{r global_options, include=FALSE} knitr::opts_chunk$set(echo=FALSE, warning=FALSE, message=FALSE)
The Dataset has 4898 entries, with 13 parameters describing them. The columns are the ID of the wine, 11 variables describing the wine and finally, the wine’s score as rated by at least 3 “wine experts”.
mean | sd | median | min | max | n | |
---|---|---|---|---|---|---|
X | 2449.5000000 | 1414.0751394 | 2449.50000 | 1.00000 | 4898.00000 | 4898 |
fixed.acidity | 6.8547877 | 0.8438682 | 6.80000 | 3.80000 | 14.20000 | 4898 |
volatile.acidity | 0.2782411 | 0.1007945 | 0.26000 | 0.08000 | 1.10000 | 4898 |
citric.acid | 0.3341915 | 0.1210198 | 0.32000 | 0.00000 | 1.66000 | 4898 |
residual.sugar | 6.3914149 | 5.0720578 | 5.20000 | 0.60000 | 65.80000 | 4898 |
chlorides | 0.0457724 | 0.0218480 | 0.04300 | 0.00900 | 0.34600 | 4898 |
free.sulfur.dioxide | 35.3080849 | 17.0071373 | 34.00000 | 2.00000 | 289.00000 | 4898 |
total.sulfur.dioxide | 138.3606574 | 42.4980646 | 134.00000 | 9.00000 | 440.00000 | 4898 |
density | 0.9940274 | 0.0029909 | 0.99374 | 0.98711 | 1.03898 | 4898 |
pH | 3.1882666 | 0.1510006 | 3.18000 | 2.72000 | 3.82000 | 4898 |
sulphates | 0.4898469 | 0.1141258 | 0.47000 | 0.22000 | 1.08000 | 4898 |
alcohol | 10.5142670 | 1.2306206 | 10.40000 | 8.00000 | 14.20000 | 4898 |
quality | 5.8779094 | 0.8856386 | 6.00000 | 3.00000 | 9.00000 | 4898 |
This one is measured by the tartaric acid’s concentration within the wine. These acids do not evaporate easily. We have 0 missing values.
Bell shaped, a very small positive skewness.
This one is measured by the acetic acid’s concentration within the wine. At too high level, this will cause an unpleasant, vinegar like taste. We have 0 missing values.
Bell shaped, but a more pronounced skewness here, a negative one.
Found in small quantities, citric acid can dd frehness and flavor to wine. We have 0 missing values.
Also bell shaped, a little skewed to the left.
The amount of sugar remaining after fermentation stops, it’s rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet. We have 0 missing values.
A bimodal shape here, with an extremely high spike at the start (Dry wine, no or very little residual sugar), then another smaller summit at roughly 8 g/dm^3
The amount of salt in the wine. We have 0 missing values.
A lot of outliers in here. But the main bulk has a balanced bell shape.
The free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine. At free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine We have 0 missing values.
Same as chlorides, but the proportion of outliers here is a lot less.
Amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine. We have 0 missing values.
The outliers here are even less than that of the free sulfurs.
The density of water is close to that of water depending on the percent alcohol and sugar content. We have 0 missing values.
Very little outliers, and a very narrow density range (I mean numerically). The vast majority of the wines are just a tiny bit less dense than water.
Describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale. We have 0 missing values.
A classical bell shape, and all wines are within the acidic pH range with a mean pH around 3.2
A wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant. We have 0 missing values.
Positively skewed bell shape.
The percent alcohol content of the wine. We have 0 missing values.
Alcohol distribution has a heavy positive skeweness. The peak is maybe at 9%, but the bulk of bottles have a higher alcohol rate.
Expert Rating for the wine, on a scale from 1 to 10. We have 0 missing values.
Minimum score is 3, maximum is 9. The most common score is 6, but there are more bottles with a worse score than not.
The dataset has 11 variables describing - chemically - the wine, and one last variable for the wine’s quality as perceived by wine experts, graded from 0 (very bad) to 10 (highest quality). Each wine is was evaluated by at least 3 experts. As declared: “Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).”
I honestly would have wished to have the price tag as well.
From the univariable analysis, there is not really any striking variable. All of them have a - more or less - bell shape, with different levels of skewness. Intuitevly though, the most important factor would be the quality.
I will wait for the correlation calculation to see which variables are most related to quality. I would suspect that acidity, sulfur contents and alcohol level would be important factors.
Not yet
Alcohol was positively skewed in a very noticeable way, while quality was slightly negatively skewed. There is not a lot of outliers in the data.