0% found this document useful (0 votes)
3 views

Individual Assignemnt Tanzila Siddique MBA 739

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Individual Assignemnt Tanzila Siddique MBA 739

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Individual Assignemnt

Tanzila Siddique
MBA 739

3.2
3.3 i)
3.3 ii)

4.4 Break fast Cereal


4 .1 a) Quantitative/Numeric Variables: From the dataset, the quantitative/numeric variables
are:

1. Calories
2. Protein
3. Fat
4. Sodium
5. Fiber
6. Carbohydrates
7. Sugars
8. Potassium
9. Vitamins

b)
1. Which variables have the largest variability?

Based on the standard deviations from the summary statistics and the spread in the histograms:

 Sodium has the largest variability with a standard deviation of 85.03.


 Potassium also shows considerable variability with a standard deviation of 84.98.

2. Which variables seem skewed?

Variables that appear skewed based on the histograms:

 Protein, Fat, Sodium, Fiber, Sugars, and Potassium all show some degree of
skewness. Specifically:
o Protein and Potassium are skewed towards higher values.
o Fat, Sodium, Fiber, and Sugars are skewed towards lower values.

3. Are there any values that seem extreme?

From the histograms and summary statistics:

 Sodium has values reaching up to 290, which might be considered extreme depending on
the context (e.g., dietary recommendations).
 Potassium also has values reaching up to 330, which could be considered high depending
on the recommended daily intake
This plot provides a clear comparison of calorie distributions between hot and cold cereals, aiding in
understanding their nutritional differences.
Based on the correlation matrix computed from the cereal dataset, the pair of variables that is most
strongly correlated is Fiber and Potassium, with a correlation coefficient of approximately 0.7285. This
indicates a strong positive correlation between the amount of fiber and potassium in cereals.

To reduce the number of variables based on correlations, consider the following strategies:
1. Identify Highly Correlated Pairs: Look for pairs of variables that have high absolute correlation
coefficients (both positive and negative). Variables that are highly correlated provide redundant
information, so keeping both may not add significant value to your analysis.
o For example, in the cereal dataset:
Fiber and Potassium are highly correlated. Keeping both might be redundant; you could consider
using only one of them depending on your analysis goals.
2. Principal Component Analysis (PCA): PCA is a dimensionality reduction technique that can be
used to transform a set of correlated variables into a smaller set of uncorrelated variables (principal
components).

5.1
.
[1] 0.09052632

You might also like