MS Excel - Excercises - BA Lab Manual
MS Excel - Excercises - BA Lab Manual
Basics of MS Excel : Features of MS Excel , Worksheets and Workbooks: Definition of Worksheets and
Workbooks, Opening, Labeling and Naming Worksheets and Workbooks , Adding, Deleting and Saving
Worksheets and Workbooks, Format Worksheet Tabs, Reposition Worksheets, Inserting, Deleting, and
Renaming Worksheets, Copy Worksheets.
If you have a workbook that you have already been working on, you can open it from Excel. You
can do so in three ways;
These methods will close the workbook and leave Excel open.
There are also 3 ways to exit Excel:
To enter data into Excel, click on the cell, type in your data and press Enter.
After pressing Enter the cell below the current one then becomes the active cell. Other
alternatives are:
Tab key: Enters the data and the cell to the right of the current cell becomes the active cell
Arrow keys: Enters the data and the cell dependent on the direction of the arrow key pressed
becomes the active cell
Mouse click: Enters the data and the cell clicked becomes the active cell
• Select the row, or a cell in the row below where you want the inserted row to appear. For example, if
you wanted to insert a row between rows 7 and 8, select row 8.
• Select the column, or a cell in the column to the right of where you want the inserted column to appear.
For example, if you wanted to insert a column between columns C and D, select column D.
• Select the cell, or the range of cells where you want to insert the new cells. Select the same number of
cells as you would like to insert
• In the dialogue box that appears select the direction in which to shift the surrounding cells Deleting
rows, columns, and cells
Date:
Suppose you have the following dataset in cells A1 to A10:
1 45
2 32
3 67
4 89
5 22
6 54
7 78
8 43
9 56
10 65
To find the maximum value in this dataset, enter the following formula in a cell:
`=MAX(A1:A10)`
The result should be `89`, which is the highest value in the dataset.
To find the minimum value in the dataset, enter the following formula in a cell:
`=MIN(A1:A10)`
The result should be `22`, which is the lowest value in the dataset.
**AVG (Average Value):**
To find the average (mean) value of the dataset, enter the following formula in a cell:
`=AVERAGE(A1:A10)`
The result should be approximately `56.5`, which is the average value of the dataset.
To find the sum of all values in the dataset, enter the following formula in a cell:
`=SUM(A1:A10)`
The result should be `565`, which is the sum of all values in the dataset.
To calculate the square root of a value, let's say you want to find the square root of the
value in cell A3 (which is `67`). Enter the following formula in a cell:
`=SQRT(A3)`
The result should be approximately `8.1853527719`, which is the square root of `67`.
To round a value to a specific number of decimal places, let's say you want to round the
value in cell A6 (which is `54`) to two decimal places. Enter the following formula in a cell:
`=ROUND(A6, 2)`
The result should be `54.00`, which is the value rounded to two decimal places.
2. (ii) Perform data import/export operation for different file formats.
Date:
Certainly, here are examples of data import and export operations in Microsoft Excel for
different file formats:
- Suppose you have a CSV file named "data.csv" with the following content:
Carol 28 Chicago
- Open Excel.
- Excel will automatically parse and load the data into a worksheet.
- Suppose you have a text file named "data.txt" with tab-delimited content:
Carol 28 Chicago
- To import this data into Excel, follow the same steps as importing from a CSV file, but
make sure to specify the tab as the delimiter in the Text Import Wizard.
- Suppose you have data in an Excel worksheet, such as the following table in cells A1:C4:
Carol 28 Chicago
- Open a text editor (e.g., Notepad) and paste the data (Ctrl+V).
- Suppose you have data in an Excel worksheet, as shown in the previous example.
- Suppose you have data in an Excel worksheet, as shown in the previous example.
These examples demonstrate how to import data from CSV and text files into Excel and
how to export data from Excel to CSV, PDF, and text files. You can adapt these methods to
your specific data and file format requirements.
EXP3. Perform statistical operation –Mean , Median , Mode and
Standard deviation , Variance , Skewness , Kurtosis
Date:
**Sample Dataset:**
1 45
2 32
3 67
4 89
5 22
6 54
7 78
8 43
9 56
10 65
**Mean (Average):**
To calculate the mean (average) of the dataset, enter the following formula in a cell:
`=AVERAGE(A1:A10)`
**Median:**
To calculate the median of the dataset, enter the following formula in a cell:
`=MEDIAN(A1:A10)`
To calculate the mode of the dataset, enter the following array formula in a cell (remember
to press `Ctrl+Shift+Enter` after typing the formula, not just `Enter`):
`=MODE.SNGL(A1:A10)`
**Standard Deviation:**
To calculate the sample standard deviation of the dataset, enter the following formula in a
cell:
`=STDEV.S(A1:A10)`
**Variance:**
To calculate the sample variance of the dataset, enter the following formula in a cell:
`=VAR.S(A1:A10)`
**Skewness:**
To calculate the skewness of the dataset, enter the following formula in a cell:
`=SKEW(A1:A10)`
**Kurtosis:**
To calculate the kurtosis of the dataset, enter the following formula in a cell:
`=KURT(A1:A10)`
The result should be approximately `-0.788`, indicating that the distribution is platykurtic
(less peaked than a normal distribution).
These formulas can be used to perform these statistical operations on your own dataset in
Excel. Simply replace `A1:A10` with the range of cells containing your data.
EXP4. Perform Z-test , T-test , & ANOVA
Date:
Let's perform Z-tests, T-tests, and ANOVA with this data:
**Sample Data:**
85 88 92
89 85 95
90 91 93
87 86 91
84 87 94
88 89 92
85 92 89
86 88 94
90 86 93
89 90 90
**Z-Test Example:**
Suppose we want to perform a Z-test to compare the mean test scores of Method A to a
known population mean of 85.
2. Calculate the population standard deviation (if known). For example, let's assume it's 4.0.
**T-Test Example:**
Suppose we want to perform a T-test to compare the mean test scores of Method B to a
known or assumed population mean of 85 (population standard deviation is
unknown).
**ANOVA Example:**
1. Organize the data in Excel as you provided (Method A, Method B, Method C in separate
columns).
- `A1:A10`, `B1:B10`, and `C1:C10` represent the ranges of test scores for each method.
This formula will perform a one-way ANOVA and provide information on whether there are
statistically significant differences between the teaching methods.
5. Perform Data Preprocessing operations
i)Handling Missing Data
ii)Normalization
Date:
**Sample Dataset:**
Suppose you have a dataset of test scores and ages, and there are some missing values.
85 25
89 30
90 28
87 32
84 27
88 35
Missing
86 29
90 33
89 26
1. You can choose to remove rows with missing data (rows without age values) if the
missing values are sparse.
- Filter the "Age" column and uncheck the box for "Blanks."
This will hide or remove the rows with missing age values.
1. To fill in the missing values in the "Age" column with the average age, you can create a
new column for the filled values.
- In cell C2, use the following formula to fill missing values with the average age:
Now, the missing values in the "Age" column are filled with the average age.
**ii) Normalization:**
**Min-Max Normalization:**
1. To normalize the "Test Scores" and "Age" columns using Min-Max normalization to the
range of 0 to 1, create new columns for the normalized values.
=(A2-MIN($A$2:$A$11))/(MAX($A$2:$A$11)-MIN($A$2:$A$11))
=(B2-MIN($B$2:$B$11))/(MAX($B$2:$B$11)-MIN($B$2:$B$11))
Now, the "Test Scores" and "Age" columns are normalized to the range of 0 to 1.
These examples demonstrate how to handle missing data and perform Min-Max
normalization in Excel with sample data. You can adapt these methods to your specific
dataset and requirements.
6. Perform Dimensionality reduction operation using PCA,
KPCA and SVD
Date:
**Sample Dataset:**
Suppose you have a dataset with three features (variables): "Feature 1," "Feature 2," and
"Feature 3." We will perform PCA on this dataset.
|--------|-----------|-----------|-----------|
|A | 85 | 88 | 92 |
|B | 89 | 85 | 95 |
|C | 90 | 91 | 93 |
|D | 87 | 86 | 91 |
|E | 84 | 87 | 94 |
|F | 88 | 89 | 92 |
|G | 85 | 92 | 89 |
|H | 86 | 88 | 94 |
|I | 90 | 86 | 93 |
|J | 89 | 90 | 90 |
**PCA Example:**
Calculate the mean for each feature (columns B, C, and D). These are the sample means for
Feature 1, Feature 2, and Feature 3.
- Mean(Feature 1) = `=AVERAGE(B2:B11)`
- Mean(Feature 2) = `=AVERAGE(C2:C11)`
- Mean(Feature 3) = `=AVERAGE(D2:D11)`
Create new columns (E, F, and G) to center the data by subtracting the corresponding
means from the original data. For example, cell E2 should contain `=B2-Mean(Feature 1)`.
Calculate the covariance matrix for the centered data (columns E, F, and G). You can use
Excel's built-in COVAR function or, for example, Covariance(E2:E11, F2:F11) to calculate
the covariance between Feature 1 (E) and Feature 2 (F).
Excel doesn't have built-in functions to directly calculate eigenvectors and eigenvalues, so
you may need to use other software or programming languages for this step. These values
represent the directions of maximum variance in your data.
5. **Choose Principal Components:**
Select the top-k eigenvectors corresponding to the largest eigenvalues. Typically, you
might choose a subset of the eigenvectors, reducing the dimensionality to k dimensions.
Multiply your centered data by the selected principal components to project your data
into the new coordinate system defined by the principal components.
import numpy as np
np.random.seed(0)
# Apply KPCA
plt.title("Original Data")
plt.show()
plt.show()
In this example:
2. We apply KPCA with a radial basis function (RBF) kernel (commonly used for non-linear
data).
This Python example illustrates the concept of KPCA and how it can transform non-linear
data into a lower-dimensional space. In practice, you would typically use more complex and
real-world datasets with KPCA to capture and represent non-linear relationships in the
data.
**Sample Dataset:**
1 85 88 92
2 89 85 95
3 90 91 93
4 87 86 91
5 84 87 94
6 88 89 92
7 85 92 89
8 86 88 94
9 90 86 93
10 89 90 90
**SVD Example:**
Calculate the mean for each column (A, B, C) to center the data by subtracting the means
from each data point.
- Mean(A) = `=AVERAGE(A1:A10)`
- Mean(B) = `=AVERAGE(B1:B10)`
- Mean(C) = `=AVERAGE(C1:C10)`
Create new columns (D, E, F) to center the data. For example, cell D1 should contain `=A1-
Mean(A)`.
D E F
1 -3 0 0
2 1 -3 3
3 2 2 1
4 -1 -3 -2
5 -4 -1 1
6 0 0 0
7 -3 4 -3
8 -2 0 1
9 2 -2 0
10 1 1 -3
To perform SVD in Excel, you can use the built-in functions. Assuming you have your
centered data in columns D, E, and F, you can calculate the SVD as follows:
- In a new area of the spreadsheet, enter the following formulas to calculate the SVD:
- In cell J1, enter `=SQRT(H1)^(2/3)` (This is a component to estimate the first singular
value).
- In cell K1, enter `=(I1-J1*D1:F10)*D1:F10` (This is the first component to estimate the
first singular vector).
- The singular values (Σ) can be obtained from the diagonal elements of the matrix
calculated in cell I1.
Please note that in practice, SVD calculations for larger datasets are typically performed
using specialized software or programming languages with dedicated libraries like NumPy
in Python or SVD functions in R. The above example provides a simplified illustration of
how SVD might be implemented in Excel for a small dataset.
Performing bivariate and multivariate analysis in Microsoft Excel involves exploring the
relationships and interactions between two or more variables in your dataset. Let's use a
sample dataset to demonstrate these analyses.
**Sample Dataset:**
Suppose you have a dataset with the following information about individuals: "Age,"
"Income," and "Spending." We will use this dataset to perform bivariate and multivariate
analyses.
|--------|-----|--------|----------|
|A | 30 | 50000 | 1000 |
|B | 35 | 60000 | 1200 |
|C | 28 | 45000 | 900 |
|D | 40 | 75000 | 1500 |
|E | 25 | 40000 | 800 |
|F | 45 | 80000 | 1600 |
|G | 32 | 55000 | 1100 |
|H | 38 | 68000 | 1360 |
|I | 29 | 48000 | 960 |
|J | 42 | 72000 | 1440 |
**Bivariate Analysis:**
1. **Correlation Analysis:**
You can calculate the correlation between two variables (e.g., Age and Income, Age and
Spending, Income and Spending) to measure the strength and direction of the relationship.
- To calculate the correlation between Age and Income, you can use the formula
`=CORREL(B2:B11, C2:C11)` in Excel.
- To calculate the correlation between Age and Spending, you can use the formula
`=CORREL(B2:B11, D2:D11)` in Excel.
- To calculate the correlation between Income and Spending, you can use the formula
`=CORREL(C2:C11, D2:D11)` in Excel.
**Multivariate Analysis:**
1. **Scatter Plots:**
Create scatter plots to visualize the relationships between multiple variables. For
example, you can create a scatter plot matrix to view the relationships between Age,
Income, and Spending simultaneously.
- Go to the "Insert" tab in Excel and choose "Scatter" from the "Charts" group.
Excel will generate a matrix of scatter plots that show the relationships between pairs of
variables.
2. **Regression Analysis:**
- Use Excel's regression analysis tool. Click on "Data Analysis" in the "Data" tab, and select
"Regression." Specify your dependent and independent variables to perform the analysis.
Regression analysis helps you model and predict the impact of multiple variables on an
outcome.
These are basic examples of bivariate and multivariate analyses in Excel using a small
dataset. More advanced and complex analyses can be performed depending on the nature
of your data and research objectives.
**Sample Dataset:**
For this example, we'll use the same dataset we've used previously:
|--------|-----|--------|----------|
|A | 30 | 50000 | 1000 |
|B | 35 | 60000 | 1200 |
|C | 28 | 45000 | 900 |
|D | 40 | 75000 | 1500 |
|E | 25 | 40000 | 800 |
|F | 45 | 80000 | 1600 |
|G | 32 | 55000 | 1100 |
|H | 38 | 68000 | 1360 |
|I | 29 | 48000 | 960 |
|J | 42 | 72000 | 1440 |
Create a scatter plot to visualize the relationship between two variables, such as Age and
Income.
This will create a scatter plot showing the relationship between Age and Income.
**2. Histogram:**
You can customize the number of bins and other settings to create the histogram.
Create a bar chart to compare categorical data or discrete values, such as Spending by
Person.
Create a line chart to visualize trends over time or sequential data, such as tracking
changes in Spending over Person.
This will create a line chart showing the trend in Spending over Person.
These are some basic examples of plotting functions in Excel. You can further customize
and explore various chart options to better understand and visualize your dataset. Excel
offers a wide range of chart types and customization features to create meaningful and
informative visualizations for your data.