Business Analytics Assignment: Neha Singh
Business Analytics Assignment: Neha Singh
ASSIGNMENT
IN LIEU TO MID-TERM EXAMINATION
NEHA SINGH
PGDM Batch 2019-21
Roll No. & Section 10, Sec.’A’
Q-1 How do you Compute Cumulative Relative Frequencies in an Excel Sheet. Take an
example to support your answer and structure the process.
SOLUTION:
For cumulative relative frequency Firstly we have to find out cumulative frequency and relative
frequency.
The data in the table represent the tuition for all 2-year community colleges in a region in 2009-
2010.
Tuition-(dollars) Number-of-Community-colleges
775-799 20
800-824 67
825-849 15
850-874 5
875-899 0
900-924 0
925-949 0
950-974 2
Step 2: Add a third column to your frequency chart. Title it “Cumulative Frequency.”
Step 3: Type the formula “=C2” (where C2 is the actual location of your first frequency count)
in the first row of your new column.
Step 4: Type the formula “=SUM(C2:C3)” (where C2 is the actual location of your first
cumulative frequency count from Step 3, and B3 is the location of your second frequency
count) in the first row of your new column.
Step 5: Click the cell you entered the formula in Step 4. Click and drag the little black square
in the bottom right hand corner of the cell to the bottom of the column. Excel will populate the
cell with all of the remaining values.
After calculate cumulative frequency, we have to find relative frequency.
Step 2: Type the formula =C2/C$12 in cell E2 then click Enter key.
Step 3: Click the cell you entered the formula in Step 2. Click and drag the little black square
in the bottom right hand corner of the cell to the bottom of the column. Excel will populate the
cell with all of the remaining values.
Steps for Cumulative Relative Frequency are:
Step 1: Copy the first digits of relative frequency as-it-is.
Step 2: Type the formula =SUM(E2;E3) in cell F3 then click Enter key OR type =SUM and
select cell E2 to E3 and click enter key.
Step 3: In this step the same formula will apply (which was used for previous cell)
=SUM(E2:E4) ) in cell F4 then click Enter key OR type =SUM and select cell E2 to E4 and
click enter key. And, do the same with all remaining columns to find cumulative relative
frequency.
Q-2 You are asked to analyze the impact of Covid 19 (Novel Coronavirus) on economy of
India for which you need to conduct a survey on individuals in Delhi city. You are
required to ask the following:
• Gender
• Education
• Ethnicity
• Name
• Age
• Length of residency
• Factors affecting Economy (using a scale of 1–5, going from poor to excellent)
SOLUTION:
1) What types of data (categorical, ordinal, interval, or ratio) would each of the survey
Items represent and why?
Gender
It is categorical or nominal type of data scale. Nominal scales were often called
qualitative scales, and measurements made on qualitative scales were called qualitative data.
Education
It is a type of ratio Scale. Applications of measurement models in educational contexts
often indicate that total scores have a fairly linear relationship with measurements across the
range of an assessment.
Ethnicity
It is cover under Nominal Scale. The nominal type differentiates between items or subjects
based only on their names or (meta-) categories and other qualitative classifications they
belong to; thus dichotomous data involves the construction of classifications as well as the
classification of items.
Name
It is also cover under Nominal scale. Because Nominal data are used to label
variables without any quantitative value.
Age
It is type of Ratio Data scale. Using the aforementioned definition, age is in a ratio
scale. And In this case there are no categorized are given of age. So, in this situation
it is cover under the ratio scale only.
Length of residency
It is also a Ratio scale. Ratio scale refers to the level of measurement in which the attributes
composing variables are measured on specific numerical scores or values.
Factors affecting Economy (using a scale of 1–5, going from poor to excellent)
By using the 1-5 scale it is categorizes as ordinal data. Because it is a Likert scale
and this scale is cover under the topic of ordinal data.
2) What analytical tools you would use on this data to analyze it.
When we talk about the analysis tools there are three names which come to the mind are
MS-Excel, SAS and SPSS.
So, comparison between Excel, SPSS and SAS are given below:
General Use Easy to learn and use Easy to learn and use Hard to learn and use
Let's dig into this job market data and study popularity of these tools by Job Function.
SAS
SAS is mostly used in business analytics and intelligence industry. IT Software functional area
comes second in the list, followed by KPO/BPO, clinical research and finance etc.
SPSS
SPSS is mostly used in business analytics and intelligence industry. IT Software functional
area comes second in the list, followed by KPO/BPO, finance and others. Others include Sales,
Retail, Marketing, Logistics etc.
Excel
Excel is mostly used in finance industry. KPO/BPO industry comes second in the list, followed
by IT software, business analytics and others. Others include HR,Retail, Marketing, Supply
Chain, Logistics etc
Conclusion
In the situation of Covid-19 survey we are working as market research. So, the best tool for the
Market research uses is SPSS which is most popular in the Market research industry and it is
also a tool from 3 of them which major job function is Analytics. It has many advance functions
and cheaper than SAS also.
3) What precautions you will take as an analyst before analyzing the data?
Before analyzing the data analyst should remove the errors in data. Here are 7 criteria you
should consider:
Respondents who answer just a fraction of your required questions can bias your overall results
for many reasons:
It can be a sign that they weren’t qualified to take your survey to begin with (leading them
to leave).
It can indicate that they weren’t as engaged and considerate in their responses as those
who were willing to complete it.
When you're working with an incomplete dataset, using filters or Compare Rules may not
show you the full picture, but offer a partial (and potentially skewed) view instead.
Say you want to survey women between the ages of 18 and 29.
You wouldn’t want the responses of a 50-year-old influencing your overall findings, would
you?
Whatever audience specifications you land on, you can ignore respondents who don’t match
them by filtering them out.
If they only take a few seconds to complete it, they’re likely speeding through the questions,
which means they aren’t reading them carefully and answering them thoughtfully.
So how do you go about deciding who’s a speeder and who isn’t? The answer can vary,
depending on the subject of your survey and the types of questions you ask.
Straight lining is when a respondent chooses the same answer choice over and over again (e.g.
the first answer option). Straight liners are often speeders as well, as they race through the
survey by answering each question with little to no thought.
5. Respondents who provide unrealistic answers
Imagine asking respondents how much TV they watch per week, on average. If a respondent
writes in 165 hours, they’re likely exaggerating (Hint: there are only 168 hours in a week).
We call this type of response an outlier, because it falls beyond the range of answers from our
other respondents, and is, quite frankly, unrealistic.
When a respondent’s answer contradicts their response to another question, it’s clear that
they’re either being dishonest or careless (or even both!)
Having a response like: “Fdsklj” might make you smile, but it isn’t going to get you far in your
analysis.
Q3 Discuss the concept of contingency tables and analyze the dashboard given
below on the following parameters
DATA SET
Regi Time
Cust ID on Payment Transaction Code Source Amount Product Of
Day
10001 East Paypal 93816545 Web $20.19 DVD 22:19
OUTPUT OF DASHBOA
SOLUTION:
The subcategories of variable must be mutually exclusive and exhaustive, meaning that each
observation can be classified into only one subcategory, and taken together over all
subcategories, they must constitute and complete data set.
Additionally, a pivot table is one of the possible ways of creating a contingency table. A typical
pivot table has the visual form of the contingency table, or pivot table can be used to quickly
create cross-tabulation and to drill down into a large set of data in numerous ways.
Payment Mode
In East, North, South credit payment is use more than Paypal. But in West Paypal
is much casual
In Total the much amount or payment is made and accept by the credit.
Sale Amount.
Highest sale amount is in West, because the no. of product ordered in west is
highest.
As no. of product sold in east and south is not a big difference. Similarly the sale
amount is also haven’t big difference (East have $251.67 more than south).
North has lowest sale Amount or Lowest sale.
D
REFRENCES
https://www.youtube.com/watch?v=DrtChy0dBuk
https://en.wikipedia.org/wiki/Level_of_measurement
https://stats.stackexchange.com/questions/240363/is-age-interval-scale
https://www.surveymonkey.com/curiosity/survey-data-cleaning-7-things-to-
check-before-you-start-your-analysis/
https://www.listendata.com/2013/04/data-analysis-tools-excel-spss-or-sas.html
https://stats.stackexchange.com/questions/44834/what-is-the-
difference-between-the-pivot-table-and-contingency-table
Michael.hahsler.net