null
null
Name of Student:
Branch:
Sem:
Register Number:
Sl No Content
Collect the data of two wheeler (with a rider and a pillion) crossing a
busy junction in your locality in the peak hours (problem statement
can be changed according to priorities of the tutor) and determine the
17
variance of the data in Microsoft excel spread sheet and
brief your inference with less than 30 words.
Primary data or raw data is a type of information that is obtained directly from the first-
hand source through experiments, surveys, or observations.
The primary data is further classified into two types, They are
• Qualitative Data
• Quantitative Data
Quantitative Data:
These can be measured and not simply observed. They can be numerically represented and
calculations can be performed on them.
It is also known as numerical data which represents the numerical value i.e., how much, how
often, how many.
It is based on mathematical calculations using various formats like close ended
questions, correlation and regression methods, mean, median or mode measures.
This method is cheaper than qualitative data collection methods.
This can be classified as quantitative.
Secondary data is data is the data which has been already collected and analysed by
someone other than the actual user.
The secondary data includes magazines, newspapers, books, journals etc.
Example: Government publications ,Historical and statistical documents, Business
EXPERIMENT -1
AIM: To prepare a close/open ended hand written questionnaire containing 25 questions.
PURPOSE: A questionnaire can be helpful for collecting data and analyse it.
STRUCTURE:
• • Interpretation of the language should be same for all the questions. It means language should be
concise.
• Units of questions should be precisely stated or defined in order to ensure proper orientation of respondent.
• Subjective words such as ‘bad’, ‘good’, ‘fair’ and the like do not lend themselves either to quantitative or
qualitative and as such should be avoided.
• No single question should deal with more than one issue and as such the principle of one question, one
issue should be followed.
• Vocabulary employed in the questions should be appropriate to the background of the respondents.
• Questions should be so worded that ego of the respondents is not injured in any way.
• Complex questions that require the respondent to go through several steps of reasoning before answering
are undesirable and as such should be avoided.
TOPIC:
1)
2)
3)
4)
5)
6)
7)
8)
9)
11)
12)
13)
14)
15)
16)
17)
18)
19)
20)
21)
22)
23)
24)
25)
• Click on the send, enter the recipient’s email ID, click on dialogue box and send.
2) Using the TRIM function in Excel the data in a column can be arranged according to one’s priority and
remove unwanted space between words
3) The logical test (IF) is used to fill the missing values in those cells.
4) Duplicate values happen when the same value or set of values appear in the data.
The following are the steps involved in removing duplicates in the data set.
5) To remove blanks
Select Data, HOME > FIND&SELECT > REPLACE > FIND AND REPLACE > REPLACE ALL
SUMMARIZATION OF DATA
Statistics have majorly categorized into two types:
1. Descriptive statistics
2. Inferential statistics
Data tabulation
Tabulation is a process of systematic arrangement of the classified data in
rows and columns, in the form of table.
In general
We use the following steps to construct a frequency table:
Step 1:
Construct a table with three columns. Then in the first column, write
down all of the data values in ascending order of magnitude.
Step 2:
To complete the second column, go through the list of data values and
place one tally mark at the appropriate place in the second column for every data
value.
Step 3:
Count the number of tally marks for each data value and write it
in the third column.
When the set of data values are spread out, it is difficult to set up a
frequency table. So we group the data into class intervals (or groups) to help us
organize, interpret and analyze the data.
Ideally, we should have between five and ten rows in a frequency table.
Example: The number of calls from motorists per day for roadside service was
recorded for the month of December 2003. The results were as follows:
Step 2: Go through the list of data values. Place a tally mark against the
corresponding class interval in the second column.
Step 3: Count the number of tally marks for each group and write it in the
third column.
The data set consists of 50 student marks in Mathematics in a class. Marks Scored
TOTAL
1. Count the total number of items. In this chart the total is 40.
INTERFACE:
To find the cumulative relative frequency, follow the steps above to create a
relative frequency distribution table. As a final step, add up the relative
frequencies in another column.
The first entry in the column is the same as the first entry in the rel.freq column (0.29).
Next, I added the first and second entries to get 0.29 + 0.50 = 0.79.
Next, I added the first, second and third entries to get 0.29 + 0.50 + 0.14 = 0.93.
Finally, I added the first, second, third and fourth entries to get 0.29 + 0.50 + 0.14 + 0.07 = 1
The bars drawn are of uniform width, and the variable quantity is represented on one of
the axes. The number of values on the x-axis of a bar graph or the y-axis of a column
graph is called the scale.
The frequency distribution tables can be easily represented using bar charts which
simplify the calculations of data.
When the grouped data are represented vertically in a graph or chart with the help
of bars.
The data is represented along the y-axis of the graph, and the height of the bars
shows the values.
When the grouped data are represented horizontally in a chart with the help of
bars.
The data is depicted here along the x-axis of the graph, and the length of the
bars denote the values.
Apart from the vertical and horizontal bar graph, the two different types of bar charts are
Grouped bar graph is also called the clustered bar graph, which is used to represent the
discrete value for more than one object that shares the same category.
Stacked bar graph is also called the composite bar chart, which divides the
aggregate into different parts. In this type of bar graph, each part can be
represented using different colors.
Bar charts possess a discrete domain of divisions and are normally scaled so that all the
data can fit on the graph. When there is no regular order of the divisions being matched,
bars on the chart may be organized in any order. Bar charts organized from the highest to
the lowest number are called Pareto charts.
EXPERIMENT -7
AIM: To conduct survey on favorite fruit of 100 persons using excel spread sheet and to plot bar graph for
the collected data.
PURPOSE: Bar graphs have been in widespread use everywhere from textbooks to newspapers, most
audiences understand how to read a bar graph and can grasp the information the graph conveys.
STEPS INVOLVED:
Step1: Tabulate the data collected.
Step2: Select the entire tabulated data as shown in the figure. Using formula
=COUNTIF(RANGE,”FRUIT”) and similarly for other fruits
Step 3: Click on insert tab > chart : Modify the chart by adding tittle name, axis tittles and format data labels.
15 12
10
10
5
0
apple orrange banana sapota grapes kiwi
Fruits
Formula
Each segment and sectors of a pie chart forms all the data is equal to 360°.
To work out with the percentage for a pie chart, follow the steps given below:
Imagine a teacher surveys her class on the basis of their favourite Sports:
10 5 5 10 10
Step 2: Add all the values in the table to get the total.
Step 3: Next, divide each value by the total and multiply by 100 to get a per cent:
(10/40) × 100 (5/ 40) × 100 (5/40) ×100 (10/ 40)×100 (10/40)×
=25% =12.5% =12.5% =25% 100
=25%
Step 4: Next to know how many degrees for each “pie sector” we need, we will take a
full circle of 360° and follow the calculations below:
The central angle of each component = (Value of each component/sum of values of all the
components)✕360°
(10/ 40)× 360° (5 / 40) × 360° (5/40) × 360° (10/ 40)× 360° (10/ 40) × 360°
=90° =45° =45° =90° =90°
Now you can draw a pie chart.
Step 5: Draw a circle and use the protractor to measure the degree of each sector.
A function is just an equation that gives you a unique output for every input. For
Example: y = – 4/5x + 3 is a function because you’ll get a unique value for
The most usual type of data you’ll find on a line graph is how something
changes over time. A line graph that shows changes over time is sometimes
called a Timeplot.
A Dow Jones Timeplot from the Wall Street Journal shows how the
stock market changes over time.
• You have a function. Line graphs are good at showing specific data values, meaning
that if you have one variable (x) you can easily find the other (y).
• You want to show trends. For example, how your investments change over time or
how food prices have increased over time.
• You want to make predictions. A line graph can be extrapolated beyond the
data at hand. They enable you to make predictions about the results of data.
PURPOSE: A line graph, also known as a line chart, is a type of chart used to visualize the
value of something over time.
STEPS INVOLVED:
Step 1: Put each category and the associated value on one line
Step 4: Then, open the Insert tab in the Ribbon. In the Charts group, click the Insert Line or
Area Chart Button:
Step 1- Choose the class interval and mark the values on the horizontal axes
Step 2- Mark the mid value of each interval on the horizontal axes.
PURPOSE: Frequency polygons are a graphical device for understanding the shapes
of distributions. They serve the same purpose as histograms, but are especially
helpful for comparing sets of data.
STEPS INVOLVED:
Step2: Select the entire data, Click on insert tab>>insert bar chart>> click on the bar
graph>> select format data series>> make gap width 0 to obtain the Histogram
Step3: Once the histogram is constructed join the midpoints of each bar by hand,
Notice that, unlike a bar chart, there are no "gaps" between the bars
(although some bars might be "absent" reflecting no frequencies). This is
because a histogram represents a continuous data set, and as such, there
are no gaps in the data (although you will have to decide whether you
round up or round down scores on the boundaries of bins).
Choosing the correct bin width There is no right or wrong answer as to how wide a bin
The major difference is that a histogram is only used to plot the frequency
of score occurrences in a continuous data set that has been divided into
classes, called bins. Bar charts, on the other hand, can be used for a great
deal of other types of variables including ordinal
A boxplot, also called a box and whisker plot, is a way to show the spread
and centers of a data set. Measures of spread include the interquartile
range and the mean of the data set. Measures of center include the mean
or average and median (the middle of a data set).
Five pieces of information (the “five number summary“) are generally included in
the chart:
EXPERIMENT - 11
AIM: To Using Microsoft Excel spread sheet construct a box plot for the given dataset.
PURPOSE: Box plot is usually helpful in explanatory data. It indicates the spread out of data based
on 5 number summary namely minimum, Q1 (Quartile 1), Median, Q3 (Quartile 3), and Maximum.
Sample problem:
Make a box and whiskers chart in Excel for the following data set:
25, 145, 145, 148, 178, 178, 198, 201, 222, 210, 565, 589, 485,
333, 358, 158, 257.
STEPS INVOLVED:
Step 1: Type your data into one column in an Excel worksheet.
For this example, type your data into cells A1:A11.
Q1 =QUARTILE(A1:A17,1)
Median or Q2 =MED(A1:A17)
Q3 =QUARTILE(A1:A17,3)
Maximum = MAX(A1:A17)
Step 5: Click the graph and then click the “Switch Row/Column” button.
Step 6: Select the left-hand blue box, right-click and then click
“Format Data Series.”
2. Select the “Layout” tab, then click “Error Bars“. Next, click “More
Q1 and the Min into the “Fixed Value” box. For this
sample problem, that value is 133.
6. Click “Close.”
EXPERIMENT - 12
AIM: Using Microsoft Excel spread sheet construct a stem and leaf plot for the given dataset.
PURPOSE: A stem-and-leaf display (also known as a stemplot) is a diagram designed to allow you
to quickly assess the distribution of a given dataset. It indicates the recurrence of data.
STEPS INVOLVED:
values.
Once you finish typing the formula and click Enter, you will get the following result:
To double check that your results are correct, you can verify three numbers:
Central tendency refers to the centre of the distribution of the data and
which generally can be described using different measures like mean,
median and the mode.
Based on the properties of the data, the measures of central tendency are
selected.
The mean represents the average value of the dataset. It can be calculated as the
sum of all the values in the dataset divided by the number of values
Median is the middle value of the dataset in which the dataset is arranged in the
ascending order or in descending order.
➢ When the dataset contains an even number of values, then the median value
of the dataset can be found by taking the mean of the middle two values.
• Consider the given dataset with the odd nmber of observations arranged in
descending order –23, 21, 18, 16, 15, 13, 12, 10, 9, 7, 6, 5, and 2
27 and 29.
Now, find out the mean value for these two numbers.
i.e.,(27+29)/2 =28
The mode represents the frequently occurring value in the dataset. Sometimes the
dataset may contain multiple modes and in some cases, it does not contain any
mode at all.
Since the mode represents the most common value. Hence, the most frequently repeated
value in the given dataset is 5.
so the range is 9 − 3 = 6.
EXPERIMENT - 13
AIM: Using Microsoft Excel spread sheet find the Mean, Mode and Median for the data (univariate data)
given and also represent them in a Histogram.
PURPOSE: The central tendencies Mean, Mode and Median help us understand that has already taken
place and predict future values as well.
STEPS INVOLVED:
Step 1: Tabulate the collected data as shown below
Step 2: Select the data and enter the syntax for Mean, Mode and Median
Step 3: Press enter to obtain the Mean or Average, Mode and Median
Step 4: Plot the Histogram with the instruction given in previous examples.
Step 1: Consider the odd number and even number data set as shown below
Quartile deviation defined as the difference between the third and first quartiles,
and half of this range is called the semi-interquartile range (SIQD) or
Step 3: The average of the absolute values is determined. This value is called Mean deviation.
Step 4 : Quartile Deviation:the quartiles Q1 and Q3 are found with which the Quartile deviation
QD=(Q3-Q1)/2 is found
=VAR.P(number1,NUMBER2)
EXPERIMENT - 17
AIM: To determine the variance for the data collected.
PURPOSE: SD and variance tells us about the shape of our distribution, how close the
individual data values are from the mean value.
STEPS INVOLVED:
Step 1: Let's use this =VAR.P(number1,NUMBER2)formula to calculate the variance of our data.
Step 2 : We have data in cell C2:C15. So the formula will be:
INFERENCE: This returns a value 186.4285714, which is quite a large variance given our data
Positive Skewness means when the tail on the right side of the distribution
is longer or fatter. The mean and median will be greater than the mode.
Negative Skewness is when the tail of the left side of the distribution is
longer or fatter than the tail on the right side. The mean and median will
be less than the mode.
So, when is the skewness too much? The rule of thumb seems to be:
• If the skewness is between -0.5 and 0.5, the data are fairly symmetrical.
• If the skewness is between -1 and -0.5(negatively skewed) or
between 0.5 and 1(positively skewed), the data are moderately
skewed.
• If the skewness is less than -1(negatively skewed) or greater
than 1(positively skewed), the data are highly skewed.
High kurtosis in a data set is an indicator that data has heavy tails or
outliers. If there is a high kurtosis, then, we need to investigate why do we
have so many outliers. It indicates a lot of things, maybe wrong data entry
or other things. Investigate!
Low kurtosis in a data set is an indicator that data has light tails or lack of
outliers. If we get low kurtosis(too good to be true), then also we need to
investigate and trim the dataset of unwanted results.
It is used for:
• Python was designed for readability, and has some similarities to the English
language with influence from mathematics.
• Python uses new lines to complete a command, as opposed to other
programming languages which often use semicolons or parentheses.
• Python relies on indentation, using whitespace, to define scope; such as the
scope of loops, functions and classes. Other programming languages often
use curly-brackets for this purpose.
Example
if 5 > 2:
print("Five is greater than two!")
OUTPUT:Five is greater than two!
Syntax Error:
if 5 > 2:
print("Five is greater than two!")
if 5 > 2:
print("Five is greater than two!")
if 5 > 2:
print("Five is greater than two!")
Comments start with a #, and Python will render the rest of the line as a comment:
Example
#This is a comment
print("Hello, World!") Hello, World!
#print("Hello, World!")
print("Cheers, Mate!") Hello, World!
Python does not really have a syntax for multi line comments.
Example
#This is a comment
#written in
#more than just one line
print("Hello, World!") Hello, World!
Since Python will ignore string literals that are not assigned to a variable, you can
add a multiline string (triple quotes) in your code, and place your comment inside
it:
"""
This is a comment
written in
more than just one line
"""
print("Hello, World!") Hello, World!
You can get the data type of any object by using the type() function:
Example
Print the data type of the variable x:
x = 5
print(type(x)) <class 'int'>
In Python, the data type is set when you assign a value to a variable:
Example
Variables in Python:
x = 5
y = "Hello, World!"
Example
x = 5
y = "John"
Variables do not need to be declared with any particular type, and can even change
type after they have been set.
x = 4 # x is of type int
x = "Sally" # x is now of type str
print(x) Sally
If you want to specify the data type of a variable, this can be done with casting.
Example
x = str(3) 3
y = int(3) 3
z = float(3) 3.0
You can get the data type of a variable with the type() function.
Example
x = 5
y = "John"
print(type(x))
print(type(y)) <class 'int'>
<class 'str'>
Example
x = "John" John
# is the same as
x = 'John' John
Example
This will create two variables:
4
Sally
• Equals: a == b
• Not Equals: a != b
• Less than: a < b
• Less than or equal to: a <= b
• Greater than: a > b
• Greater than or equal to: a >= b
These conditions can be used in several ways, most commonly in "if statements"
and loops.
Example
If statement:
a = 33
b = 200
if b > a:
print("b is greater than a") b is greater than a
In this example we use two variables, a and b, which are used as part of the if
statement to test whether b is greater than a. As a is 33, and b is 200, we know
that 200 is greater than 33, and so we print to screen that "b is greater than a".
The elif keyword is pythons way of saying "if the previous conditions were not true, then try this
condition".
Example
a = 33
b = 33
if b > a:
print("b is greater than a")
elif a == b:
print("a and b are equal") a and b are equal
The else keyword catches anything which isn't caught by the preceding conditions.
Example
a = 200
b = 33
if b > a:
print("b is greater than a")
elif a == b:
print("a and b are equal")
else:
print("a is greater than b") a is greater than b
In this example a is greater than b, so the first condition is not true, also
the elif condition is not true, so we go to the else condition and print to screen
that "a is greater than b".
Example
a = 200
b = 33
if b > a:
print("b is greater than a")
else:
print("b is not greater than a") b is not greater than a
If you have only one statement to execute, you can put it on the same line as the if
statement.
Example
One line if statement:
Example
One line if else statement:
a = 2
b = 330
print("A") if a > b else print("B") "a is greater than b"
You can also have multiple else statements on the same line:
Example
One line if else statement, with 3 conditions:
a = 330
b = 330
print("A") if a > b else print("=") if a == b else print("B") =
Example
Test if a is greater than b, AND if c is greater than a:
a = 200
b = 33
c = 500
if a > b and c > a:
print("Both conditions are True") Both conditions are True
Example
Test if a is greater than b, OR if a is greater than c:
a = 200
Example
x = 41
if x > 10:
print("Above ten,")
if x > 20:
print("and also above 20!")
else:
print("but not above 20.")
Above ten,
and also above 20!
if statements cannot be empty, but if you for some reason have an if statement
with no content, put in the pass statement to avoid getting an error.
Example
a = 33
b = 200
if b > a:
pass
# having an empty if statement like this, would raise an error without the
pass statement
• while loops
• for loops
Example
Print i as long as i is less than 6:
i = 1
while i < 6:
print(i)
i += 1
1
2
3
4
5
The while loop requires relevant variables to be ready, in this example we need to
define an indexing variable, i, which we set to 1.
With the break statement we can stop the loop even if the while condition is true:
Example
Exit the loop when i is 3:
i = 1
while i < 6:
print(i)
if i == 3:
break
i += 1
1
2
3
With the continue statement we can stop the current iteration, and continue with
the next:
Example
Continue to the next iteration if i is 3:
i = 0
while i < 6:
i += 1
if i == 3:
With the else statement we can run a block of code once when the condition no
longer is true:
Example
Print a message once the condition is false:
i = 1
while i < 6:
print(i)
i += 1
else:
print("i is no longer less than 6")
1
2
3
4
5
i is no longer less than 6
A for loop is used for iterating over a sequence (that is either a list, a tuple, a
dictionary, a set, or a string).
This is less like the for keyword in other programming languages, and works more
like an iterator method as found in other object-orientated programming
languages.
With the for loop we can execute a set of statements, once for each item in a list,
tuple, set etc.
Example
Print each fruit in a fruit list:
The for loop does not require an indexing variable to set beforehand.
With the break statement we can stop the loop before it has looped through all the
items:
Example
Exit the loop when x is "banana":
With the continue statement we can stop the current iteration of the loop, and
continue with the next:
Example
Do not print banana:
apple
cherry
Example
Print all numbers from 0 to 5, and print a message when the loop has ended:
for x in range(6):
print(x)
else:
print("Finally finished!")
0
1
2
3
4
5
Finally finished!
The "inner loop" will be executed one time for each iteration of the "outer loop":
Example
Print each adjective for every fruit:
for x in adj:
for y in fruits:
print(x, y)
red apple
red banana
red cherry
big apple
big banana
big cherry
tasty apple
tasty banana
tasty cherry
Note: This page shows you how to use LISTS as ARRAYS, however, to work with
arrays in Python you will have to import a library, like the NumPy library.
Example
Create an array containing car names:
Example
Get the value of the first array item:
x = cars[0] Ford
Use the len() method to return the length of an array (the number of elements in
an array).
Example
Return the number of elements in the cars array:
x = len(cars) 3
Note: The length of an array is always one more than the highest array index.
You can use the for in loop to loop through all the elements of an array.
for x in cars:
print(x)
Ford
Volvo
BMW
Python has a set of built-in methods that you can use on lists/arrays.
Method Description
Note: Python does not have built-in support for Arrays, but Python Lists can be
used instead.
Arguments are specified after the function name, inside the parentheses. You can
add as many arguments as you want, just separate them with a comma.
Python also accepts function recursion, which means a defined function can call
itself.
EXPERIMENT 19
AIM: To write a python program to Convert Decimal to Binary, Octal and Hexadecimal.
CODE:
OUTPUT:
CODE:
OUTPUT:
CODE:
OUTPUT:
CODE:
OUTPUT:
CODE:
OUTPUT:
OUTPUT
OUT PUT
AIM: To write a python program to create a labeled pie graph using matpoltlib. pyplot.
CODE:
OUTPUT: