12 IP-Data Visualization (Part-2) - Note
12 IP-Data Visualization (Part-2) - Note
CLASS XII
Topic: DATA VISUALIZATION (Part-2) - NOTES
BAR GRAPHS
A graph drawn using rectangular bars to show how
large each value is. The bars can be horizontal or
vertical.
The length of the rectangular bars, each for a
specific category, represents the value of that
category.
To make a bar chart with matplotlib, we need to use
the plt.bar() function.
BAR GRAPHS
Example: Consider the following data. If we try making a BAR
graph representing age , then it would appear like this.
Name Age Gender
Representing Age Representing Age
100 Asha 15 F 25
101 Kashish 18 M 20 Sumit
Names of students
20 18
102 Meeta 20 F
15 Meeta
103 Sumit 12 M 15
Age 12
10 Kashish
Age
Remember that the x and y
axes will be swapped when 5
Asha
using barh, requiring care
when labelling. 0
Asha Kashish Meeta Sumit 0 5 10 15 20 25
Age
Names of students
Vertical Bar graphs Horizontal Bar graphs
Representing Columns of a dataframe with the help of Bar Graph
import matplotlib.pyplot as plt
import pandas as pd
dict1={'Name':['Asha','Kashish','Meeta','Sumit'],
'Age':[15,18,20,12],
'Gender':['F','M','F','M’] }
df=pd.DataFrame (dict1,index=[100,101,102,103])
print(df)
x=df['Name']
y=df['Age']
plt.bar(x,y,width=0.5,color='green')
plt.title("Representing Age of
kids",fontsize=14,color="blue")
plt.xlabel("Names of students",fontsize=14,color="red")
plt.ylabel("Age",fontsize=14,color="red")
plt.show()
Horizontal Bars
import matplotlib.pyplot as plt
import pandas as pd
dict1={'Name':['Asha','Kashish','Meeta','Sumit'],
'Age':[15,18,20,12],
'Gender':['F','M','F','M’] }
df=pd.DataFrame(dict1,index=[100,101,102,103])
print(df)
x=df['Name']
y=df['Age']
plt.barh(x,y, color='green')
plt.title("Representing Age of
kids",fontsize=14,color="blue") Remember that the x and y axes
plt.xlabel("Age",fontsize=14,color="red") will be swapped when using
plt.ylabel("Names of Students",fontsize=14,color="red") barh, requiring care when
plt.show() labelling.
Example: Plot a graph to show the Monthwise sales
of Midrange cars of a company on a Bar graph.
import matplotlib.pyplot as plt
import pandas as pd
dict1={'Month':['July','August','Sept','Oct','Nov','Dec'],
'Mid_Car':[100,80,150,170,160,145],
'Bike':[150,155,170,180,100,90],
'High_Car':[50,60,70,40,20,10] }
df=pd.DataFrame(dict1)
print(df)
x=df['Month']
y=df['Mid_Car']
plt.bar(x,y, color='green')
plt.title("Month Wise Automobile Sales in North
Region",fontsize=14,color="blue")
plt.xlabel("Month",fontsize=14,color="red")
plt.ylabel("Sales of Mid Range Cars",fontsize=14,color="red")
plt.show()
Some more Variation…………………..
x=df['Month']
y=df['Mid_Car']
plt.bar(x,y, color='green')
plt.title("Month Wise Automobile Sales in North
Region",fontsize=14,color="blue")
plt.xlabel("Month",fontsize=14,color="red")
plt.ylabel("Sales of Mid Range
Cars",fontsize=14,color="red")
plt.xticks(x, fontsize=10, rotation=30)
plt.show()
What if we want to show sales of multiple
items monthwise??? And thus show a
comparison of the sales values of various
items.
Pandas plot provides what we need here, putting
the index on the x-axis, and rendering each
column as a separate series or set of bars, with a
(usually) neatly positioned legend.
import matplotlib.pyplot as plt
import pandas as pd
dict1={'Mid_Car':[100,80,150,170],
'Bike':[150,155,170,180],
'High_Car':[50,60,70,40]
}
df=pd.DataFrame(dict1, index=['July','August','Sept','Oct'])
df.plot(kind='bar', color=['green','black','orange'])
plt.title("Comparison of Sales Month
wise",fontsize=14,color="blue")
plt.xlabel("Month",fontsize=14,color="red")
plt.ylabel("Sales",fontsize=14,color="red")
plt.xticks(fontsize=10, rotation=30)
plt.show()
HISTOGRAMS
It’s a bar chart showing FREQUENCY DISTRIBUTION.
In this case, the data is grouped into ranges, such as "100 to 199 ", " 200 to 300",
etc, and then plotted as bars based on the frequency values. The Range is also
called as the “Bins”.
The width of the bars show the bins and y axis shows the frequency.
It is Similar to a Bar Graph, but with a difference that, in a Histogram each bar is for
a range of data.
The width of the bars corresponds to the class intervals, while the height of each
bar corresponds to the frequency of the class it represents.
CONCEPT OF FREQUENCY DISTRIBUTION :
Let’s consider a test given to students out of 50 marks. Following are
the scores they get.
Test scores As per the scores lets see how many students
20 scored in different range of scores. Like,
30
20-25 3
45
32 26-30 1
34 31-35 2
24 36-40 0
25 41-45 1
48 46-50 4
50 This data is called the frequency distribution table.
50
49
To manually construct a histogram:
1. The first step is to “bin” the range of values, i.e., divide
the entire range of values into a series of intervals. These
bins may or may not be of same interval size.
2. Then count how many values fall into each interval.
NOTE: The bins are usually non-overlapping intervals of a
variable.
So the histogram of the previously
mentioned data looks like:
HOW TO DRAW HISTOGRAMS IN PYTHON???
Considering the above given data for marks, lets write the code to make the
histogram in python pandas.
Example 1:
import matplotlib.pyplot as plt
data=[20,30,45,32,34,24,25,48,50,50,49]
b=[20,26,31,36,41,46]