0% found this document useful (0 votes)
61 views

12 IP-Data Visualization (Part-2) - Note

Bar graphs and histograms are two common types of charts used for data visualization. Bar graphs show categorical data using rectangular bars of varying heights, while histograms bin continuous data into ranges and show the frequency of data points within each bin. The document discusses how to create bar graphs and histograms in Python using Matplotlib, including how to specify bin sizes, colors, labels, and other formatting options. Key aspects like automatically generated bins and customizing bin sizes and gaps are also covered.

Uploaded by

Priyansh Patel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

12 IP-Data Visualization (Part-2) - Note

Bar graphs and histograms are two common types of charts used for data visualization. Bar graphs show categorical data using rectangular bars of varying heights, while histograms bin continuous data into ranges and show the frequency of data points within each bin. The document discusses how to create bar graphs and histograms in Python using Matplotlib, including how to specify bin sizes, colors, labels, and other formatting options. Key aspects like automatically generated bins and customizing bin sizes and gaps are also covered.

Uploaded by

Priyansh Patel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

INFORMATICS PRACTICES

CLASS XII
Topic: DATA VISUALIZATION (Part-2) - NOTES
BAR GRAPHS
A graph drawn using rectangular bars to show how
large each value is. The bars can be horizontal or
vertical.
The length of the rectangular bars, each for a
specific category, represents the value of that
category.
To make a bar chart with matplotlib, we need to use
the plt.bar() function.
BAR GRAPHS
Example: Consider the following data. If we try making a BAR
graph representing age , then it would appear like this.
Name Age Gender
Representing Age Representing Age
100 Asha 15 F 25
101 Kashish 18 M 20 Sumit

Names of students
20 18
102 Meeta 20 F
15 Meeta
103 Sumit 12 M 15
Age 12
10 Kashish
Age
Remember that the x and y
axes will be swapped when 5
Asha
using barh, requiring care
when labelling. 0
Asha Kashish Meeta Sumit 0 5 10 15 20 25
Age
Names of students
Vertical Bar graphs Horizontal Bar graphs
Representing Columns of a dataframe with the help of Bar Graph
import matplotlib.pyplot as plt
import pandas as pd
dict1={'Name':['Asha','Kashish','Meeta','Sumit'],
'Age':[15,18,20,12],
'Gender':['F','M','F','M’] }
df=pd.DataFrame (dict1,index=[100,101,102,103])
print(df)
x=df['Name']
y=df['Age']
plt.bar(x,y,width=0.5,color='green')
plt.title("Representing Age of
kids",fontsize=14,color="blue")
plt.xlabel("Names of students",fontsize=14,color="red")
plt.ylabel("Age",fontsize=14,color="red")
plt.show()
Horizontal Bars
import matplotlib.pyplot as plt
import pandas as pd
dict1={'Name':['Asha','Kashish','Meeta','Sumit'],
'Age':[15,18,20,12],
'Gender':['F','M','F','M’] }
df=pd.DataFrame(dict1,index=[100,101,102,103])
print(df)
x=df['Name']
y=df['Age']
plt.barh(x,y, color='green')
plt.title("Representing Age of
kids",fontsize=14,color="blue") Remember that the x and y axes
plt.xlabel("Age",fontsize=14,color="red") will be swapped when using
plt.ylabel("Names of Students",fontsize=14,color="red") barh, requiring care when
plt.show() labelling.
Example: Plot a graph to show the Monthwise sales
of Midrange cars of a company on a Bar graph.
import matplotlib.pyplot as plt
import pandas as pd
dict1={'Month':['July','August','Sept','Oct','Nov','Dec'],
'Mid_Car':[100,80,150,170,160,145],
'Bike':[150,155,170,180,100,90],
'High_Car':[50,60,70,40,20,10] }
df=pd.DataFrame(dict1)
print(df)
x=df['Month']
y=df['Mid_Car']
plt.bar(x,y, color='green')
plt.title("Month Wise Automobile Sales in North
Region",fontsize=14,color="blue")
plt.xlabel("Month",fontsize=14,color="red")
plt.ylabel("Sales of Mid Range Cars",fontsize=14,color="red")
plt.show()
Some more Variation…………………..
x=df['Month']
y=df['Mid_Car']
plt.bar(x,y, color='green')
plt.title("Month Wise Automobile Sales in North
Region",fontsize=14,color="blue")
plt.xlabel("Month",fontsize=14,color="red")
plt.ylabel("Sales of Mid Range
Cars",fontsize=14,color="red")
plt.xticks(x, fontsize=10, rotation=30)
plt.show()
What if we want to show sales of multiple
items monthwise??? And thus show a
comparison of the sales values of various
items.
Pandas plot provides what we need here, putting
the index on the x-axis, and rendering each
column as a separate series or set of bars, with a
(usually) neatly positioned legend.
import matplotlib.pyplot as plt
import pandas as pd
dict1={'Mid_Car':[100,80,150,170],
'Bike':[150,155,170,180],
'High_Car':[50,60,70,40]
}
df=pd.DataFrame(dict1, index=['July','August','Sept','Oct'])
df.plot(kind='bar', color=['green','black','orange'])
plt.title("Comparison of Sales Month
wise",fontsize=14,color="blue")
plt.xlabel("Month",fontsize=14,color="red")
plt.ylabel("Sales",fontsize=14,color="red")
plt.xticks(fontsize=10, rotation=30)
plt.show()
HISTOGRAMS
It’s a bar chart showing FREQUENCY DISTRIBUTION.
In this case, the data is grouped into ranges, such as "100 to 199 ", " 200 to 300",
etc, and then plotted as bars based on the frequency values. The Range is also
called as the “Bins”.
The width of the bars show the bins and y axis shows the frequency.
It is Similar to a Bar Graph, but with a difference that, in a Histogram each bar is for
a range of data.
The width of the bars corresponds to the class intervals, while the height of each
bar corresponds to the frequency of the class it represents.
CONCEPT OF FREQUENCY DISTRIBUTION :
Let’s consider a test given to students out of 50 marks. Following are
the scores they get.

Test scores As per the scores lets see how many students
20 scored in different range of scores. Like,
30
20-25 3
45
32 26-30 1
34 31-35 2
24 36-40 0
25 41-45 1
48 46-50 4
50 This data is called the frequency distribution table.
50
49
To manually construct a histogram:
1. The first step is to “bin” the range of values, i.e., divide
the entire range of values into a series of intervals. These
bins may or may not be of same interval size.
2. Then count how many values fall into each interval.
NOTE: The bins are usually non-overlapping intervals of a
variable.
So the histogram of the previously
mentioned data looks like:
HOW TO DRAW HISTOGRAMS IN PYTHON???
Considering the above given data for marks, lets write the code to make the
histogram in python pandas.

Example 1:
import matplotlib.pyplot as plt
data=[20,30,45,32,34,24,25,48,50,50,49]
b=[20,26,31,36,41,46]

plt.hist(data, bins=b, color="green", label="marks")


plt.xlabel("student marks")
plt.ylabel("frequency")
plt.legend()
plt.show()
Example 2:
import matplotlib.pyplot as plt
data=[20,30,45,32,34,24,25,48,50,50,49]
b=[20,26,31,36,41,46]
plt.hist(data,bins=b,color="green", label="marks", edgecolor="black")
plt.xlabel("student marks")
plt.ylabel("frequency")
plt.legend()
plt.show()
Bin Frequency
Example of a Histogram with
0-33 >0 and <33 2
varying bin size 33-45 >=33 and <45 3
Example 3: 45-60 >=45 and <60 1
60-100 >=60 and <100 4
import matplotlib.pyplot as plt
data=[40,60,55,20,35,70,60,89,20,33]
bins=[0,33,45,60,100]
plt.hist(data,bins,color="green",
edgecolor="black")
plt.show()
What happens if we do not specify
the intervals or the bins to python?
• It will make 10 equal bins from the
data given to it automatically.

import matplotlib.pyplot as plt


data=[40,60,55,20,35,70,60,89,20,33]
plt.hist(data,color="green",edgecolor
="black")
plt.show()
What happens if we specify the
number of bins we want him to create?
import matplotlib.pyplot as plt
data=[40,60,55,20,35,70,60,89,20,33]
plt.hist(data,bins=5,color="green",
edgecolor="black")
plt.show()
Key points about BINS:
 It mentions the sequence of integers
 It is an optional argument to the hist().
 When not mentioned, python by default creates 10 bins of equal
range from the data given. It takes the lowest value and the
highest value from the data and divide the range into 10 equal
parts.
 If bins is (customized )mentioned as [11,15,20,30], then it will
have 3bins ( that is one less than the number of values
mentioned)
 As per the above example, the bins would be
11----15( including 11 but excluding 15)
15----20 (including 15 but excluding 20
20----30 (including both 20 and 30)
 We can also specify the number of bins we need by writing bins=n.
How to create gaps between the bars?
import matplotlib.pyplot as plt
data=[40,60,55,20,35,70,60,89,20,33]
plt.hist(data,bins=5,color="green",
edgecolor="black", rwidth=0.9)
plt.show()

By default: the value of rwidth is 1

You might also like