FDS Notes Unit-5
FDS Notes Unit-5
21CSS202T
Unit-5
Visualization
Installation of Matplotlib
pip install matplotlib
Import it in Your Applications
import matplotlib
Check Version
print(matplotlib.__version__)
Pyplot
import matplotlib.pyplot as plt
Example: Draw a line in a diagram from position (0,0) to position (6,250):
import matplotlib.pyplot as plt
import numpy as np
xpoints = np.array([0, 6])
ypoints = np.array([0, 250])
plt.plot(xpoints, ypoints)
plt.show()
Multiple Points
Example: Draw a line in a diagram from position (1, 3) to (2, 8) then to (6, 1) and finally to
position (8, 10):
import matplotlib.pyplot as plt
import numpy as np
xpoints = np.array([1, 2, 6, 8])
ypoints = np.array([3, 8, 1, 10])
plt.plot(xpoints, ypoints)
plt.show()
Markers
import matplotlib.pyplot as plt
import numpy as np
ypoints = np.array([3, 8, 1, 10])
plt.plot(ypoints, marker = 'o')
plt.show()
Marker Reference
Marker
'o' Circle
'*' Star
'.' Point
',' Pixel
'x' X
'X' X (filled) Line Reference
'+' Plus Line Syntax Description
'P' Plus (filled) '-' Solid line
's' Square ':' Dotted line
'D' Diamond '--' Dashed line
'd' Diamond (thin) '-.' Dashed/dotted line
'p' Pentagon
'H' Hexagon
'h' Hexagon Color Reference
'v' Triangle Down Color Syntax Description
'_' Hline
Matplotlib Line
Example: Use a dotted line
import matplotlib.
pyplot as plt
import numpy as np
ypoints = np.array([3, 8, 1, 10])
plt.plot(ypoints, linestyle = 'dotted')
plt.show()
Line Styles
Style Or
'solid' (default) '-'
'dotted' ':'
'dashed' '--'
'dashdot' '-.'
'None' '' or ' '
Line Color
plt.plot(ypoints, color = 'r')
Line Width
ypoints = np.array([3, 8, 1, 10])
plt.plot(ypoints, linewidth = '20.5')
Multiple Lines
y1 = np.array([3, 8, 1, 10])
y2 = np.array([6, 2, 7, 11])
plt.plot(y1)
plt.plot(y2)
Example: Draw two lines by specifying the x- and y-point values for both
lines:
import matplotlib.pyplot as plt
import numpy as np
x1 = np.array([0, 1, 2, 3])
y1 = np.array([3, 8, 1, 10])
x2 = np.array([0, 1, 2, 3])
y2 = np.array([6, 2, 7, 11])
plt.plot(x1, y1, x2, y2)
plt.show()
Add Grid Lines to a Plot
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110,
115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290,
300, 310, 320, 330])
plt.title("Sports Watch Data")
plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")
plt.plot(x, y)
plt.grid()
plt.show()
#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(1, 2, 2)
#the figure has 1 row, 2 columns,
and this plot is the second plot.
plt.plot(x,y)
Matplotlib Scatter
import matplotlib.pyplot as plt
import numpy as np
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y)
plt.show()
Matplotlib Bars
import matplotlib.pyplot as plt
import numpy as np
x = np.array(["A", "B", "C", "D"])
y = np.array([3, 8, 1, 10])
plt.bar(x,y)
plt.show()
Horizontal Bars
import matplotlib.pyplot as plt
import numpy as np
x = np.array(["A", "B", "C", "D"])
y = np.array([3, 8, 1, 10])
plt.barh(x, y)
plt.show()
Bar Width
plt.bar(x, y, width = 0.1)
Bar Height
plt.barh(x, y, height = 0.1)
Histogram
A histogram is a graph showing frequency distributions.
It is a graph showing the number of observations within each given interval.
import matplotlib.pyplot as plt
import numpy as np
x = np.random.normal(170, 10, 250)
plt.hist(x)
plt.show()
1. Line plot: Lineplot is the most popular plot to draw a relationship between x and y with
the possibility of several semantic groupings. It is often used to track changes over intervals.
Syntax : sns.lineplot(x=None, y=None)
Parameters:
x, y: Input data variables; must be numeric. Can pass data directly or reference columns in data.
Example:
import pandas as pd
import seaborn as sns
data = {'Weight':[ 254, 354, 230, 253 ],
'Age':[ 21 , 28 , 29 , 30 ]}
df = pd.DataFrame( data )
sns.lineplot(x=df['Age'], y=df['Weight'])
2. Scatter Plot: Scatter plots are used to visualize the relationship between two numerical
variables. They help identify correlations or patterns. It can draw a two-dimensional graph.
Syntax: seaborn.scatterplot(x=None, y=None)
Parameters:
x, y: Input data variables that should be numeric.
Returns: This method returns the Axes object with the plot drawn onto it.
Example:
import pandas as pd
import seaborn as sns
data = {'Age':[ 21 , 22, 23,24,25, 28 , 29 , 30
], 'Weight':[ 230 , 221 , 243, 246, 265, 268,
259 , 228 ] }
df = pd.DataFrame( data )
sns.scatterplot(x=df['Age'],y=df['Weight'])
3. Box plot: A box plot (or box-and-whisker plot) s is the visual representation of the depicting
groups of numerical data through their quartiles against continuous/categorical data.
A box plot consists of 5 things.
Minimum
First Quartile or 25%
Median (Second Quartile) or 50%
Third Quartile or 75%
Maximum
Syntax: seaborn.boxplot(x=None, y=None, hue=None, data=None)
Parameters:
x, y, hue: Inputs for plotting long-form data.
data: Dataset for plotting. If x and y are absent, this is interpreted as wide-form.
Returns: It returns the Axes object with the plot drawn onto it.
import pandas as pd
import seaborn as sns
data = {'Name':[ 'Mohe' , 'Karnal' , 'Yrik' ,
'jack' ],'Age':[ 21 , 28 , 29, 30 ]}
df = pd.DataFrame( data )
sns.boxplot( df['Age'] )
4. Violin Plot: A violin plot is similar to a boxplot. It shows several quantitative data across
one or more categorical variables such that those distributions can be compared.
Syntax: seaborn.violinplot(x=None, y=None, hue=None, data=None)
Parameters:
x, y, hue: Inputs for plotting long-form data.
data: Dataset for plotting.
Example:
import pandas as pd
import seaborn as sns
data = {'Name':[ 'Mohe' , 'Karnal' , 'Yrik' ,
'jack' ],'Age':[ 30 , 21 , 29 , 28 ]}
df = pd.DataFrame( data )
sns.violinplot(data['Age'])
5. Swarm plot: A swarm plot with non-overlapping points against categorical data.
Syntax: seaborn.swarmplot(x=None, y=None, hue=None, data=None)
Parameters:
x, y, hue: Inputs for plotting long-form data.
data: Dataset for plotting.
Example:
import seaborn
seaborn.set(style = 'whitegrid')
data = pandas.read_csv( "nba.csv" )
seaborn.swarmplot(x = data["Age"]
6. Bar plot: Barplot represents an estimate of central tendency for a numeric variable with
the height of each rectangle and provides some indication of the uncertainty around that estimate
using error bars.
Syntax : seaborn.barplot(x=None, y=None, hue=None, data=None)
Parameters :
x, y : This parameter take names of variables in data or vector data, Inputs for plotting long-form
data.
hue : (optional) This parameter take column name for colour encoding.
data : (optional) This parameter take DataFrame, array, or list of arrays, Dataset for plotting. If x
and y are absent, this is interpreted as wide-form. Otherwise it is expected to be long-form.
Returns : Returns the Axes object with the plot drawn onto it.
Example:
import seaborn
seaborn.set(style = 'whitegrid')
# read csv and plot
data = pandas.read_csv("nba.csv")
seaborn.barplot(x ="Age", y ="Weight",
data = data)
7. Point plot: Point plot used to show point estimates and confidence intervals using scatter
plot glyphs. A point plot represents an estimate of central tendency for a numeric variable by the
position of scatter plot points and provides some indication of the uncertainty around that
estimate using error bars.
Syntax: seaborn.pointplot(x=None, y=None, hue=None, data=None)
Parameters:
x, y: Inputs for plotting long-form data.
hue: (optional) column name for color encoding.
data: dataframe as a Dataset for plotting.
Return: The Axes object with the plot drawn onto it.
Example
import seaborn
seaborn.set(style = 'whitegrid')
# read csv and plot
data = pandas.read_csv("nba.csv")
seaborn.pointplot(x = "Age", y = "Weight",
data = data)
8. Count plot: Count plot used to Show the counts of observations in each categorical bin
using bars.
Syntax : seaborn.countplot(x=None, y=None, hue=None, data=None)
Parameters :
x, y: This parameter take names of variables in data or vector data, optional, Inputs for
plotting long-form data.
hue : (optional) This parameter take column name for color encoding.
data : (optional) This parameter take DataFrame, array, or list of arrays, Dataset for
plotting. If x and y are absent, this is interpreted as wide-form. Otherwise, it is expected to
be long-form.
Returns: Returns the Axes object with the plot drawn onto it.
Example
:
import seaborn
seaborn.set(style = 'whitegrid')
data = pandas.read_csv("nba.csv")
seaborn.countplot(data["Age"])
9. KDE Plot: KDE Plot described as Kernel Density Estimate is used for visualizing the
Probability Density of a continuous variable. It depicts the probability density at different values
in a continuous variable. We can also plot a single graph for multiple samples which helps in
more efficient data visualization.
Syntax: seaborn.kdeplot(x=None, *, y=None, vertical=False, palette=None, **kwargs)
Parameters:
x, y : vectors or keys in data
vertical : boolean (True or False)
data : pandas.DataFrame, numpy.ndarray, mapping, or sequence
Example:
import seaborn as sns
import pandas
data = pandas.read_csv("nba.csv").head()
sns.kdeplot( data['Age'], data['Number'])
10. Heatmap: A heatmap is a graphical representation of data where values in a matrix are
represented as colors. It’s often used to visualize the magnitude of values in a matrix, allowing
patterns and correlations to be easily identified.
Example:
import seaborn as sns
import matplotlib.pyplot as plt
flights = sns.load_dataset("flights")
flights_pivot = flights.pivot(index="month",
columns="year", values="passengers")
sns.heatmap(flights_pivot, annot=True,
fmt="d", cmap="YlGnBu")
plt.show()
11. Cluster Map: A cluster map is a heatmap that organizes rows and columns of a
dataset based on their similarity, often using hierarchical clustering. It’s useful for identifying
patterns and relationships in complex datasets by grouping similar rows and columns together.
import seaborn as sns
import matplotlib.pyplot as plt
flights = sns.load_dataset("flights")
flights_pivot = flights.pivot(index="month",
columns="year", values="passengers")
sns.clustermap(flights_pivot,
cmap="viridis", standard_scale=1)
plt.show()
12. Pair Plot: A pair plot creates a grid of scatterplots and histograms for each pair of
variables in a dataset, allowing for visual exploration of relationships and distributions between
variables. It’s particularly useful for identifying patterns and correlations in multivariate data.
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset("tips")
sns.pairplot(tips, hue="smoker",
palette="coolwarm")
plt.show()
UNIVARIATE GRAPHS FOR NUMERIC AND
CATEGORICAL DATA
Univariate Analysis is a type of data visualization where we visualize only a single variable at a
time. Univariate Analysis helps us to analyze the distribution of the variable present in the data
so that we can perform further analysis.
import pandas as pd
import seaborn as sns
data = pd.read_csv('Employee_dataset.csv')
print(data.head())
MULTIVARIATE GRAPHS
It is an extension of bivariate analysis which means it involves multiple variables at the same
time to find correlation between them. Multivariate Analysis is a set of statistical models that
examine patterns in multidimensional data by considering at once, several data variable.
import numpy as np
import matplotlib.pyplot as plt x, y = np.random.randn(2, 30)
plt.rcParams["figure.figsize"] = [7.50, 3.50] y *= 100
plt.rcParams["figure.autolayout"] = True z = func(x, y)
def func(x, y): fig, ax = plt.subplots()
return 3 * x + 4 * y - 2 + s = ax.scatter(x, y, c=z, s=100, marker='*',
np.random.randn(30) cmap='plasma')
fig.colorbar(s)
plt.show()
INTRODUCTION TO DASHBOARDS
Dash is a Python framework for building analytical web applications. Dash helps in building
responsive web dashboards that is good to look at and is very fast without the need to understand
complex front-end frameworks or languages such as HTML, CSS, JavaScript. Let’s build our
first web dashboard using Dash.
Step 1: Importing all the required libraries: import Dash, Dash Core
Components (which has components like graph, inputs etc., ) and Dash
HTML Components(which has HTML components like meta tags, body tags,
paragraph tags etc., )
import dash
import dash_core_components as dcc
import dash_html_components as html
Step 2: Designing a layout: make a graph which has various parameters such
as id(a unique ID to a particular graph), figure(the graph itself), layout(the
basic layout, title of graph, X axis, Y axis data etc., ).
The figure parameter is essentially a dictionary which has elements like x, y, type, name.
x refers to the X-axis value(it can be a list or a single element), y is the same except it is
associated with the Y-axis.
The type parameter refers to the type of the graph, it maybe line, bar.
The name parameter refers to the name associated with the axis of a graph
app = dash.Dash()
app.layout = html.Div(children =[
html.H1("Dash Tutorial"),
dcc.Graph(
id ="example",
figure ={
'data':[
{'x':[1, 2, 3, 4, 5],
'y':[5, 4, 7, 4, 8],
'type':'line',
'name':'Trucks'},
{'x':[1, 2, 3, 4, 5],
'y':[6, 3, 5, 3, 7],
'type':'bar',
'name':'Ships'}
],
'layout':{
'title':'Basic Dashboard'
}
}
)
])
Step 3: Running the server: The dashboard is now ready, but it needs a server
to run on.
if __name__ == '__main__':
app.run_server()
Open the app on the web browser in localhost and default port 8050.
http://127.0.0.1:8050/