IP Practical File - Reference
IP Practical File - Reference
1. PANDAS
• SERIES
• DATAFRAME
2. MATPLOTLIB
3. MYSQL
4. CSV
Pandas
Series:
The Pandas Series can be defined as a one-dimensional array that is capable of storing
various data types. We can easily convert the list, tuple, and dictionary into series using
"series' method. The row labels of series are called the index. A Series cannot contain
multiple columns.
import numpy as np
n=np.arange(1,6)
s=pd.Series(n)
print(s)
Output:
Creating a series from a dictionary
Code:
import pandas as pd
seri= pd.Series(dictionary)
print(seri)
Output:
Changing the index values of a
series during creation
Code:
import pandas as pd
s=pd.Series([3,8,2,0,6],index=[6,'A',8,'B',34])
print(s)
Output:
Changing the index values of an
existing series
Code:
import pandas as pd
s=pd.Series([3,8,2,0,6])
print("Original Index")
print(s)
s.index=[6,'A',8,'B',34]
print("New Index")
print(s)
Output:
Create a series having 10 random
integers and having index from 9 to 0
Code:
import pandas as pd
import numpy as np
n=np.random.randint(100,size=10)
t=np.arange(9,-1,-1)
s=pd.Series(n,index=t)
print(s)
Output:
Create a series using a dictionary to
print month name and month number
Code:
import pandas as pd
d={1:'jan',2:'feb',3:'march',4:'april',5:'may',6:'june',
7:'july',8:'aug',9:'sep',10:'oct',11:'nov',12:'dec'}
s=pd.Series(d)
print(s)
Output:
Pandas
DataFrame:
DataFrame is a 2-dimensional labeled data structure with columns of potentially different
types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is
generally the most commonly used pandas object.
Features of DataFrame:
• Potentially columns are of different types
• Size – Mutable
• Labeled axes (rows and columns)
• Can Perform Arithmetic operations on rows and columns
CODE:
import pandas as pd
dfsales=pd.DataFrame(Sales)
print(dfsales)
OUTPUT:
Write a program to create a dataframe from
a list of dictionaries of the sales performance
of four zonal offices. Zone names should be
the row labels.
CODE:
import pandas as pd
zoneA = {'Target' :56000, 'Sales':58000}
zoneB ={'Target':70000, 'Sales':68000}
zoneC = {'Target':75000, 'Sales' : 78000}
zoneD ={'Target' :60000, 'Sales':61000}
sales=[zoneA, zoneB, zoneC, zoneD]
saleDf = pd.DataFrame (sales , index = ['zoneA', 'zoneB',’zoneC', 'zoneD'])
print(saleDf)
OUTPUT:
Write a program to create a dataframe
from a 2D list. Specify own index labels.
CODE:
import pandas as pd
list1 = [[ 25, 45, 60], [34, 67, 89], [88, 90, 56] ]
df1= pd.DataFrame(list1,index=['row1','row2','row3'])
print ( df1 )
OUTPUT:
Write a program to create a dataframe from
a list containing 2 lists, each containing
Target and actual Sales figures of four zonal
offices. Give appropriate row labels.
CODE:
import pandas as pd
index=['Target', 'Sales'])
print( ZsaleDf )
OUTPUT:
Consider two series objects staff and salaries hat
store the number of people in various office
branches and salaries distributed in these
branches, respectively. Write a program to create
another Series object that stores average salary
per branch and then create a dataframe object
from these series objects.
CODE:
import pandas as pd
import numpy as np
avg=salaries/staff
dtf5=pd.DataFrame(org)
print (dtf5)
OUTPUT:
Write a program to create a dataframe to
store weight, age and names of 3 people.
Print dataframe and its transpose.
CODE:
import pandas as pd
print (df)
print('Transpose:')
print (df.T)
OUTPUT:
Write a program to print the dataframe one
row at a time.
CODE:
import pandas as pd
dict={'Name':['Ram',"Pam","Sam"],
'Marks':[70,95,80]}
print(j)
print("_____________________")
OUTPUT:
Write a program to print the dataframe one
column at a line.
CODE:
import pandas as pd
print (j)
print("________________")
OUTPUT:
Write a program to print only the values
from marks column, for each row.
CODE:
import pandas as pd
print(row[ "Marks"])
OUTPUT:
Write a program to concatenate
two dataframes.
CODE:
import pandas as pd
df1=pd.DataFrame (d1)
df2=pd.DataFrame (d2)
print(df3)
OUTPUT:
Write a program to use notnull() function to
find the non-missing values, when there are
missing values in the dataframe.
CODE:
import pandas as pd
Print(df.notnull())
OUTPUT:
Write a program to Sort the pandas
dataframe on the basis of multiple columns.
CODE:
import pandas as pd
import numpy as np
d={'Name':pd.Series(['Sachin','Dhoni','Virat','Rohit',
'Shikhar']),
'Age':pd.Series([26,25,25,24,31]),
'Score':pd.Series([87,67,89,55,47])}
df= pd.DataFrame(d)
print("Dataframe contents without sorting")
print(df)
df=df.sort_values(by=['Age','Score'],ascending=[True,
False])
print("Dataframe contents after sorting")
print(df)
OUTPUT:
Write a program to implement
aggregate functions on dataframe.
CODE:
import pandas as pd
d={'Name':pd.Series(['Sachin','Dhoni','Virat','Rohit', 'Shikhar']),
'Age':pd.Series([26,25,25,24,31]), 'Score':pd.Series([87,67,89,55,47])}
df= pd.DataFrame(d)
print("Dataframe contents")
print (d)
print()
print("sum-----------\n",df.sum(numeric_only=True))
print("mean----------\n",df.mean())
print("median--------\n",df.median())
print("mode----------\n",df.mode())
print("count---------\n",df.count())
print("min---\n",df.min())
print("max---\n",df.max())
OUTPUT:
Write a program using group by function create a
dataframe.
CODE:
import pandas as pd
'Year':[2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,201
7],
'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)
print("Original DataFrame")
print(df)
print()
gdf=df.groupby('Team')
print("Groups are:---\n",gdf.groups)
print()
print("groups on the basis of riders:---\n",gdf.get_group('Riders'))
print("group size-------\n",gdf.size())
print("group count------\n",gdf.count())
OUTPUT:
Write a program to create multiple dataframes and
use merging operation to merge them.
CODE:
import pandas as pd
df1=pd.DataFrame(d1)
df2=pd.DataFrame(d2)
df3=pd.concat([df1,df2])
df4=pd.DataFrame(d3)
df5=pd.merge(df3,df4,left_on='roll_no',right_on='roll_no')
print(df5)
OUTPUT:
Write a program to Create a Data Frame Qtr Sales
where each row contains the item category, item
name, and expenditure. Locate the 3 largest values
of expenditure in this data frame.
CODE:
import pandas as pd
QtrSales = pd.DataFrame({'Item Category':[ 'A', 'B', 'A', 'A', 'B', 'C', 'B', 'C'],
'Item Name':['iPad', 'LCD', 'iPhone', 'iWatch', 'Projector', 'Hard disk',
'Smartboard', 'Pen drive'],
'Expenditure': [288000, 356000, 497000, 315000, 413000, 45000,
211000, 21000]})
print ("Dataframe QtrSales is: ")
print (QtrSales)
print("3 largest expenditure values in given dataframe are :")
print(QtrSales.sort_values("Expenditure", ascending=False).head(3) )
OUTPUT:
Write a Pandas program to count
number of columns of a dataframe.
CODE:
import pandas as pd
d = {‘col1’: [1, 2, 3, 4, 7], ‘col2’: [4, 5, 6, 9, 5], ‘col3’: [7, 8, 12, 1, 11]}
df = pd.DataFrame(data=d)
print(“Original DataFrame”)
print(df)
print(“\nNumber of columns:”)
print(len(df.columns))
OUTPUT:
MatPlotLib
Matplotlib is a plotting library for the Python programming language
and its numerical mathematics extension NumPy. It provides an
object-oriented API for embedding plots into applications using
general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+.
PROGRAMS
ON
MATPLOTLIB
Creating a line graph using matplotlib.
Code:
import numpy as np
x=np.linspace(1,5,6)
y=np.log(x)
plt.plot(x,y)
plt.show()
Output:
Creating a double line graph using
matplotlib.
Code:
import numpy as np
x=np.arange(0,10,0.1)
a=np.cos(x)
b=np.sin(x)
plt.plot(x,a,'b')
plt.plot(x,b,'r')
plt.show()
Output:
Creating a pie chart using matplotlib
Code:
import matplotlib.pyplot as plt
Section='A','B','C','D','E'
val=[8000,12000,9800,11200,11500]
sizes=val
colors=['gold','yellowgreen','lightcoral','lightskyblue','red']
explode=(0,0,0.1,0,0.1)
plt.pie(sizes,explode=explode,labels=Section,colors=colors)
plt.title('Sectionwise Performances')
plt.show()
Output:
Creating a pie chart with percentages
using matplotlib
Code:
import matplotlib.pyplot as plt
votes=[315,130,245,210]
sizes=votes
colors=['gold','yellowgreen','lightcoral','lightskyblue']
explode=(0.3,0,0,0)
plt.pie(sizes,explode=explode,labels=labels,colors=colors,
autopct='%.2f%%',shadow=False,startangle=0)
plt.axis('equal')
plt.show()
Output:
Creating a scatter chart using matplotlib
Code:
import matplotlib.pyplot as plt
x=['x','y','z','a','b']
y=[67,23,90,76,56]
plt.scatter(x,y,color='red')
plt.show()
Output:
Creating a scatter chart of boys and girls
marks using matplotlib
Code:
import matplotlib.pyplot as plt
grades_range = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
x=grades_range
y=[40, 29, 59, 48, 60, 98, 38, 45, 20, 30]
z=[79, 89, 60, 89, 100, 80, 90, 100, 80, 34]
plt.scatter(x,y,color='red')
plt.scatter(x,z,color='blue')
plt.xlabel('Marks Range')
plt.ylabel('Marks')
plt.show()
Output:
Creating a bar graph using matplotlib
Code:
import matplotlib.pyplot as plt
dl=[197000,209000,414000,196000,272000,311000,213000,455000,27
8000]
plt.bar(name,dl)
plt.xlabel("Name Of App")
plt.ylabel("Downloads ")
plt.show()
Output:
Creating a boxplot using matplotlib.
Code:
import matplotlib.pyplot as plt
value1=[72,76,24,40,57,75,78,31,32]
box_plot_data=[value1]
box=plt.boxplot(box_plot_data,vert=1,patch_artist=True,labels=['Course1'])
colors=['cyan']
patch.set_facecolor(color)
plt.show()
Output:
Creating boxplots of marks using
matplotlib.
Code:
import matplotlib.pyplot as plt
value1=[72,45,69,45,57]
value2=[79,65,24,42,87]
value3=[100,45,94,45,47]
value4=[42,79,45,43,77]
box_plot_data=[value1,value2,value3,value4]
box=plt.boxplot(box_plot_data,vert=1,patch_artist=True,labels=
['English','Math','BST','Economics'])
colors=['cyan']
plt.show()
Output:
Creating a histogram using matplotlib.
Code:
import matplotlib.pyplot as plt
ages = [24,55,62,45,11,22,34,42,42,4,99,102,110,120,
121,122,130,111,115,112,80,75,65,54,44,43,42,48]
plt.hist([24,55,62,45,11,22,34,42,42,4,99,102,110,120,
121,122,130,111,115,112,80,75,65,54,44,43,42,48],
bins=[0,10,20,30,40,50,60,70,80,90,100,110,120,130],
weights=ages,edgecolor='red')
plt.xlabel('Ages')
plt.ylabel('No. of people')
plt.title('Population')
plt.show()
Output:
Creating a double bar graph showing
the number of downloads and prices
of apps.
Code:
import matplotlib.pyplot as plt
import numpy as np
price=[75,120,190,245,550,55,175,75,140]
dl=[197000,209000,414000,196000,272000,311000,213000,455000,27
8000]
w=0.4
bar1=np.arange(len(name))
for i in dl:
k=i/1000
dl1.append(k)
plt.bar(bar1,price,w,label='price')
plt.bar(bar2,dl1,w,label='downloads')
plt.xticks(bar1+w/2,name)
plt.xlabel("Name Of App")
plt.show()
Output:
Creating a horizontal bar chart
using matplotlib.
Code:
import matplotlib.pyplot as plt
height=[5.1,5.5,6.0,5.0,6.3]
Names=('Asma','Bela','Chris','Diya','Saqib')
plt.barh(Names,height)
plt.xlabel("Height")
plt.ylabel("Names")
plt.show()
Output:
MYSQL
MySQL is an open-source relational database management system (RDBMS). Its name is a
combination of "My", the name of co-founder Michael Widenius's daughter, and "SQL", the
abbreviation for Structured Query Language. A relational database organizes data into
one or more data tables in which data types may be related to each other; these relations
help structure the data.
SQL is a language programmers use to create, modify and extract data from the relational
database, as well as control user access to the database. In addition to relational databases
and SQL, an RDBMS like MySQL works with an operating system to implement a relational
database in a computer's storage system, manages users, allows for network access and
facilitates testing database integrity and creation of backups.
PROGRAMS
ON
MYSQL
TABLE 1:
Display table EMPLOYEE and SALGRADE;
Code to display both the tables joint on Sgrade
Salgrade.Sgrade;”
Code:
import mysql.connector as mq
mydb=mq.connect(host="localhost",user="root",passwd="spider",
database="school")
mycursor=mydb.cursor()
mycursor.execute("select*from sales;")
for x in mycursor:
print(x)
Output:
Code to connect to mysql database and print
table Sales using pandas read_sql function
Code:
import mysql.connector as mq
import pandas as pd
mydb=mq.connect(host="localhost",user="root",passwd="spider",
database="school")
mycursor=mydb.cursor()
print(df1)
Output:
Code to connect to mysql database and print
table Sales where name like ‘%h%’
Code:
import mysql.connector as mq
import pandas as pd
mydb=mq.connect(host="localhost",user="root",passwd="spider",
database="school")
mycursor=mydb.cursor()
df1=pd.read_sql(qrystr,mydb)
print(df1)
Output:
Code to connect to mysql database and count
locationID without any repitions
Code:
import mysql.connector as mq
import pandas as pd
mydb=mq.connect(host="localhost",user="root",passwd="spider",
database="school")
mycursor=mydb.cursor()
print(df1)
Output:
Code to connect to mysql database and :
Code:
import mysql.connector as mq
import pandas as pd
mydb=mq.connect(host="localhost",user="root",passwd="spider",
database="school")
mycursor=mydb.cursor()
%s, %s)"""
('Paresh',19,'M'),('Ali',17,'M'),('Gargi',17,'F')]
mycursor.executemany(sql, rows)
mydb.commit()
df1=pd.read_sql(sql,mydb)
print(df1)
mydb.close()
Output:
CSV
A comma-separated values file is a delimited text file that uses a
comma to separate values. Each line of the file is a data record. Each
record consists of one or more fields, separated by commas. The use
of the comma as a field separator is the source of the name for this file
format.
PROGRAMS
ON
CSV
Code to print data from csv file
“CSV example.csv”
Code:
import pandas
df=pandas.read_csv("CSV example.csv")
print(df)
Output:
Code to print data from csv file “Marks.csv”
and create columns total and average
Code:
import pandas
df1=pandas.read_csv("Marks.csv")
print(df1)
df1['Total']=df1['English']+df1['Maths']+df1['Science']
df1['Average']=df1['Total']/3
print(df1)
Output:
Code to connect to csv file “EMP.csv” and:
df1=pandas.read_csv("EMP.csv")
print(df1)
df2=pandas.read_csv("EMP.csv",header=None,skiprows=1)
print(df2)
# Print csv table changing column name
df3=pandas.read_csv("EMP.csv",names=["Emp ID","Emp
Name","Designation","Salary"],skiprows=1)
print(df3)
df4=pandas.read_csv("Emp.csv")
df5=pandas.read_csv("Emp.csv",nrows=3)
print(df5.tail(1))
Output:
1
2
3
4