7th class of CSV and DataFrame
7th class of CSV and DataFrame
5 6 Shruti 35 55 50 90 35
6 7 Shikha 50 65 55 80 50 When we compare the dataframe with
csv file we see that dataframe has taken
7 8 Rohan 60 35 65 50 60
the first row as column names.
2
(Note: while giving filepaths, sometimes error is reported with path even through we
have given the right path. To avoid this error we may give file paths with double slashes
i.e in place of
d:\python_programs\students.csv
we can write
d:\\python_programs\\students.csv)
When we read the such CSV file by giving the filepath, it takes the column header as first
row which is itself a data:
For example:
import pandas as pd
df1=pandas.read_csv(“d:\python_programs\students.csv”)
print(df1)
1 Raman 30 20 40 50 45
0 2 Suman 40 25 50 60 55
1 3 Simran 20 26 60 40 65
2 4 Chetan 50 35 70 60 47
3 5 Ravi 45 45 40 70 65
4 6 Shruti 35 55 50 90 35
5 7 Shikha 50 65 55 80 50
6 8 Rohan 60 35 65 50 60
3
names
it is used to specify our own column heading in dataframe.
In this case we can specify our own column headings in read_csv( ) using name argument
as per the syntax given below:
<DataFrame> = pandas.read_csv(<path of the file>, names=<sequence
containing column name>)
For example:
import pandas as pd
df1=pd.read_csv("d:\python_programs\
students.csv",names=["Roll","Names","AC","BS","ECO","ENG","IP"])
print(df1)
Roll Names AC BS ECO ENG IP
0 1 Raman 30 20 40 50 45
1 2 Suman 40 25 50 60 55
2 3 Simran 20 26 60 40 65
3 4 Chetan 50 35 70 60 47
4 5 Ravi 45 45 40 70 65
5 6 Shruti 35 55 50 90 35
6 7 Shikha 50 65 55 80 50
7 8 Rohan 60 35 65 50 60 as
header
It is used to specify our own heading in dataframe as 0,1,2,3…..
If we want the first row not to be used as header and at the same time we do not want
to specify column heading rather go with default column heading as like 0,1,2,3…….,
then simply we give argument as header=None in read_csv( ) as given below:
<dataframe>=pandas.csv_read(“path of file”,header=none)
import pandas as pd
df1=pd.read_csv("d:\python_programs\
students.csv",names=["Roll","Names","AC","BS","ECO","ENG","IP"])
print(df1)
Roll Names AC BS ECO ENG IP
0 1 Raman 30 20 40 50 45
4
1 2 Suman 40 25 50 60 55
2 3 Simran 20 26 60 40 65
3 4 Chetan 50 35 70 60 47
4 5 Ravi 45 45 40 70 65
5 6 Shruti 35 55 50 90 35
6 7 Shikha 50 65 55 80 50
7 8 Rohan 60 35 65 50 60
df2=pd.read_csv("d:\python_programs\students.csv", header=None)
print(df2)
0 1 2 3 4 5 6
0 1 Raman 30 20 40 50 45
1 2 Suman 40 25 50 60 55
2 3 Simran 20 26 60 40 65
3 4 Chetan 50 35 70 60 47
4 5 Ravi 45 45 40 70 65
5 6 Shruti 35 55 50 90 35
6 7 Shikha 50 65 55 80 50
7 8 Rohan 60 35 65 50 60
skiprows
It is used to skip specified number of rows from top.
In a situation when CSV file containing some column heading as first row but
we want to use them and want to show our own column heading in that case
we give the two argument as name=<column heading sequence> and another
skiprows=<n>(here n is number of rows which we want to skip from CSV
files).
Syntax is:
<dataframe>=pandas.read_CSV(“file path”, name=<column heading sequence> ,
skiprows=<n>)
For example:
df1=pd.read_csv("d:\python_programs\
students.csv",names=["Roll","Names","AC","BS","ECO","ENG","IP"],skiprows=1)
Roll Names AC BS ECO ENG IP
0 2 Suman 40 25 50 60 55 Here we can see that first row is skipped
1 3 Simran 20 26 60 40 65 while reading data from CSV file.
2 4 Chetan 50 35 70 60 47
3 5 Ravi 45 45 40 70 65
4 6 Shruti 35 55 50 90 35
5 7 Shikha 50 65 55 80 50
6 8 Rohan 60 35 65 50 60
nrows
It is used to read specified number of rows from CSV file.
5
Giving argument nrows=<n> in read_csv( ), reads the specified number of rows from the
CSV file.
For example:
df1=pd.read_csv("d:\python_programs\
students.csv",names=["Roll","Names","AC","BS","ECO","ENG","IP"],nrows=4)
Roll Names AC BS ECO ENG IP
0 1 Raman 30 20 40 50 45
1 2 Suman 40 25 50 60 55
2 3 Simran 20 26 60 40 65
3 4 Chetan 50 35 70 60 47
sep=<separator character>
It is used to specify our own separator character which
is by default comma.
Syntax is:
<dataframe>=pandas.read_CSV(“file path”, sep=<characrer to be used as
separator>,name=<column heading sequence> , skiprows=<n>)
df1=pd.read_csv("d:\python_programs\
students.csv",sep=’;’names=["Roll","Names","AC","BS","ECO","ENG","IP"],nrows=4)
Roll Names AC BS ECO ENG IP
0 1,Raman,30,20,40,50,45
1 2,Suman,40,25,50,60,55
2 3,Simran,20,26,60,40,65
3 4,Chetan,50,35,70,60,47
4 5,Ravi,45,45,40,70,65
5 6,Shruti,35,55,50,90,35
6 7,Shikha,50,65,55,80,50
7 8,Rohan,60,35,65,50,60
For Example:
import pandas as pd
import numpy as np
roll=[1,2,3,4,5]
name=['Suhani','Vandana','Ramdev','Subhash',"Ranjana"]
city=['rajkot','mehsana','junavadh','jamnagar','Surat']
subject=["Science","Math","SST","Hindi","English"]
marks=[440,350,410,355,295]
student={"Roll":roll,"Name":name,"City":city,"Subject":subject,"Marks":marks}
stdf1=pd.DataFrame(student)
print(stdf1)
#Storing data of DataFrame to sreport.csv
stdf1.to_csv("d:\\python_programs\\csv1\sreport.csv")
NOTE:
You have to place the r character before the path name, to take care
of any symbols within the path name such as the backslash symbol
which is used frequently. If you don’t use the r character it runs into
an (unicode error) as shown:
7
The filename.csv represents the file name you want to create. You can type your own
file name if you like to
The (.csv) represents the file type, which is CSV(Comma Seperated Values) filetype
Once you run the Python codes, the CSV file will be saved at your specified location.
When we store dataframe with NaN values they are stored as empty string in CSV file.
For example:
In the above dataframe stdf1 we assigned NaN to subject of Ramdev and city of Ranjana and Suhani like
given below:
stdf1.loc[2,"Subject"]=np.NaN
stdf1.loc[4,"City"]=np.NaN
stdf1.loc[0,"City"]=np.NaN
print(stdf1)
Then the output is:
Roll Name City Subject Marks
0 1 Suhani NaN Science 440
1 2 Vandana mehsana Math 350
2 3 Ramdev junavadh NaN 410
3 4 Subhash jamnagar Hindi 355
4 5 Ranjana NaN English 295
In this case we can specify our own string that can be written for missing /NaN values by giving an
argument
na_rep=<string value>
This string value will be written in place of missing values.
Na_rep=”unknown values”
8
stdf1.to_csv("d:\\python_programs\\csv1\sreport2.csv",header=None,index=None,na_rep=”not known”)
Question:
Write a program in Python to accept number of employees first from an user,
then ask an user to enter name, department and basic salary of employees.
Calculate da which is 50% of basic, hra which is 25% of basic, total which is sum
of basic,da,hra, the calculate income tax which is 10% of total. Store these
values into emp.csv which is stored inside csv1subdirectory inside d drive.