0% found this document useful (0 votes)
11 views

7th class of CSV and DataFrame

The document provides a comprehensive guide on importing and exporting data between CSV files and DataFrames using Python's Pandas library. It explains the structure and advantages of CSV files, demonstrates how to read from and write to CSV files, and discusses various parameters for customizing the read and write operations. Additionally, it covers handling missing values and saving DataFrames without headers and indices.

Uploaded by

Sanjay Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

7th class of CSV and DataFrame

The document provides a comprehensive guide on importing and exporting data between CSV files and DataFrames using Python's Pandas library. It explains the structure and advantages of CSV files, demonstrates how to read from and write to CSV files, and discusses various parameters for customizing the read and write operations. Additionally, it covers handling missing values and saving DataFrames without headers and indices.

Uploaded by

Sanjay Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Importing/Exporting Data between CSV files and Data Frames

What is a .CSV file?


The acronym CSV is short for comma-separated values file, which allows data to be
saved in a tabular data that has been saved as plaintext where data is separated by
commas. CSV files can be used with most any spreadsheet program, such as Microsoft
Excel or Google Spreadsheets. They differ from other spreadsheet file types because you
can only have a single sheet in a file, they can not save cell, column, or row. Also, you
cannot not save formulas in this format. Here is an example of CSV format

Roll No. Name Marks


Roll No., Name , Marks
101 Ruby 87.5
101, Ruby, 87.5
102 Richa 92.5 102, Richa, 92.5

When converted to CSV format it appears as :


As you can see that in CSV format:
i. Each row of the table is stored in one row i.e., the number of rows in a CSV file are
equal to number of rows in the table.
ii. The field-values of a row are stored together with commas after field value; but
after the last field’s value in a line/row, no comma is given, just the end of line.

The CSV format is popular as it offers following advantages:


 A simple, compact and universal format for data storage.
 A common format for data interchange.
 It can be opened in popular spreadsheet packages like Ms-Excel, Calc etc.
 Nearly all spreadsheets and databases support import/export to csv format.
They also serve other primary business functions:
i. CSV files are plain-text files, making them easier for the website developer to
create.
ii. Since they're plain text, they're easier to import into a spreadsheet or another
storage database, regardless of the specific software you're using.
iii. To better organize large amounts of data.

Loading Data from CSV to DataFrames:


Python Pandas library offers two functions read_csv( ) and to_CSV( )
that help us bring data from CSV file into a dataframe and write a dataframe’s to a CSV
file.
1
We can create a CSV file by saving data of an Ms-Excel file in CSV format using Save As
command from File tab/menu and selecting Save As Type as CSV Format.
For example given below the example of Ms-Excel Data and CSV file format data.

Reading From a CSV File to DataFrame:


read_csv( ) function is used to read data from a CSV file in a DataFrame.
Syntax to read data from CSV file is:

<DataFrame> = pandas.read_csv(<path of the file>)


For example:
import pandas as pd
df1=pandas.read_csv(“d:\python_programs\students.csv”)
print(df1)
Roll Name Accounts BS Eco Eng IP
Roll,Name,Accounts,BS,Eco,Eng,IP
0 1 Raman 30 20 40 50 45 1,Raman,30,20,40,50,45
2,Suman,40,25,50,60,55
1 2 Suman 40 25 50 60 55
3,Simran,20,26,60,40,65
2 3 Simran 20 26 60 40 65 4,Chetan,50,35,70,60,47
5,Ravi,45,45,40,70,65
3 4 Chetan 50 35 70 60 47 6,Shruti,35,55,50,90,35
7,Shikha,50,65,55,80,50
4 5 Ravi 45 45 40 70 65 8,Rohan,60,35,65,50,60

5 6 Shruti 35 55 50 90 35
6 7 Shikha 50 65 55 80 50 When we compare the dataframe with
csv file we see that dataframe has taken
7 8 Rohan 60 35 65 50 60
the first row as column names.

2
(Note: while giving filepaths, sometimes error is reported with path even through we
have given the right path. To avoid this error we may give file paths with double slashes
i.e in place of
d:\python_programs\students.csv
we can write
d:\\python_programs\\students.csv)

Different arguments of read_csv( ) functions are:-


Reading CSV File and Specifying Own Column Names:
We may have CSV file that does not have top row containing column header for
example:

When we read the such CSV file by giving the filepath, it takes the column header as first
row which is itself a data:
For example:
import pandas as pd
df1=pandas.read_csv(“d:\python_programs\students.csv”)
print(df1)
1 Raman 30 20 40 50 45
0 2 Suman 40 25 50 60 55
1 3 Simran 20 26 60 40 65
2 4 Chetan 50 35 70 60 47
3 5 Ravi 45 45 40 70 65
4 6 Shruti 35 55 50 90 35
5 7 Shikha 50 65 55 80 50
6 8 Rohan 60 35 65 50 60
3
names
it is used to specify our own column heading in dataframe.
In this case we can specify our own column headings in read_csv( ) using name argument
as per the syntax given below:
<DataFrame> = pandas.read_csv(<path of the file>, names=<sequence
containing column name>)
For example:
import pandas as pd
df1=pd.read_csv("d:\python_programs\
students.csv",names=["Roll","Names","AC","BS","ECO","ENG","IP"])
print(df1)
Roll Names AC BS ECO ENG IP
0 1 Raman 30 20 40 50 45
1 2 Suman 40 25 50 60 55
2 3 Simran 20 26 60 40 65
3 4 Chetan 50 35 70 60 47
4 5 Ravi 45 45 40 70 65
5 6 Shruti 35 55 50 90 35
6 7 Shikha 50 65 55 80 50
7 8 Rohan 60 35 65 50 60 as

header
It is used to specify our own heading in dataframe as 0,1,2,3…..
If we want the first row not to be used as header and at the same time we do not want
to specify column heading rather go with default column heading as like 0,1,2,3…….,
then simply we give argument as header=None in read_csv( ) as given below:
<dataframe>=pandas.csv_read(“path of file”,header=none)
import pandas as pd
df1=pd.read_csv("d:\python_programs\
students.csv",names=["Roll","Names","AC","BS","ECO","ENG","IP"])
print(df1)
Roll Names AC BS ECO ENG IP
0 1 Raman 30 20 40 50 45
4
1 2 Suman 40 25 50 60 55
2 3 Simran 20 26 60 40 65
3 4 Chetan 50 35 70 60 47
4 5 Ravi 45 45 40 70 65
5 6 Shruti 35 55 50 90 35
6 7 Shikha 50 65 55 80 50
7 8 Rohan 60 35 65 50 60

df2=pd.read_csv("d:\python_programs\students.csv", header=None)
print(df2)
0 1 2 3 4 5 6
0 1 Raman 30 20 40 50 45
1 2 Suman 40 25 50 60 55
2 3 Simran 20 26 60 40 65
3 4 Chetan 50 35 70 60 47
4 5 Ravi 45 45 40 70 65
5 6 Shruti 35 55 50 90 35
6 7 Shikha 50 65 55 80 50
7 8 Rohan 60 35 65 50 60

skiprows
It is used to skip specified number of rows from top.
In a situation when CSV file containing some column heading as first row but
we want to use them and want to show our own column heading in that case
we give the two argument as name=<column heading sequence> and another
skiprows=<n>(here n is number of rows which we want to skip from CSV
files).
Syntax is:
<dataframe>=pandas.read_CSV(“file path”, name=<column heading sequence> ,
skiprows=<n>)
For example:
df1=pd.read_csv("d:\python_programs\
students.csv",names=["Roll","Names","AC","BS","ECO","ENG","IP"],skiprows=1)
Roll Names AC BS ECO ENG IP
0 2 Suman 40 25 50 60 55 Here we can see that first row is skipped
1 3 Simran 20 26 60 40 65 while reading data from CSV file.
2 4 Chetan 50 35 70 60 47
3 5 Ravi 45 45 40 70 65
4 6 Shruti 35 55 50 90 35
5 7 Shikha 50 65 55 80 50
6 8 Rohan 60 35 65 50 60

nrows
It is used to read specified number of rows from CSV file.

5
Giving argument nrows=<n> in read_csv( ), reads the specified number of rows from the
CSV file.
For example:
df1=pd.read_csv("d:\python_programs\

Because of nrows=4 only 4


rows have been read from
CSV file.

students.csv",names=["Roll","Names","AC","BS","ECO","ENG","IP"],nrows=4)
Roll Names AC BS ECO ENG IP
0 1 Raman 30 20 40 50 45
1 2 Suman 40 25 50 60 55
2 3 Simran 20 26 60 40 65
3 4 Chetan 50 35 70 60 47

sep=<separator character>
It is used to specify our own separator character which
is by default comma.
Syntax is:
<dataframe>=pandas.read_CSV(“file path”, sep=<characrer to be used as
separator>,name=<column heading sequence> , skiprows=<n>)
df1=pd.read_csv("d:\python_programs\
students.csv",sep=’;’names=["Roll","Names","AC","BS","ECO","ENG","IP"],nrows=4)
Roll Names AC BS ECO ENG IP
0 1,Raman,30,20,40,50,45
1 2,Suman,40,25,50,60,55
2 3,Simran,20,26,60,40,65
3 4,Chetan,50,35,70,60,47
4 5,Ravi,45,45,40,70,65
5 6,Shruti,35,55,50,90,35
6 7,Shikha,50,65,55,80,50
7 8,Rohan,60,35,65,50,60

Storing a pandas Dataframe Data to a CSV File


To work with csv file using pandas you need to follow these steps:
1. First you’ll need to import the pandas and numpy module.
2. Create the DataFrame for your data
3. Pass your dataframe as a parameter to to_csv() to write your data in
csv file format
6
4. CSV file are saved in the default directory but it can also be used to
save at a specified location.
Syntax is:
#save the result on desktop
df.to_csv(r'Path\filename.csv', index=False,mode,header)#Writes to a
CSV file type
File Mode:
mode, we specify whether we want to read r , write w or append a to the file.
We can also specify if we want to open the file in text mode or binary mode. The
default is reading in text mode.
Below are some of the most commonly used modes for
opening or creating a file.
 r : opens a text file in reading mode.
 w : opens or creates a text file in writing mode.
 a : opens a text file in append mode.
 r+ : opens a text file in both reading and writing mode. ...
 w+ : opens a text file in both reading and writing mode.

For Example:
import pandas as pd
import numpy as np
roll=[1,2,3,4,5]
name=['Suhani','Vandana','Ramdev','Subhash',"Ranjana"]
city=['rajkot','mehsana','junavadh','jamnagar','Surat']
subject=["Science","Math","SST","Hindi","English"]
marks=[440,350,410,355,295]
student={"Roll":roll,"Name":name,"City":city,"Subject":subject,"Marks":marks}
stdf1=pd.DataFrame(student)
print(stdf1)
#Storing data of DataFrame to sreport.csv
stdf1.to_csv("d:\\python_programs\\csv1\sreport.csv")
NOTE:
 You have to place the r character before the path name, to take care
of any symbols within the path name such as the backslash symbol
which is used frequently. If you don’t use the r character it runs into
an (unicode error) as shown:
7
 The filename.csv represents the file name you want to create. You can type your own
file name if you like to
 The (.csv) represents the file type, which is CSV(Comma Seperated Values) filetype

Saving csv files without headers and index


#save the data to a csv_file without the headers and index:
Syntax is:

df.to_csv('filename.csv', header=False, index=False)


For Example:
stdf1.to_csv("d:\\python_programs\\csv1\sreport1.csv",header=None,index=None)

Once you run the Python codes, the CSV file will be saved at your specified location.

Handling NaN Values with to_CSV( )

When we store dataframe with NaN values they are stored as empty string in CSV file.
For example:
In the above dataframe stdf1 we assigned NaN to subject of Ramdev and city of Ranjana and Suhani like
given below:
stdf1.loc[2,"Subject"]=np.NaN
stdf1.loc[4,"City"]=np.NaN
stdf1.loc[0,"City"]=np.NaN
print(stdf1)
Then the output is:
Roll Name City Subject Marks
0 1 Suhani NaN Science 440
1 2 Vandana mehsana Math 350
2 3 Ramdev junavadh NaN 410
3 4 Subhash jamnagar Hindi 355
4 5 Ranjana NaN English 295
In this case we can specify our own string that can be written for missing /NaN values by giving an
argument
na_rep=<string value>
This string value will be written in place of missing values.
Na_rep=”unknown values”

8
stdf1.to_csv("d:\\python_programs\\csv1\sreport2.csv",header=None,index=None,na_rep=”not known”)

Question:
Write a program in Python to accept number of employees first from an user,
then ask an user to enter name, department and basic salary of employees.
Calculate da which is 50% of basic, hra which is 25% of basic, total which is sum
of basic,da,hra, the calculate income tax which is 10% of total. Store these
values into emp.csv which is stored inside csv1subdirectory inside d drive.

You might also like