Data frames pandas, handout 1 (1)
Data frames pandas, handout 1 (1)
Data Frames
Pandas library
1-Pandas library
2-Notepade.csv
3-Data frames
4-Reading from excel file and .csv file
5-The writer function
Page 1
2
import pandas as pd
df = pd.read_excel('book4.xlsx')
print(df)
print('done')
Page 2
3
###################################
For notepad with .csv extension
Create notepad commas separated data
as shown below.
When you save it by going to all
files, then add the extension .csc
import pandas as pd
df = pd.read_csv('data2.csv')
print(df)
print('done')
Page 3
4
Data Frames
Python dataframe is a data structure constructed with rows and columns,
similar to a database or Excel spreadsheet. It consists of a dictionary of lists in
which the list each have their own identifiers or keys, such as “last name” or
“food group.”
—------------
Example
d={"Duration":{
"0":60,
"1":60,
"2":60,
"3":45,
},
"Maxpulse":{
"0":130,
"1":145,
"2":135,
"3":175,
},
"Calories":{
"0":409.1,
"1":479.0,
"2":340.0,
"3":282.4,
}
}
import json
import pandas as pd
df = pd.read_json('dd.json')
Page 4
5
print(df.to_string())
Output
Duration Maxpulse Calories
0 60 130 409.1
1 60 145 479.0
2 60 135 340.0
3 45 175 282.4
Method Usage
setdefa Returns the value of the specified key. If the key does not exist
ult() insert the key, with the specified value
Page 5
6
items() Returns a list containing a tuple for each key value pair
Page 6
7
Page 7
8
'Leave':['April,20,Aprile,26']
}
Page 8
9
Example:column by column
generate a data frame
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
'Fee' :[22000,25000,23000,24000,26000],
'Duration':['30days','50days','30days', None,np.nan],
'Discount':[1000,2300,1000,1200,2500]
}
df = pd.DataFrame(technologies)
print(df)
Page 9
10
column_headers = df.columns.values.tolist()
print("The Column Header :", column_headers)
output
The Column Header : ['Courses', 'Fee', 'Duration', 'Discount']
Page 10
11
output
Courses Fee Duration
0 Spark 20000 35days
1 PySpark 20000 35days
2 Java 15000 40days
3 PHP 10000 30days
Page 11
12
a b c
0 1 2 3
1 2 5 7
2 3 8 3
a b c
0 1 2 3
1 2 5 7
Done
Page 12
13
0 1 5 3
1 1 6 8
2 2 3 7
3 3 1 8
a b c
0 1 5 3
1 1 6 8
3 3 1 8
Page 13
14
import pandas as pd
import openpyxl
print(df)
# a b c
# one 11 21 31
# two 12 22 32
# three 31 32 33
You can specify a path as the first argument of the to_excel() method.
Note: that the data in the original file is deleted when overwriting.
Page 14
15
import pandas as pd
import openpyxl
print(df)
#writing to excel
df.to_excel('pandas_to_excel.xlsx',
sheet_name='new_sheet_name')
#df.to_excel('xxx_no_index_header.xlsx', index=False,
header=False)
Page 15
16
Page 16