0% found this document useful (0 votes)
3 views

Data frames pandas, handout 1 (1)

Uploaded by

ayaqassas21
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Data frames pandas, handout 1 (1)

Uploaded by

ayaqassas21
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

1

Data Frames
Pandas library
1-Pandas library
2-Notepade.csv
3-Data frames
4-Reading from excel file and .csv file
5-The writer function

Reading data from excel file


You have to create this excel file, then read it and display
it in the output using python

Book4.xlsx excel file


a b c
0 1 5 3
1 1 6 8
2 2 3 7
3 3 1 8

Code to read the file, from excel

Page 1
2

You need to import some libraries that are


shown below. To install these libraries use
the pip3
#use pip3 in command window to install
# 1- pandas 2- openpyxl
# pip3 install pandas
# pip3 install openyxl
Example

Example Write the script


below to read

For excel file

import pandas as pd
df = pd.read_excel('book4.xlsx')
print(df)
print('done')

Page 2
3

###################################
For notepad with .csv extension
Create notepad commas separated data
as shown below.
When you save it by going to all
files, then add the extension .csc

Example The code to read notepad.csv

import pandas as pd
df = pd.read_csv('data2.csv')
print(df)
print('done')

Page 3
4

Data Frames
Python dataframe is a data structure constructed with rows and columns,
similar to a database or Excel spreadsheet. It consists of a dictionary of lists in
which the list each have their own identifiers or keys, such as “last name” or
“food group.”

—------------
Example
d={"Duration":{
"0":60,
"1":60,
"2":60,
"3":45,

},
"Maxpulse":{
"0":130,
"1":145,
"2":135,
"3":175,
},
"Calories":{
"0":409.1,
"1":479.0,
"2":340.0,
"3":282.4,

}
}
import json

with open("dd.json", 'w') as fout:


json_dumps_str = json.dumps(d, indent=4)
print(json_dumps_str, file=fout)

import pandas as pd
df = pd.read_json('dd.json')

Page 4
5

print(df.to_string())

Output
Duration Maxpulse Calories
0 60 130 409.1
1 60 145 479.0
2 60 135 340.0
3 45 175 282.4

To create a dataframe you must first create a dictionary.


These are the dictionary methods we use

Method Usage

Values( Return a list of all values in the dictionary


)

Update( Updates the dictionary with the specified key-value pairs


)

setdefa Returns the value of the specified key. If the key does not exist
ult() insert the key, with the specified value

Page 5
6

clear() Removes all the elements from the dictionary

keys() Returns a list containing the keys of the dictionary

pop() Removes the element with the specified key

popitem Removes the last inserted key-value pair


()

get() Returns the value of the specified key

items() Returns a list containing a tuple for each key value pair

copy() Returns a copy of the dictionary

Page 6
7

fromkey Returns a dictionary with the specified keys and value


s()

This how we create a


dictionary and print header
import pandas as pd
import numpy as np
df={
'First name':['Ali','Fred'],
'Last name':['Baba','Json'],
'Visit':['march,19','March,25'],

Page 7
8

'Leave':['April,20,Aprile,26']
}

# Get the list of all column names from headers


for column_headers in df:
print(column_headers, end='\t')

Page 8
9

Example:column by column
generate a data frame
import pandas as pd
import numpy as np

technologies= {
'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
'Fee' :[22000,25000,23000,24000,26000],
'Duration':['30days','50days','30days', None,np.nan],
'Discount':[1000,2300,1000,1200,2500]
}
df = pd.DataFrame(technologies)
print(df)

#notice the column generated in the output

# To Get the list of all column


names from headers
column_headers = list(df.columns.values)
print("The Column Header :", column_headers)
Output:
The Column Header : ['Courses', 'Fee', 'Duration', 'Discount']

Page 9
10

#Example Get the list of all column names


from headers

column_headers = df.columns.values.tolist()
print("The Column Header :", column_headers)

output
The Column Header : ['Courses', 'Fee', 'Duration', 'Discount']

Page 10
11

To reset, and delete


rows
import pandas as pd
# Create DataFrame from dict
df =
pd.DataFrame({'Courses':['Spark','PySpark','Java','PHP'
],
'Fee':[20000,20000,15000,10000],
'Duration':['35days','35days','40days','30days']})
print(df)
df=df.drop([2])
print(df)
df2=df.reset_index()
print(df2)

output
Courses Fee Duration
0 Spark 20000 35days
1 PySpark 20000 35days
2 Java 15000 40days
3 PHP 10000 30days

Courses Fee Duration


0 Spark 20000 35days
1 PySpark 20000 35days
3 PHP 10000 30days
index Courses Fee Duration
0 0 Spark 20000 35 days
1 1 PySpark 20000 35 days

Page 11
12

2 3 PHP 10000 30 days

#Use .scv file to drop a colun


import pandas as pd
df = pd.read_csv('data2.csv')
print(df)
df2=df.drop([2])
print(df2)
print('done')

a b c
0 1 2 3
1 2 5 7
2 3 8 3

a b c
0 1 2 3
1 2 5 7
Done
Page 12
13

0 1 5 3
1 1 6 8
2 2 3 7
3 3 1 8
a b c
0 1 5 3
1 1 6 8
3 3 1 8

Example Write Excel with Python


Pandas. Excel file will be created
You can write any data (lists, strings, numbers etc) to Excel, by first converting
it into a Pandas DataFrame and then writing the DataFrame to Excel.

To export a Pandas DataFrame as an Excel file (extension: .xlsx, .xls), use


the to_excel() method.

Page 13
14

Install the following library


$ pip install xlwt
$ pip install openpyxl
Importing openpyxl is required if you want to append it to an existing Excel file
described at the end.

import pandas as pd
import openpyxl

df = pd.DataFrame([[11, 21, 31], [12, 22, 32], [31, 32, 33]],


index=['one', 'two', 'three'], columns=['a', 'b', 'c'])

print(df)
# a b c
# one 11 21 31
# two 12 22 32
# three 31 32 33

You can specify a path as the first argument of the to_excel() method.

Note: that the data in the original file is deleted when overwriting.

The argument new_sheet_name is the name of the sheet. If omitted, it will be


named Sheet1.

Output is the excel file that is created

Page 14
15

import pandas as pd
import openpyxl

df = pd.DataFrame([[11, 21, 31], [12, 22, 32], [31, 32,


33]],
index=['one', 'two', 'three'],
columns=['a', 'b', 'c'])

print(df)
#writing to excel
df.to_excel('pandas_to_excel.xlsx',
sheet_name='new_sheet_name')

#If you do not need to write index (row name), columns


(column name),
#the argument index, columns is False

#df.to_excel('xxx_no_index_header.xlsx', index=False,
header=False)

#then use the ExcelWriter() function like


this:

Page 15
16

with pd.ExcelWriter('pandas_to_excel.xlsx') as writer:


df.to_excel(writer, sheet_name='sheet1')
df.to_excel(writer, sheet_name='sheet2')
#You don’t need to call writer.save(), writer.close()
within the blocks.

Page 16

You might also like