0% found this document useful (0 votes)
4 views

5Python Processing XLS Data

Uploaded by

David Osei
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

5Python Processing XLS Data

Uploaded by

David Osei
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

PYTHON ­ PROCESSING XLS DATA

https://www.tutorialspoint.com/python/python_processing_xls_data.htm Copyright © tutorialspoint.com

Advertisements

Microsoft Excel is a very widely used spread sheet program. Its user friendliness and appealing features makes it a
very frequently used tool in Data Science. The Panadas library provides features using which we can read the Excel
file in full as well as in parts for only a selected group of Data. We can also read an Excel file with multiple sheets in
it. We use the read_excel function to read the data from it.

Input as Excel File


We Create an excel file with multiple sheets in the windows OS. The Data in the different sheets is as shown below.

You can create this file using the Excel Program in windows OS. Save the file as input.xlsx.

# Data in Sheet1

id,name,salary,start_date,dept
1,Rick,623.3,2012‐01‐01,IT
2,Dan,515.2,2013‐09‐23,Operations
3,Tusar,611,2014‐11‐15,IT
4,Ryan,729,2014‐05‐11,HR
5,Gary,843.25,2015‐03‐27,Finance
6,Rasmi,578,2013‐05‐21,IT
7,Pranab,632.8,2013‐07‐30,Operations
8,Guru,722.5,2014‐06‐17,Finance

# Data in Sheet2

id name zipcode
1 Rick 301224
2 Dan 341255
3 Tusar 297704
4 Ryan 216650
5 Gary 438700
6 Rasmi 665100
7 Pranab 341211
8 Guru 347480

Reading an Excel File


The read_excel function of the pandas library is used read the content of an Excel file into the python
environment as a pandas DataFrame. The function can read the files from the OS by using proper path to the file.
By default, the function will read Sheet1.

import pandas as pd
data = pd.read_excel('path/input.xlsx')
print (data)

When we execute the above code, it produces the following result. Please note how an additional column starting
with zero as a index has been created by the function.
id name salary start_date dept
0 1 Rick 623.30 2012‐01‐01 IT
1 2 Dan 515.20 2013‐09‐23 Operations
2 3 Tusar 611.00 2014‐11‐15 IT
3 4 Ryan 729.00 2014‐05‐11 HR
4 5 Gary 843.25 2015‐03‐27 Finance
5 6 Rasmi 578.00 2013‐05‐21 IT
6 7 Pranab 632.80 2013‐07‐30 Operations
7 8 Guru 722.50 2014‐06‐17 Finance

Reading Specific Columns and Rows


Similar to what we have already seen in the previous chapter to read the CSV file, the read_excel function of the
pandas library can also be used to read some specific columns and specific rows. We use the multi­axes indexing
method called .loc for this purpose. We choose to display the salary and name column for some of the rows.

import pandas as pd
data = pd.read_excel('path/input.xlsx')

# Use the multi‐axes indexing funtion


print (data.loc[[1,3,5],['salary','name']])

When we execute the above code, it produces the following result.

salary name
1 515.2 Dan
3 729.0 Ryan
5 578.0 Rasmi

Reading Multiple Excel Sheets


Multiple sheets with different Data formats can also be read by using read_excel function with help of a wrapper
class named ExcelFile. It will read the multiple sheets into memory only once. In the below example we read
sheet1 and sheet2 into two data frames and print them out individually.

import pandas as pd
with pd.ExcelFile('C:/Users/Rasmi/Documents/pydatasci/input.xlsx') as xls:
df1 = pd.read_excel(xls, 'Sheet1')
df2 = pd.read_excel(xls, 'Sheet2')

print("****Result Sheet 1****")


print (df1[0:5]['salary'])
print("")
print("***Result Sheet 2****")
print (df2[0:5]['zipcode'])

When we execute the above code, it produces the following result.

****Result Sheet 1****


0 623.30
1 515.20
2 611.00
3 729.00
4 843.25
Name: salary, dtype: float64

***Result Sheet 2****


0 301224
1 341255
2 297704
3 216650
4 438700
Name: zipcode, dtype: int64

You might also like