Chapter 1 - Part 2 - DataFrame (1)
Chapter 1 - Part 2 - DataFrame (1)
DATA FRAMES
Data frames:
• It is a two-dimensional data structure of python pandas with heterogeneous
data, usually represented in table format.
• The data is represented in rows and columns.
• They are similar to any spreadsheets or SQL tables.
Need of DataFrame
• Using series data structure, it is not able to handle 2D or multi dimensional
data related to real time.
• For such tasks, pandas provides another data structure called data frames.
1
10/26/2024
You can give row index (row label) explicitly using index argument, default
row index is 0, 1, 2, …
Column index (Column heading) is given using argument columns
• Lists
• Series
• Dictionary
• Numpy ndarrays
• Text / csv
2
10/26/2024
Output
3
10/26/2024
Column labels
4
10/26/2024
Creating data frame using Nested List with column label and row label
Column labels
Row labels
HOMEWORK
• Solved qns : 3, 17, 18, 27
5
10/26/2024
6
10/26/2024
import pandas as pd
Output
• Index of series will appear as the row index of the data frame.
Output
• Index of series will appear as the row index of the data frame.
7
10/26/2024
Output
8
10/26/2024
Example :
9
10/26/2024
(df2)
Output
Output
10
10/26/2024
HOMEWORK
• Solved Questions: Q 4, 16
11
10/26/2024
Attributes of a Dataframe
• Data frames have some properties known as attributes.
• We can access an attribute of a dataframe, using the dot(.) operator
• Syntax:
<Data frame object>.<attribute name>
• Eg:
df.index
df.columns
12
10/26/2024
Attributes of a Dataframe
1. index :
Returns the indexes of the dataframe. By default, it is 0,1,2,…
We can use some alternate name / labels as index
Dataframe : df
2. columns:
Returns the column names of the dataframe
3. axes
Returns both index and column names of the dataframe
Attributes of a Dataframe
4. dtypes:
Returns the data type of each column in the dataframe
Dataframe : df
5. size:
Returns the size of the dataframe, which is the product of no. of rows and columns
6. shape:
Returns the size and shape of the dataframe as the no. of rows and no. of columns
13
10/26/2024
Attributes of a Dataframe
7. ndim:
Returns the dimension of the dataframe
Dataframe : df
8. empty:
Returns a boolean value – True if the dataframe is completely empty,
otherwise False
9. count() :
count() or count(0) or count(axis=‘index’) displays the count of rows,
i.e., the no. of items in each column(default value)
count(1) or count(axis=‘columns’) displays the count of columns, i.e.,
the no. of items in each row
Attributes of a Dataframe
10. T (Transpose)
Allows us to transpose the dataframe. i.e., rows become columns and
columns become rows
Dataframe : df
14
10/26/2024
15
10/26/2024
16
10/26/2024
Using loc[ ]
• df.loc[ ] allows us to access
the data in a specified range
of rows & columns (inclusive
of stop value)
• : as the 2nd argument means
all columns
• : as the 1st argument means
all rows
Using loc[ ]
• To access the data in a single row
17
10/26/2024
Df.iloc[
Using iloc[ ]
• To access the data in a single row
18
10/26/2024
Using iloc[ ]
• To access the data in rows with • To access the data in rows with
index 1& 2 (stop value 3 is not index 0 & 1 and columns with
included) index 1 & 2
Using iloc[ ]
• Prints contents from row index 1 to 2 and column index 1 to 2
19
10/26/2024
• df.iat[ ] is used to access a single value corresponding to the numeric index of row & column
Adding a new column to data frame & fill with same value
Consider the following program which creates a data frame with name and
age as columns and row index as I,II,III
20
10/26/2024
21
10/26/2024
BEFORE AFTER
22
10/26/2024
df
23
10/26/2024
axis = 1 represents column and hence the column with the specified name will be deleted
24
10/26/2024
25
10/26/2024
26
10/26/2024
27
10/26/2024
Indexing in Pandas means selecting particular rows and columns of data from a DataFrame. That can be
selecting all the rows and some of the columns, some of the rows and all of the columns, or some of each
of the rows and columns. Indexing can also be known as Subset Selection.
28
10/26/2024
Concatenation – append()
29
10/26/2024
Binary Operations
add(), sub(), mul() and div() performs the basic mathematical
operations for addition, subtraction, multiplication and division
of two dataframes
For example:
df1.sub(df2) means df1 – df2
df1.rsub(df2) means df2 – df1
30
10/26/2024
31
10/26/2024
32
10/26/2024
iterrows()
• It is used to access the data row wise.
• iterrows() function iterates over each row of the dataframe.
iteritems()
• It is used to access the data column wise.
• iteritems() function iterates over each column of the dataframe.
33
10/26/2024
34
10/26/2024
CSV file
• CSV (Comma Separated Values) is a simple file format used to store tabular
data, such as a spreadsheet or a database.
• It stores tabular data in plain text
• Each line is a record
• Each record consists of one or more fields, separated by commas.
• To work with CSV files in Python, there is a module called csv.
Method:
• Start Excel and enter the data in it.
• Save the file with file type as CSV (comma
delimitted) (*.csv)
OR
• Create a text file in notepad, with values
separated with comma.
• Save the file as .csv
35
10/26/2024
Output
• If the CSV file and the python program are in the same
folder, only filename is needed
• Otherwise the complete path of the file should be given
• In the file path use \\ or / instead of \
• Eg: C:\\ NCERT\\ TestingDF.csv
OR C: / NCERT / TestingDF.csv
36
10/26/2024
37
10/26/2024
DataFrame - REVISION
• Create dataframe using list • Add new row
• Add new column -insert()
• Create dataframe using dictionary
• Remove a row – drop()
• Create dataframe using series • Remove a column – drop(), pop(), del
• Attributes of DataFrame • Set_index()
Index • Reset_index()
Columns • Sort_values()
Axes • Rename()
Empty • Fillna()
Ndim • Append()
Shape • Head()
Size • Tail()
Dtypes • Binary Operations – add(), sub(), mul(), div(), radd(),
T rsub(), rmul(), rdiv()
Count() • iterrows() & iteritems()
• Accessing Data – loc[], iloc[], at[], iat[] • Create dataframe from csv - read_csv(), to_csv()
• Create dataframe from text files – read_table()
38
10/26/2024
• If the text file and the python program are in the same
folder, only filename is needed
• Otherwise the complete path of the file should be given
• In the file path use \\ or / instead of \
• Eg: C:\\ NCERT\\ TestingDF.txt
OR C: / NCERT / TestingDF.txt
39
10/26/2024
HOMEWORK
• Pg. No. 1.78
Unsolved questions : Q.no. 32 to 37
Homework Answers Q: 30
40
10/26/2024
Homework Answers
Homework Answers
41
10/26/2024
Homework Answers
Homework Answers
42
10/26/2024
Homework Answers
Homework Answers
43
10/26/2024
HOMEWORK
• Pg. No. 1.78 & 1.79
Unsolved questions : Q.no. 38 & 43
HOMEWORK
• Pg. No. 1.78 & 1.79
Unsolved questions : Q.no. 39, 40, 41 & 42
44
10/26/2024
HOMEWORK
• Solved Questions: Q 6, 13, 15, 19 & 20
label=['a','b','c','d','e','f','g','h','i','j']
45
10/26/2024
46
10/26/2024
HOMEWORK
• Pg. No. 1.73
Solved questions : Q. no. 26, 28
• Pg. No. 1.77
Unsolved questions : Q.no. 31
Home work :
Unsolved questions : Q.no. 29
47
10/26/2024
Home work :
Pg. No. 1.77 Unsolved questions : Q.no. 29
(contd.)
Home work :
Pg. No. 1.77 Unsolved questions : Q.no. 29
(contd.)
48