0% found this document useful (1 vote)

1K views17 pages

Informatics Practices Class 12 Cbse Notes Data Handling

The document discusses various techniques for reshaping, sorting, aggregating and analyzing Pandas dataframes. It covers reshaping data using pivot() and pivot_table() functions. It describes sorting dataframes using sort_values() and sort_index() functions. It discusses calculating aggregates like count, sum, mean using agg() function. Finally, it covers applying functions to entire dataframes or rows/columns using pipe(), apply() and applymap() functions.

Uploaded by

ellastark

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

1K views17 pages

Informatics Practices Class 12 Cbse Notes Data Handling

Uploaded by

ellastark

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

Chapter 1: Advance Operations On Dataframes.

Reshaping Datastructures:
In Pandas data reshaping means the transformation of the structure of a table (i.e. DataFrame or Series) to
make it suitable for further analysis. Mainly, 2 functions are mainly used to reshape, they are:

1. Pivot()
2. Pivot_table()

Pivot():
The pivot function is used to create a new derived table out of a given one.

Pivot reshapes data and uses unique values from index/columns to form axes of the resulting
DataFrame.

Pivot functions are used to reshape and manipulate an already existing DataFrame.

Pivot takes 3 arguments with the following names:

 Index
 Column
 Values.

For each of the above parameters, a column name from the existing table should be defined.

Then the pivot function will create a new table, whose row and column indices are the unique
values of the respective parameters. The cell values of the new table are taken from column
specified as the values parameter.

However, if the values parameter is not specified, all the other columns in the original table will be
taken as values, and a individual table will be created for each of the values.

NOTE: If the index- column combination of two or more values is the same, ValueError is raised.

Syntax:
DataFrame.pivot(index=<name>, columns=<name>, values=<name>)

(or)

DataFrame.pivot(<indexname>, <columnsname>, <valuesname>)

Example for Pivot():
pivot_table:
This tool enables users to automatically sort, count, total, or average the data stored in one
table.

In the above example, as we saw, whenever the index and column combination is identical, a
value error is raised. Using pivot_table, the problem can be solved.

When there are two or more values for the same index-column combination, the aggfunc
aggreagates the duplicate values.

Consider the same example as above (ln 16), but with the table_pivot function:

Here, the aggregate function is the mean.

Syntax for pivot_table():

X=pandas.pivot_table(<nameofdataframe>,values=<name>, index=<name>,
columns=<name>, aggfunc=...)
Example for pivot_table():

Here, the aggregate function is the sum function. It added up the two values(27,23) for John-
Masters.

NOTE: The default aggregate function is the mean.

Sorting in DataFrames:
Sorting is arranging the datatype according to the values or the index.

Often you want to sort Pandas data frame in a specific way. Typically, one may want to sort pandas
data frame based on the values of one or more columns or sort based on the values of row index or
row names of pandas DataFrame. Pandas data frame has two useful functions:

sort_values(): to sort pandas data frame by one or more columns

sort_index(): to sort pandas data frame by row index

Each of these functions come with numerous options, like sorting the data frame in specific order
(ascending or descending), sorting in place, sorting with missing values, sorting by specific algorithm
and so on.

1) sort_values(): Pandas sort_values() function sorts a data frame in Ascending or

Descending order of passed Column.
It accepts a 'by' argument which will use the column name of the DataFrame with which the
values are to be sorted.
i) Ascending order:
Syntax:
X=df.sort_value(by <columnname>)
By default, the order is ascending.
Example:
ii) Descending order:
Syntax

DataFrame.sort_values(by, axis=0, ascending=True/False,

na_position=’last’/’first’)

In case of nan values are present, the na_position determines whether the nan values should
be present in the beginning or the end.

Sorting by multiple columns:

The values can also be sorted by multiple columns. Incase of multiple columns, the column on then
leftmost side of the command is given most preference.

Syntax:

df=df.sort_values(by=[<a>,<b>....],ascending=[True/False,
True/False, True/False....])

Example:

In the example below, the elements are first sorted by age, and in case the ages of 2 values are the
same, it is then next sorted by the score.

Thus, the data frame can be sorted even when there are duplicate values in the first column by
which it is to be sorted. Like here, For example, Dhoni and Virat have the same value for age:25.
since there is no other criteria to decide by, about whose data will come first, a second sorting
criteria is added. This is the score. Since we have score in descending order, the one with the
highest score comes first in the table as clearly seen in the output.

Sorting by index:
The sort_index() function is used to sort the values according to the indexes.

Syntax:
DataFrame.sort_index(axis=0/1, ascending=True/False,
na_position=’last’/’first’, sort_remaining=True, by=<name>)

0 sorts by the column index, and 1 sorts by the row index.

Consider the example below,

In the example, df is reindexed randomly. To arrange the new index in ascending
or descending order, Sort_index() is used.
Data aggregation –
Aggregation is the process of turning the values of a dataset (or a subset of it) into one single value.

There are many aggregations like count,sum,min,max,median,quartile etc. They are also called
descriptive statistics.

Syntax:
X=<dataframename>[[‘<columnname>’]].aggfunc()

Examples:
Variance:
var() –Variance Function in python pandas is used to calculate variance of a given set of numbers, Variance
of a data frame, Variance of column and Variance of rows can be calculated.

Syntax:

Df.var() : For variance of everything

df.loc[:,“<columnname>"].var() for variance of specific column

df.var(axis=0) : For variance of columns

df.var(axis=1) : For variance of rows

Example:

∑(𝑥𝑖 −𝑥̅ )2
The variance of given data can be calculated using the formula: where xi is the ith term and 𝑥̅ is
𝑛
the mean. N is the total number of terms.
Quantiles:
Quantile statistics is a part of a data set. It is used to describe data in a clear and understandable
way.

A quantile is part of a distribution which is divided into equal sized subgroups. It is also called a
fractile.

It can also refer to dividing a distribution into areas of equal probability.

The 0,30 quantile is basically saying that 30 % of the observations in our data set is below a given
line. On the other hand, it is also stating that there are 70 % remaining above the line we set.

Common Quantiles
Certain types of quantiles are used commonly enough to have specific names.
Below is a list of these:
• 2 quantile is called the median
• 3 quantiles are called terciles
• 4 quantiles are called quartiles
• 5 quantiles are called quintiles
• 6 quantiles are called sextiles
• 7 quantiles are called septiles
• 8 quantiles are called octiles
• 10 quantiles are called deciles
• 12 quantiles are called duodeciles
• 20 quantiles are called vigintiles
• 100 quantiles are called percentiles
• 1000 quantiles are called permilles

Finding quantiles:
Sample question: Find the number in the following set of data where 30 percent of values fall
below it, and 70 percent fall above:

2 4 5 7 9 11 12 17 19 21 22 31 35 36 45 44 55 68 79 80 81 88 90 91 92 100 112 113 114 120 121 132

145 148 149 152 157 170 180 190

Step 1: Order the data from smallest to largest. The data in the question is already in ascending
order.

Step 2: Count how many observations you have in your data set. this particular data set has 40
items.

Step 3: Convert any percentage to a decimal for “q”. We are looking for the number where 30
percent of the values fall below it, so convert that to .3.
Step 4: Insert your values into the formula:

ithobservation = q (n + 1)

Here q is the quantile number, and n is the total no of elements in the distribution

ithobservation = .3 (40 + 1) = 12.3

Answer: The ith observation is at 12.3, so we round down to 12 (remembering that this formula is
an estimate). The 12th number in the set is 31, which is the number where 30 percent of the values
fall below it.

Consider another example:

Here first, the ith term is calculated normally. Q is 0.5 and n is 4 which gives thr ith term as 2.5

For column b, it is calculated by the formula: i+q(j-i), where j is the element after i.

So the value is given by 10+0.5(100-10) =55

Histograms
A histogram is a powerful technique in data visualization.It is an accurate graphical representation of the
distribution of numerical data .It was first introduced by Karl Pearson.

To construct a histogram, the first step is to “bin” the range of values —i.e. divide the entire range of values
into a series of intervals — and then count how many values fall into each interval.

The bins are usually specified as consecutive, non-overlapping intervals of a variable.

The bins(intervals) must be adjacent, and are often(but are not required to be) of equal size.

The bins iare represented on the x axis. And the y axis represents the frequency of each interval.
Example code:
Function Applications

To apply functions to a data structure(for ex: increasing all values by 2) we mainly use 3 functions:

 Table wise Function Application: pipe()

 Row or Column Wise Function Application: apply()
 Element wise Function Application: applymap()

Table wise Function Application: pipe():

Pipe() function performs the operation for the entire data frame with the help of user defined or library
function.

The operation is performed on all elements in te table.

Syntax:
X=df.pipe(<func.name>,value)

Example:

In the above code, 5 has been added to each value in the dataframe.

The pipe function can be applied to the values any number of times.

Syntax:
df.pipe(<func.name>,value).pipe(<func.name>,value).pipe(<func>,value)....

It can be seen in the following code:

Apply() function:
The apply function is used for applying a function or a operation to rows or columns.

Syntax:
X=df.apply(<funct>,axis=1/0)

Axis =0 is for columns and axis=1 is for rows

Example program:
Element wise Function Application in python pandas: applymap()
applymap() Function performs the specified operation for all the elements the DataFrame.

The applymap() uses a lambda function. These is a short version of a user defined function Instead of
the def syntax for function declaration, we can use a lambda expression to write Python functions. The
lambda syntax closely follows the def syntax.

The lambda expression takes in a comma separated sequence of inputs (like def). Then,
immediately following the colon, it returns the expression without using an explicit return
statement.

Syntax:

X=df.applymap(lambda <operation>)

Example:

Orange IP065 12 QP
No ratings yet
Orange IP065 12 QP
9 pages
Ip Class Xii Term 2 Revision Notes
No ratings yet
Ip Class Xii Term 2 Revision Notes
10 pages
12 CS PreBoard SET QPMS 2024-25
No ratings yet
12 CS PreBoard SET QPMS 2024-25
14 pages
Class XII CS Question Set For Bright Students
No ratings yet
Class XII CS Question Set For Bright Students
65 pages
Netwroks All Full Forms
No ratings yet
Netwroks All Full Forms
4 pages
XII CS Mid-Term 2024-25
No ratings yet
XII CS Mid-Term 2024-25
13 pages
MySQL Commands and Database Management Guide
100% (1)
MySQL Commands and Database Management Guide
9 pages
IP Class 12 Sample Question Paper
No ratings yet
IP Class 12 Sample Question Paper
8 pages
Aismv - Xii - Solved - QB - Review of Xi
No ratings yet
Aismv - Xii - Solved - QB - Review of Xi
10 pages
Class 12 Ip Sample Question Paper
No ratings yet
Class 12 Ip Sample Question Paper
9 pages
CSV Files Worksheet2
No ratings yet
CSV Files Worksheet2
7 pages
Class 12 (IP) PT.1question Paper2024-25
No ratings yet
Class 12 (IP) PT.1question Paper2024-25
3 pages
Chennai T2 XII CS MS
No ratings yet
Chennai T2 XII CS MS
9 pages
CH 3 - Working With Functions Material
No ratings yet
CH 3 - Working With Functions Material
22 pages
Cbseskilleducation Com Java-class-12-Notes #Google Vignette
No ratings yet
Cbseskilleducation Com Java-class-12-Notes #Google Vignette
21 pages
Data Structure - Worksheet 1 - 3 Marks
No ratings yet
Data Structure - Worksheet 1 - 3 Marks
8 pages
Class 12 Complete Series and DataFrame Last Year Question
No ratings yet
Class 12 Complete Series and DataFrame Last Year Question
3 pages
Ip 12 Assignment - 6 (MCQ)
No ratings yet
Ip 12 Assignment - 6 (MCQ)
8 pages
Understanding Digital Footprint and Cyber Safety
No ratings yet
Understanding Digital Footprint and Cyber Safety
5 pages
Python Function Basics Quiz
No ratings yet
Python Function Basics Quiz
38 pages
XII IP CH 1 Python Pandas - I Series
No ratings yet
XII IP CH 1 Python Pandas - I Series
45 pages
Class 12 Board Revision Study Material and Worksheet With Solutions
No ratings yet
Class 12 Board Revision Study Material and Worksheet With Solutions
383 pages
Class 12 CS Sample Paper Solutions
No ratings yet
Class 12 CS Sample Paper Solutions
9 pages
Python CSV File Operations Guide
No ratings yet
Python CSV File Operations Guide
4 pages
Practical File Computer Science Class 11 - A Comprehensive Guide
No ratings yet
Practical File Computer Science Class 11 - A Comprehensive Guide
24 pages
Stack 3 Marks
No ratings yet
Stack 3 Marks
16 pages
XII-IP-QuickRevision 2 in 1
No ratings yet
XII-IP-QuickRevision 2 in 1
13 pages
PB 1 IP Answer Key 2024
No ratings yet
PB 1 IP Answer Key 2024
6 pages
Class XII Computer Science Sample Paper
No ratings yet
Class XII Computer Science Sample Paper
270 pages
Python CSV Employee Search Program
No ratings yet
Python CSV Employee Search Program
2 pages
Chapter 4 - Type B - Application Based Questions
No ratings yet
Chapter 4 - Type B - Application Based Questions
6 pages
Binary File Handling Guide
No ratings yet
Binary File Handling Guide
7 pages
Societal Impacts in Informatics Practices
No ratings yet
Societal Impacts in Informatics Practices
4 pages
Find The Output of The Following Program in Python
0% (1)
Find The Output of The Following Program in Python
1 page
Class 12 Cs All Region Papers
No ratings yet
Class 12 Cs All Region Papers
213 pages
Kendriya Vidyalaya Sangathan Jaipur Region: Sample Question Paper (Term-I)
0% (1)
Kendriya Vidyalaya Sangathan Jaipur Region: Sample Question Paper (Term-I)
8 pages
MCQ-Python FILE HANDLING-QB
No ratings yet
MCQ-Python FILE HANDLING-QB
17 pages
Class 12 IP Ch-1 CSV File Handling
No ratings yet
Class 12 IP Ch-1 CSV File Handling
8 pages
Class XII Computer Science Pre-Board Exam 2023-24
No ratings yet
Class XII Computer Science Pre-Board Exam 2023-24
7 pages
WORKSHEET For PYTHON - CLASS 11
No ratings yet
WORKSHEET For PYTHON - CLASS 11
24 pages
Class 11 Database Concepts Notes
No ratings yet
Class 11 Database Concepts Notes
6 pages
Answer Key - Set 3 - IP
No ratings yet
Answer Key - Set 3 - IP
6 pages
Cluster Xii Cs Setb
No ratings yet
Cluster Xii Cs Setb
7 pages
Informatics Practices XII
0% (1)
Informatics Practices XII
117 pages
Xii A Projects of Computer Science 2025-26
No ratings yet
Xii A Projects of Computer Science 2025-26
1 page
Class 12 CS Notes
100% (1)
Class 12 CS Notes
138 pages
Python Functions Guide for Class XII
No ratings yet
Python Functions Guide for Class XII
21 pages
Chapter 1 - Python Pandas - I
No ratings yet
Chapter 1 - Python Pandas - I
23 pages
Sample Questions For XII Computer Science
No ratings yet
Sample Questions For XII Computer Science
36 pages
12 Cs PP Set1 Qpms 2024-25
No ratings yet
12 Cs PP Set1 Qpms 2024-25
13 pages
AS-societal impacts-NCERT Solutions
100% (2)
AS-societal impacts-NCERT Solutions
11 pages
Series vs DataFrame in Pandas
No ratings yet
Series vs DataFrame in Pandas
5 pages
SQ L Worksheet 24
No ratings yet
SQ L Worksheet 24
35 pages
Class 12 Python MCQs Revision Set 1
100% (1)
Class 12 Python MCQs Revision Set 1
6 pages
CS QP 1
No ratings yet
CS QP 1
11 pages
MS Xii CS Set 2
No ratings yet
MS Xii CS Set 2
7 pages
Class Xii Ip (065) Computer Networks Notes
No ratings yet
Class Xii Ip (065) Computer Networks Notes
75 pages
Mastering Pandas: DataFrame Operations
100% (2)
Mastering Pandas: DataFrame Operations
33 pages
Python Pandas II Notes XII
No ratings yet
Python Pandas II Notes XII
20 pages
4 PythonPandas
No ratings yet
4 PythonPandas
8 pages
Azure Fundamentals Exam Prep
100% (1)
Azure Fundamentals Exam Prep
11 pages
MSI KT3 Motherboard
No ratings yet
MSI KT3 Motherboard
97 pages
Power Bi Book PDF Free
100% (1)
Power Bi Book PDF Free
32 pages
!!!!FD 1x1x1x1 Gui
No ratings yet
!!!!FD 1x1x1x1 Gui
9 pages
PI SQL Data Access Server (OLE DB) 2018 Administrator Guide
No ratings yet
PI SQL Data Access Server (OLE DB) 2018 Administrator Guide
40 pages
TCP2COM Manual
No ratings yet
TCP2COM Manual
9 pages
Mini Project
No ratings yet
Mini Project
9 pages
BX4003 Unit V
No ratings yet
BX4003 Unit V
27 pages
Multiprocessor OS Overview & Benefits
No ratings yet
Multiprocessor OS Overview & Benefits
7 pages
Deepika Resume
No ratings yet
Deepika Resume
4 pages
Computer Seminar
No ratings yet
Computer Seminar
3 pages
Cloud Security Playbook Overview
No ratings yet
Cloud Security Playbook Overview
4 pages
CTT R3 - C2 Week3 Kano Fasar
No ratings yet
CTT R3 - C2 Week3 Kano Fasar
4 pages
AI Chatbot Project Report
No ratings yet
AI Chatbot Project Report
50 pages
Cimatron E 7.1 Runner Tutorial
100% (3)
Cimatron E 7.1 Runner Tutorial
15 pages
CMS Proposal For LCD Screen
No ratings yet
CMS Proposal For LCD Screen
5 pages
CEH v12 Exam Practice Questions
No ratings yet
CEH v12 Exam Practice Questions
53 pages
5.01 Intro To Deswik - Cad UG CO Tutorial v3.3
No ratings yet
5.01 Intro To Deswik - Cad UG CO Tutorial v3.3
130 pages
GENESIS64 - Connecting To Remote ICONICS FrameWorX64 Server
No ratings yet
GENESIS64 - Connecting To Remote ICONICS FrameWorX64 Server
2 pages
Internet Café Billing System Overview
No ratings yet
Internet Café Billing System Overview
3 pages
Second Proofreading Test - Updated February 17 2016
No ratings yet
Second Proofreading Test - Updated February 17 2016
1 page
Test Inheritance PDF
60% (5)
Test Inheritance PDF
9 pages
CCC Gtu 1000 Question
No ratings yet
CCC Gtu 1000 Question
38 pages
Theory of Computation Course Overview
No ratings yet
Theory of Computation Course Overview
19 pages
Enhance Documents with Video & Design
No ratings yet
Enhance Documents with Video & Design
2 pages
VTU FINAL YEAR PROJECT REPORT Front Pages
100% (1)
VTU FINAL YEAR PROJECT REPORT Front Pages
11 pages
Hefei Kangshite Intelligent Technology Co., Ltd.
No ratings yet
Hefei Kangshite Intelligent Technology Co., Ltd.
28 pages
EC1307
No ratings yet
EC1307
12 pages
Teaching Ideas Lesson Five
No ratings yet
Teaching Ideas Lesson Five
3 pages
Top Web Design Services in Noida
No ratings yet
Top Web Design Services in Noida
7 pages

Informatics Practices Class 12 Cbse Notes Data Handling

Uploaded by

Informatics Practices Class 12 Cbse Notes Data Handling

Uploaded by

Chapter 1: Advance Operations On Dataframes.

Pivot takes 3 arguments with the following names:

DataFrame.pivot(<indexname>, <columnsname>, <valuesname>)

Here, the aggregate function is the mean.

Syntax for pivot_table():

NOTE: The default aggregate function is the mean.

sort_values(): to sort pandas data frame by one or more columns

sort_index(): to sort pandas data frame by row index

1) sort_values(): Pandas sort_values() function sorts a data frame in Ascending or

DataFrame.sort_values(by, axis=0, ascending=True/False,

Sorting by multiple columns:

0 sorts by the column index, and 1 sorts by the row index.

Consider the example below,

Df.var() : For variance of everything

df.loc[:,“<columnname>"].var() for variance of specific column

df.var(axis=0) : For variance of columns

df.var(axis=1) : For variance of rows

It can also refer to dividing a distribution into areas of equal probability.

2 4 5 7 9 11 12 17 19 21 22 31 35 36 45 44 55 68 79 80 81 88 90 91 92 100 112 113 114 120 121 132

ithobservation = .3 (40 + 1) = 12.3

Consider another example:

So the value is given by 10+0.5(100-10) =55

The bins are usually specified as consecutive, non-overlapping intervals of a variable.

 Table wise Function Application: pipe()

Table wise Function Application: pipe():

The operation is performed on all elements in te table.

It can be seen in the following code:

Axis =0 is for columns and axis=1 is for rows

You might also like