100% found this document useful (1 vote)
61 views

Pandas Ip PDF

sum of S: 115

Uploaded by

abc
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
61 views

Pandas Ip PDF

sum of S: 115

Uploaded by

abc
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

PANDAS

The origin of the name panda is the Nepalese word


'nigalya ponya', which means 'eater of bamboo'.
Data Handling using Pandas -I

• Introduction to Python libraries- Pandas


• Data structures in Pandas - Series and Data Frames.
• Series: Creation of Series from – ndarray, dictionary, scalar value;
mathematical operations;
• Head and Tail functions;
• Selection,
• Indexing
• Slicing
[pandas] is derived from the term "panel data", an **econometrics term for
data sets that include observations over multiple time periods for the same
individuals.

**the branch of economics concerned with the use of mathematical methods


(especially statistics) in describing economic systems.
Pandas is a high-level data manipulation tool developed by Wes McKinney
What's Pandas for?
Pandas has so many uses

This tool is essentially your data’s home. Through pandas, you get acquainted with your
data by cleaning, transforming, and analyzing it.

For example, say you want to explore a dataset stored in a CSV on your computer. Pandas
will extract the data from that CSV into a DataFrame — a table, basically — then let you
do things like:

Calculate statistics and answer questions about the data, like


• What's the average, median, max, or min of each column?
• Does column A correlate with column B?
• What does the distribution of data in column C look like?
• Clean the data by doing things like removing missing values and filtering rows or
columns by some criteria
• Visualize the data with help from Matplotlib. Plot bars, lines, histograms, bubbles,
and more.
• Store the cleaned, transformed data back into a CSV, other file or database
Pandas is built on top of the NumPy package, meaning a
lot of the structure of NumPy is used or replicated in
Pandas. Data in pandas is often used to feed statistical
analysis in SciPy, plotting functions from Matplotlib, and
machine learning algorithms in Scikit-learn.
What is a Pandas Series?

What is a Pandas Dataframe?


Core components of pandas: Series and DataFrames
The primary two components of pandas are the Series and DataFrame.

A Series is essentially a column, and a DataFrame is a multi-dimensional


table made up of a collection of Series.
A fruit Shop sells apples and oranges. We want to have a column(Series) for
each fruit and a row for each customer purchase.
Creating A Series
import pandas as pd 0 30
import numpy as np 1 20
# simple array 2 60
a= np.array([30,20,60,17]) 3 17
o=np.array([560,345,756,298]) dtype: int64
0 560
apple = pd.Series(a) 1 345
print(apple) 2 756
orange = pd.Series(o) 3 298
print(orange) dtype: int64
import pandas as pd
d1={"cust-1":10,"Cust-2":20} Original dictionary:
print("Original dictionary:") {'cUst-1': 10, 'Cust-2': 20}
print(d1) Converted series:
cUst-1 10
new_series = pd.Series(d1) Cust-2 20
print("Converted series:") dtype: int64
print(new_series)
import pandas as pd
# a simple list
0 k
list = ['k', 'd', 'a', 'v']
1 d
# create series form a list
2 a
ser = pd.Series(list)
3 v
print(ser)
dtype: object
import pandas as pd
0 10
1 10
2 10
3 10
# giving a scalar value with index 4 10
ser = pd.Series(10, index =[0, 1, 2, 3, 4, 5]) 5 10
dtype: int64

print(ser)
import pandas as pd
0 10
1 10
2 10
3 10
# giving a scalar value with index 4 10
ser = pd.Series(10, index =[0, 1, 2, 3, 4, 5]) 5 10
dtype: int64

print(ser)
Create an empty Series

import pandas as pd
S = pd.Series(dtype =int)
print(S)

Series([], dtype: int32)


Create a Series using range()

import pandas as pd
S = pd.Series(range(2,24,3))
print(S)
0 2
1 5
2 8
3 11
4 14
5 17
6 20
7 23
Create a Series using range() and for loop

import pandas as pd
S = pd.Series(range(1,12,2),index=[x for x in ‘abcdef’])
print(S)
a 1
b 3
c 5
d 7
e 9
f 11
dtype: int64
Create a Series using a list with floating point

import pandas as pd
S = pd.Series([1,8,9.5])
print(S)
0 1.0
1 8.0
2 9.5
dtype: float64
Create a Series using two different list

import pandas as pd
Fruits=['Apple','Orange','Banana']
NoofFruitssold=[12,45,67]
S = pd.Series(NoofFruitssold,index=Fruits)
print(S)
Apple 12
Orange 45
Banana 67
dtype: int64
Create a Series using missing values

import pandas as pd
import numpy as np
Fruits=['Apple','Orange','Banana']
NoofFruitssold=[12,45,np.NaN]
S = pd.Series(NoofFruitssold,index=Fruits)
print(S)
Apple 12.0
Orange 45.0
Banana NaN
dtype: float64
Create a Series using mathematical expression

import pandas as pd
import numpy as np
L1 =np.arange(2,24,2)
ind =range(1,12)
S = pd.Series(index=ind,data=L1*2)
print(S)
1 4
2 8
3 12
4 16
5 20
6 24
7 28
8 32
9 36
10 40
11 44
dtype: int32
Write the output ????

import pandas as pd
import numpy as np
L1 =np.array([4,9,16,25])
ind =range(4)
S = pd.Series(index=ind,data=L1**0.5)
print(S)
0 2.0
1 3.0
2 4.0
3 5.0
dtype: float64
A Series in Pandas can be created using library Series() method

Any list ,dictionary data,scalar value can be converted to a


series

Syntax :- pandas.Series( data, index)


A series can be created using various inputs like −

Array
Dict
Scalar value or constant
Mathematical expression
Data Structure Dimensions Description
Series 1 1D labeled homogeneous array,
size immutable.
Values of Data Mutable

a Series is Size immutable, which means once a Series object is created operations such as appending/deleting which
would change the size of the object are not allowed.
Mathematical Operations on Series
Mean, median, and mode are three kinds of "averages"

The "mean" :you add up all the numbers and then divide
by the number of numbers.
The "median" is the "middle" value in the list of numbers.
To find the median, your numbers have to be listed in
numerical order from smallest to largest,
The "mode" is the value that occurs most often. If no
number in the list is repeated, then there is no mode for
the list.
Find the mean, median, mode, and range for the following list of values:
13, 18, 13, 14, 13, 16, 14, 21, 13
Mean
(13 + 18 + 13 + 14 + 13 + 16 + 14 + 21 + 13) ÷ 9 = 15

The median is the middle value, so first I'll have to rewrite the list in numerical order:

13, 13, 13, 13, 14, 14, 16, 18, 21

There are nine numbers in the list, so the middle one will be the (9 + 1) ÷ 2 = 10 ÷ 2 = 5th number:

13, 13, 13, 13, 14, 14, 16, 18, 21

So the median is 14.

The mode is the number that is repeated more often than any other, so 13 is the mode.

The largest value in the list is 21, and the smallest is 13, so the range is 21 – 13 = 8.
mean: 15
median: 14
mode: 13
range: 8
import pandas as pd indexmin=s.idxmin()
s = pd.Series([3,45,1,2,3]) indexmax=s.idxmax()
ss = s.sum() print(indexmin)
print(ss)
cc=s.count()
print(indexmax)
print(cc) md =s.median()
mm=s.mean() mo=s.mode()
print(mm) print(md)
stds=s.std() print(mo)
print(stds) va=s.value_counts()
mn=s.min() print(va)
mx=s.max()
print(mn)
al=s.describe()
print(mx) print(al)
FUNCTION USE
s.sum() Returns sum of all values in the series
s.mean() Returns mean of all values in series. Equals to
s.sum()/s.count()
s.std() Returns standard deviation of all values
s.min() or s.max() Return min and max values from series
s.idxmin() or
Returns index of min or max value in series
s.idxmax()
s.median() Returns median of all value
s.mode() Returns mode of the series
s.value_counts() Returns series with frequency of each value
Returns a series with information like mean, mode
s.describe()
etc depending on dtype of data passed
Vector Operations on Series
import pandas as pd
s = pd.Series([1,2,3])
t = pd.Series([13,24,54])
0 14
u=s+t 1 26
print (u) 2 57
dtype: int64
w=s+2 0 3
1 4
print(w) 2 5
dtype: int64
fruits = ['apples', 'oranges', 'cherries', 'pears']
S = pd.Series([20, 33, 52, 10], index=fruits)
S2 = pd.Series([17, 13, 31, 32], index=fruits)
print(S + S2)
print("sum of S: ", sum(S))

apples 37
oranges 46
cherries 83
pears 42
dtype: int64
sum of S: 115
import pandas as pd
S = pd.Series([11, 28, 72, 3, 5, 8])
print(S.index)
print(S.values)

RangeIndex(start=0, stop=6, step=1)


[11 28 72 3 5 8]
PANDAS-Series
Selection, Indexing and Slicing.
Selection in Series
Original Series :
Accessing data in a series 0 56
1 45
import pandas as pd 2 90
3 45
s = pd.Series([56,45,90,45,32,78]) 4 32
print("Original Series :") 5 78
dtype: int64
print(s) Element at 0 position :
print("Element at 0 position :") 56
print(s[0]) Element from start till 2 nd
index
print("Element from start till 2 nd 0 56
index") 1 45
2 90
print(s[:3]) dtype: int64
print("Ellement from -3 index till end:") Ellement from -3 index till end:
print(s[-3:]) 3 45
4 32
5 78
iIoc and loc
Original Series :
a 56
import pandas as pd b 45
s = pd.Series([56,45,90,45,32,78],index =['a','b','c','d','e','f']) c 90
d 45
print("Original Series :") e 32
f 78
print(s) dtype: int64
print("Element at 1 ,2 (b,c) indices :") Element at 1 ,2 (b,c) indices :
b 45
print(s.iloc[1:3]) c 90
print("Element at index c,d,e,f") dtype: int64
Element at index c,d,e,f
print(s.loc['c':'f']) c 90
d 45
e 32
f 78
dtype: int64
Retrieve Data from selection
There are two methods for data selection:
 loc gets rows (or columns) with particular labels from the
index.
 iloc gets rows (or columns) at particular positions in the index
(so it only takes integers).
Reindex

import pandas as pd
s = pd.Series([56,45,90,45,32,78],index =['a','b','c','d','e','f'])
s =s.reindex(['f','a','c','d','e','b'])
print(s)
runfile('C:/Users/PC/untitled0.py', wdir='C:/Users/PC')
f 78
a 56
c 90
d 45
e 32
b 45
dtype: int64

You might also like