0% found this document useful (0 votes)
5 views

Day 1 Pandas Library in Python 1729578062

The document provides an overview of the Pandas library in Python, focusing on the creation and manipulation of Pandas Series. It explains how to create Series from lists and dictionaries, customize indexes, and utilize attributes and methods for data analysis. Additionally, it discusses reading data from CSV files into Series format and demonstrates various operations on Series objects.

Uploaded by

Ankur Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Day 1 Pandas Library in Python 1729578062

The document provides an overview of the Pandas library in Python, focusing on the creation and manipulation of Pandas Series. It explains how to create Series from lists and dictionaries, customize indexes, and utilize attributes and methods for data analysis. Additionally, it discusses reading data from CSV files into Series format and demonstrates various operations on Series objects.

Uploaded by

Ankur Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

5/29/23, 11:14 AM 1.

pandas Series - Jupyter Notebook

Pandas
Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.

pandas website : https://pandas.pydata.org/about/index.html (https://pandas.pydata.org/about/index.html)

Pandas Series
A Pandas Series is like a column in a table. It is a 1-D array holding data of any type.

Importing Pandas
In [1]:

1 import numpy as np
2 import pandas as pd

making pandas series using the list

In [2]:

1 country = ['india','pakistan','usa','nepal','srilanka']
2
3 print(pd.Series(country))
4 '''here pandas is a library,
5 inside the pandas series is a class,
6 you made a object of the class series,
7 and passed a country value in the constructor'''

0 india
1 pakistan
2 usa
3 nepal
4 srilanka
dtype: object

Out[2]:

'here pandas is a library,\ninside the pandas series is a class,\nyou made


a object of the class series,\nand passed a country value in the construct
or'

in series object you will have mainly two parts. one is value and another is index assigned to each value
here data type is object
object is basically string in the pandas

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 1/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [3]:

1 # integers series
2 runs = [13,24,56,78,110]
3 runs_score = pd.Series(runs)
4 runs_score

Out[3]:

0 13
1 24
2 56
3 78
4 110
dtype: int64

in the above two series which we made index is generated automatically


we can generate the indexes in a custom way also

In [4]:

1 # custom index
2 marks = [67,57,80,100]
3 subject = ['maths','english','science','hindi']
4
5 '''we want a series in such a way that,
6 marks will be the value where as subject will be the index,
7 for that you have to use the parameter index inside the class Series'''
8
9 pd.Series(marks,index=subject)
10 # here data type of values is integer

Out[4]:

maths 67
english 57
science 80
hindi 100
dtype: int64

In [5]:

1 # we can give name to our series objects


2 pd.Series(marks, index=subject, name='Himanshu marks')

Out[5]:

maths 67
english 57
science 80
hindi 100
Name: Himanshu marks, dtype: int64

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 2/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [6]:

1 # we are storing the series in marks variable


2 result=pd.Series(marks, index=subject, name='Himanshu marks')
3 result

Out[6]:

maths 67
english 57
science 80
hindi 100
Name: Himanshu marks, dtype: int64

making pandas series using the dictionary

In [7]:

1 marks = {'maths':67,'english':57,'science':80,'hindi':100}
2 mark_series = pd.Series(marks, name='Himanshu marks')
3 mark_series

Out[7]:

maths 67
english 57
science 80
hindi 100
Name: Himanshu marks, dtype: int64

Series Attributes
In [8]:

1 marks = {'maths':67,'english':57,'science':80,'hindi':100}
2 mark_series = pd.Series(marks, name='Himanshu marks')
3 mark_series

Out[8]:

maths 67
english 57
science 80
hindi 100
Name: Himanshu marks, dtype: int64

size

In [9]:

1 mark_series.size

Out[9]:

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 3/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

dtype

In [10]:

1 mark_series.dtype

Out[10]:

dtype('int64')

name

In [11]:

1 mark_series.name

Out[11]:

'Himanshu marks'

is_unique

it tells us that whether all the items inside the series are unique or not

In [12]:

1 mark_series.is_unique

Out[12]:

True

In [13]:

1 pd.Series([1,1,2,3,5,6,6]).is_unique

Out[13]:

False

index

In [14]:

1 mark_series.index

Out[14]:

Index(['maths', 'english', 'science', 'hindi'], dtype='object')

values

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 4/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [15]:

1 mark_series.values

Out[15]:

array([ 67, 57, 80, 100], dtype=int64)

Series using read_csv


we will make Series from the real data

In [16]:

1 data=pd.read_csv('subs.csv')
2 print(data)
3 type(data)
4 # for now this data type is dataframe

Subscribers gained
0 48
1 57
2 40
3 43
4 44
.. ...
360 231
361 226
362 155
363 144
364 172

[365 rows x 1 columns]

Out[16]:

pandas.core.frame.DataFrame

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 5/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [17]:

1 '''to convert this data into Series datatype we will be using


2 the parameter squeeze=True'''
3 print(pd.read_csv('subs.csv', squeeze=True))
4 print(type(pd.read_csv('subs.csv', squeeze=True)))
5 # with help of squeeze parameter it is converted to Series datatype

0 48
1 57
2 40
3 43
4 44
...
360 231
361 226
362 155
363 144
364 172
Name: Subscribers gained, Length: 365, dtype: int64
<class 'pandas.core.series.Series'>

C:\Users\gadha\AppData\Local\Temp\ipykernel_14052\2828103470.py:3: FutureW
arning: The squeeze argument has been deprecated and will be removed in a
future version. Append .squeeze("columns") to the call to squeeze.

print(pd.read_csv('subs.csv', squeeze=True))
C:\Users\gadha\AppData\Local\Temp\ipykernel_14052\2828103470.py:4: FutureW
arning: The squeeze argument has been deprecated and will be removed in a
future version. Append .squeeze("columns") to the call to squeeze.

print(type(pd.read_csv('subs.csv', squeeze=True)))

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 6/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [18]:

1 subs = pd.read_csv('subs.csv', squeeze=True)


2 print(subs)
3 '''value are truncated because we have to many values in the data'''

0 48
1 57
2 40
3 43
4 44
...
360 231
361 226
362 155
363 144
364 172
Name: Subscribers gained, Length: 365, dtype: int64

C:\Users\gadha\AppData\Local\Temp\ipykernel_14052\1160397978.py:1: FutureW
arning: The squeeze argument has been deprecated and will be removed in a
future version. Append .squeeze("columns") to the call to squeeze.

subs = pd.read_csv('subs.csv', squeeze=True)

Out[18]:

'value are truncated because we have to many values in the data'

In [19]:

1 kohli = pd.read_csv('kohli_ipl.csv')
2 print(kohli)
3 print(type(kohli))

match_no runs
0 1 1
1 2 23
2 3 13
3 4 12
4 5 1
.. ... ...
210 211 0
211 212 20
212 213 73
213 214 25
214 215 7

[215 rows x 2 columns]


<class 'pandas.core.frame.DataFrame'>

here we will provide the index name also

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 7/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [20]:

1 vk = pd.read_csv('kohli_ipl.csv', index_col='match_no', squeeze=True)


2 print(vk)
3 type(vk)

match_no
1 1
2 23
3 13
4 12
5 1
..
211 0
212 20
213 73
214 25
215 7
Name: runs, Length: 215, dtype: int64

C:\Users\gadha\AppData\Local\Temp\ipykernel_14052\3868690441.py:1: FutureW
arning: The squeeze argument has been deprecated and will be removed in a
future version. Append .squeeze("columns") to the call to squeeze.

vk = pd.read_csv('kohli_ipl.csv', index_col='match_no', squeeze=True)

Out[20]:

pandas.core.series.Series

In [21]:

1 print(pd.read_csv('bollywood.csv'))
2 type(pd.read_csv('bollywood.csv'))

movie lead
0 Uri: The Surgical Strike Vicky Kaushal
1 Battalion 609 Vicky Ahuja
2 The Accidental Prime Minister (film) Anupam Kher
3 Why Cheat India Emraan Hashmi
4 Evening Shadows Mona Ambegaonkar
... ... ...
1495 Hum Tumhare Hain Sanam Shah Rukh Khan
1496 Aankhen (2002 film) Amitabh Bachchan
1497 Saathiya (film) Vivek Oberoi
1498 Company (film) Ajay Devgn
1499 Awara Paagal Deewana Akshay Kumar

[1500 rows x 2 columns]

Out[21]:

pandas.core.frame.DataFrame

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 8/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [22]:

1 movies = pd.read_csv('bollywood.csv', index_col='movie', squeeze=True)


2 print(movies)
3 type(movies)

movie
Uri: The Surgical Strike Vicky Kaushal
Battalion 609 Vicky Ahuja
The Accidental Prime Minister (film) Anupam Kher
Why Cheat India Emraan Hashmi
Evening Shadows Mona Ambegaonkar
...
Hum Tumhare Hain Sanam Shah Rukh Khan
Aankhen (2002 film) Amitabh Bachchan
Saathiya (film) Vivek Oberoi
Company (film) Ajay Devgn
Awara Paagal Deewana Akshay Kumar
Name: lead, Length: 1500, dtype: object

C:\Users\gadha\AppData\Local\Temp\ipykernel_14052\3858941184.py:1: FutureW
arning: The squeeze argument has been deprecated and will be removed in a
future version. Append .squeeze("columns") to the call to squeeze.

movies = pd.read_csv('bollywood.csv', index_col='movie', squeeze=True)

Out[22]:

pandas.core.series.Series

Series methods

head

In [23]:

1 # by default it shows first 5 rows


2 vk.head()

Out[23]:

match_no
1 1
2 23
3 13
4 12
5 1
Name: runs, dtype: int64

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 9/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [24]:

1 subs.head()

Out[24]:

0 48
1 57
2 40
3 43
4 44
Name: Subscribers gained, dtype: int64

In [25]:

1 # we want first 10 rows here


2 movies.head(10)

Out[25]:

movie
Uri: The Surgical Strike Vicky Kaushal
Battalion 609 Vicky Ahuja
The Accidental Prime Minister (film) Anupam Kher
Why Cheat India Emraan Hashmi
Evening Shadows Mona Ambegaonkar
Soni (film) Geetika Vidya Ohlyan
Fraud Saiyaan Arshad Warsi
Bombairiya Radhika Apte
Manikarnika: The Queen of Jhansi Kangana Ranaut
Thackeray (film) Nawazuddin Siddiqui
Name: lead, dtype: object

tail

In [26]:

1 # it provides last 5 rows


2 vk.tail()

Out[26]:

match_no
211 0
212 20
213 73
214 25
215 7
Name: runs, dtype: int64

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 10/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [27]:

1 # here we want last 10 rows


2 movies.tail(10)

Out[27]:

movie
Raaz (2002 film) Dino Morea
Zameen (2003 film) Ajay Devgn
Waisa Bhi Hota Hai Part II Arshad Warsi
Devdas (2002 Hindi film) Shah Rukh Khan
Kaante Amitabh Bachchan
Hum Tumhare Hain Sanam Shah Rukh Khan
Aankhen (2002 film) Amitabh Bachchan
Saathiya (film) Vivek Oberoi
Company (film) Ajay Devgn
Awara Paagal Deewana Akshay Kumar
Name: lead, dtype: object

sample

it randomly picks out any one row from the whole data
sample is generated randomly so it is helpful when data is bais, so we can reduce the baisness by
picking the random data with help of sample

In [28]:

1 subs.sample()

Out[28]:

180 93
Name: Subscribers gained, dtype: int64

In [29]:

1 # here we will get the 5 random rows from the data


2 movies.sample(5)

Out[29]:

movie
31st October (film) Soha Ali Khan
Brothers (2015 film) Akshay Kumar
Fredrick (film) Avinash Dhyani
Banjo (2016 film) Riteish Deshmukh
Jhootha Kahin Ka Rishi Kapoor
Name: lead, dtype: object

value_counts()

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 11/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [30]:

1 # suppose we want that each actor has done how many movies
2 # basically we want the frequency count of the each value
3 movies.value_counts()
4 # it will be in descending order

Out[30]:

Akshay Kumar 48
Amitabh Bachchan 45
Ajay Devgn 38
Salman Khan 31
Sanjay Dutt 26
..
Diganth 1
Parveen Kaur 1
Seema Azmi 1
Akanksha Puri 1
Edwin Fernandes 1
Name: lead, Length: 566, dtype: int64

In [31]:

1 # we can have it in the ascending order also


2 movies.value_counts(ascending=True)

Out[31]:

Sharib Hashmi 1
Ravi Kishan 1
Sagar Bhangade 1
Harish Chabbra 1
Bidita Bag 1
..
Sanjay Dutt 26
Salman Khan 31
Ajay Devgn 38
Amitabh Bachchan 45
Akshay Kumar 48
Name: lead, Length: 566, dtype: int64

sort_values()

for sorting the series value wise

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 12/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [32]:

1 vk.sort_values()
2 # our whole series will be sorted in the ascending order

Out[32]:

match_no
87 0
211 0
207 0
206 0
91 0
...
164 100
120 100
123 108
126 109
128 113
Name: runs, Length: 215, dtype: int64

In [33]:

1 vk.sort_values(ascending=False)
2 # for descending order use parameter ascending

Out[33]:

match_no
128 113
126 109
123 108
164 100
120 100
...
93 0
211 0
130 0
8 0
135 0
Name: runs, Length: 215, dtype: int64

In [34]:

1 vk.sort_values(ascending=False).head(1).values
2 # it will give us the numpy arrays with value 113
3 # sort_values will not do permanent changes in our data

Out[34]:

array([113], dtype=int64)

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 13/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [35]:

1 vk.sort_values(ascending=False).head(1).values[0]
2 '''this type of code is called method chaining where
3 we will use one function after another,
4 so first part will become the input for next part and so on'''

Out[35]:

'this type of code is called method chaining where\nwe will use one functi
on after another,\nso first part will become the input for next part and s
o on'

In [36]:

1 '''if we provide True for the parameter inplace then


2 it makes the permanets changes in the series'''
3 # vk.sort_values(inplace=True)

Out[36]:

'if we provide True for the parameter inplace then \nit makes the permanet
s changes in the series'

In [37]:

1 # print(vk)
2 # changes have been made permanently in the original series

sort_index()

sort the series from index

In [38]:

1 movies.sort_index()
2 # here also we can use the inplace parameter to make the changes permanently

Out[38]:

movie
1920 (film) Rajniesh Duggall
1920: London Sharman Joshi
1920: The Evil Returns Vicky Ahuja
1971 (2007 film) Manoj Bajpayee
2 States (2014 film) Arjun Kapoor
...
Zindagi 50-50 Veena Malik
Zindagi Na Milegi Dobara Hrithik Roshan
Zindagi Tere Naam Mithun Chakraborty
Zokkomon Darsheel Safary
Zor Lagaa Ke...Haiya! Meghan Jadhav
Name: lead, Length: 1500, dtype: object

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 14/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

Series Maths Methods

count

it will not consider the missing values

In [39]:

1 vk.count()

Out[39]:

215

sum

In [40]:

1 subs.sum()

Out[40]:

49510

In [41]:

1 subs.product()
2 # to get the multiplication

Out[41]:

mean

In [42]:

1 subs.mean()

Out[42]:

135.64383561643837

In [43]:

1 vk.median()

Out[43]:

24.0

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 15/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [44]:

1 print(movies.mode())

0 Akshay Kumar
Name: lead, dtype: object

In [45]:

1 subs.std()

Out[45]:

62.67502303725269

In [46]:

1 vk.var()

Out[46]:

688.0024777222344

min/max

In [47]:

1 subs.min()

Out[47]:

33

In [48]:

1 subs.max()

Out[48]:

396

describe

it gives the summary of numerical values

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 16/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [49]:

1 vk.describe()

Out[49]:

count 215.000000
mean 30.855814
std 26.229801
min 0.000000
25% 9.000000
50% 24.000000
75% 48.000000
max 113.000000
Name: runs, dtype: float64

In [50]:

1 subs.describe()

Out[50]:

count 365.000000
mean 135.643836
std 62.675023
min 33.000000
25% 88.000000
50% 123.000000
75% 177.000000
max 396.000000
Name: Subscribers gained, dtype: float64

Series Indexing
In [51]:

1 # integer indexing
2 x = pd.Series([12,13,14,35,46,57,58,79,9])
3 x

Out[51]:

0 12
1 13
2 14
3 35
4 46
5 57
6 58
7 79
8 9
dtype: int64

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 17/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [52]:

1 x[0]

Out[52]:

12

Series will not work on the negative indexing if the custom index is a integer
if the custom index is string then negative indexing will work

In [53]:

1 x[-1]
2 # it will throw an error

-----------------------------------------------------------------------
----
ValueError Traceback (most recent call l
ast)
~\anaconda3\lib\site-packages\pandas\core\indexes\range.py in get_loc(s
elf, key, method, tolerance)
384 try:
--> 385 return self._range.index(new_key)
386 except ValueError as err:

ValueError: -1 is not in range

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call l


ast)
~\AppData\Local\Temp\ipykernel_14052\2813742014.py in <module>
----> 1 x[-1]
2 # it will throw an error

In [55]:

1 vk

Out[55]:

match_no
1 1
2 23
3 13
4 12
5 1
..
211 0
212 20
213 73
214 25
215 7
Name: runs, Length: 215, dtype: int64

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 18/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [56]:

1 vk[-1]
2 # this will throw an error
3 # here custom indexing is integer

-----------------------------------------------------------------------
----
KeyError Traceback (most recent call l
ast)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(se
lf, key, method, tolerance)
3628 try:
-> 3629 return self._engine.get_loc(casted_key)
3630 except KeyError as err:

~\anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.in
dex.IndexEngine.get_loc()

~\anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.in
dex.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64
HashTable.get_item()

pandas\ libs\hashtable class helper p i i d lib h ht bl I t64


In [57]:

1 movies

Out[57]:

movie
Uri: The Surgical Strike Vicky Kaushal
Battalion 609 Vicky Ahuja
The Accidental Prime Minister (film) Anupam Kher
Why Cheat India Emraan Hashmi
Evening Shadows Mona Ambegaonkar
...
Hum Tumhare Hain Sanam Shah Rukh Khan
Aankhen (2002 film) Amitabh Bachchan
Saathiya (film) Vivek Oberoi
Company (film) Ajay Devgn
Awara Paagal Deewana Akshay Kumar
Name: lead, Length: 1500, dtype: object

In [58]:

1 movies[-1]
2 # here custom indexing is a string so negative indexing will work

Out[58]:

'Akshay Kumar'

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 19/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [59]:

1 print(movies[0])
2 print(movies['Uri: The Surgical Strike'])
3 # we can fetch the values in above two ways

Vicky Kaushal
Vicky Kaushal

slicing

In [60]:

1 vk

Out[60]:

match_no
1 1
2 23
3 13
4 12
5 1
..
211 0
212 20
213 73
214 25
215 7
Name: runs, Length: 215, dtype: int64

In [61]:

1 vk[5:16]

Out[61]:

match_no
6 9
7 34
8 0
9 21
10 3
11 10
12 38
13 3
14 11
15 50
16 2
Name: runs, dtype: int64

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 20/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [62]:

1 vk[-5:]

Out[62]:

match_no
211 0
212 20
213 73
214 25
215 7
Name: runs, dtype: int64

In [63]:

1 movies[-5:]

Out[63]:

movie
Hum Tumhare Hain Sanam Shah Rukh Khan
Aankhen (2002 film) Amitabh Bachchan
Saathiya (film) Vivek Oberoi
Company (film) Ajay Devgn
Awara Paagal Deewana Akshay Kumar
Name: lead, dtype: object

In [64]:

1 movies[::2]
2 # 2 is a step or jump value

Out[64]:

movie
Uri: The Surgical Strike Vicky Kaushal
The Accidental Prime Minister (film) Anupam Kher
Evening Shadows Mona Ambegaonkar
Fraud Saiyaan Arshad Warsi
Manikarnika: The Queen of Jhansi Kangana Ranaut
...
Raaz (2002 film) Dino Morea
Waisa Bhi Hota Hai Part II Arshad Warsi
Kaante Amitabh Bachchan
Aankhen (2002 film) Amitabh Bachchan
Company (film) Ajay Devgn
Name: lead, Length: 750, dtype: object

fancy indexing

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 21/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [65]:

1 vk[[1,3,5,6]]

Out[65]:

match_no
1 1
3 13
5 1
6 9
Name: runs, dtype: int64

indexing with labels

In [66]:

1 movies

Out[66]:

movie
Uri: The Surgical Strike Vicky Kaushal
Battalion 609 Vicky Ahuja
The Accidental Prime Minister (film) Anupam Kher
Why Cheat India Emraan Hashmi
Evening Shadows Mona Ambegaonkar
...
Hum Tumhare Hain Sanam Shah Rukh Khan
Aankhen (2002 film) Amitabh Bachchan
Saathiya (film) Vivek Oberoi
Company (film) Ajay Devgn
Awara Paagal Deewana Akshay Kumar
Name: lead, Length: 1500, dtype: object

In [67]:

1 movies['Hum Tumhare Hain Sanam']

Out[67]:

'Shah Rukh Khan'

Editing Series

using indexing

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 22/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [68]:

1 marks = {'maths':67,'english':57,'science':80,'hindi':100}
2 mark_series = pd.Series(marks, name='Himanshu marks')
3 mark_series

Out[68]:

maths 67
english 57
science 80
hindi 100
Name: Himanshu marks, dtype: int64

In [69]:

1 mark_series[1]=100
2 mark_series

Out[69]:

maths 67
english 100
science 80
hindi 100
Name: Himanshu marks, dtype: int64

what if an index does not exist

In [70]:

1 mark_series['social study'] = 80
2 '''this will not throw an error, instead it will add
3 new value in the existing Series'''
4 mark_series

Out[70]:

maths 67
english 100
science 80
hindi 100
social study 80
Name: Himanshu marks, dtype: int64

slicing

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 23/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [71]:

1 runs = [13,24,56,78,110]
2 runs_score = pd.Series(runs)
3 runs_score

Out[71]:

0 13
1 24
2 56
3 78
4 110
dtype: int64

In [72]:

1 runs_score

Out[72]:

0 13
1 24
2 56
3 78
4 110
dtype: int64

In [73]:

1 runs_score[2:4]=[90,95]
2 runs_score

Out[73]:

0 13
1 24
2 90
3 95
4 110
dtype: int64

fancy indexing

In [74]:

1 runs_score[[0,3,4]] = [0,0,0]
2 runs_score

Out[74]:

0 0
1 24
2 90
3 0
4 0
dtype: int64

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 24/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

using index label

In [75]:

1 movies

Out[75]:

movie
Uri: The Surgical Strike Vicky Kaushal
Battalion 609 Vicky Ahuja
The Accidental Prime Minister (film) Anupam Kher
Why Cheat India Emraan Hashmi
Evening Shadows Mona Ambegaonkar
...
Hum Tumhare Hain Sanam Shah Rukh Khan
Aankhen (2002 film) Amitabh Bachchan
Saathiya (film) Vivek Oberoi
Company (film) Ajay Devgn
Awara Paagal Deewana Akshay Kumar
Name: lead, Length: 1500, dtype: object

In [76]:

1 movies['Why Cheat India'] = 'himanshu gadhavi'


2 movies

Out[76]:

movie
Uri: The Surgical Strike Vicky Kaushal
Battalion 609 Vicky Ahuja
The Accidental Prime Minister (film) Anupam Kher
Why Cheat India himanshu gadhavi
Evening Shadows Mona Ambegaonkar
...
Hum Tumhare Hain Sanam Shah Rukh Khan
Aankhen (2002 film) Amitabh Bachchan
Saathiya (film) Vivek Oberoi
Company (film) Ajay Devgn
Awara Paagal Deewana Akshay Kumar
Name: lead, Length: 1500, dtype: object

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 25/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

Series with Python Functionalities


In [77]:

1 subs

Out[77]:

0 48
1 57
2 40
3 43
4 44
...
360 231
361 226
362 155
363 144
364 172
Name: Subscribers gained, Length: 365, dtype: int64

len / type / dir / sorted / max / min

In [78]:

1 print(len(subs))

365

In [79]:

1 print(type(subs))

<class 'pandas.core.series.Series'>

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 26/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [80]:

1 print(dir(subs))

['T', '_AXIS_LEN', '_AXIS_ORDERS', '_AXIS_TO_AXIS_NUMBER', '_HANDLED_TY


PES', '__abs__', '__add__', '__and__', '__annotations__', '__array__',
'__array_priority__', '__array_ufunc__', '__array_wrap__', '__bool__',
'__class__', '__contains__', '__copy__', '__deepcopy__', '__delattr__',
'__delitem__', '__dict__', '__dir__', '__divmod__', '__doc__', '__eq_
_', '__finalize__', '__float__', '__floordiv__', '__format__', '__ge_
_', '__getattr__', '__getattribute__', '__getitem__', '__getstate__',
'__gt__', '__hash__', '__iadd__', '__iand__', '__ifloordiv__', '__imod_
_', '__imul__', '__init__', '__init_subclass__', '__int__', '__invert_
_', '__ior__', '__ipow__', '__isub__', '__iter__', '__itruediv__', '__i
xor__', '__le__', '__len__', '__long__', '__lt__', '__matmul__', '__mod
__', '__module__', '__mul__', '__ne__', '__neg__', '__new__', '__nonzer
o__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmo
d__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rm
atmul__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__',
'__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__setitem__',
'__setstate__', '__sizeof__', '__str__', '__sub__', '__subclasshook__',
'__truediv__', '__weakref__', '__xor__', '_accessors', '_accum_func',
'_add_numeric_operations', '_agg_by_level', '_agg_examples_doc', '_agg_
l d ' ' li f ' ' li i ' ' d' ' ith th
In [81]:

1 print(sorted(subs))
2 # sorted function will save the output in the List data type
3 # it will be in ascending order

[33, 33, 35, 37, 39, 40, 40, 40, 40, 42, 42, 43, 44, 44, 44, 45, 46, 46, 4
8, 49, 49, 49, 49, 50, 50, 50, 51, 54, 56, 56, 56, 56, 57, 61, 62, 64, 65,
65, 66, 66, 66, 66, 67, 68, 70, 70, 70, 71, 71, 72, 72, 72, 72, 72, 73, 7
4, 74, 75, 76, 76, 76, 76, 77, 77, 78, 78, 78, 79, 79, 80, 80, 80, 81, 81,
82, 82, 83, 83, 83, 84, 84, 84, 85, 86, 86, 86, 87, 87, 87, 87, 88, 88, 8
8, 88, 88, 89, 89, 89, 90, 90, 90, 90, 91, 92, 92, 92, 93, 93, 93, 93, 95,
95, 96, 96, 96, 96, 97, 97, 98, 98, 99, 99, 100, 100, 100, 101, 101, 101,
102, 102, 103, 103, 104, 104, 104, 105, 105, 105, 105, 105, 105, 105, 105,
105, 108, 108, 108, 108, 108, 108, 109, 109, 110, 110, 110, 111, 111, 112,
113, 113, 113, 114, 114, 114, 114, 115, 115, 115, 115, 117, 117, 117, 118,
118, 119, 119, 119, 119, 120, 122, 123, 123, 123, 123, 123, 124, 125, 126,
127, 128, 128, 129, 130, 131, 131, 132, 132, 134, 134, 134, 135, 135, 136,
136, 136, 137, 138, 138, 138, 139, 140, 144, 145, 146, 146, 146, 146, 147,
149, 150, 150, 150, 150, 151, 152, 152, 152, 153, 153, 153, 154, 154, 154,
155, 155, 156, 156, 156, 156, 157, 157, 157, 157, 158, 158, 159, 159, 160,
160, 160, 160, 162, 164, 166, 167, 167, 168, 170, 170, 170, 170, 171, 172,
172, 173, 173, 173, 174, 174, 175, 175, 176, 176, 177, 178, 179, 179, 180,
180, 180, 182, 183, 183, 183, 184, 184, 184, 185, 185, 185, 185, 186, 186,
186, 188, 189, 190, 190, 192, 192, 192, 196, 196, 196, 197, 197, 202, 202,
202, 203, 204, 206, 207, 209, 210, 210, 211, 212, 213, 214, 216, 219, 220,
221, 221, 222, 222, 224, 225, 225, 226, 227, 228, 229, 230, 231, 233, 236,
236, 237, 241, 243, 244, 245, 247, 249, 254, 254, 258, 259, 259, 261, 261,
265, 267, 268, 269, 276, 276, 290, 295, 301, 306, 312, 396]

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 27/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [82]:

1 print(sorted(subs, reverse=True))
2 # with reverse parameter it will become true

[396, 312, 306, 301, 295, 290, 276, 276, 269, 268, 267, 265, 261, 261, 25
9, 259, 258, 254, 254, 249, 247, 245, 244, 243, 241, 237, 236, 236, 233, 2
31, 230, 229, 228, 227, 226, 225, 225, 224, 222, 222, 221, 221, 220, 219,
216, 214, 213, 212, 211, 210, 210, 209, 207, 206, 204, 203, 202, 202, 202,
197, 197, 196, 196, 196, 192, 192, 192, 190, 190, 189, 188, 186, 186, 186,
185, 185, 185, 185, 184, 184, 184, 183, 183, 183, 182, 180, 180, 180, 179,
179, 178, 177, 176, 176, 175, 175, 174, 174, 173, 173, 173, 172, 172, 171,
170, 170, 170, 170, 168, 167, 167, 166, 164, 162, 160, 160, 160, 160, 159,
159, 158, 158, 157, 157, 157, 157, 156, 156, 156, 156, 155, 155, 154, 154,
154, 153, 153, 153, 152, 152, 152, 151, 150, 150, 150, 150, 149, 147, 146,
146, 146, 146, 145, 144, 140, 139, 138, 138, 138, 137, 136, 136, 136, 135,
135, 134, 134, 134, 132, 132, 131, 131, 130, 129, 128, 128, 127, 126, 125,
124, 123, 123, 123, 123, 123, 122, 120, 119, 119, 119, 119, 118, 118, 117,
117, 117, 115, 115, 115, 115, 114, 114, 114, 114, 113, 113, 113, 112, 111,
111, 110, 110, 110, 109, 109, 108, 108, 108, 108, 108, 108, 105, 105, 105,
105, 105, 105, 105, 105, 105, 104, 104, 104, 103, 103, 102, 102, 101, 101,
101, 100, 100, 100, 99, 99, 98, 98, 97, 97, 96, 96, 96, 96, 95, 95, 93, 9
3, 93, 93, 92, 92, 92, 91, 90, 90, 90, 90, 89, 89, 89, 88, 88, 88, 88, 88,
87, 87, 87, 87, 86, 86, 86, 85, 84, 84, 84, 83, 83, 83, 82, 82, 81, 81, 8
0, 80, 80, 79, 79, 78, 78, 78, 77, 77, 76, 76, 76, 76, 75, 74, 74, 73, 72,
72, 72, 72, 72, 71, 71, 70, 70, 70, 68, 67, 66, 66, 66, 66, 65, 65, 64, 6
2, 61, 57, 56, 56, 56, 56, 54, 51, 50, 50, 50, 49, 49, 49, 49, 48, 46, 46,
45, 44, 44, 44, 43, 42, 42, 40, 40, 40, 40, 39, 37, 35, 33, 33]

In [83]:

1 print(min(subs))

33

In [84]:

1 print(max(subs))

396

type conversion

In [85]:

1 mark_series

Out[85]:

maths 67
english 100
science 80
hindi 100
social study 80
Name: Himanshu marks, dtype: int64

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 28/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [86]:

1 list(mark_series)
2 # converting in the list form

Out[86]:

[67, 100, 80, 100, 80]

In [87]:

1 dict(mark_series)
2 # converting in the dictionary form

Out[87]:

{'maths': 67, 'english': 100, 'science': 80, 'hindi': 100, 'social study':
80}

membership operator

it only works only on the indexes of the series

In [88]:

1 movies

Out[88]:

movie
Uri: The Surgical Strike Vicky Kaushal
Battalion 609 Vicky Ahuja
The Accidental Prime Minister (film) Anupam Kher
Why Cheat India himanshu gadhavi
Evening Shadows Mona Ambegaonkar
...
Hum Tumhare Hain Sanam Shah Rukh Khan
Aankhen (2002 film) Amitabh Bachchan
Saathiya (film) Vivek Oberoi
Company (film) Ajay Devgn
Awara Paagal Deewana Akshay Kumar
Name: lead, Length: 1500, dtype: object

In [89]:

1 print('Saathiya (film)' in movies)


2 '''this will return True because it works on index only'''

True

Out[89]:

'this will return True because it works on index only'

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 29/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [90]:

1 print('Amitabh Bachchan' in movies)


2 '''this will return false because in operator
3 will work on the indexes only, not on values'''

False

Out[90]:

'this will return false because in operator\nwill work on the indexes onl
y, not on values'

In [91]:

1 '''if we want to search something from values then


2 we have to add .values after the series name'''
3 'Amitabh Bachchan' in movies.values

Out[91]:

True

looping

looping works only on the values of the Series

In [92]:

1 for i in movies:
2 print(i)

Vicky Kaushal
Vicky Ahuja
Anupam Kher
himanshu gadhavi
Mona Ambegaonkar
Geetika Vidya Ohlyan
Arshad Warsi
Radhika Apte
Kangana Ranaut
Nawazuddin Siddiqui
Ali Asgar
Ranveer Singh
Prit Kamani
Ajay Devgn
Sushant Singh Rajput
Amitabh Bachchan
Abhimanyu Dasani
Talha Arshad Reshi
Nawazuddin Siddiqui
G i A l

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 30/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [93]:

1 # to perform looping on the indexes


2 for i in movies.index :
3 print(i)

Uri: The Surgical Strike


Battalion 609
The Accidental Prime Minister (film)
Why Cheat India
Evening Shadows
Soni (film)
Fraud Saiyaan
Bombairiya
Manikarnika: The Queen of Jhansi
Thackeray (film)
Amavas
Gully Boy
Hum Chaar
Total Dhamaal
Sonchiriya
Badla (2019 film)
Mard Ko Dard Nahi Hota
Hamid (film)
Photograph (film)
Ri k

Arithmetic Operators

In [94]:

1 mark_series

Out[94]:

maths 67
english 100
science 80
hindi 100
social study 80
Name: Himanshu marks, dtype: int64

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 31/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [95]:

1 print(100 - mark_series)
2 '''this is good example of broadcastig
3 here we are using only one scalar 100 but
4 it is doing operation on every element
5 with help of broadcasting'''

maths 33
english 0
science 20
hindi 0
social study 20
Name: Himanshu marks, dtype: int64

Out[95]:

'this is good example of broadcastig \nhere we are using only one scalar 1
00 but\nit is doing operation on every element \nwith help of broadcastin
g'

In [96]:

1 100 + mark_series

Out[96]:

maths 167
english 200
science 180
hindi 200
social study 180
Name: Himanshu marks, dtype: int64

In [97]:

1 2 * mark_series

Out[97]:

maths 134
english 200
science 160
hindi 200
social study 160
Name: Himanshu marks, dtype: int64

In [98]:

1 2 / mark_series

Out[98]:

maths 0.029851
english 0.020000
science 0.025000
hindi 0.020000
social study 0.025000
Name: Himanshu marks, dtype: float64

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 32/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [99]:

1 mark_series**2

Out[99]:

maths 4489
english 10000
science 6400
hindi 10000
social study 6400
Name: Himanshu marks, dtype: int64

Relational Operators

In [100]:

1 vk

Out[100]:

match_no
1 1
2 23
3 13
4 12
5 1
..
211 0
212 20
213 73
214 25
215 7
Name: runs, Length: 215, dtype: int64

In [101]:

1 vk >= 50
2 # we will get an boolean series

Out[101]:

match_no
1 False
2 False
3 False
4 False
5 False
...
211 False
212 False
213 True
214 False
215 False
Name: runs, Length: 215, dtype: bool

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 33/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

Boolean Indexing on Series

Find no of 50's and 100's scored by kohli

In [102]:

1 vk >= 50

Out[102]:

match_no
1 False
2 False
3 False
4 False
5 False
...
211 False
212 False
213 True
214 False
215 False
Name: runs, Length: 215, dtype: bool

In [103]:

1 vk[vk >= 50]


2 '''
3 we are doing indexing on the boolean series
4 so we will get only those matches in which
5 virat kohli made 50 or more runs
6 '''

Out[103]:

'\nwe are doing indexing on the boolean series\nso we will get only those
matches in which\nvirat kohli made 50 or more runs\n'

In [104]:

1 vk[vk >= 50].count()


2 # to get the total number of matches

Out[104]:

50

In [105]:

1 vk[vk >= 50].size

Out[105]:

50

find number of ducks

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 34/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [106]:

1 vk[ vk == 0]

Out[106]:

match_no
8 0
87 0
91 0
93 0
130 0
135 0
206 0
207 0
211 0
Name: runs, dtype: int64

In [107]:

1 vk[ vk == 0].size

Out[107]:

Count number of day when i had more than 200 subs a day

In [108]:

1 subs

Out[108]:

0 48
1 57
2 40
3 43
4 44
...
360 231
361 226
362 155
363 144
364 172
Name: Subscribers gained, Length: 365, dtype: int64

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 35/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [109]:

1 subs[ subs > 200 ]

Out[109]:

165 225
166 249
167 265
168 306
169 261
170 222
225 224
226 254
227 214
228 236
229 261
230 247
231 207
232 254
233 301
234 233
240 202
246 259

In [110]:

1 subs[ subs > 200 ].size

Out[110]:

59

find actors who have done more than 20 movies

In [111]:

1 movies

Out[111]:

movie
Uri: The Surgical Strike Vicky Kaushal
Battalion 609 Vicky Ahuja
The Accidental Prime Minister (film) Anupam Kher
Why Cheat India himanshu gadhavi
Evening Shadows Mona Ambegaonkar
...
Hum Tumhare Hain Sanam Shah Rukh Khan
Aankhen (2002 film) Amitabh Bachchan
Saathiya (film) Vivek Oberoi
Company (film) Ajay Devgn
Awara Paagal Deewana Akshay Kumar
Name: lead, Length: 1500, dtype: object

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 36/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [112]:

1 movies.value_counts()

Out[112]:

Akshay Kumar 48
Amitabh Bachchan 45
Ajay Devgn 38
Salman Khan 31
Sanjay Dutt 26
..
Diganth 1
Parveen Kaur 1
Seema Azmi 1
Akanksha Puri 1
Edwin Fernandes 1
Name: lead, Length: 567, dtype: int64

In [113]:

1 num_movies = movies.value_counts()
2 # storing in a variable

In [114]:

1 num_movies > 20

Out[114]:

Akshay Kumar True


Amitabh Bachchan True
Ajay Devgn True
Salman Khan True
Sanjay Dutt True
...
Diganth False
Parveen Kaur False
Seema Azmi False
Akanksha Puri False
Edwin Fernandes False
Name: lead, Length: 567, dtype: bool

In [115]:

1 num_movies[num_movies > 20]

Out[115]:

Akshay Kumar 48
Amitabh Bachchan 45
Ajay Devgn 38
Salman Khan 31
Sanjay Dutt 26
Shah Rukh Khan 22
Name: lead, dtype: int64

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 37/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

Plotting Graphs on Series


In [116]:

1 subs.plot()

Out[116]:

<AxesSubplot:>

In [117]:

1 movies.value_counts()
2 # to get the no. of movies by each actor

Out[117]:

Akshay Kumar 48
Amitabh Bachchan 45
Ajay Devgn 38
Salman Khan 31
Sanjay Dutt 26
..
Diganth 1
Parveen Kaur 1
Seema Azmi 1
Akanksha Puri 1
Edwin Fernandes 1
Name: lead, Length: 567, dtype: int64

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 38/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [118]:

1 movies.value_counts().head(20)
2 # to get the top 20 actors

Out[118]:

Akshay Kumar 48
Amitabh Bachchan 45
Ajay Devgn 38
Salman Khan 31
Sanjay Dutt 26
Shah Rukh Khan 22
Emraan Hashmi 20
Saif Ali Khan 18
John Abraham 18
Shahid Kapoor 17
Sunny Deol 17
Jimmy Sheirgill 16
Tusshar Kapoor 16
Arjun Rampal 14
Manoj Bajpayee 14
Irrfan Khan 14
Anupam Kher 13
Hrithik Roshan 12
Kangana Ranaut 12
Ayushmann Khurrana 12
Name: lead, dtype: int64

In [119]:

1 movies.value_counts().head(20).plot(kind = 'bar')

Out[119]:

<AxesSubplot:>

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 39/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [120]:

1 movies.value_counts().head(20).plot(kind = 'barh')

Out[120]:

<AxesSubplot:>

In [121]:

1 movies.value_counts().head(20).plot(kind = 'pie')

Out[121]:

<AxesSubplot:ylabel='lead'>

Some Important Series Methods

astype

it is useful to reduce the size of the data or to reduce the foorprint of the data

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 40/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [122]:

1 vk

Out[122]:

match_no
1 1
2 23
3 13
4 12
5 1
..
211 0
212 20
213 73
214 25
215 7
Name: runs, Length: 215, dtype: int64

In [123]:

1 import sys
2 sys.getsizeof(vk)
3 # to get the size occupies by data

Out[123]:

11752

In [124]:

1 vk.astype('int16')

Out[124]:

match_no
1 1
2 23
3 13
4 12
5 1
..
211 0
212 20
213 73
214 25
215 7
Name: runs, Length: 215, dtype: int16

In [125]:

1 sys.getsizeof(vk.astype('int16'))
2 # size will be reduced

Out[125]:

10462

between
localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 41/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

it will return the boolean series in the given range

In [126]:

1 vk.between(51,99)
2 # it will return a boolean series

Out[126]:

match_no
1 False
2 False
3 False
4 False
5 False
...
211 False
212 False
213 True
214 False
215 False
Name: runs, Length: 215, dtype: bool

In [127]:

1 vk[vk.between(51,99)]

Out[127]:

match_no
34 58
41 71
44 56
45 67
52 70
57 57
68 73
71 51
73 58
74 65
80 57
81 93
82 99
85 56
97 67
99 73
103 51

In [128]:

1 vk[vk.between(51,99)].count()

Out[128]:

43

clip

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 42/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

lower : All values below this threshold will be set to it.


upper : All values above this threshold will be set to it
In [129]:

1 subs

Out[129]:

0 48
1 57
2 40
3 43
4 44
...
360 231
361 226
362 155
363 144
364 172
Name: Subscribers gained, Length: 365, dtype: int64

In [130]:

1 print(subs.clip(100,200))
2 '''values which are less than 100 will become 100,
3 values which are more than 200 will become 200,
4 values between 100 and 200 will be as it is'''

0 100
1 100
2 100
3 100
4 100
...
360 200
361 200
362 155
363 144
364 172
Name: Subscribers gained, Length: 365, dtype: int64

Out[130]:

'values which are less than 100 will become 100,\nvalues which are more th
an 200 will become 200,\nvalues between 100 and 200 will be as it is'

drop_duplicates

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 43/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [131]:

1 temp = pd.Series([1,1,2,2,3,3,4,4])
2 temp

Out[131]:

0 1
1 1
2 2
3 2
4 3
5 3
6 4
7 4
dtype: int64

In [132]:

1 print(temp.drop_duplicates())
2 '''first occurence will be there and second will be droped'''

0 1
2 2
4 3
6 4
dtype: int64

Out[132]:

'first occurence will be there and second will be droped'

In [133]:

1 print(temp.drop_duplicates(keep='last'))
2 '''here first will be deleted and second we will keep'''

1 1
3 2
5 3
7 4
dtype: int64

Out[133]:

'here first will be deleted and second we will keep'

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 44/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [134]:

1 movies.drop_duplicates()

Out[134]:

movie
Uri: The Surgical Strike Vicky Kaushal
Battalion 609 Vicky Ahuja
The Accidental Prime Minister (film) Anupam Kher
Why Cheat India himanshu gadhavi
Evening Shadows Mona Ambegaonkar
...
Sssshhh... Tanishaa Mukerji
Rules: Pyaar Ka Superhit Formula Tanuja
Right Here Right Now (film) Ankit
Talaash: The Hunt Begins... Rakhee Gulzar
The Pink Mirror Edwin Fernandes
Name: lead, Length: 567, dtype: object

to know that how many duplicate values exist in the data use the .duplicated() function

In [135]:

1 temp.duplicated()

Out[135]:

0 False
1 True
2 False
3 True
4 False
5 True
6 False
7 True
dtype: bool

In [136]:

1 temp.duplicated().sum()
2 # we will get total number of duplicate values

Out[136]:

In [137]:

1 movies.duplicated().sum()

Out[137]:

933

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 45/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

isnull

to check the number of null values available in the data

In [138]:

1 temp = pd.Series([1,2,3,np.nan,5,6,np.nan,8,np.nan,10])
2 temp

Out[138]:

0 1.0
1 2.0
2 3.0
3 NaN
4 5.0
5 6.0
6 NaN
7 8.0
8 NaN
9 10.0
dtype: float64

In [139]:

1 temp.size
2 # it will show total number of values

Out[139]:

10

In [140]:

1 temp.count()
2 # it shows only non null value

Out[140]:

In [141]:

1 temp.isnull()

Out[141]:

0 False
1 False
2 False
3 True
4 False
5 False
6 True
7 False
8 True
9 False
dtype: bool

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 46/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [142]:

1 temp.isnull().sum()
2 # there are 3 missing values

Out[142]:

dropna

to remove all the missing value in the data

In [143]:

1 temp.dropna()

Out[143]:

0 1.0
1 2.0
2 3.0
4 5.0
5 6.0
7 8.0
9 10.0
dtype: float64

fillna

In [144]:

1 temp.fillna(0)
2 # we are filling the missing value with zero

Out[144]:

0 1.0
1 2.0
2 3.0
3 0.0
4 5.0
5 6.0
6 0.0
7 8.0
8 0.0
9 10.0
dtype: float64

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 47/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [145]:

1 temp.fillna(temp.mean())
2 # here we are filling the missing value with mean of the data

Out[145]:

0 1.0
1 2.0
2 3.0
3 5.0
4 5.0
5 6.0
6 5.0
7 8.0
8 5.0
9 10.0
dtype: float64

isin

In [146]:

1 vk

Out[146]:

match_no
1 1
2 23
3 13
4 12
5 1
..
211 0
212 20
213 73
214 25
215 7
Name: runs, Length: 215, dtype: int64

suppose we want the values when virat kohli got out on 49 or 99

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 48/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [147]:

1 (vk == 49) | (vk == 99)

Out[147]:

match_no
1 False
2 False
3 False
4 False
5 False
...
211 False
212 False
213 False
214 False
215 False
Name: runs, Length: 215, dtype: bool

In [148]:

1 vk[(vk == 49) | (vk == 99)]

Out[148]:

match_no
82 99
86 49
Name: runs, dtype: int64

suppose there is a scenario where we want to know that when virat kohli got out on 49, 99 and 79
isin help us to get this cases in one line rather then writing the logic for each value

In [149]:

1 vk.isin([49,99,79])

Out[149]:

match_no
1 False
2 False
3 False
4 False
5 False
...
211 False
212 False
213 False
214 False
215 False
Name: runs, Length: 215, dtype: bool

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 49/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [150]:

1 vk[vk.isin([49,99,79])]

Out[150]:

match_no
82 99
86 49
117 79
Name: runs, dtype: int64

apply

it helps us to apply custom logics on our series

In [151]:

1 movies

Out[151]:

movie
Uri: The Surgical Strike Vicky Kaushal
Battalion 609 Vicky Ahuja
The Accidental Prime Minister (film) Anupam Kher
Why Cheat India himanshu gadhavi
Evening Shadows Mona Ambegaonkar
...
Hum Tumhare Hain Sanam Shah Rukh Khan
Aankhen (2002 film) Amitabh Bachchan
Saathiya (film) Vivek Oberoi
Company (film) Ajay Devgn
Awara Paagal Deewana Akshay Kumar
Name: lead, Length: 1500, dtype: object

suppose in the movies data we only want the first name of the actor also that name should be in capital
letter

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 50/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [152]:

1 movies.apply(lambda x:x.split())
2 # we get the list of actor names

Out[152]:

movie
Uri: The Surgical Strike [Vicky, Kaushal]
Battalion 609 [Vicky, Ahuja]
The Accidental Prime Minister (film) [Anupam, Kher]
Why Cheat India [himanshu, gadhavi]
Evening Shadows [Mona, Ambegaonkar]
...
Hum Tumhare Hain Sanam [Shah, Rukh, Khan]
Aankhen (2002 film) [Amitabh, Bachchan]
Saathiya (film) [Vivek, Oberoi]
Company (film) [Ajay, Devgn]
Awara Paagal Deewana [Akshay, Kumar]
Name: lead, Length: 1500, dtype: object

In [153]:

1 movies.apply(lambda x:x.split()[0])
2 # we will get the first item of the list of actor name

Out[153]:

movie
Uri: The Surgical Strike Vicky
Battalion 609 Vicky
The Accidental Prime Minister (film) Anupam
Why Cheat India himanshu
Evening Shadows Mona
...
Hum Tumhare Hain Sanam Shah
Aankhen (2002 film) Amitabh
Saathiya (film) Vivek
Company (film) Ajay
Awara Paagal Deewana Akshay
Name: lead, Length: 1500, dtype: object

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 51/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [154]:

1 movies.apply(lambda x:x.split()[0].upper())

Out[154]:

movie
Uri: The Surgical Strike VICKY
Battalion 609 VICKY
The Accidental Prime Minister (film) ANUPAM
Why Cheat India HIMANSHU
Evening Shadows MONA
...
Hum Tumhare Hain Sanam SHAH
Aankhen (2002 film) AMITABH
Saathiya (film) VIVEK
Company (film) AJAY
Awara Paagal Deewana AKSHAY
Name: lead, Length: 1500, dtype: object

In [155]:

1 subs

Out[155]:

0 48
1 57
2 40
3 43
4 44
...
360 231
361 226
362 155
363 144
364 172
Name: Subscribers gained, Length: 365, dtype: int64

suppose in the above subscribers data we want condition that if on a day if we are getting subscribers
more than the average value then it is good day otherwise it's a bad day

In [156]:

1 subs.mean()

Out[156]:

135.64383561643837

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 52/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [157]:

1 subs.apply(lambda x : 'good day' if x > subs.mean() else "bad day")

Out[157]:

0 bad day
1 bad day
2 bad day
3 bad day
4 bad day
...
360 good day
361 good day
362 good day
363 good day
364 good day
Name: Subscribers gained, Length: 365, dtype: object

copy

.head() and .tail() will give the view of the data however they do not copy the data so change made by
saving head or tail will make changes in the original data too

In [158]:

1 vk

Out[158]:

match_no
1 1
2 23
3 13
4 12
5 1
..
211 0
212 20
213 73
214 25
215 7
Name: runs, Length: 215, dtype: int64

In [159]:

1 vk.head(5)

Out[159]:

match_no
1 1
2 23
3 13
4 12
5 1
Name: runs, dtype: int64

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 53/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [160]:

1 new = vk.head(5)

In [161]:

1 new[1] = 10

In [162]:

1 vk
2 # our data will be changed now

Out[162]:

match_no
1 10
2 23
3 13
4 12
5 1
..
211 0
212 20
213 73
214 25
215 7
Name: runs, Length: 215, dtype: int64

In [163]:

1 new1 = vk.head(5).copy()
2 new1

Out[163]:

match_no
1 10
2 23
3 13
4 12
5 1
Name: runs, dtype: int64

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 54/55
5/29/23, 11:14 AM 1. pandas Series - Jupyter Notebook

In [164]:

1 new1[1] = 100
2 new1
3 # changes will be made in new1 variable
4 # which is copied from the original data

Out[164]:

match_no
1 100
2 23
3 13
4 12
5 1
Name: runs, dtype: int64

In [165]:

1 vk
2 # still original data will remain the same

Out[165]:

match_no
1 10
2 23
3 13
4 12
5 1
..
211 0
212 20
213 73
214 25
215 7
Name: runs, Length: 215, dtype: int64

localhost:8888/notebooks/1. Python/4. Python libraries/2. Pandas/1. Pandas Series Campus X/1. pandas Series.ipynb 55/55

You might also like