0% found this document useful (0 votes)

8 views

Numpy Pandas

Uploaded by

Kingshuk Kundu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Numpy Pandas

Uploaded by

Kingshuk Kundu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 54

Numpy

MULTIDIMENSIONAL ARRAY
Array

 An array is a collection of items stored at contiguous memory

locations.
 The idea is to store multiple items of the same type together.
 This makes it easier to calculate the position of each element by
simply adding an offset to a base value, i.e., the memory location
of the first element of the array (generally denoted by the name
of the array).

 cars = ["Ford", "Volvo", "BMW"]

 x = cars[0]
 Cars[1
Numpy

 NumPy, which stands for Numerical Python, is a library consisting

of multidimensional array objects and a collection of routines for
processing those arrays.

 Using NumPy, mathematical and logical operations on arrays can

be performed.
Operations using NumPy

 Using NumPy, a developer can perform the following operations

−
 Mathematical and logical operations on arrays.
 Fourier transforms and routines for shape manipulation.
 Operations related to linear algebra.
 NumPy has in-built functions for linear algebra and random number
generation.
NumPy - Ndarray Object

 The most important object defined in NumPy is an N-dimensional

array type called ndarray.
 It describes the collection of items of the same type. Items in the
collection can be accessed using a zero-based index.
 Every item in an ndarray takes the same size of block in the
memory. Each element in ndarray is an object of data-type object
(called dtype).
 The basic ndarray is created using an array function in NumPy as
follows −
 numpy.array
syntax

 numpy.array(object, dtype = None)

Sr.No. Parameter &
Description
1 object
Any object exposing
the array interface
method returns an
array, or any (nested)
sequence.
2 dtype
Desired data type of
array, optional
NumPy - Data Types

Sr.No. Data Types &

Description

1 bool_
Boolean (True or False)
stored as a byte
2 int_
Default integer type
(same as C long;
normally either int64 or
int32)
float

15 float32
Single precision float:
sign bit, 8 bits
exponent, 23 bits
mantissa
16 float64
Double precision float:
sign bit, 11 bits
exponent, 52 bits
mantissa
Data Type Objects (dtype)

A data type object describes interpretation of fixed block of memory

corresponding to an array, depending on the following aspects −
 Type of data (integer, float or Python object)
 Size of data
 Byte order (little-endian or big-endian)
 In case of structured type, the names of fields, data type of each
field and part of the memory block taken by each field.
 If data type is a subarray, its shape and data type
Import numpy as np

 # using array-scalar type

import numpy as np
dt = np.dtype(np.int32)
print dt
ndarray.shape

 This array attribute returns a tuple consisting of array dimensions. It

can also be used to resize the array.

import numpy as np
B=np.array([]) c=np.array([[1,2],[3,4]]])
a = np.array([[1,2,3],[4,5,6]])
print a.shape

 The output is as follows −

 (2, 3)
NumPy also provides a reshape function to resize
an array.

import numpy as np
a = np.array([[1,2,3],[4,5,6]])
b = a.reshape(3,2)
print b
 The output is as follows −

 [[1, 2]
 [3, 4]
 [5, 6]]
ndarray.ndim

 This array attribute returns the number of array dimensions.

 # an array of evenly spaced numbers
import numpy as np
a = np.arange(24)
print a
 The output is as follows −

 [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
21 22 23]
NumPy - Array Creation Routines

 numpy.empty
 It creates an uninitialized array of specified shape and dtype. It uses the following
constructor −

numpy.empty(shape, dtype = float, order = 'C')

import numpy as np
x = np.empty([3,2], dtype = int)
print x
 The output is as follows −

 [[22649312 1701344351]
 [1818321759 1885959276]
 [16779776 156368896]]
 numpy.zeros
 Returns a new array of specified size, filled with zeros.

 numpy.zeros(shape, dtype = float, order = 'C')

 The constructor takes the following parameters.

 # array of five zeros. Default dtype is float

 import numpy as np
 x = np.zeros(5)
 print x
 The output is as follows −

 [ 0. 0. 0. 0. 0.]
Numpy.random.rand ----- from
uniform distribution (in range
[0,1))
 All the values will be generated randomly between 0 and 1
# numpy.random.randn() method --
generates samples from the normal
distribution---any number can be generated
import numpy as np
# 1D Array
array = np.random.randn(5)
print("1D Array filled with random values : \n", array);

Output----

1D Array filled with randnom values :

[-0.51733692 0.48813676 -0.88147002 1.12901958 0.68026197]
randomly constructing 2D
array
numpy.random.randn() method

import numpy as np
# 2D Array
array = np.random.randn(3, 4)
print("2D Array filled with random values : \n", array);
2D Array filled with random
values :
output

[[ 1.33262386 -0.88922967 -0.07056098 0.27340112]
 [ 1.00664965 -0.68443807 0.43801295 -0.35874714]
 [-0.19289416 -0.42746963 -1.80435223 0.02751727]]
PANDAS
Introduction to Pandas

 Library for computation with tabular data

 Mixed types of data allowed in a single table
 Columns and rows of data can be named
 Advanced data aggregation and statistical functions
Basic data structures

 TYPE  PANDAS NAME

 Vector  Series
 (1 Dimension)

 Array  DataFrame
 (2 Dimensions)
pandas.Series
 pandas.Series( data, index, dtype)
S.No Parameter &
Description
1 data
data takes various
forms like ndarray, list,
constants
2 index
Index values must be
unique and hashable,
same length as data.
Default np.arrange(n
) if no index is passed.
3 dtype
dtype is for data type.
If None, data type will
be inferred
Create a Series from ndarray

#import the pandas library and  Its output is as follows −

aliasing as pd
import pandas as pd
0 a
import numpy as np
1 b
data = np.array(['a','b','c','d'])
2 c
s = pd.Series(data)
3 d
print s  dtype: object
Pandas Series with index

#import the pandas library and  Its output is as follows −

aliasing as pd
import pandas as pd
100 a
import numpy as np
101 b
data = np.array(['a','b','c','d'])
102 c
s=
103 d
pd.Series(data,index=[100,101,102,
103]) dtype: object
print s
Accessing Data from Series with
Position
import pandas as pd  Output
s = pd.Series([1,2,3,4,5])
#retrieve the first element
print s[0]  1
Retrieve the first three elements
in the Series.
import pandas as pd  Its output is as follows −
s = pd.Series([1,2,3,4,5],index =
['a','b','c','d','e'])
a 1
b 2
#retrieve the first three element
c 3
print s[:3]
dtype: int64
Retrieve the last three elements.

import pandas as pd  Its output is as follows −

s = pd.Series([1,2,3,4,5])
3
#retrieve the last three element 4
print s[-3:] 5
dtype: int64
Python Pandas - DataFrame

A Data frame is a two-dimensional data structure, i.e., data is

aligned in a tabular fashion in rows and columns.

 Features of DataFrame
 Potentially columns are of different types
 Size – Mutable
 Labeled axes (rows and columns)
 Can Perform Arithmetic operations on rows and columns
Structure
Let us assume that we are creating a data frame
with rows and columns.
pandas.DataFrame

 A pandas DataFrame can be created using the following

constructor −
 pandas.DataFrame( data, index, columns, dtype)
Create DataFrame

A pandas DataFrame can be created using various inputs like −

 Lists
 dict
 Series
 Numpy ndarrays
 Another DataFrame
Create a DataFrame from Lists

import pandas as pd  Its output is as follows −

data = [1,2,3,4,5]
df = pd.DataFrame(data) 0
print df 0 1
1 2
2 3
3 4
4 5
import pandas as pd
Its output is as follows −
data = [['Alex',10],['Bob',12],
['Clarke',13]]
Name Age
df =
pd.DataFrame(data,columns=['Name','A0 Alex 10
ge']) 1 Bob 12
print df 2 Clarke 13
Create a DataFrame from Dict of ndarrays / Lists

import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print df

Its output is as follows −

Age Name
0 28 Tom
1 34 Jack
2 29 Steve
3 42 Ricky
Missing data

import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)
print df

Its output is as follows −

a b c
0 1 2 NaN
1 5 10 20.0
2 Note − Observe, NaN (Not a Number) is appended in missing areas.
Pandas descriptive statistics

 S.No. Function Description

 1 count() Number of non-null observations
 2 sum() Sum of values
 3 mean() Mean of Values
 4 median() Median of Values
 5 mode() Mode of values
 6 std() Standard Deviation of the Values
 7 min() Minimum Value
 8 max() Maximum Value
mean()
Returns the average value
import pandas as pd
import numpy as np
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
'Lee','David','Gasper','Betina','Andres']),
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),

'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65
])}
#Create a DataFrame
df = pd.DataFrame(d)
print df.mean()
 Its output is as follows −

Age 31.833333
Rating 3.743333
dtype: float64
std()
Returns the standard deviation of the
numerical columns.
import pandas as pd
import numpy as np

#Create a Dictionary of series

d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
'Lee','David','Gasper','Betina','Andres']),
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}

#Create a DataFrame
df = pd.DataFrame(d)
print df.std()
Its output is as follows −

Age 9.232682
Rating 0.661628
dtype: float64
Summarizing Data
The describe() function computes a summary of statistics
pertaining to the DataFrame columns.

import pandas as pd
import numpy as np
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
'Lee','David','Gasper','Betina','Andres']),
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}

#Create a DataFrame
df = pd.DataFrame(d)
print df.describe()
Its output is as follows −

Age Rating
count 12.000000 12.000000
mean 31.833333 3.743333
std 9.232682 0.661628
min 23.000000 2.560000
25% 25.000000 3.230000
50% 29.500000 3.790000
75% 35.500000 4.132500
max 51.000000 4.800000
Python Pandas - Indexing and
Selecting Data
Indexing Description
.loc() Label based

.iloc() Integer based

.loc()

Pandas provide various methods to have purely label based

indexing. When slicing, the start bound is also included. Integers are
valid labels, but they refer to the label and not the position.

.loc() has multiple access methods like −

A single scalar label

A list of labels
A slice object
A Boolean array
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(8, 4),
index = ['a','b','c','d','e','f','g','h'], columns = ['A', 'B', 'C', 'D'])
print(df)
 A B C D
 a -0.069384 -0.787414 -0.474020 0.216364
 b -1.265146 1.431168 -0.443679 0.435746
 c -0.483534 1.478549 -0.619949 0.475728
 d -0.770839 -0.272018 -0.361404 0.684284
 e 0.141069 -1.162204 0.047874 -0.054955
 f 0.056770 0.214658 -0.180290 -1.325190
 g 0.976647 0.768103 1.535049 0.682851
 h 1.249561 -2.757903 1.181472 -1.311080
By adding .loc in the code

 #select all rows for a specific column

 print df.loc[:,'A']
Its output is as follows

a -0.069384
b -1.265146
c -0.483534
d -0.770839
e 0.141069
f 0.056770
g 0.976647
h 1.249561
.iloc------index location

.iloc()
Pandas provide various methods in order to get purely integer
based indexing. Like python and numpy, these are 0-based
indexing.
The various access methods are as follows −

An Integer
A list of integers
A range of values
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C',

'D'])

# select all rows for a specific column

print df.iloc[:4]
Its output is as follows −

 A B C D
 0 0.699435 0.256239 -1.270702 -0.645195
 1 -0.685354 0.890791 -0.813012 0.631615
 2 -0.783192 -0.531378 0.025070 0.230806
 3 0.539042 -1.284314 0.826977 -0.026251
 Move to Practical in Jupyter Notebook

OFDM Matlab Code
100% (1)
OFDM Matlab Code
5 pages
Packet Tracer - Investigate The TCP/IP and OSI Models in Action
No ratings yet
Packet Tracer - Investigate The TCP/IP and OSI Models in Action
5 pages
22mbada303 Module 4
No ratings yet
22mbada303 Module 4
32 pages
DAY6 Pandas Seaborn
No ratings yet
DAY6 Pandas Seaborn
97 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
Numpy Basics Introduction To
No ratings yet
Numpy Basics Introduction To
35 pages
Introduction To Numpy: Aniruddh Kadam Reg No-12109237 Lovely Professional University
100% (1)
Introduction To Numpy: Aniruddh Kadam Reg No-12109237 Lovely Professional University
84 pages
RAW Data
No ratings yet
RAW Data
22 pages
Numpy & Pandas
No ratings yet
Numpy & Pandas
13 pages
Data Analysis and Visualization Using Python Libraries and Streamlit - RTF Pre Read Materials
No ratings yet
Data Analysis and Visualization Using Python Libraries and Streamlit - RTF Pre Read Materials
29 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
36 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
61 pages
Numpy
No ratings yet
Numpy
54 pages
Chapter 2 - NumPy and Pandas
No ratings yet
Chapter 2 - NumPy and Pandas
26 pages
Print
No ratings yet
Print
296 pages
Data Analytics Pandas
No ratings yet
Data Analytics Pandas
33 pages
Unit - 1 - Python Pandas
No ratings yet
Unit - 1 - Python Pandas
176 pages
P03 Introduction To Pandas Ans
No ratings yet
P03 Introduction To Pandas Ans
45 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
72 pages
Pandas python
No ratings yet
Pandas python
11 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
Pandas Class XII (2021-22)
No ratings yet
Pandas Class XII (2021-22)
246 pages
Ip Chapter 1
No ratings yet
Ip Chapter 1
36 pages
Python Libraries
No ratings yet
Python Libraries
79 pages
Ln. 1 - Data handling using Pandas - Series & Dataframe
No ratings yet
Ln. 1 - Data handling using Pandas - Series & Dataframe
14 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
75 pages
Numpy
No ratings yet
Numpy
64 pages
Advance Data Analysis and Visualisation - With - Python For Executives and Business Management
No ratings yet
Advance Data Analysis and Visualisation - With - Python For Executives and Business Management
76 pages
Python Data Processing
No ratings yet
Python Data Processing
36 pages
PP Unit 4 Q&A
No ratings yet
PP Unit 4 Q&A
25 pages
ML Lab8
No ratings yet
ML Lab8
28 pages
PyDays Day-2 - Final
No ratings yet
PyDays Day-2 - Final
26 pages
Python Abstract
No ratings yet
Python Abstract
7 pages
Tutorial 2
No ratings yet
Tutorial 2
9 pages
Working With Pandas Notes
No ratings yet
Working With Pandas Notes
27 pages
Pandas
No ratings yet
Pandas
82 pages
Data Analysis and Visualisation With Python
No ratings yet
Data Analysis and Visualisation With Python
75 pages
Essential Python Libraries
100% (1)
Essential Python Libraries
41 pages
Chapter 10 Eng Introducing Python Pandas
100% (3)
Chapter 10 Eng Introducing Python Pandas
28 pages
Python For DScience & D Visualisation Updated
No ratings yet
Python For DScience & D Visualisation Updated
11 pages
4 Introduction to Python Part 3(1)
No ratings yet
4 Introduction to Python Part 3(1)
62 pages
Data Visualization1
No ratings yet
Data Visualization1
52 pages
CH 2
No ratings yet
CH 2
36 pages
Data Handling Python NCERT
No ratings yet
Data Handling Python NCERT
36 pages
45B AIML Practical1.1
No ratings yet
45B AIML Practical1.1
57 pages
Numpy Python
No ratings yet
Numpy Python
36 pages
Introduction To Numpy Pandas and Matplotlib
No ratings yet
Introduction To Numpy Pandas and Matplotlib
2 pages
HKU - 7001 - 3.2 Managing Data II
No ratings yet
HKU - 7001 - 3.2 Managing Data II
67 pages
Manipulating and Analyzing Data With Pandas
No ratings yet
Manipulating and Analyzing Data With Pandas
50 pages
UNIT 3(Chapter 2) Pandas
No ratings yet
UNIT 3(Chapter 2) Pandas
43 pages
Panda Ncert 1
No ratings yet
Panda Ncert 1
36 pages
Ip 102
No ratings yet
Ip 102
36 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
Unit_III_part_2_1725700061785
No ratings yet
Unit_III_part_2_1725700061785
85 pages
UNIT 3 (1)
No ratings yet
UNIT 3 (1)
56 pages
Numpy @CodeProgrammer
No ratings yet
Numpy @CodeProgrammer
64 pages
4 Introduction to Python Part 3 (2)
No ratings yet
4 Introduction to Python Part 3 (2)
48 pages
XII_ip_Panda_I_Part_I_2023 (1) 1 1
No ratings yet
XII_ip_Panda_I_Part_I_2023 (1) 1 1
25 pages
Data Science - Unit-3-Part-2
No ratings yet
Data Science - Unit-3-Part-2
32 pages
Num Py
No ratings yet
Num Py
31 pages
Unit8_DataAnalyticsandVisualizationpdf__2023_10_17_09_16_46
No ratings yet
Unit8_DataAnalyticsandVisualizationpdf__2023_10_17_09_16_46
64 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Website Design 1 - Coding Refresher: Description
No ratings yet
Website Design 1 - Coding Refresher: Description
2 pages
Bi Practical
No ratings yet
Bi Practical
27 pages
Excel Formats & Formula
No ratings yet
Excel Formats & Formula
209 pages
S7-1200 Basic To Advance Course Content
No ratings yet
S7-1200 Basic To Advance Course Content
5 pages
Week6 ch10
No ratings yet
Week6 ch10
77 pages
Switch 4500G 05-02-00s56p12 Release Notes
100% (1)
Switch 4500G 05-02-00s56p12 Release Notes
41 pages
MFGPROeBIntro TG vEB PDF
No ratings yet
MFGPROeBIntro TG vEB PDF
470 pages
Combining Dfas PDF
No ratings yet
Combining Dfas PDF
8 pages
10 Manual Um en Psr Trisafe System 106775 en 03
No ratings yet
10 Manual Um en Psr Trisafe System 106775 en 03
151 pages
Sangfor Company Profile
No ratings yet
Sangfor Company Profile
11 pages
SV - PerformanceGuidelines - Verification Academy
No ratings yet
SV - PerformanceGuidelines - Verification Academy
19 pages
BATCH REFLOW SYSTEM (T200N) Datasheet (Manncorp)
No ratings yet
BATCH REFLOW SYSTEM (T200N) Datasheet (Manncorp)
2 pages
Cloud Computing - Unit4 - New
No ratings yet
Cloud Computing - Unit4 - New
16 pages
4th Sem Notes
No ratings yet
4th Sem Notes
18 pages
Share Market Basics in Tamil PDF Download - Google Search
17% (6)
Share Market Basics in Tamil PDF Download - Google Search
2 pages
Dr. Pooja Jain
No ratings yet
Dr. Pooja Jain
6 pages
01.more LAN Switching
No ratings yet
01.more LAN Switching
45 pages
Quiz 001 Software Engineering 1 PDF
No ratings yet
Quiz 001 Software Engineering 1 PDF
6 pages
HuC6280 - CMOS 8-bit Microprocessor Hardware Manual
No ratings yet
HuC6280 - CMOS 8-bit Microprocessor Hardware Manual
28 pages
ZXUR 9000 UMTS (V4.17.10.03) Performance Counter Reference
No ratings yet
ZXUR 9000 UMTS (V4.17.10.03) Performance Counter Reference
1,183 pages
AN0116 Unidrive Family - UniSoft and The Option Modules
No ratings yet
AN0116 Unidrive Family - UniSoft and The Option Modules
4 pages
BAPI Enhancement
No ratings yet
BAPI Enhancement
6 pages
Ansys 2023 R1 - Message Passing Interface Support For Parallel Computing
No ratings yet
Ansys 2023 R1 - Message Passing Interface Support For Parallel Computing
1 page
BTC - Autopilot - Method - MAKE - 700$-800$ - PER - WEEK PDF
No ratings yet
BTC - Autopilot - Method - MAKE - 700$-800$ - PER - WEEK PDF
4 pages
App Modernization On Azure Succinctly
No ratings yet
App Modernization On Azure Succinctly
122 pages
Smart Door Lock System
No ratings yet
Smart Door Lock System
13 pages
CV of Managernew
No ratings yet
CV of Managernew
3 pages
Exemplo de Batch Input
No ratings yet
Exemplo de Batch Input
3 pages

Numpy Pandas

Uploaded by

Numpy Pandas

Uploaded by

Numpy

 An array is a collection of items stored at contiguous memory

 cars = ["Ford", "Volvo", "BMW"]

 NumPy, which stands for Numerical Python, is a library consisting

 Using NumPy, mathematical and logical operations on arrays can

 Using NumPy, a developer can perform the following operations

 The most important object defined in NumPy is an N-dimensional

 numpy.array(object, dtype = None)

Sr.No. Data Types &

A data type object describes interpretation of fixed block of memory

 # using array-scalar type

 This array attribute returns a tuple consisting of array dimensions. It

 The output is as follows −

 This array attribute returns the number of array dimensions.

numpy.empty(shape, dtype = float, order = 'C')

 numpy.zeros(shape, dtype = float, order = 'C')

 # array of five zeros. Default dtype is float

1D Array filled with randnom values :

 Library for computation with tabular data

 TYPE  PANDAS NAME

#import the pandas library and  Its output is as follows −

#import the pandas library and  Its output is as follows −

import pandas as pd  Its output is as follows −

A Data frame is a two-dimensional data structure, i.e., data is

 A pandas DataFrame can be created using the following

A pandas DataFrame can be created using various inputs like −

import pandas as pd  Its output is as follows −

Its output is as follows −

Its output is as follows −

 S.No. Function Description

#Create a Dictionary of series

.iloc() Integer based

Pandas provide various methods to have purely label based

.loc() has multiple access methods like −

A single scalar label

 #select all rows for a specific column

df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C',

# select all rows for a specific column

You might also like