Numpy Pandas
Numpy Pandas
MULTIDIMENSIONAL ARRAY
Array
1 bool_
Boolean (True or False)
stored as a byte
2 int_
Default integer type
(same as C long;
normally either int64 or
int32)
float
15 float32
Single precision float:
sign bit, 8 bits
exponent, 23 bits
mantissa
16 float64
Double precision float:
sign bit, 11 bits
exponent, 52 bits
mantissa
Data Type Objects (dtype)
import numpy as np
B=np.array([]) c=np.array([[1,2],[3,4]]])
a = np.array([[1,2,3],[4,5,6]])
print a.shape
(2, 3)
NumPy also provides a reshape function to resize
an array.
import numpy as np
a = np.array([[1,2,3],[4,5,6]])
b = a.reshape(3,2)
print b
The output is as follows −
[[1, 2]
[3, 4]
[5, 6]]
ndarray.ndim
[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
21 22 23]
NumPy - Array Creation Routines
numpy.empty
It creates an uninitialized array of specified shape and dtype. It uses the following
constructor −
[[22649312 1701344351]
[1818321759 1885959276]
[16779776 156368896]]
numpy.zeros
Returns a new array of specified size, filled with zeros.
[ 0. 0. 0. 0. 0.]
Numpy.random.rand ----- from
uniform distribution (in range
[0,1))
All the values will be generated randomly between 0 and 1
# numpy.random.randn() method --
generates samples from the normal
distribution---any number can be generated
import numpy as np
# 1D Array
array = np.random.randn(5)
print("1D Array filled with random values : \n", array);
Output----
import numpy as np
# 2D Array
array = np.random.randn(3, 4)
print("2D Array filled with random values : \n", array);
2D Array filled with random
values :
output
[[ 1.33262386 -0.88922967 -0.07056098 0.27340112]
[ 1.00664965 -0.68443807 0.43801295 -0.35874714]
[-0.19289416 -0.42746963 -1.80435223 0.02751727]]
PANDAS
Introduction to Pandas
Vector Series
(1 Dimension)
Array DataFrame
(2 Dimensions)
pandas.Series
pandas.Series( data, index, dtype)
S.No Parameter &
Description
1 data
data takes various
forms like ndarray, list,
constants
2 index
Index values must be
unique and hashable,
same length as data.
Default np.arrange(n
) if no index is passed.
3 dtype
dtype is for data type.
If None, data type will
be inferred
Create a Series from ndarray
Features of DataFrame
Potentially columns are of different types
Size – Mutable
Labeled axes (rows and columns)
Can Perform Arithmetic operations on rows and columns
Structure
Let us assume that we are creating a data frame
with rows and columns.
pandas.DataFrame
Lists
dict
Series
Numpy ndarrays
Another DataFrame
Create a DataFrame from Lists
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print df
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)
print df
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65
])}
#Create a DataFrame
df = pd.DataFrame(d)
print df.mean()
Its output is as follows −
Age 31.833333
Rating 3.743333
dtype: float64
std()
Returns the standard deviation of the
numerical columns.
import pandas as pd
import numpy as np
#Create a DataFrame
df = pd.DataFrame(d)
print df.std()
Its output is as follows −
Age 9.232682
Rating 0.661628
dtype: float64
Summarizing Data
The describe() function computes a summary of statistics
pertaining to the DataFrame columns.
import pandas as pd
import numpy as np
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
'Lee','David','Gasper','Betina','Andres']),
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}
#Create a DataFrame
df = pd.DataFrame(d)
print df.describe()
Its output is as follows −
Age Rating
count 12.000000 12.000000
mean 31.833333 3.743333
std 9.232682 0.661628
min 23.000000 2.560000
25% 25.000000 3.230000
50% 29.500000 3.790000
75% 35.500000 4.132500
max 51.000000 4.800000
Python Pandas - Indexing and
Selecting Data
Indexing Description
.loc() Label based
df = pd.DataFrame(np.random.randn(8, 4),
index = ['a','b','c','d','e','f','g','h'], columns = ['A', 'B', 'C', 'D'])
print(df)
A B C D
a -0.069384 -0.787414 -0.474020 0.216364
b -1.265146 1.431168 -0.443679 0.435746
c -0.483534 1.478549 -0.619949 0.475728
d -0.770839 -0.272018 -0.361404 0.684284
e 0.141069 -1.162204 0.047874 -0.054955
f 0.056770 0.214658 -0.180290 -1.325190
g 0.976647 0.768103 1.535049 0.682851
h 1.249561 -2.757903 1.181472 -1.311080
By adding .loc in the code
a -0.069384
b -1.265146
c -0.483534
d -0.770839
e 0.141069
f 0.056770
g 0.976647
h 1.249561
.iloc------index location
.iloc()
Pandas provide various methods in order to get purely integer
based indexing. Like python and numpy, these are 0-based
indexing.
The various access methods are as follows −
An Integer
A list of integers
A range of values
import pandas as pd
import numpy as np
A B C D
0 0.699435 0.256239 -1.270702 -0.645195
1 -0.685354 0.890791 -0.813012 0.631615
2 -0.783192 -0.531378 0.025070 0.230806
3 0.539042 -1.284314 0.826977 -0.026251
Move to Practical in Jupyter Notebook