22mbada303 Module 4
22mbada303 Module 4
# creating list
list_1 = [1, 2, 3, 4]
list_2 = [5, 6, 7, 8]
list_3 = [9, 10, 11, 12]
2. Shape: The number of elements along with each axis. It is from a tuple
Numpy array :
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
Rank 1
Rank 2
# Import module
import numpy as np
# add arrays
print ("Array sum:\n", a + b)
# matrix multiplication
print ("Matrix multiplication:\n", a.dot(b))
Pandas is a Python library used for working with data sets.
It has functions for analyzing, cleaning, exploring, and
manipulating data.
The name "Pandas" has a reference to both "Pane Data", and
"Python Data Analysis”
Pandas particularly well-suited for working with tabular
data, such as spreadsheets or SQL tables.
Its versatility and ease of use make it an essential tool for
data analysts, scientists with structured data in Python.
Pandas generally provide two data structures
for manipulating data, They are:
•Series
•DataFrame
A Pandas Series is a one-dimensional labeled array capable of holding
data of any type (integer, string, float, python objects, etc.).
•The axis labels are collectively called indexes.
Pandas Series is nothing but a column in an Excel sheet.
In the real world, a Pandas Series will be created by loading the
datasets from existing storage, storage can be SQL Database, CSV
file, or an Excel file.
Pandas Series can be created from lists, dictionaries, etc.
import pandas as pd
import numpy as np
# Creating empty series
ser = pd.Series()
print("Pandas Series: ", ser)
# simple array
data = np.array([‘p', ‘a', ‘n', ‘d', ‘a’, 's'])
ser = pd.Series(data)
print("Pandas Series:\n", ser)
Pandas DataFrame is a two-dimensional data structure with
labeled axes (rows and columns).
In the real world, a Pandas DataFrame will be created by
loading the datasets from existing storage, storage can be
SQL Database, CSV file, or an Excel file.
Pandas DataFrame can be created from lists, dictionaries,
and etc.
import pandas as pd
# list of strings
lst = [‘Python', ‘NumPy', ‘Pandas’, ‘Data',' Analytics']
print(df.head())
df = pd.read_csv('people.csv',header=0,
usecols=["First Name", "Sex", "Email"])
# printing dataframe
print(df.head())
In Descriptive statistics, we are describing our data with the help of
various representative methods using charts, graphs, tables, excel files,
etc. In descriptive statistics, we describe our data in some manner and
present it in a meaningful way so that it can be easily understood.
Mean
It is the sum of observations divided by the total number of observations.
import numpy as np
# Sample Data
arr = [5, 6, 11]
# Mean
mean = np.mean(arr)
print("Mean = ", mean)
Median
It is the middle value of the data set. It splits the data into two halves.
import numpy as np
# sample Data
arr = [1, 2, 3, 4]
# Median
median = np.median(arr)
print("Median = ", median)
Missing Data can occur when no information is provided for one or more items
or for a whole unit.
Missing Data is a very big problem in a real-life scenarios.
Missing Data can also refer to as NA(Not Available) values in pandas.
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}