NumPy and Pandas (1)
NumPy and Pandas (1)
Installing NumPy
sh
pip install numpy
1. Creating Arrays
import numpy as np
# Creating a 1D array
# Creating a 2D array
range_array = np.arange(10)
Output:
1D Array: [1 2 3 4 5]
2D Array:
[[1 2 3]
[4 5 6]]
Zeros Array:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. ]]
Ones Array:
[[1. 1.]
[1. 1.]]
Range Array: [0 1 2 3 4 5 6 7 8 9]
2. Array Operations
# Arithmetic operations
# Addition
array_add = array + 10
# Multiplication
array_mult = array * 2
# Element-wise operations
array_square = array ** 2
Output:
Original Array: [1 2 3 4]
Array * 2: [2 4 6 8]
# Sine function
array_sin = np.sin(array)
# Exponential function
array_exp = np.exp(array)
Output:
# Matrix multiplication
# Inverse of a matrix
matrix_inv = np.linalg.inv(matrix_a)
print("Eigenvectors:\n", eigenvectors)
Output:
Matrix Product:
[[19 22]
[43 50]]
Inverse of Matrix A:
[[-2. 1. ]
[ 1.5 -0.5]]
Eigenvectors:
[[-0.82456484 -0.41597356]
[ 0.56576746 -0.90937671]]
random_array = np.random.rand(5)
normal_array = np.random.randn(5)
Output: (Note: Output will vary each time due to random generation)
Random Integers: [4 1 6 9 7]
Pandas
Pandas is a powerful and widely-used Python library for data manipulation
and analysis. It provides data structures like DataFrames and Series, which
are designed to make data cleaning, manipulation, and analysis fast and
easy. Let's explore some of the core functionalities of Pandas.
1. Data Structures:
○ Series: One-dimensional labeled array capable of holding any
data type.
○ DataFrame: Two-dimensional labeled data structure with
columns of potentially different types, similar to a table in a
database or an Excel spreadsheet.
2. Data Cleaning and Preparation:
○ Handling missing data, filtering, and cleaning data.
○ Data transformation and normalization.
3. Data Analysis and Exploration:
○ Aggregation, grouping, merging, and joining data.
○ Descriptive statistics and data summarization.
4. Time Series Analysis:
○ Tools for working with time-indexed data, resampling, and time-
based aggregations.
5. Data Input and Output:
○ Reading from and writing to various file formats such as CSV,
Excel, SQL databases, and more.
Installing Pandas
sh
pip install pandas
import pandas as pd
# Creating a Series
data = [1, 2, 3, 4, 5]
series = pd.Series(data)
print("Series:\n", series)
# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'Country': ['USA', 'UK', 'Canada', 'Australia']}
df = pd.DataFrame(data)
print("\nDataFrame:\n", df)
Output:
Series:
0 1
1 2
2 3
3 4
4 5
dtype: int64
DataFrame:
Name Age Country
0 Alice 25 USA
1 Bob 30 UK
2 Charlie 35 Canada
3 David 40 Australia
Output:
Output:
Ages:
0 25
1 30
2 35
3 40
Name: Age, dtype: int64
Subset of DataFrame:
Name Country
0 Alice USA
1 Bob UK
2 Charlie Canada
3 David Australia
Filtered DataFrame:
Name Age Country
2 Charlie 35 Canada
3 David 40 Australia
4. Data Cleaning
# Handling missing values
df = pd.DataFrame({'A': [1, 2, None], 'B': [None, 4, 5]})
print("Original DataFrame:\n", df)
Output:
Original DataFrame:
A B
0 1.0 NaN
1 2.0 4.0
2 NaN 5.0
Output:
Grouped DataFrame:
Age
Country
Australia 40.0
Canada 35.0
UK 30.0
USA 25.0