0% found this document useful (0 votes)
1 views

NumPy and Pandas (1)

The document provides an overview of NumPy and Pandas, two essential libraries for scientific computing and data manipulation in Python. It details key features, installation instructions, and basic operations for both libraries, including array creation, mathematical functions, data structures, and data cleaning techniques. The document serves as a guide for users to effectively utilize NumPy and Pandas for various data analysis tasks.

Uploaded by

Akshat Joshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

NumPy and Pandas (1)

The document provides an overview of NumPy and Pandas, two essential libraries for scientific computing and data manipulation in Python. It details key features, installation instructions, and basic operations for both libraries, including array creation, mathematical functions, data structures, and data cleaning techniques. The document serves as a guide for users to effectively utilize NumPy and Pandas for various data analysis tasks.

Uploaded by

Akshat Joshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

NumPy and Pandas

NumPy is a fundamental package for scientific computing with Python. It


provides support for arrays, matrices, and a large collection of
mathematical functions to operate on these data structures efficiently.

Key Features of NumPy

1. N-dimensional Array Object:


○ The core of NumPy is the ndarray, a powerful n-dimensional
array object.
○ Supports various data types and operations.
2. Universal Functions (ufuncs):
○ Functions that operate element-wise on arrays.
○ Includes mathematical, logical, bitwise, and other functions.
3. Broadcasting:
○ Allows arithmetic operations on arrays of different shapes.
○ Simplifies code and improves performance.
4. Linear Algebra:
○ Provides tools for performing linear algebra operations, such as
matrix multiplication, eigenvalues, and singular value
decomposition.
5. Random Number Generation:
○ Generates random numbers for various distributions.
○ Useful for simulations and statistical computations.
6. Integration with Other Libraries:
○ Works seamlessly with other scientific computing libraries like
SciPy, Pandas, and Matplotlib.

Installing NumPy

You can install NumPy using pip:

sh
pip install numpy

Basic Operations with NumPy

1. Creating Arrays
import numpy as np

# Creating a 1D array

array_1d = np.array([1, 2, 3, 4, 5])

print("1D Array:", array_1d)

# Creating a 2D array

array_2d = np.array([[1, 2, 3], [4, 5, 6]])

print("2D Array:\n", array_2d)

# Creating arrays with zeros, ones, and a range of numbers

zeros_array = np.zeros((3, 3))

ones_array = np.ones((2, 2))

range_array = np.arange(10)

print("Zeros Array:\n", zeros_array)

print("Ones Array:\n", ones_array)

print("Range Array:", range_array)

Output:

1D Array: [1 2 3 4 5]

2D Array:

[[1 2 3]

[4 5 6]]

Zeros Array:
[[0. 0. 0.]

[0. 0. 0.]

[0. 0. ]]

Ones Array:

[[1. 1.]

[1. 1.]]

Range Array: [0 1 2 3 4 5 6 7 8 9]

2. Array Operations

# Arithmetic operations

array = np.array([1, 2, 3, 4])

print("Original Array:", array)

# Addition

array_add = array + 10

print("Array + 10:", array_add)

# Multiplication

array_mult = array * 2

print("Array * 2:", array_mult)

# Element-wise operations
array_square = array ** 2

print("Array squared:", array_square)

Output:

Original Array: [1 2 3 4]

Array + 10: [11 12 13 14]

Array * 2: [2 4 6 8]

Array squared: [ 1 4 9 16]

3. Universal Functions (ufuncs)

# Using ufuncs for element-wise operations

array = np.array([1, 2, 3, 4])

# Sine function

array_sin = np.sin(array)

print("Sine of Array:", array_sin)

# Exponential function

array_exp = np.exp(array)

print("Exponential of Array:", array_exp)

# Square root function


array_sqrt = np.sqrt(array)

print("Square Root of Array:", array_sqrt)

Output:

Sine of Array: [ 0.84147098 0.90929743 0.14112001 -0.7568025 ]

Exponential of Array: [ 2.71828183 7.3890561 20.08553692


54.59815003]

Square Root of Array: [1. 1.41421356 1.73205081 2. ]

4. Linear Algebra Operations

# Matrix multiplication

matrix_a = np.array([[1, 2], [3, 4]])

matrix_b = np.array([[5, 6], [7, 8]])

matrix_product = np.dot(matrix_a, matrix_b)

print("Matrix Product:\n", matrix_product)

# Inverse of a matrix

matrix_inv = np.linalg.inv(matrix_a)

print("Inverse of Matrix A:\n", matrix_inv)

# Eigenvalues and eigenvectors

eigenvalues, eigenvectors = np.linalg.eig(matrix_a)


print("Eigenvalues:", eigenvalues)

print("Eigenvectors:\n", eigenvectors)

Output:

Matrix Product:

[[19 22]

[43 50]]

Inverse of Matrix A:

[[-2. 1. ]

[ 1.5 -0.5]]

Eigenvalues: [-0.37228132 5.37228132]

Eigenvectors:

[[-0.82456484 -0.41597356]

[ 0.56576746 -0.90937671]]

5. Random Number Generation

# Generating random numbers

random_array = np.random.rand(5)

print("Random Array:", random_array)

# Generating random integers

random_integers = np.random.randint(1, 10, size=5)


print("Random Integers:", random_integers)

# Generating numbers from a normal distribution

normal_array = np.random.randn(5)

print("Normal Distribution Array:", normal_array)

Output: (Note: Output will vary each time due to random generation)

Random Array: [0.85953447 0.73381974 0.37786374 0.84847527


0.64217697]

Random Integers: [4 1 6 9 7]

Normal Distribution Array: [ 0.35743143 -1.32095611 -0.61792992


0.77700679

Pandas
Pandas is a powerful and widely-used Python library for data manipulation
and analysis. It provides data structures like DataFrames and Series, which
are designed to make data cleaning, manipulation, and analysis fast and
easy. Let's explore some of the core functionalities of Pandas.

Key Features of Pandas

1. Data Structures:
○ Series: One-dimensional labeled array capable of holding any
data type.
○ DataFrame: Two-dimensional labeled data structure with
columns of potentially different types, similar to a table in a
database or an Excel spreadsheet.
2. Data Cleaning and Preparation:
○ Handling missing data, filtering, and cleaning data.
○ Data transformation and normalization.
3. Data Analysis and Exploration:
○ Aggregation, grouping, merging, and joining data.
○ Descriptive statistics and data summarization.
4. Time Series Analysis:
○ Tools for working with time-indexed data, resampling, and time-
based aggregations.
5. Data Input and Output:
○ Reading from and writing to various file formats such as CSV,
Excel, SQL databases, and more.

Installing Pandas

You can install Pandas using pip:

sh
pip install pandas

Basic Operations with Pandas

1. Creating Series and DataFrames

import pandas as pd

# Creating a Series
data = [1, 2, 3, 4, 5]
series = pd.Series(data)
print("Series:\n", series)

# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'Country': ['USA', 'UK', 'Canada', 'Australia']}
df = pd.DataFrame(data)
print("\nDataFrame:\n", df)

Output:
Series:
0 1
1 2
2 3
3 4
4 5
dtype: int64

DataFrame:
Name Age Country
0 Alice 25 USA
1 Bob 30 UK
2 Charlie 35 Canada
3 David 40 Australia

2. Reading and Writing Data

# Reading from a CSV file


# Assuming 'data.csv' exists with appropriate data
df = pd.read_csv('data.csv')
print("DataFrame from CSV:\n", df)

# Writing to a CSV file


df.to_csv('output.csv', index=False)

Output:

DataFrame from CSV:


(output will depend on the contents of 'data.csv')

3. Data Selection and Filtering

# Selecting a single column


ages = df['Age']
print("Ages:\n", ages)

# Selecting multiple columns


subset = df[['Name', 'Country']]
print("Subset of DataFrame:\n", subset)

# Filtering rows based on a condition


filtered = df[df['Age'] > 30]
print("Filtered DataFrame:\n", filtered)

Output:

Ages:
0 25
1 30
2 35
3 40
Name: Age, dtype: int64

Subset of DataFrame:
Name Country
0 Alice USA
1 Bob UK
2 Charlie Canada
3 David Australia

Filtered DataFrame:
Name Age Country
2 Charlie 35 Canada
3 David 40 Australia

4. Data Cleaning
# Handling missing values
df = pd.DataFrame({'A': [1, 2, None], 'B': [None, 4, 5]})
print("Original DataFrame:\n", df)

# Filling missing values


df_filled = df.fillna(0)
print("DataFrame with filled values:\n", df_filled)

# Dropping missing values


df_dropped = df.dropna()
print("DataFrame with dropped rows:\n", df_dropped)

Output:

Original DataFrame:
A B
0 1.0 NaN
1 2.0 4.0
2 NaN 5.0

DataFrame with filled values:


A B
0 1.0 0.0
1 2.0 4.0
2 0.0 5.0

DataFrame with dropped rows:


A B
1 2.0 4.0

5. Data Aggregation and Grouping

# Grouping data by a column and calculating aggregate statistics


grouped = df.groupby('Country').agg({'Age': 'mean'})
print("Grouped DataFrame:\n", grouped)

Output:

Grouped DataFrame:
Age
Country
Australia 40.0
Canada 35.0
UK 30.0
USA 25.0

You might also like