De Lab Manual New
De Lab Manual New
a. Basic ndarray
b. Array of zeros
c. Array of ones
f. Imatrix in NumPy
Here’s how you can create different types of NumPy arrays in Python using the numpy library:
import numpy as np
# a. Basic ndarray
# b. Array of zeros
# c. Array of ones
Example:
import numpy as np
Example:
Example:
Example:
reshaped = arr.reshape(3, 2)
print(reshaped)
# Output:
# [[1 2]
# [3 4]
# [5 6]]
Example:
flat = arr.flatten()
print(flat) # Output: [1 2 3 4 5 6]
Swaps the axes of the array (rows become columns and vice versa).
Use .T or .transpose().
Example:
transposed = arr.T
print(transposed)
# Output:
# [[1 4]
# [2 5]
# [3 6]]
Expanding an array typically means adding dimensions, often to make it compatible with operations
like broadcasting.
Using np.expand_dims()
python
Copy
Edit
import numpy as np
Using np.squeeze()
For 2D arrays:
arr.sort()
import numpy as np
[4, 5, 6],
[7, 8, 9]])
print(arr3d[:, 1, :]) # Output: [[3 4], [7 8]] (2nd row from each matrix)
2D example:
[4, 5, 6],
[7, 8, 9]])
print(arr2d[::-1, ::-1])
# Output:
# [[9 8 7]
# [6 5 4]
# [3 2 1]]
a. Stacking ndarrays
b. Concatenating ndarrays
a. Stacking ndarrays
Stacking refers to joining arrays along a new axis. There are a few key functions for stacking in
NumPy:
np.stack()
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(stacked)
# Output:
# [[1 2 3]
# [4 5 6]]
np.vstack()
np.vstack((a, b))
# Output:
# [[1 2 3]
# [4 5 6]]
np.hstack()
# Output:
# [1 2 3 4 5 6]
np.dstack()
np.dstack((a, b))
# Output:
# [[[1 4]
# [2 5]
# [3 6]]]
b. Concatenating ndarrays
Concatenation joins arrays along an existing axis (unlike stacking which creates a new one).
np.concatenate()
b = np.array([[5, 6]])
# Output:
# [[1 2]
# [3 4]
# [5 6]]
Rules of Broadcasting:
If arrays have different dimensions, the smaller one is padded with 1s on the left.
Dimensions must be equal, or one of them must be 1.
a = np.array([1, 2, 3])
b = 10
a = np.array([[1, 2, 3],
[4, 5, 6]])
b = np.array([[10], [20]])
a+b
# Output:
# [[11 12 13]
# [24 25 26]]
a. Creating dataframe
b. concat()
c. Setting conditions
import pandas as pd
a. Creating a DataFrame
df2 = pd.DataFrame(data2)
print(df1)
print(df2)
print(df_combined)
c. Setting Conditions
print(age_above_30)
print(df_combined)
c. groupby()
Use the .fillna() method to replace missing (NaN) values with a string.
import pandas as pd
# Sample DataFrame
data = {
}
df = pd.DataFrame(data)
df_filled = df.fillna('Unknown')
print(df_filled)
df_sorted = df.sort_values(by='Age')
print(df_sorted)
c. Using groupby()
Group the DataFrame by a column and apply aggregate functions like .sum(), .mean(), etc.
# Example DataFrame
data = {
df2 = pd.DataFrame(data)
grouped = df2.groupby('Department')['Salary'].mean()
print(grouped)
a. Text files
b. CSV files
c. Excel files
d. JSON files
a. Text Files
If the text file is delimited (e.g., tab, space), use read_csv() with a delimiter:
import pandas as pd
b. CSV Files
df_csv = pd.read_csv('file.csv')
c. Excel Files
d. JSON Files
df_json = pd.read_json('file.json')
a. Pickle files
a. Pickle Files
import pickle
data = pickle.load(file)
print(data)
b. Image Files using PIL (Pillow)
PIL (now maintained as Pillow) is used for opening, manipulating, and saving image files.
img = Image.open('image.jpg')
The glob module finds all the pathnames matching a specified pattern.
import glob
file_list = glob.glob('path/to/directory/*.txt')
# Read them
content = file.read()
print(content)
Using sqlite3 (for SQLite) or other database connectors like psycopg2 (PostgreSQL), pyodbc (SQL
Server), or mysql.connector.
import sqlite3
# Connect to database
conn = sqlite3.connect('example.db')
cursor = conn.cursor()
# Execute a query
# Fetch data
rows = cursor.fetchall()
for row in rows:
print(row)
# Close connection
conn.close()
import requests
# Target URL
url = 'http://quotes.toscrape.com/'
response = requests.get(url)
print(f'"{text}" - {author}')
Sample Output:
“The world as we have created it is a process of our thinking. It cannot be changed without changing
our thinking.” - Albert Einstein
“It is our choices, Harry, that show what we truly are, far more than our abilities.” - J.K. Rowling
11. Perform following preprocessing techniques on loan prediction dataset a. Feature Scaling
b. Feature Standardization c. Label Encoding d. One Hot Encoding
Let's go through each preprocessing technique you mentioned using Python and the pandas,
scikit-learn, and numpy libraries. These techniques are often applied to prepare data for
machine learning models.
Here’s how we can perform each technique on the "loan prediction dataset":
1. Feature Scaling: This involves scaling features so they are within a similar range.
Typically, this is done using MinMaxScaler or StandardScaler.
2. Feature Standardization: Standardization scales the features so that they have a
mean of 0 and a standard deviation of 1, using StandardScaler.
3. Label Encoding: For categorical labels (target variable), we encode them as numeric
labels using LabelEncoder.
4. One-Hot Encoding: For categorical features, we create dummy variables (binary
columns) for each unique value in the categorical feature using OneHotEncoder or
pd.get_dummies().
Code Example:
python
CopyEdit
# Import necessary libraries
import pandas as pd
from sklearn.preprocessing import MinMaxScaler, StandardScaler,
LabelEncoder
from sklearn.model_selection import train_test_split
df = pd.DataFrame(data)
# ----------------------
# a. Feature Scaling (Min-Max Scaling)
# ----------------------
scaler = MinMaxScaler()
# ----------------------
# b. Feature Standardization
# ----------------------
standard_scaler = StandardScaler()
# ----------------------
# c. Label Encoding
# ----------------------
label_encoder = LabelEncoder()
# ----------------------
# d. One Hot Encoding
# ----------------------
# ----------------------
# Final DataFrame
# ----------------------
print(df)
Explanation:
2. Feature Standardization:
o The StandardScaler() standardizes the features by transforming them to
have a mean of 0 and a standard deviation of 1.
o The formula is:
3. Label Encoding:
o The target variable Loan_Status (which is categorical: 'Y' for Yes and 'N' for
No) is encoded into 1 for 'Y' and 0 for 'N'.
4. One-Hot Encoding:
o The categorical column Credit_History is converted into binary dummy
variables: Credit_History_Good and Credit_History_Bad, dropping the
first column to avoid multicollinearity.
Output:
text
CopyEdit
LoanAmount ApplicantIncome Loan_Status Credit_History_Good
0 -0.183679 -1.414214 1 1
1 -0.490019 -1.632993 0 0
2 0.183679 0.000000 1 1
3 0.000000 0.707107 1 1
4 1.490019 1.414214 0 0
12. Perform following visualizations using matplotlib a. Bar Graph b. Pie Chart c. Box Plot d.
Histogram e. Line Chart and Subplots f. Scatter Plot
To perform the following visualizations using matplotlib, I'll walk you through how to
create each plot with code examples. You can run the code in a Python environment where
matplotlib is installed.
bash
CopyEdit
pip install matplotlib
a. Bar Graph
python
CopyEdit
import matplotlib.pyplot as plt
plt.bar(categories, values)
plt.title('Bar Graph Example')
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()
b. Pie Chart
python
CopyEdit
# Data for pie chart
labels = ['Apples', 'Bananas', 'Cherries', 'Grapes']
sizes = [40, 30, 20, 10]
plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90)
plt.title('Pie Chart Example')
plt.show()
c. Box Plot
python
CopyEdit
import numpy as np
plt.boxplot(data)
plt.title('Box Plot Example')
plt.ylabel('Value')
plt.show()
d. Histogram
python
CopyEdit
# Data for histogram
data = np.random.randn(1000) # Random normal data
A line chart is used to represent data over a continuous range. Subplots are useful to display
multiple charts together.
python
CopyEdit
# Data for line chart
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)
# Creating subplots
fig, axes = plt.subplots(2, 1, figsize=(8, 6))
f. Scatter Plot
A scatter plot is used to represent the relationship between two continuous variables.
python
CopyEdit
# Data for scatter plot
x = np.random.rand(100)
y = np.random.rand(100)
plt.scatter(x, y, color='green')
plt.title('Scatter Plot Example')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()
Summary of Plots:
To get started with the Natural Language Toolkit (NLTK), you first need to install it. You can
do this using pip, which is the Python package manager. Here's how you can install NLTK:
After the installation is complete, you can start using NLTK in your Python scripts.
Once NLTK is installed, you can import it and start using it. Here's a simple example of
importing NLTK and downloading some resources that you'll need for common NLP tasks:
import nltk
nltk.download('punkt') # Downloads the tokenizer models
nltk.download('stopwords') # Downloads a list of common stop words
14. Python program to implement with Python Sci Kit-Learn & NLTK
Sure! Below is an example of how to use Python's Sci-Kit Learn (sklearn) and NLTK
(Natural Language Toolkit) together to implement a simple text classification model.
We will use NLTK for text preprocessing and Sci-Kit Learn to train a machine learning
model (such as Naive Bayes classifier) to classify text.
Steps:
# Step 2: Preprocessing
# Split data into features (text) and labels (categories)
texts, labels = zip(*reviews)
Explanation:
Libraries used:
Output Example:
bash
CopyEdit
Accuracy: 79.25%
precision recall f1-score support
To implement a Natural Language Processing (NLP) program in Python, we can use libraries
like NLTK (Natural Language Toolkit), spaCy, or PyNLPI. These libraries provide a wide
range of functions to process text, including tokenization, named entity recognition (NER),
part-of-speech (POS) tagging, etc.
For this example, I'll walk you through a simple Python NLP program using NLTK and
spaCy.
1. Using NLTK:
NLTK is one of the most widely used Python libraries for text processing and analysis.
Below is an example Python program using NLTK to perform basic text processing tasks,
such as tokenization and POS tagging.
import nltk
# Sample text
text = "Hello! My name is John. I work at OpenAI, and I love programming in
Python."
# Sentence Tokenization
sentences = sent_tokenize(text)
print("Sentences:", sentences)
# Word Tokenization
words = word_tokenize(text)
print("Words:", words)
# Part-of-Speech Tagging
pos_tags = pos_tag(words)
print("POS Tags:", pos_tags)
2. Using spaCy:
spaCy is another powerful NLP library that is more efficient and suited for production
environments. It provides various pre-trained models for tasks like tokenization, part-of-
speech tagging, named entity recognition, and more.
bash
CopyEdit
pip install spacy
bash
CopyEdit
python -m spacy download en_core_web_sm
import spacy
# Sample text
text = "Hello! My name is John. I work at OpenAI, and I love programming in
Python."
# Tokenization
print("Tokens:")
for token in doc:
print(token.text)
# Part-of-Speech Tagging
print("\nPOS Tags:")
for token in doc:
print(f"{token.text}: {token.pos_}")
3. Using PyNLPI:
PyNLPI is a lesser-known NLP library, primarily designed for linguistic processing tasks. If
you'd like to use PyNLPI, you can install it via:
However, PyNLPI is not as popular as NLTK and spaCy, so many people prefer NLTK or
spaCy for most NLP tasks. Here's a sample program using PyNLPI for tokenization:
# Sample text
text = "Hello! My name is John. I work at OpenAI, and I love programming in
Python."
Conclusion: