0% found this document useful (0 votes)

15 views24 pages

De Lab Manual New

The document provides a comprehensive guide on creating and manipulating NumPy arrays and performing operations using pandas. It covers topics such as creating different types of arrays, reshaping, indexing, and performing mathematical operations, as well as data manipulation techniques in pandas like concatenation, filtering, and reading various file formats. Additionally, it includes examples of web scraping using Python with libraries like requests and BeautifulSoup.

Uploaded by

prasanna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views24 pages

De Lab Manual New

Uploaded by

prasanna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 24

1.

Creating a NumPy Array

a. Basic ndarray

b. Array of zeros

c. Array of ones

d. Random numbers in ndarray

e. An array of your choice

f. Imatrix in NumPy

g. Evenly spaced ndarray

Here’s how you can create different types of NumPy arrays in Python using the numpy library:

import numpy as np

# a. Basic ndarray

basic_array = np.array([1, 2, 3, 4, 5])

print("a. Basic ndarray:\n", basic_array)

# b. Array of zeros

zeros_array = np.zeros((3, 4)) # 3 rows, 4 columns

print("\nb. Array of zeros:\n", zeros_array)

# c. Array of ones

ones_array = np.ones((2, 5)) # 2 rows, 5 columns

print("\nc. Array of ones:\n", ones_array)

# d. Random numbers in ndarray

random_array = np.random.rand(3, 3) # 3x3 array with random floats between 0 and 1

print("\nd. Random numbers in ndarray:\n", random_array)

# e. An array of your choice

custom_array = np.array([[10, 20], [30, 40]])

print("\ne. An array of your choice:\n", custom_array)

# f. Identity matrix in NumPy (often called "Imatrix")

identity_matrix = np.eye(4) # 4x4 identity matrix

print("\nf. Identity matrix (Imatrix) in NumPy:\n", identity_matrix)

# g. Evenly spaced ndarray

evenly_spaced_array = np.linspace(0, 10, 6) # 6 values evenly spaced from 0 to 10

print("\ng. Evenly spaced ndarray:\n", evenly_spaced_array)

2. The Shape and Reshaping of NumPy Array

a. Dimensions of NumPy array

b. Shape of NumPy array

c. Size of NumPy array

d. Reshaping a NumPy array

e. Flattening a NumPy array

f. Transpose of a NumPy array

2. The Shape and Reshaping of NumPy Array

a. Dimensions of NumPy Array

The number of axes (also called rank) of an array.

Use .ndim to get the number of dimensions.

Example:

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr.ndim) # Output: 2 (2D array)

b. Shape of NumPy Array

A tuple showing the size of the array along each dimension.

Use .shape to get or set the shape.

Example:

print(arr.shape) # Output: (2, 3)

c. Size of NumPy Array

The total number of elements in the array.

Use .size to get it.

Example:

print(arr.size) # Output: 6 (2*3)

d. Reshaping a NumPy Array

Changing the shape without changing the data.

Use .reshape(new_shape) method.

Example:

reshaped = arr.reshape(3, 2)

print(reshaped)

# Output:

# [[1 2]

# [3 4]

# [5 6]]

e. Flattening a NumPy Array

Converts a multi-dimensional array into a 1D array.

Use .flatten() or .ravel().

Example:

flat = arr.flatten()

print(flat) # Output: [1 2 3 4 5 6]

f. Transpose of a NumPy Array

Swaps the axes of the array (rows become columns and vice versa).

Use .T or .transpose().

Example:

transposed = arr.T

print(transposed)

# Output:

# [[1 4]
# [2 5]

# [3 6]]

3. Expanding and Squeezing a NumPy Array

a. Expanding a NumPy array

b. Squeezing a NumPy array

c. Sorting in NumPy Arrays

3a. Expanding a NumPy Array

Expanding an array typically means adding dimensions, often to make it compatible with operations
like broadcasting.

Using np.expand_dims()

python

Copy

Edit

import numpy as np

arr = np.array([1, 2, 3]) # Shape: (3,)

expanded = np.expand_dims(arr, axis=0) # Shape: (1, 3)

Using None or np.newaxis

expanded = arr[np.newaxis, :] # Shape: (1, 3)

expanded = arr[:, np.newaxis] # Shape: (3, 1)

3b. Squeezing a NumPy Array

Squeezing means removing dimensions of size 1.

Using np.squeeze()

arr = np.array([[[1], [2], [3]]]) # Shape: (1, 3, 1)

squeezed = np.squeeze(arr) # Shape: (3,)

You can also specify an axis:

squeezed = np.squeeze(arr, axis=0) # Only squeeze axis 0

3c. Sorting in NumPy Arrays

You can sort an array either in-place or return a sorted copy.

Using np.sort() (returns a sorted copy)

arr = np.array([3, 1, 2])

sorted_arr = np.sort(arr) # [1, 2, 3]

For 2D arrays:

arr2d = np.array([[3, 1], [2, 4]])

np.sort(arr2d, axis=0) # Sorts each column

np.sort(arr2d, axis=1) # Sorts each row

Using .sort() method (in-place)

arr.sort()

Getting sorted indices with np.argsort()

arr = np.array([3, 1, 2])

indices = np.argsort(arr) # [1, 2, 0]

4. Indexing and Slicing of NumPy Array

a. Slicing 1-D NumPy arrays

b. Slicing 2-D NumPy arrays

c. Slicing 3-D NumPy arrays

d. Negative slicing of NumPy arrays

4. Indexing and Slicing of NumPy Array

a. Slicing 1-D NumPy Arrays

A 1D NumPy array is similar to a list.

import numpy as np

arr = np.array([10, 20, 30, 40, 50])

print(arr[1:4]) # Output: [20 30 40]

print(arr[:3]) # Output: [10 20 30]

print(arr[2:]) # Output: [30 40 50]

b. Slicing 2-D NumPy Arrays

2D arrays require slicing both rows and columns: array[row_start:row_end, col_start:col_end]

arr2d = np.array([[1, 2, 3],

[4, 5, 6],

[7, 8, 9]])

print(arr2d[0:2, 1:3]) # Output: [[2 3], [5 6]]

print(arr2d[:, 0]) # Output: [1 4 7] (all rows, first column)

print(arr2d[1, :]) # Output: [4 5 6] (second row, all columns)

c. Slicing 3-D NumPy Arrays

3D arrays are sliced using three indices: array[depth, row, column]

arr3d = np.array([[[1, 2], [3, 4]],

[[5, 6], [7, 8]]])

print(arr3d[0, :, :]) # Output: [[1 2], [3 4]] (1st matrix)

print(arr3d[:, 1, :]) # Output: [[3 4], [7 8]] (2nd row from each matrix)

d. Negative Slicing of NumPy Arrays

Negative slicing allows reverse indexing.

arr = np.array([10, 20, 30, 40, 50])

print(arr[-3:]) # Output: [30 40 50]

print(arr[::-1]) # Output: [50 40 30 20 10] (reverse array)

2D example:

arr2d = np.array([[1, 2, 3],

[4, 5, 6],

[7, 8, 9]])

print(arr2d[::-1, ::-1])

# Output:

# [[9 8 7]

# [6 5 4]
# [3 2 1]]

5. Stacking and Concatenating Numpy Arrays

a. Stacking ndarrays

b. Concatenating ndarrays

c. Broadcasting in Numpy Arrays

5. Stacking and Concatenating Numpy Arrays

a. Stacking ndarrays

Stacking refers to joining arrays along a new axis. There are a few key functions for stacking in
NumPy:

np.stack()

Combines arrays along a new axis.

import numpy as np

a = np.array([1, 2, 3])

b = np.array([4, 5, 6])

stacked = np.stack((a, b)) # default is axis=0

print(stacked)

# Output:

# [[1 2 3]

# [4 5 6]]

np.vstack()

Stacks arrays vertically (row-wise).

np.vstack((a, b))

# Output:

# [[1 2 3]

# [4 5 6]]

np.hstack()

Stacks arrays horizontally (column-wise).

np.hstack((a, b))

# Output:

# [1 2 3 4 5 6]

np.dstack()

Stacks arrays depth-wise (3rd dimension).

np.dstack((a, b))

# Output:

# [[[1 4]

# [2 5]

# [3 6]]]

b. Concatenating ndarrays

Concatenation joins arrays along an existing axis (unlike stacking which creates a new one).

np.concatenate()

a = np.array([[1, 2], [3, 4]])

b = np.array([[5, 6]])

# Concatenate along axis 0 (row-wise)

np.concatenate((a, b), axis=0)

# Output:

# [[1 2]

# [3 4]

# [5 6]]

You can also concatenate along other axes if dimensions match.

c. Broadcasting in NumPy Arrays

Broadcasting allows NumPy to perform operations on arrays of different shapes in a memory-

efficient way.

Rules of Broadcasting:

If arrays have different dimensions, the smaller one is padded with 1s on the left.
Dimensions must be equal, or one of them must be 1.

Example 1: Adding a scalar to an array

a = np.array([1, 2, 3])

b = 10

a + b # b is "broadcast" to [10, 10, 10]

# Output: [11 12 13]

Example 2: Adding a column vector to a matrix

a = np.array([[1, 2, 3],

[4, 5, 6]])

b = np.array([[10], [20]])

# b is broadcast to match a's shape

a+b

# Output:

# [[11 12 13]

# [24 25 26]]

6. Perform following operations using pandas

a. Creating dataframe

b. concat()

c. Setting conditions

d. Adding a new column

Step 1: Import Pandas

import pandas as pd

a. Creating a DataFrame

Let's create two sample DataFrames:

data1 = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}

data2 = {'Name': ['Charlie', 'David'], 'Age': [35, 40]}

df1 = pd.DataFrame(data1)

df2 = pd.DataFrame(data2)

print(df1)

print(df2)

b. Using concat() to Combine DataFrames

Concatenate df1 and df2 vertically:

df_combined = pd.concat([df1, df2], ignore_index=True)

print(df_combined)

c. Setting Conditions

Filter rows where age is greater than 30:

age_above_30 = df_combined[df_combined['Age'] > 30]

print(age_above_30)

d. Adding a New Column

Add a new column called "Is_Adult" based on age:

df_combined['Is_Adult'] = df_combined['Age'] >= 18

print(df_combined)

7. Perform following operations using pandasa. Filling NaN with string

b. Sorting based on column values

c. groupby()

a. Filling NaN with a String

Use the .fillna() method to replace missing (NaN) values with a string.

import pandas as pd

# Sample DataFrame

data = {

'Name': ['Alice', 'Bob', None, 'David'],

'Age': [25, None, 30, 22]

}
df = pd.DataFrame(data)

# Fill NaN with a string

df_filled = df.fillna('Unknown')

print(df_filled)

b. Sorting Based on Column Values

Use the .sort_values() method to sort by a specific column.

# Sort DataFrame by 'Age' column

df_sorted = df.sort_values(by='Age')

print(df_sorted)

c. Using groupby()

Group the DataFrame by a column and apply aggregate functions like .sum(), .mean(), etc.

# Example DataFrame

data = {

'Department': ['HR', 'IT', 'HR', 'IT', 'Finance'],

'Salary': [40000, 60000, 45000, 62000, 50000]

df2 = pd.DataFrame(data)

# Group by 'Department' and get average salary

grouped = df2.groupby('Department')['Salary'].mean()

print(grouped)

8. Read the following file formats using pandas

a. Text files

b. CSV files

c. Excel files

d. JSON files

a. Text Files

If the text file is delimited (e.g., tab, space), use read_csv() with a delimiter:
import pandas as pd

# Example: space-delimited text file

df_text = pd.read_csv('file.txt', delimiter=' ')

b. CSV Files

CSV (Comma-Separated Values) files are read with:

df_csv = pd.read_csv('file.csv')

c. Excel Files

You can read Excel files using:

df_excel = pd.read_excel('file.xlsx') # Default is the first sheet

# You can specify a sheet name:

# df_excel = pd.read_excel('file.xlsx', sheet_name='Sheet1')

d. JSON Files

JSON files are read with:

df_json = pd.read_json('file.json')

9. Read the following file formats

a. Pickle files

b. Image files using PIL

c. Multiple files using Glob

d. Importing data from database

a. Pickle Files

Python's pickle module is used to serialize and deserialize Python objects.

import pickle

# Reading a pickle file

with open('data.pkl', 'rb') as file:

data = pickle.load(file)

print(data)
b. Image Files using PIL (Pillow)

PIL (now maintained as Pillow) is used for opening, manipulating, and saving image files.

from PIL import Image

# Open an image file

img = Image.open('image.jpg')

img.show() # To display the image

c. Multiple Files using Glob

The glob module finds all the pathnames matching a specified pattern.

import glob

# Get all .txt files in a directory

file_list = glob.glob('path/to/directory/*.txt')

# Read them

for filename in file_list:

with open(filename, 'r') as file:

content = file.read()

print(content)

d. Importing Data from a Database

Using sqlite3 (for SQLite) or other database connectors like psycopg2 (PostgreSQL), pyodbc (SQL
Server), or mysql.connector.

Example with SQLite:

import sqlite3

# Connect to database

conn = sqlite3.connect('example.db')

cursor = conn.cursor()

# Execute a query

cursor.execute('SELECT * FROM users')

# Fetch data

rows = cursor.fetchall()
for row in rows:

print(row)

# Close connection

conn.close()

10. Demonstrate web scraping using python

Step-by-step Web Scraping Example in Python

1. Install Required Libraries

pip install requests beautifulsoup4

2. Python Script: Scrape Quotes

import requests

from bs4 import BeautifulSoup

# Target URL

url = 'http://quotes.toscrape.com/'

# Send GET request

response = requests.get(url)

# Parse HTML content using BeautifulSoup

soup = BeautifulSoup(response.text, 'html.parser')

# Find all quote containers

quotes = soup.find_all('div', class_='quote')

# Loop through and extract quote text and author

for quote in quotes:

text = quote.find('span', class_='text').get_text()

author = quote.find('small', class_='author').get_text()

print(f'"{text}" - {author}')

Sample Output:

“The world as we have created it is a process of our thinking. It cannot be changed without changing
our thinking.” - Albert Einstein

“It is our choices, Harry, that show what we truly are, far more than our abilities.” - J.K. Rowling
11. Perform following preprocessing techniques on loan prediction dataset a. Feature Scaling
b. Feature Standardization c. Label Encoding d. One Hot Encoding

Let's go through each preprocessing technique you mentioned using Python and the pandas,
scikit-learn, and numpy libraries. These techniques are often applied to prepare data for
machine learning models.

Here’s how we can perform each technique on the "loan prediction dataset":

1. Feature Scaling: This involves scaling features so they are within a similar range.
Typically, this is done using MinMaxScaler or StandardScaler.
2. Feature Standardization: Standardization scales the features so that they have a
mean of 0 and a standard deviation of 1, using StandardScaler.
3. Label Encoding: For categorical labels (target variable), we encode them as numeric
labels using LabelEncoder.
4. One-Hot Encoding: For categorical features, we create dummy variables (binary
columns) for each unique value in the categorical feature using OneHotEncoder or
pd.get_dummies().

Code Example:
python
CopyEdit
# Import necessary libraries
import pandas as pd
from sklearn.preprocessing import MinMaxScaler, StandardScaler,
LabelEncoder
from sklearn.model_selection import train_test_split

# Load the dataset (assuming you have it in a .csv file)

# df = pd.read_csv('loan_prediction.csv')

# For demonstration, let's assume we have a small dataset like below:

data = {
'LoanAmount': [200, 150, 300, 250, 500],
'ApplicantIncome': [5000, 4000, 6000, 7000, 8000],
'Credit_History': ['Good', 'Bad', 'Good', 'Good', 'Bad'],
'Loan_Status': ['Y', 'N', 'Y', 'Y', 'N']
}

df = pd.DataFrame(data)

# ----------------------
# a. Feature Scaling (Min-Max Scaling)
# ----------------------

scaler = MinMaxScaler()

# Scale features - we will scale numerical columns

df[['LoanAmount', 'ApplicantIncome']] =
scaler.fit_transform(df[['LoanAmount', 'ApplicantIncome']])

# ----------------------
# b. Feature Standardization
# ----------------------
standard_scaler = StandardScaler()

# Standardize features - we standardize the same numerical columns

df[['LoanAmount', 'ApplicantIncome']] =
standard_scaler.fit_transform(df[['LoanAmount', 'ApplicantIncome']])

# ----------------------
# c. Label Encoding
# ----------------------

label_encoder = LabelEncoder()

# Apply Label Encoding on the target variable (Loan_Status)

df['Loan_Status'] = label_encoder.fit_transform(df['Loan_Status'])

# ----------------------
# d. One Hot Encoding
# ----------------------

# Apply One Hot Encoding on the 'Credit_History' categorical feature

df = pd.get_dummies(df, columns=['Credit_History'], drop_first=True)

# ----------------------
# Final DataFrame
# ----------------------

print(df)

Explanation:

1. Feature Scaling (Min-Max Scaling):

o The MinMaxScaler() scales the values of LoanAmount and ApplicantIncome
to a range of 0 to 1.
o The formula is:

Xscaled=X−min(X)max(X)−min(X)X_{\text{scaled}} = \frac{X - \text{min}

(X)}{\text{max}(X) - \text{min}(X)}Xscaled=max(X)−min(X)X−min(X)

2. Feature Standardization:
o The StandardScaler() standardizes the features by transforming them to
have a mean of 0 and a standard deviation of 1.
o The formula is:

Xstandardized=X−μσX_{\text{standardized}} = \frac{X - \mu}{\

sigma}Xstandardized=σX−μ

where μ is the mean and σ is the standard deviation of the feature.

3. Label Encoding:
o The target variable Loan_Status (which is categorical: 'Y' for Yes and 'N' for
No) is encoded into 1 for 'Y' and 0 for 'N'.
4. One-Hot Encoding:
o The categorical column Credit_History is converted into binary dummy
variables: Credit_History_Good and Credit_History_Bad, dropping the
first column to avoid multicollinearity.

Output:
text
CopyEdit
LoanAmount ApplicantIncome Loan_Status Credit_History_Good
0 -0.183679 -1.414214 1 1
1 -0.490019 -1.632993 0 0
2 0.183679 0.000000 1 1
3 0.000000 0.707107 1 1
4 1.490019 1.414214 0 0

12. Perform following visualizations using matplotlib a. Bar Graph b. Pie Chart c. Box Plot d.
Histogram e. Line Chart and Subplots f. Scatter Plot

To perform the following visualizations using matplotlib, I'll walk you through how to
create each plot with code examples. You can run the code in a Python environment where
matplotlib is installed.

bash
CopyEdit
pip install matplotlib

Now, let's proceed with the visualizations:

a. Bar Graph

A bar graph is used to represent categorical data with rectangular bars.

python
CopyEdit
import matplotlib.pyplot as plt

# Data for bar graph

categories = ['A', 'B', 'C', 'D']
values = [10, 20, 15, 30]

plt.bar(categories, values)
plt.title('Bar Graph Example')
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()

b. Pie Chart

A pie chart is used to represent proportions of a whole.

python
CopyEdit
# Data for pie chart
labels = ['Apples', 'Bananas', 'Cherries', 'Grapes']
sizes = [40, 30, 20, 10]
plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90)
plt.title('Pie Chart Example')
plt.show()

c. Box Plot

A box plot is used to visualize the distribution and outliers in a dataset.

python
CopyEdit
import numpy as np

# Data for box plot

data = np.random.rand(10, 5) * 100 # Random data for illustration

plt.boxplot(data)
plt.title('Box Plot Example')
plt.ylabel('Value')
plt.show()

d. Histogram

A histogram is used to represent the frequency distribution of a dataset.

python
CopyEdit
# Data for histogram
data = np.random.randn(1000) # Random normal data

plt.hist(data, bins=30, edgecolor='black')

plt.title('Histogram Example')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

e. Line Chart and Subplots

A line chart is used to represent data over a continuous range. Subplots are useful to display
multiple charts together.

python
CopyEdit
# Data for line chart
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)

# Creating subplots
fig, axes = plt.subplots(2, 1, figsize=(8, 6))

# First subplot (sine wave)

axes[0].plot(x, y1, label='sin(x)', color='blue')
axes[0].set_title('Sine Wave')
axes[0].set_xlabel('x')
axes[0].set_ylabel('sin(x)')
axes[0].legend()
# Second subplot (cosine wave)
axes[1].plot(x, y2, label='cos(x)', color='red')
axes[1].set_title('Cosine Wave')
axes[1].set_xlabel('x')
axes[1].set_ylabel('cos(x)')
axes[1].legend()

plt.tight_layout() # To prevent overlap

plt.show()

f. Scatter Plot

A scatter plot is used to represent the relationship between two continuous variables.

python
CopyEdit
# Data for scatter plot
x = np.random.rand(100)
y = np.random.rand(100)

plt.scatter(x, y, color='green')
plt.title('Scatter Plot Example')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()

Summary of Plots:

1. Bar Graph: Categorical data comparison.

2. Pie Chart: Proportions of a whole.
3. Box Plot: Distribution of data with quartiles.
4. Histogram: Frequency distribution of a dataset.
5. Line Chart and Subplots: Continuous data visualization with multiple subplots.
6. Scatter Plot: Relationship between two continuous variables.

13. Getting started with NLTK, install NLTK using PIP

To get started with the Natural Language Toolkit (NLTK), you first need to install it. You can
do this using pip, which is the Python package manager. Here's how you can install NLTK:

1. Open your command-line interface (Terminal, Command Prompt, or PowerShell).

2. Run the following command to install NLTK:

pip install nltk

After the installation is complete, you can start using NLTK in your Python scripts.

Example: Importing NLTK and Downloading Resources

Once NLTK is installed, you can import it and start using it. Here's a simple example of
importing NLTK and downloading some resources that you'll need for common NLP tasks:
import nltk
nltk.download('punkt') # Downloads the tokenizer models
nltk.download('stopwords') # Downloads a list of common stop words

14. Python program to implement with Python Sci Kit-Learn & NLTK

Sure! Below is an example of how to use Python's Sci-Kit Learn (sklearn) and NLTK
(Natural Language Toolkit) together to implement a simple text classification model.

We will use NLTK for text preprocessing and Sci-Kit Learn to train a machine learning
model (such as Naive Bayes classifier) to classify text.

In this example, we’ll classify movie reviews as positive or negative.

Steps:

1. Load dataset (movie reviews dataset in this case).

2. Preprocess text using NLTK.
3. Train a Naive Bayes classifier using Sci-Kit Learn.
4. Evaluate the model.

Full Python Code:

python
CopyEdit
import nltk
from nltk.corpus import movie_reviews
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn import metrics

# Download necessary NLTK data

nltk.download('movie_reviews')
nltk.download('stopwords')

# Step 1: Prepare the dataset (movie reviews)

# Load movie reviews from NLTK
reviews = [(movie_reviews.raw(fileid), category) for fileid in
movie_reviews.fileids() for category in movie_reviews.categories(fileid)]

# Step 2: Preprocessing
# Split data into features (text) and labels (categories)
texts, labels = zip(*reviews)

# Step 3: Text Vectorization

# We will use CountVectorizer to convert text data to a bag-of-words model
vectorizer =
CountVectorizer(stop_words=nltk.corpus.stopwords.words('english'))
X = vectorizer.fit_transform(texts)

# Step 4: Train-test split

X_train, X_test, y_train, y_test = train_test_split(X, labels,
test_size=0.25, random_state=42)

# Step 5: Train Naive Bayes classifier

classifier = MultinomialNB()
classifier.fit(X_train, y_train)

# Step 6: Make Predictions

y_pred = classifier.predict(X_test)

# Step 7: Evaluate the model

accuracy = metrics.accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy*100:.2f}%")

# Step 8: Show Classification Report

print(metrics.classification_report(y_test, y_pred))

# Example: Predict new review

new_review = ["This movie was amazing, I loved it!"]
new_review_vectorized = vectorizer.transform(new_review)
prediction = classifier.predict(new_review_vectorized)
print(f"Prediction for new review: {prediction[0]}")

Explanation:

1. Dataset (Movie Reviews):

o NLTK provides a collection of movie reviews categorized as 'pos' (positive)
and 'neg' (negative). These reviews are used as the dataset.
2. Preprocessing:
o The text data is tokenized and stopwords (common words like 'the', 'and', etc.)
are removed using NLTK’s stopwords.
3. Text Vectorization:
o We use CountVectorizer from sklearn to convert the raw text into a
numerical feature set (bag of words model).
4. Train-Test Split:
o We split the dataset into a training set and a testing set (75% for training, 25%
for testing).
5. Training the Model:
o We use MultinomialNB (Naive Bayes classifier) to train the model on the text
data.
6. Prediction and Evaluation:
o We make predictions using the trained model and evaluate its performance
using metrics such as accuracy and classification report (precision, recall, and
F1-score).
7. Prediction for New Reviews:
o Finally, we show an example of how to predict the sentiment of a new movie
review.

Libraries used:

 NLTK (Natural Language Toolkit): For text preprocessing, tokenization, and

stopwords.
 Sci-Kit Learn: For machine learning, vectorization, and model evaluation.

Output Example:
bash
CopyEdit
Accuracy: 79.25%
precision recall f1-score support

neg 0.79 0.79 0.79 247

pos 0.79 0.79 0.79 253

accuracy 0.79 500

macro avg 0.79 0.79 0.79 500
weighted avg 0.79 0.79 0.79 500

Prediction for new review: pos

15. Python program to implement with Python NLTK/Spicy/Py NLPI.

To implement a Natural Language Processing (NLP) program in Python, we can use libraries
like NLTK (Natural Language Toolkit), spaCy, or PyNLPI. These libraries provide a wide
range of functions to process text, including tokenization, named entity recognition (NER),
part-of-speech (POS) tagging, etc.

For this example, I'll walk you through a simple Python NLP program using NLTK and
spaCy.

1. Using NLTK:

NLTK is one of the most widely used Python libraries for text processing and analysis.
Below is an example Python program using NLTK to perform basic text processing tasks,
such as tokenization and POS tagging.

First, install NLTK:

pip install nltk

Then, here's a simple program that uses NLTK to process text:

import nltk

from nltk.tokenize import word_tokenize, sent_tokenize

from nltk import pos_tag

# Download necessary NLTK resources (only need to run once)

nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

# Sample text
text = "Hello! My name is John. I work at OpenAI, and I love programming in
Python."

# Sentence Tokenization
sentences = sent_tokenize(text)
print("Sentences:", sentences)

# Word Tokenization
words = word_tokenize(text)
print("Words:", words)
# Part-of-Speech Tagging
pos_tags = pos_tag(words)
print("POS Tags:", pos_tags)

Explanation of the Code:

 sent_tokenize(): Splits the text into sentences.

 word_tokenize(): Splits the text into words.
 pos_tag(): Assigns part-of-speech tags to each word (e.g., noun, verb).

2. Using spaCy:

spaCy is another powerful NLP library that is more efficient and suited for production
environments. It provides various pre-trained models for tasks like tokenization, part-of-
speech tagging, named entity recognition, and more.

First, install spaCy:

bash
CopyEdit
pip install spacy

Next, download a pre-trained model for English:

bash
CopyEdit
python -m spacy download en_core_web_sm

Now, here's a simple program using spaCy for NLP tasks:

import spacy

# Load the spaCy model

nlp = spacy.load('en_core_web_sm')

# Sample text
text = "Hello! My name is John. I work at OpenAI, and I love programming in
Python."

# Process the text

doc = nlp(text)

# Tokenization
print("Tokens:")
for token in doc:
print(token.text)

# Part-of-Speech Tagging
print("\nPOS Tags:")
for token in doc:
print(f"{token.text}: {token.pos_}")

# Named Entity Recognition

print("\nNamed Entities:")
for ent in doc.ents:
print(f"{ent.text}: {ent.label_}")
Explanation of the Code:

 The text is processed using nlp(text).

 Tokenization is done automatically when we process the text using spaCy.
 Part-of-speech tagging is done via token.pos_.
 Named entity recognition is done via doc.ents which detects entities like people,
organizations, dates, etc.

3. Using PyNLPI:

PyNLPI is a lesser-known NLP library, primarily designed for linguistic processing tasks. If
you'd like to use PyNLPI, you can install it via:

pip install pynlpi

However, PyNLPI is not as popular as NLTK and spaCy, so many people prefer NLTK or
spaCy for most NLP tasks. Here's a sample program using PyNLPI for tokenization:

from pynlpi.tokenize import word_tokenize

# Sample text
text = "Hello! My name is John. I work at OpenAI, and I love programming in
Python."

# Tokenize the text

tokens = word_tokenize(text)
print("Tokens:", tokens)

Conclusion:

 NLTK is great for educational purposes and prototyping.

 spaCy is faster and more suitable for production-grade NLP applications.
 PyNLPI can be useful for certain linguistic processing tasks but is not as widely used
as NLTK or spaCy.

M3-Introduction To Numpy and Pandas
No ratings yet
M3-Introduction To Numpy and Pandas
55 pages
Unit 1
No ratings yet
Unit 1
170 pages
Kuliah #7 Alprog - Numpy, Pandas, Matplotlib
No ratings yet
Kuliah #7 Alprog - Numpy, Pandas, Matplotlib
48 pages
Numpy
No ratings yet
Numpy
32 pages
Data Science Using Python Lab 2024-2025
No ratings yet
Data Science Using Python Lab 2024-2025
55 pages
Unit 3 - Numpy - VP
No ratings yet
Unit 3 - Numpy - VP
53 pages
Swarang Raut EDVA Experiment 1 Numpy Pandas
No ratings yet
Swarang Raut EDVA Experiment 1 Numpy Pandas
58 pages
Datasciencepythonlab
No ratings yet
Datasciencepythonlab
77 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
61 pages
Numpy PPT
No ratings yet
Numpy PPT
73 pages
NumPy Notes
No ratings yet
NumPy Notes
13 pages
Numpy
No ratings yet
Numpy
27 pages
4 Introduction To Python Part 3
No ratings yet
4 Introduction To Python Part 3
62 pages
Mds1111 Merged Numbered
No ratings yet
Mds1111 Merged Numbered
41 pages
4 Introduction To Python Part 3
No ratings yet
4 Introduction To Python Part 3
48 pages
1 Numpy
No ratings yet
1 Numpy
26 pages
Datascience Lab
No ratings yet
Datascience Lab
42 pages
Lets Begin With Numpy
No ratings yet
Lets Begin With Numpy
16 pages
Numpy Tutorial
No ratings yet
Numpy Tutorial
19 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
15 Numpy
No ratings yet
15 Numpy
32 pages
Numpy Basics
No ratings yet
Numpy Basics
66 pages
Dse Unit 3
No ratings yet
Dse Unit 3
12 pages
Topic - 2 - The Basics of NumPy Arrays 1
100% (1)
Topic - 2 - The Basics of NumPy Arrays 1
10 pages
Numpy and Pandas
No ratings yet
Numpy and Pandas
28 pages
Basic Array Creation and Operations
No ratings yet
Basic Array Creation and Operations
27 pages
Numpy - Pandas
No ratings yet
Numpy - Pandas
26 pages
10 Numpy
No ratings yet
10 Numpy
39 pages
Unit3 - Arrays and Strings
No ratings yet
Unit3 - Arrays and Strings
20 pages
Numpy, Pandas
No ratings yet
Numpy, Pandas
19 pages
Num Py
No ratings yet
Num Py
18 pages
Num Py
No ratings yet
Num Py
30 pages
Exp 12345
No ratings yet
Exp 12345
15 pages
Lab 1 - Introduction
No ratings yet
Lab 1 - Introduction
14 pages
Day 3.numpy - Complete - Guide
No ratings yet
Day 3.numpy - Complete - Guide
17 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
NumPy & Pandas
No ratings yet
NumPy & Pandas
27 pages
Ids 6 Experiments
No ratings yet
Ids 6 Experiments
27 pages
Num Py
No ratings yet
Num Py
21 pages
Numpy
No ratings yet
Numpy
11 pages
Num Py
No ratings yet
Num Py
18 pages
Numpy New
No ratings yet
Numpy New
16 pages
Numpy - Basics
No ratings yet
Numpy - Basics
18 pages
Numpy Handbook
No ratings yet
Numpy Handbook
16 pages
Num Py
No ratings yet
Num Py
13 pages
Numpy Cheat Sheet
No ratings yet
Numpy Cheat Sheet
13 pages
Jovia Report
No ratings yet
Jovia Report
18 pages
Ot Lab 6
No ratings yet
Ot Lab 6
13 pages
Numpy & Pandas
No ratings yet
Numpy & Pandas
13 pages
Self Numpy
No ratings yet
Self Numpy
6 pages
Program No.08 Ipmlement Various in Built Functions of NumPy Library
No ratings yet
Program No.08 Ipmlement Various in Built Functions of NumPy Library
13 pages
Unit 4 Numpy
No ratings yet
Unit 4 Numpy
14 pages
NumPy Class 11th
No ratings yet
NumPy Class 11th
10 pages
NUMPY
No ratings yet
NUMPY
8 pages
10.3 Geo Series
No ratings yet
10.3 Geo Series
47 pages
Lab 02
No ratings yet
Lab 02
5 pages
Lab 1
No ratings yet
Lab 1
6 pages
Guru Gobind Singh Public School Subject: IP Notes/Assignment: 2 Numpy Indexing and Slicing
No ratings yet
Guru Gobind Singh Public School Subject: IP Notes/Assignment: 2 Numpy Indexing and Slicing
6 pages
Numpy Guide
No ratings yet
Numpy Guide
1 page
Untitled 8
No ratings yet
Untitled 8
2 pages
Introducing SAP Subscription Billing For Subscription Based Business Models
No ratings yet
Introducing SAP Subscription Billing For Subscription Based Business Models
17 pages
G150XG01 V1
No ratings yet
G150XG01 V1
27 pages
NumPy Basics Cheat Sheet 1658717810
No ratings yet
NumPy Basics Cheat Sheet 1658717810
1 page
BOMBAS DE LODO Capacidad
No ratings yet
BOMBAS DE LODO Capacidad
9 pages
Final Year Project Format (Bca)
No ratings yet
Final Year Project Format (Bca)
17 pages
Standard 48: Format of The Iban Issued in The Uk
No ratings yet
Standard 48: Format of The Iban Issued in The Uk
20 pages
Config Doc - Work Clearance Management
No ratings yet
Config Doc - Work Clearance Management
30 pages
ANCA FXLinear Brochure 2021
No ratings yet
ANCA FXLinear Brochure 2021
20 pages
Unit 5
No ratings yet
Unit 5
14 pages
Link Information - 21 Links 5G
No ratings yet
Link Information - 21 Links 5G
32 pages
Character Codes Alan Codes: Enable Code (Must Be On)
No ratings yet
Character Codes Alan Codes: Enable Code (Must Be On)
37 pages
09 Data Link LayerFraming
No ratings yet
09 Data Link LayerFraming
17 pages
DI Spec
No ratings yet
DI Spec
6 pages
Lesson 8
No ratings yet
Lesson 8
21 pages
Textbooks
No ratings yet
Textbooks
2 pages
Technal Brochure Domal 40 2
No ratings yet
Technal Brochure Domal 40 2
5 pages
In The Name of ALLAH, The Most Gracious, The Most Merciful
No ratings yet
In The Name of ALLAH, The Most Gracious, The Most Merciful
18 pages
Quick Guide For Recording CPD Activities v7 Feb 2019
No ratings yet
Quick Guide For Recording CPD Activities v7 Feb 2019
7 pages
DSA and Algo For Interview
No ratings yet
DSA and Algo For Interview
15 pages
Shader Bits: Octahedral Impostors
No ratings yet
Shader Bits: Octahedral Impostors
15 pages
Avl Tree
No ratings yet
Avl Tree
11 pages
Internet Media Database
No ratings yet
Internet Media Database
5 pages
Set 4 IBM-322
No ratings yet
Set 4 IBM-322
3 pages
Eship
No ratings yet
Eship
5 pages
EP Control System II
No ratings yet
EP Control System II
4 pages
Filming at Femdom Gala
No ratings yet
Filming at Femdom Gala
4 pages
Devices - Input & Output - Crossword Labs
No ratings yet
Devices - Input & Output - Crossword Labs
2 pages
Taller #1 Ingles
No ratings yet
Taller #1 Ingles
3 pages
About Department-Energy
No ratings yet
About Department-Energy
2 pages
Attribute Types Exercise
No ratings yet
Attribute Types Exercise
1 page