0% found this document useful (0 votes)
61 views19 pages

Ds Practical

The document outlines various practical assignments for a Data Science Lab course, focusing on the installation and study of data analytics tools, particularly Python and its libraries like Numpy and Scikit-learn. It includes aims for practical tasks such as implementing classification techniques (KNN, Naïve Bayes), clustering techniques (K-means, DBScan), and association rule mining (Eclat, Apriori), along with source codes and outputs for each task. The document serves as a guide for students to apply data analytics concepts using Python in real-world scenarios.

Uploaded by

fijajamage9503
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views19 pages

Ds Practical

The document outlines various practical assignments for a Data Science Lab course, focusing on the installation and study of data analytics tools, particularly Python and its libraries like Numpy and Scikit-learn. It includes aims for practical tasks such as implementing classification techniques (KNN, Naïve Bayes), clustering techniques (K-means, DBScan), and association rule mining (Eclat, Apriori), along with source codes and outputs for each task. The document serves as a guide for students to apply data analytics concepts using Python in real-world scenarios.

Uploaded by

fijajamage9503
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Name :Harshala Sonawane Roll No:C-19

Class : SY-MCA Subject Incharge: [Link] Jadhao

Practical No.01
Aim : Installation and study of any one Data Analytics Tool Framework.

Theory :

Programming languages are used to solve a variety of data problems. now we will
focus on general ones that use letters, numbers, and symbols to create programs and require
formal syntax used by programmers. Often, they’re also called text-based programs
because you need to write software that will ultimately solve a problem. Examples include
C#, Java, PHP, Ruby, Julia, and Python, among many others on the market. Here we will
present Python as one of the best tools for data analysts that have coding knowledge as
well.

PYTHON

KEY FEATURES: An open-source solution that has simple coding processes and syntax
so it’s fairly easy to learn Integration with other languages such as C/C++, Java, PHP, C#,
etc. Advanced analysis processes through machine learning and text mining Python is
extremely accessible to code in comparison to other popular languages such as Java, and
its syntax is relatively easy to learn making this tool popular among users that look for an
open-source solution and simple coding processes. In data analysis, Python is used for data
crawling, cleaning, modeling, and constructing analysis algorithms based on business
scenarios. One of the best features is actually its user- friendliness: programmers don’t need
to remember the architecture of the system nor handle the memory – Python is considered a
high-level language that is not subject to the computer’s local processor. Another
noticeable feature of Python is its portability. Users can simply run the code on several
operating systems without making any changes to it so it’s not necessary to write
completely new code. This makes Python a highly portable language since programmers
can run it both on Windows and mac OS. An extensive number of modules, packages and
libraries make Python a respected and usable language across industries with companies
such as Spotify, Netflix, Dropbox and Reddit as the most popular ones that use this
language in their operations. With features such as text mining and machine learning,
Python is becoming a respected authority for advanced analysis processes.

Subject: Data Science Lab A.Y 2024-25


Name :Harshala Sonawane Roll No:C-19
Class : SY-MCA Subject Incharge: [Link] Jadhao

Practical No.02

Aim: Write a python program to demonstrate the use of Numpy.


Source Code:
import numpy as np
arr1 = [Link]([1, 2, 3, 4])
arr2 = [Link]([[1, 2], [3, 4]])

print("Array 1:", arr1)


print("Array 2:\n", arr2)

arr3 = arr1 + 10
print("Array 1 + 10:", arr3)

print("Mean of Array 1:", [Link](arr1))


print("Sum of Array 2:", [Link](arr2))
reshaped = [Link](2, 2)
print("Reshaped Array 1:\n", reshaped)

Output:
Array 1: [1 2 3 4]
Array 2:
[[1 2]
[3 4]]
Array 1 + 10: [11 12 13 14]
Mean of Array 1: 2.5
Sum of Array 2: 10
Reshaped Array 1:
[[1 2]
[3 4]]

Subject: Data Science Lab A.Y 2024-25


Name :Harshala Sonawane Roll No:C-19
Class : SY-MCA Subject Incharge: [Link] Jadhao

Practical No.03
Aim : Design and develop at least 10 problem statements which demonstrate the use of
data structure, functions Importing/Exporting Data in any data analytics tool.

Source Code :

Output:

Subject: Data Science Lab A.Y 2024-25


Name :Harshala Sonawane Roll No:C-19
Class : SY-MCA Subject Incharge: [Link] Jadhao
Practical No.04
Aim: Design and develop at least 5 problem statements which demonstrate the use of
Control Structures of any data analytics tool.

Subject: Data Science Lab A.Y 2024-25


Name :Harshala Sonawane Roll No:C-19
Class : SY-MCA Subject Incharge: [Link] Jadhao

Practical No.05
Aim: Implement KNN classification techniques using any data analytics tool.
Source Code:
from [Link] import KNeighborsClassifier
from [Link] import load_iris
from sklearn.model_selection import train_test_split
from [Link] import accuracy_score

iris =load_iris()
x,y =[Link], [Link]
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size= 0.5, random_state= 52)

clf = KNeighborsClassifier(n_neighbors=3)
[Link](x_train, y_train)
y_pred = [Link](x_test)

accuracy = accuracy_score(y_test, y_pred)


print("Accuracy", accuracy)

Output:
Accuracy 0.96

Subject: Data Science Lab A.Y 2024-25


Name :Harshala Sonawane Roll No:C-19
Class : SY-MCA Subject Incharge: [Link] Jadhao

Practical No.06
Aim: Implement Naïve Bayes classification techniques using any data analytics tool.
Source code:
import pandas as pd
from [Link] import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from [Link] import accuracy_score, classification_report, confusion_matrix
iris = load_iris()
X = [Link]
y = [Link]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = GaussianNB()
[Link](X_train, y_train)

y_pred = [Link](X_test)

accuracy = accuracy_score(y_test, y_pred)


conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy:.2f}")
print("\nConfusion Matrix:\n", conf_matrix)
print("\nClassification Report:\n", class_report)

Subject: Data Science Lab A.Y 2024-25


Name :Harshala Sonawane Roll No:C-19
Class : SY-MCA Subject Incharge: [Link] Jadhao

Output:
Accuracy: 1.00
Confusion Matrix:
[[10 0 0]
[ 0 8 0]
[ 0 0 12]]

Classification Report:
precision recall f1-score support

0 1.00 1.00 1.00 10


1 1.00 1.00 1.00 8
2 1.00 1.00 1.00 12

accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30

Subject: Data Science Lab A.Y 2024-25


Name :Harshala Sonawane Roll No:C-19
Class : SY-MCA Subject Incharge: [Link] Jadhao

Practical No.07
Aim: Implement K mean clustering techniques using any data analytics tool.
Source code:
import numpy as np
from [Link] import KMeans
import [Link] as plt
[Link](42)
X =[Link]([
[Link](loc=[0, 0], scale=1.0, size=(100, 2)),
[Link](loc=[5, 5], scale=1.0, size=(100, 2)),
[Link](loc=[-5, 5], scale=1.0, size=(100, 2))
])
n_clusters=3
kmeans= KMeans(n_clusters=n_clusters, random_state=42)
[Link](X)
centers=kmeans.cluster_centers_
labels=kmeans.labels_

[Link](figsize= (8, 6))


[Link](X[:, 0], X[:, 1], c=labels, cmap='viridis', marker='o', edgecolor='k')
[Link](centers[:, 0], centers[:, 1], c='red', s=200, marker='X', label='Centroids')
[Link]('K-Means Clustering')
[Link]('Feature 1')
[Link]('Feature 2')
[Link]()
[Link]()

Subject: Data Science Lab A.Y 2024-25


Name :Harshala Sonawane Roll No:C-19
Class : SY-MCA Subject Incharge: [Link] Jadhao

Subject: Data Science Lab A.Y 2024-25


Name :Harshala Sonawane Roll No:C-19
Class : SY-MCA Subject Incharge: [Link] Jadhao

Practical No.08
Aim: Implement DBScan clustering techniques using any data analytics tool.
Source code:
import numpy as np
import [Link] as plt
from [Link] import make_moons
from [Link] import DBSCAN

X, _ = make_moons(n_samples=300, noise=0.1, random_state=42)

dbscan = DBSCAN(eps=0.2, min_samples=5)


labels = dbscan.fit_predict(X)

[Link](figsize=(8, 6))
[Link](X[:, 0], X[:, 1], c=labels, cmap='viridis', marker='o', edgecolor='k', s=50)
[Link]('DBSCAN Clustering')
[Link]('Feature 1')
[Link]('Feature 2')
[Link]()
[Link]()
Output:

Subject: Data Science Lab A.Y 2024-25


Name :Harshala Sonawane Roll No:C-19
Class : SY-MCA Subject Incharge: [Link] Jadhao

Subject: Data Science Lab A.Y 2024-25


Name :Harshala Sonawane Roll No:C-19
Class : SY-MCA Subject Incharge: [Link] Jadhao

Practical No.09
Aim: Implement Eclat Association Rule Mining techniques using any data analytics
tool.
Source code:

Subject: Data Science Lab A.Y 2024-25


Name :Harshala Sonawane Roll No:C-19
Class : SY-MCA Subject Incharge: [Link] Jadhao

Practical No.10
Aim: Implement Apriori Association Rule Mining techniques using any data analytics
tool.
Source code:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
data = {
'Transaction_ID': [1, 2, 3, 4, 5, 6, 7, 8],
'Items': [
['Milk', 'Bread', 'Eggs'],
['Bread', 'Diapers', 'Beer', 'Eggs'],
['Milk', 'Diapers', 'Beer', 'Cola'],
['Bread', 'Milk', 'Diapers'],
['Bread', 'Milk', 'Cola'],
['Diapers', 'Cola'],
['Bread', 'Eggs'],
['Milk', 'Diapers', 'Beer']
]
}
df = [Link](data)
onehot = df['Items'].[Link]('|').str.get_dummies()
min_support = 0.25
frequent_itemsets = apriori(onehot, min_support=min_support, use_colnames=True)
rules = association_rules(frequent_itemsets, metric='support', min_threshold=0.25)

print("Frequent Itemsets:")
print(frequent_itemsets)
print("\nAssociation Rules:")
print(rules)

Subject: Data Science Lab A.Y 2024-25


Name :Harshala Sonawane Roll No:C-19
Class : SY-MCA Subject Incharge: [Link] Jadhao
Output:
Frequent Itemsets:
support itemsets
0 0.375 (Beer)
1 0.625 (Bread)
2 0.375 (Cola)
3 0.625 (Diapers)
4 0.375 (Eggs)
5 0.625 (Milk)
6 0.375 (Beer, Diapers)
7 0.250 (Beer, Milk)
8 0.250 (Bread, Diapers)
9 0.375 (Bread, Eggs)
10 0.375 (Bread, Milk)
11 0.250 (Cola, Diapers)
12 0.250 (Cola, Milk)
13 0.375 (Diapers, Milk)
14 0.250 (Beer, Diapers, Milk)

Association Rules:
antecedents consequents antecedent support consequent support \
0 (Beer) (Diapers) 0.375 0.625
1 (Diapers) (Beer) 0.625 0.375
2 (Beer) (Milk) 0.375 0.625
3 (Milk) (Beer) 0.625 0.375
4 (Bread) (Diapers) 0.625 0.625
5 (Diapers) (Bread) 0.625 0.625
6 (Bread) (Eggs) 0.625 0.375
7 (Eggs) (Bread) 0.375 0.625
8 (Bread) (Milk) 0.625 0.625

Subject: Data Science Lab A.Y 2024-25


Name :Harshala Sonawane Roll No:C-19
Class : SY-MCA Subject Incharge: [Link] Jadhao
9 (Milk) (Bread) 0.625 0.625
10 (Cola) (Diapers) 0.375 0.625
11 (Diapers) (Cola) 0.625 0.375
12 (Cola) (Milk) 0.375 0.625
13 (Milk) (Cola) 0.625 0.375
14 (Diapers) (Milk) 0.625 0.625
15 (Milk) (Diapers) 0.625 0.625
16 (Beer, Diapers) (Milk) 0.375 0.625
17 (Beer, Milk) (Diapers) 0.250 0.625
18 (Diapers, Milk) (Beer) 0.375 0.375
19 (Beer) (Diapers, Milk) 0.375 0.375
20 (Diapers) (Beer, Milk) 0.625 0.250
21 (Milk) (Beer, Diapers) 0.625 0.375

support confidence lift leverage conviction zhangs_metric


0 0.375 1.000000 1.600000 0.140625 inf 0.600000
1 0.375 0.600000 1.600000 0.140625 1.562500 1.000000
2 0.250 0.666667 1.066667 0.015625 1.125000 0.100000
3 0.250 0.400000 1.066667 0.015625 1.041667 0.166667
4 0.250 0.400000 0.640000 -0.140625 0.625000 -0.600000
5 0.250 0.400000 0.640000 -0.140625 0.625000 -0.600000
6 0.375 0.600000 1.600000 0.140625 1.562500 1.000000
7 0.375 1.000000 1.600000 0.140625 inf 0.600000
8 0.375 0.600000 0.960000 -0.015625 0.937500 -0.100000
9 0.375 0.600000 0.960000 -0.015625 0.937500 -0.100000
10 0.250 0.666667 1.066667 0.015625 1.125000 0.100000
11 0.250 0.400000 1.066667 0.015625 1.041667 0.166667
12 0.250 0.666667 1.066667 0.015625 1.125000 0.100000
13 0.250 0.400000 1.066667 0.015625 1.041667 0.166667
14 0.375 0.600000 0.960000 -0.015625 0.937500 -0.100000

Subject: Data Science Lab A.Y 2024-25


Name :Harshala Sonawane Roll No:C-19
Class : SY-MCA Subject Incharge: [Link] Jadhao
15 0.375 0.600000 0.960000 -0.015625 0.937500 -0.100000
16 0.250 0.666667 1.066667 0.015625 1.125000 0.100000
17 0.250 1.000000 1.600000 0.093750 inf 0.500000
18 0.250 0.666667 1.777778 0.109375 1.875000 0.700000
19 0.250 0.666667 1.777778 0.109375 1.875000 0.700000
20 0.250 0.400000 1.600000 0.093750 1.250000 1.000000
21 0.250 0.400000

Subject: Data Science Lab A.Y 2024-25


Name :Harshala Sonawane Roll No:C-19
Class : SY-MCA Subject Incharge: [Link] Jadhao

Practical No.11
Aim :Visualize all the statistical measures (mean, mode, median, runge, inter quartile
range, etc.) using Histograms, Boxplots, scatter plots, etc.
Source code:
import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns
[Link](42)
data = [Link](loc=50, scale=10, size=1000) # Normal distribution
df = [Link](data, columns=['Value'])

mean = df['Value'].mean()
median = df['Value'].median()
mode = df['Value'].mode()[0] # Mode can return multiple values, take the first
data_range = df['Value'].max() - df['Value'].min()
q1 = df['Value'].quantile(0.25)
q3 = df['Value'].quantile(0.75)
iqr = q3 - q1
print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Mode: {mode}")
print(f"Range: {data_range}")
print(f"IQR: {iqr}")
[Link](figsize=(15, 5))

# Histogram
[Link](1, 3, 1)
[Link](df['Value'], bins=30, kde=True)
[Link](mean, color='red', linestyle='dashed', linewidth=1, label='Mean')
[Link](median, color='blue', linestyle='dashed', linewidth=1, label='Median')

Subject: Data Science Lab A.Y 2024-25


Name :Harshala Sonawane Roll No:C-19
Class : SY-MCA Subject Incharge: [Link] Jadhao
[Link](mode, color='green', linestyle='dashed', linewidth=1, label='Mode')
[Link]('Histogram with Mean, Median, Mode')
[Link]('Value')
[Link]('Frequency')
[Link]()

# Boxplot
[Link](1, 3, 2)
[Link](y=df['Value'])
[Link]('Boxplot')
[Link]('Value')

# Scatter Plot
[Link](1, 3, 3)
[Link](range(len(df)), df['Value'], alpha=0.5)
[Link](mean, color='red', linestyle='dashed', linewidth=1, label='Mean')
[Link](median, color='blue', linestyle='dashed', linewidth=1, label='Median')
[Link]('Scatter Plot with Mean and Median')
[Link]('Index')
[Link]('Value')
[Link]()

plt.tight_layout()
[Link]()

Subject: Data Science Lab A.Y 2024-25


Name :Harshala Sonawane Roll No:C-19
Class : SY-MCA Subject Incharge: [Link] Jadhao
Output:
Mean: 50.193320558223256
Median: 50.25300612234888
Mode: 17.58732659930927
Range: 70.93998830723794
IQR: 12.955341809352817

Subject: Data Science Lab A.Y 2024-25

You might also like