0% found this document useful (0 votes)
18 views

Python GTU Study Material E-Notes Unit-5 16012021061815AM

The document discusses Darshan Engineering College located in Rajkot, Gujarat. It describes performing text analysis on sample text data using scikit-learn and analyzing the results. It also covers benchmarking random forest classifier models with different numbers of jobs and observing improved performance with parallelization. Memory profiling of CountVectorizer is also demonstrated. Prime number generation is evaluated with different approaches and algorithms.

Uploaded by

Shyam Bihade
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Python GTU Study Material E-Notes Unit-5 16012021061815AM

The document discusses Darshan Engineering College located in Rajkot, Gujarat. It describes performing text analysis on sample text data using scikit-learn and analyzing the results. It also covers benchmarking random forest classifier models with different numbers of jobs and observing improved performance with parallelization. Memory profiling of CountVectorizer is also demonstrated. Prime number generation is evaluated with different approaches and algorithms.

Uploaded by

Shyam Bihade
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9



o
o
o
o

o
o
o
o


darshan is the best engineering college in rajkot


rajkot is famous for engineering
college is located in rajkot morbi road

1 2 3 4 5 6 7 8
8 2 9 10 5
6 2 11 7 8 12 13

1 1 1 1 1 1 1 1 0 0 0 0 0
0 1 0 0 1 0 0 1 1 1 0 0 0
0 1 0 0 0 1 1 1 0 0 1 1 1

a = ["darshan is the best engineering college in rajkot","rajkot


is famous for engineering","college is located in rajkot morbi
road"]
from sklearn.feature_extraction.text import *
countVector = CountVectorizer()
X_train_counts = countVector.fit_transform(a)
print(countVector.vocabulary_)
print(X_train_counts.toarray())

{'darshan': 2,
'is': 7,
'the': 12,
'best': 0,
'engineering': 3,
'college': 1,
'in': 6,
'rajkot': 10,
'famous': 4,
'for': 5,
'located': 8,
'morbi': 9,
'road': 11}
array([[1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1],
[0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0]], dtype=int64)



o

o
%%timeit -n 1000 -r 7

for i in range(2,1000) :
listPrime = []
flag = 0
for j in range(2,i) :
if i % j == 0 :
flag = 1
break
if flag == 0 :
listPrime.append(i)
#print(listPrime)

5.17 ms ± 705 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)



o
o

%load_ext memory_profiler
from sklearn.feature_extraction.text import CountVectorizer
%memit countVector =
CountVectorizer(stop_words='english',analyzer='word')

peak memory: 85.61 MiB, increment: 0.02 MiB





%%timeit -n 1 -r 1
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
# define dataset
X, y = make_classification(n_samples=10000, n_features=20, n_info
rmative=15, n_redundant=5, random_state=3)
# define the model
model = RandomForestClassifier(n_estimators=500, n_jobs=1)

25.2 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


%%timeit -n 1 -r 1
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
# define dataset
X, y = make_classification(n_samples=10000, n_features=20, n_info
rmative=15, n_redundant=5, random_state=3)
# define the model
model = RandomForestClassifier(n_estimators=500, n_jobs=-1)

12.8 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)



o
o
o
o
o
o
o







o
o
o

o
o















You might also like