0% found this document useful (0 votes)
98 views

Python GTU Study Material E-Notes Unit-5 16012021061815AM

The document discusses Darshan Engineering College located in Rajkot, Gujarat. It describes performing text analysis on documents describing the college and its location. It also discusses implementing prime number checks and comparing the performance of random forest classifiers with different number of jobs. Machine learning algorithms like random forest classifier and text preprocessing techniques like bag-of-words and CountVectorizer are used for text analysis and performance comparisons.

Uploaded by

Aman Goyal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views

Python GTU Study Material E-Notes Unit-5 16012021061815AM

The document discusses Darshan Engineering College located in Rajkot, Gujarat. It describes performing text analysis on documents describing the college and its location. It also discusses implementing prime number checks and comparing the performance of random forest classifiers with different number of jobs. Machine learning algorithms like random forest classifier and text preprocessing techniques like bag-of-words and CountVectorizer are used for text analysis and performance comparisons.

Uploaded by

Aman Goyal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9



o
o
o
o

o
o
o
o


darshan is the best engineering college in rajkot


rajkot is famous for engineering
college is located in rajkot morbi road

1 2 3 4 5 6 7 8
8 2 9 10 5
6 2 11 7 8 12 13

1 1 1 1 1 1 1 1 0 0 0 0 0
0 1 0 0 1 0 0 1 1 1 0 0 0
0 1 0 0 0 1 1 1 0 0 1 1 1

a = ["darshan is the best engineering college in rajkot","rajkot


is famous for engineering","college is located in rajkot morbi
road"]
from sklearn.feature_extraction.text import *
countVector = CountVectorizer()
X_train_counts = countVector.fit_transform(a)
print(countVector.vocabulary_)
print(X_train_counts.toarray())

{'darshan': 2,
'is': 7,
'the': 12,
'best': 0,
'engineering': 3,
'college': 1,
'in': 6,
'rajkot': 10,
'famous': 4,
'for': 5,
'located': 8,
'morbi': 9,
'road': 11}
array([[1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1],
[0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0]], dtype=int64)



o

o
%%timeit -n 1000 -r 7

for i in range(2,1000) :
listPrime = []
flag = 0
for j in range(2,i) :
if i % j == 0 :
flag = 1
break
if flag == 0 :
listPrime.append(i)
#print(listPrime)

5.17 ms ± 705 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)



o
o

%load_ext memory_profiler
from sklearn.feature_extraction.text import CountVectorizer
%memit countVector =
CountVectorizer(stop_words='english',analyzer='word')

peak memory: 85.61 MiB, increment: 0.02 MiB





%%timeit -n 1 -r 1
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
# define dataset
X, y = make_classification(n_samples=10000, n_features=20, n_info
rmative=15, n_redundant=5, random_state=3)
# define the model
model = RandomForestClassifier(n_estimators=500, n_jobs=1)

25.2 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


%%timeit -n 1 -r 1
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
# define dataset
X, y = make_classification(n_samples=10000, n_features=20, n_info
rmative=15, n_redundant=5, random_state=3)
# define the model
model = RandomForestClassifier(n_estimators=500, n_jobs=-1)

12.8 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)



o
o
o
o
o
o
o







o
o
o

o
o















You might also like