0% found this document useful (0 votes)
18 views

3.2 Grid Search

Uploaded by

bassam abutraab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

3.2 Grid Search

Uploaded by

bassam abutraab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 28

SKlearn ‫ مكتبة‬:‫القسم العاشر‬

A. Data Preparation 12. Naïve Bayes


1. Data files from SKlearn 13. LDA , QDA
2. Data cleaning 14. Hierarchical Clusters
3. Metrics module 15. DbScan
4. Feature Selection 16. NLP
5. Data Scaling 17. Apriori
6. Data Split
C. Algorithm Evaluation :
B. ML Algorithms 1. Model Check
1. Linear Regression 2. Grid Search
2. Logistic Regression 3. Pipeline
3. Neural Network 4. Model Save
4. SVR
5. SVC D. Time Series
6. K-means
7. PCA
8. Decision Tree
9. Ensemble Regression
10. Ensemble Classifier
11. K Nearest Neighbors
1
3.2) Grid Search
‫ اختيار االفضل منها‬, ‫ في الموديل‬hyperparameters ‫ و هي عملية تطبيق العديد من المعامالت العليا‬
model_selection ‫ يتم استخدامها من الموديول‬
: ‫ لها اكثر من اداة مثل‬
3.2.1 model_selection.GridSearchCV
3.2.2 model_selection.RandomizedSearchCV

2
‫‪3.2.1) GridSearchCV‬‬
‫‪ ‬وهو يستخدم للتقييم بين المعامالت العليا ‪ Hyperparameter‬المستخدمة في اي خوارزم ‪ ,‬وحتي في استخدام انواع مخصصة داخل‬
‫الخوارزم ‪ ,‬مثل خطي او غيره‪ ,‬ونقوم باعطائها عدد من الخيارات ‪ ,‬وتقوم هي بتجريبها جميعا و اختيار االدق‬
‫‪ ‬يتم استخدامها عبر الموديول ‪model_selection.GridSearchCV‬‬
‫‪ ‬خطوات تنفيذها كالتالي ‪:‬‬

‫عمل استيراد لها عبر ‪:‬‬ ‫‪.1‬‬


‫استيراد الموديل المطلوب فحصه و كتابة تفاصيله‬ ‫‪.2‬‬
‫عمل قاموس بحيث يكون فيه المفتاح هو اسم الـ ‪ parameter‬و القيم هي القيم المطلوب فحصها‬ ‫‪.3‬‬
‫تنفيذ أمر ‪ GridSearchCV‬علي الموديل المطلوب بالقاموس المحدد‬ ‫‪.4‬‬
‫اظهار النتائج عبر عدد من الـ ‪ attributes‬مثل ‪:‬‬ ‫‪.5‬‬

‫_‪cv_results_ , best_estimator_ , best_score_ , best_params_ , best_index_ , scorer_ , n_splits_ , refit_time‬‬

‫‪3‬‬
‫الصيغة العامة‬
#Import Libraries
from sklearn.model_selection import GridSearchCV
import pandas as pd
#----------------------------------------------------

#Applying Grid Searching :


'''
model_selection.GridSearchCV(estimator, param_grid, scoring=None,fit_params=None, n_jobs=None, iid=’warn’,
refit=True, cv=’warn’, verbose=0,pre_dispatch=‘2*n_jobs’, error_score=
’raisedeprecating’,return_train_score=’warn’)

'''

#=======================================================================
#Example :
#from sklearn.svm import SVR
#SelectedModel = SVR(epsilon=0.1,gamma='auto')
#SelectedParameters = {'kernel':('linear', 'rbf'), 'C':[1,2,3,4,5]}
#=======================================================================
GridSearchModel = GridSearchCV(SelectedModel,SelectedParameters, cv = 2,return_train_score=True)
4
GridSearchModel.fit(X_train, y_train)
sorted(GridSearchModel.cv_results_.keys())
GridSearchResults = pd.DataFrame(GridSearchModel.cv_results_)[['mean_test_score', 'std_test_score', 'params' ,
'rank_test_score' , 'mean_fit_time']]

# Showing Results
print('All Results are :\n', GridSearchResults )
print('Best Score is :', GridSearchModel.best_score_)
print('Best Parameters are :', GridSearchModel.best_params_)
print('Best Estimator is :', GridSearchModel.best_estimator_)

5
‫مثال‬
#Import Libraries
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
import pandas as pd
#----------------------------------------------------

#load boston data

BostonData = load_boston()

#X Data
X = BostonData.data
#print('X Data is \n' , X[:10])
#print('X shape is ' , X.shape)
#print('X Features are \n' , BostonData.feature_names)

#y Data
y = BostonData.target
#print('y Data is \n' , y[:10])
6
#print('y shape is ' , y.shape)

#----------------------------------------------------
#Splitting data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=44, shuffle =True)

#----------------------------------------------------
#Applying Grid Searching :
'''
model_selection.GridSearchCV(estimator, param_grid, scoring=None,fit_params=None, n_jobs=None, iid=’warn’,
refit=True, cv=’warn’, verbose=0,pre_dispatch=‘2*n_jobs’, error_score=
’raisedeprecating’,return_train_score=’warn’)

'''

#Example :
from sklearn.svm import SVR
SelectedModel = SVR(epsilon=0.1,gamma='auto')
SelectedParameters = {'kernel':('linear', 'rbf'), 'C':[1,2,3,4,5]}
7
GridSearchModel = GridSearchCV(SelectedModel,SelectedParameters, cv = 2,return_train_score=True)
GridSearchModel.fit(X_train, y_train)
sorted(GridSearchModel.cv_results_.keys())
GridSearchResults = pd.DataFrame(GridSearchModel.cv_results_)[['mean_test_score', 'std_test_score', 'params' ,
'rank_test_score' , 'mean_fit_time']]

# Showing Results
print('All Results are :\n', GridSearchResults )
print('Best Score is :', GridSearchModel.best_score_)
print('Best Parameters are :', GridSearchModel.best_params_)
print('Best Estimator is :', GridSearchModel.best_estimator_)

8
‫مثال‬
import pandas as pd
from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
iris = datasets.load_iris()
parameters = {'kernel':('linear', 'rbf'), 'C':[1,2,3,4,5]}
svc = svm.SVC(gamma="scale")
clf = GridSearchCV(svc, parameters, cv=5)
clf.fit(iris.data, iris.target)

sorted(clf.cv_results_.keys())
pd.DataFrame(clf.cv_results_)[['mean_test_score', 'std_test_score', 'params' , 'rank_test_score' , 'mean_fit_time']]

print('score : ' , clf.best_score_)


print('params : ' , clf.best_params_)
print('best : ' , clf.best_estimator_)

9
‫مثال‬
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV

iris = load_iris()
X = iris.data
y = iris.target

knn = KNeighborsClassifier(n_neighbors=5)
scores = cross_val_score(knn, X, y, cv=10, scoring='accuracy')
print(scores)

k_range = list(range(1, 31))


print(k_range)

param_grid = dict(n_neighbors=k_range)
10
print(param_grid)

grid = GridSearchCV(knn, param_grid, cv=10, scoring='accuracy', return_train_score=False)


grid.fit(X, y)

pd.DataFrame(grid.cv_results_)[['mean_test_score', 'std_test_score', 'params']]

print(grid.cv_results_['params'])
print(grid.cv_results_['mean_test_score'])

grid_mean_scores = grid.cv_results_['mean_test_score']
print(grid_mean_scores)

plt.plot(k_range, grid_mean_scores)
plt.xlabel('Value of K for KNN')
plt.ylabel('Cross-Validated Accuracy')

print('score : ' , grid.best_score_)


print('params : ' , grid.best_params_)
print('best : ' , grid.best_estimator_)

11
‫مثال‬
import time
import numpy as np
import pandas as pd
from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV
from sklearn.kernel_ridge import KernelRidge

rng = np.random.RandomState(0)

# #############################################################################
# Generate sample data
X = 5 * rng.rand(10000, 1)
y = np.sin(X).ravel()

# Add noise to targets


y[::5] += 3 * (0.5 - rng.rand(X.shape[0] // 5))

X_plot = np.linspace(0, 5, 100000)[:, None]

12
# #############################################################################
# Fit regression model
train_size = 100
svr = GridSearchCV(SVR(kernel='rbf', gamma=0.1), cv=5,
param_grid={"C": [1e0, 1e1, 1e2, 1e3],
"gamma": np.logspace(-2, 2, 5)})

kr = GridSearchCV(KernelRidge(kernel='rbf', gamma=0.1), cv=5,


param_grid={"alpha": [1e0, 0.1, 1e-2, 1e-3],
"gamma": np.logspace(-2, 2, 5)})

t0 = time.time()
svr.fit(X[:train_size], y[:train_size])
svr_fit = time.time() - t0
print("SVR complexity and bandwidth selected and model fitted in %.3f s"
% svr_fit)

t0 = time.time()
kr.fit(X[:train_size], y[:train_size])
kr_fit = time.time() - t0
print("KRR complexity and bandwidth selected and model fitted in %.3f s"
% kr_fit)
13
sv_ratio = svr.best_estimator_.support_.shape[0] / train_size
print("Support vector ratio: %.3f" % sv_ratio)

t0 = time.time()
y_svr = svr.predict(X_plot)
svr_predict = time.time() - t0
print("SVR prediction for %d inputs in %.3f s"
% (X_plot.shape[0], svr_predict))

t0 = time.time()
y_kr = kr.predict(X_plot)
kr_predict = time.time() - t0
print("KRR prediction for %d inputs in %.3f s"
% (X_plot.shape[0], kr_predict))

pd.DataFrame(kr.cv_results_)[['mean_test_score', 'std_test_score', 'params']]


pd.DataFrame(svr.cv_results_)[['mean_test_score', 'std_test_score', 'params']]

print('score : ' , kr.best_score_)


print('params : ' , kr.best_params_)
14
print('best : ' , kr.best_estimator_)

print('==================================')

print('score : ' ,svr.best_score_)


print('params : ' , svr.best_params_)
print('best : ' , svr.best_estimator_)

15
‫مثال‬
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV

iris = load_iris()

X = iris.data
y = iris.target

knn = KNeighborsClassifier(n_neighbors=5)
scores = cross_val_score(knn, X, y, cv=10, scoring='accuracy')
print(scores)

k_range = list(range(1, 31))


weight_options = ['uniform', 'distance']

16
param_grid = dict(n_neighbors=k_range, weights=weight_options)
print(param_grid)

grid = GridSearchCV(knn, param_grid, cv=10, scoring='accuracy', return_train_score=False)


grid.fit(X, y)

pd.DataFrame(grid.cv_results_)[['mean_test_score', 'std_test_score', 'params']]

print(grid.best_score_)
print(grid.best_params_)

knn = KNeighborsClassifier(n_neighbors=13, weights='uniform')


knn.fit(X, y)

knn.predict([[3, 5, 4, 2]])

17
‫‪3.2.2) RandomizedSearchCV‬‬
‫‪ ‬وهي مشابهة لفكرة الـ‪ grid search‬في البحث عن القيم المثالية ‪ ,‬اال ان الفارق انها لن تقوم بعمل تقييم لكل القيم الموجودة النها ستاخذ وقت‬
‫رهيب ‪ ,‬لكن بانتقاء قيم عشوائية ‪ ,‬وفحص ايهم افضل ‪. .‬‬
‫‪ ‬و تكون شبيهة في االستخدام مع الـ ‪ grid‬اال اننا نكتب لها ايضا عدد المحاوالت ‪ n_iter‬المسموحة ‪ ,‬وكذلك طريقة عمل العشوائية‬
‫‪random_state‬‬
‫‪ ‬يتم استخدامها عبر الموديول ‪model_selection.RandomizedSearchCV‬‬

‫‪ ‬خطوات تنفيذها كالتالي ‪:‬‬

‫عمل استيراد لها عبر ‪:‬‬ ‫‪.6‬‬


‫استيراد الموديل المطلوب فحصه و كتابة تفاصيله‬ ‫‪.7‬‬
‫عمل قاموس بحيث يكون فيه المفتاح هو اسم الـ ‪ parameter‬و القيم هي القيم المطلوب فحصها‬ ‫‪.8‬‬
‫تنفيذ أمر ‪ RandomizedSearchCV‬علي الموديل المطلوب بالقاموس المحدد‬ ‫‪.9‬‬
‫اظهار النتائج عبر عدد من الـ ‪ attributes‬مثل ‪:‬‬ ‫‪.10‬‬

‫_‪cv_results_ , best_estimator_ , best_score_ , best_params_ , best_index_ , scorer_ , n_splits_ , refit_time‬‬

‫‪18‬‬
‫الصيغة العامة‬
#Import Libraries
from sklearn.model_selection import RandomizedSearchCV
import pandas as pd
#----------------------------------------------------

#Applying Randomized Grid Searching :


'''
model_selection.RandomizedSearchCV(estimator, param_distributions,n_iter=10,
scoring=None,fit_params=None,n_jobs=
None,iid=’warn’, refit=True, cv=’warn’, verbose=0, pre_dispatch=‘2*n_jobs’,
random_state=None,error_score=’raise-deprecating’, return_train_score=’warn’)

'''

#=======================================================================
#Example :
#from sklearn.svm import SVR
#SelectedModel = SVR(epsilon=0.1,gamma='auto')
#SelectedParameters = {'kernel':('linear', 'rbf'), 'C':[1,2,3,4,5]}
#=======================================================================
19
RandomizedSearchModel = RandomizedSearchCV(SelectedModel,SelectedParameters, cv = 2,return_train_score=True)
RandomizedSearchModel.fit(X_train, y_train)
sorted(RandomizedSearchModel.cv_results_.keys())
RandomizedSearchResults = pd.DataFrame(RandomizedSearchModel.cv_results_)[['mean_test_score', 'std_test_score',
'params' , 'rank_test_score' , 'mean_fit_time']]

# Showing Results
print('All Results are :\n', RandomizedSearchResults )
print('Best Score is :', RandomizedSearchModel.best_score_)
print('Best Parameters are :', RandomizedSearchModel.best_params_)
print('Best Estimator is :', RandomizedSearchModel.best_estimator_)

20
‫مثال‬
#Import Libraries
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.model_selection import RandomizedSearchCV
import pandas as pd
#----------------------------------------------------

#load boston data

BostonData = load_boston()

#X Data
X = BostonData.data
#print('X Data is \n' , X[:10])
#print('X shape is ' , X.shape)
#print('X Features are \n' , BostonData.feature_names)

#y Data
y = BostonData.target
#print('y Data is \n' , y[:10])
21
#print('y shape is ' , y.shape)

#----------------------------------------------------
#Splitting data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=44, shuffle =True)

#----------------------------------------------------
#Applying Randomized Grid Searching :
'''
model_selection.RandomizedSearchCV(estimator, param_distributions,n_iter=10,
scoring=None,fit_params=None,n_jobs=
None,iid=’warn’, refit=True, cv=’warn’, verbose=0, pre_dispatch=‘2*n_jobs’,
random_state=None,error_score=’raise-deprecating’, return_train_score=’warn’)

'''

#=======================================================================
#Example :
from sklearn.svm import SVR
SelectedModel = SVR(epsilon=1,gamma='auto')
SelectedParameters = {'kernel':('linear', 'rbf'), 'C':[1,2]}
22
RandomizedSearchModel = RandomizedSearchCV(SelectedModel,SelectedParameters, cv = 2,return_train_score=True)
RandomizedSearchModel.fit(X_train, y_train)
sorted(RandomizedSearchModel.cv_results_.keys())
RandomizedSearchResults = pd.DataFrame(RandomizedSearchModel.cv_results_)[['mean_test_score', 'std_test_score',
'params' , 'rank_test_score' , 'mean_fit_time']]

# Showing Results
print('All Results are :\n', RandomizedSearchResults )
print('Best Score is :', RandomizedSearchModel.best_score_)
print('Best Parameters are :', RandomizedSearchModel.best_params_)
print('Best Estimator is :', RandomizedSearchModel.best_estimator_)

23
‫مثال‬
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
import pandas as pd

# read in the iris data


iris = load_iris()

# create X (features) and y (response)


X = iris.data
y = iris.target

# 10-fold cross-validation with K=5 for KNN (the n_neighbors parameter)


knn = KNeighborsClassifier(n_neighbors=5)
scores = cross_val_score(knn, X, y, cv=10, scoring='accuracy')
print(scores)

from sklearn.model_selection import RandomizedSearchCV

24
# define the parameter values that should be searched
k_range = list(range(1, 31))
weight_options = ['uniform', 'distance']

# specify "parameter distributions" rather than a "parameter grid"


param_dist = dict(n_neighbors=k_range, weights=weight_options)

# n_iter controls the number of searches


rand = RandomizedSearchCV(knn, param_dist, cv=10, scoring='accuracy', n_iter=10, random_state=5,
return_train_score=False)
rand.fit(X, y)
pd.DataFrame(rand.cv_results_)[['mean_test_score', 'std_test_score', 'params']]

pd.DataFrame(rand.cv_results_)[['mean_test_score', 'std_test_score', 'params']]


# examine the best model
print(rand.best_score_)
print(rand.best_params_)

# run RandomizedSearchCV 20 times (with n_iter=10) and record the best score
best_scores = []
for _ in range(20):
rand = RandomizedSearchCV(knn, param_dist, cv=10, scoring='accuracy', n_iter=10, return_train_score=False)
25
rand.fit(X, y)
best_scores.append(round(rand.best_score_, 3))
print(best_scores)

26
‫مثال‬
# Load libraries
from scipy.stats import uniform
from sklearn import linear_model, datasets
from sklearn.model_selection import RandomizedSearchCV

iris = datasets.load_iris()
X = iris.data
y = iris.target

logistic = linear_model.LogisticRegression()

penalty = ['l1', 'l2']

# Create regularization hyperparameter distribution using uniform distribution


C = uniform(loc=0, scale=4)

# Create hyperparameter options


hyperparameters = dict(C=C, penalty=penalty)

27
# Create randomized search 5-fold cross validation and 100 iterations
clf = RandomizedSearchCV(logistic, hyperparameters, random_state=1, n_iter=100, cv=5, verbose=0, n_jobs=-1)

# Fit randomized search


best_model = clf.fit(X, y)

# View best hyperparameters


print('Best Penalty:', best_model.best_estimator_.get_params()['penalty'])
print('Best C:', best_model.best_estimator_.get_params()['C'])

# Predict target vector


best_model.predict(X)

28

You might also like