0% found this document useful (0 votes)
24 views

ML Unit

All lessons of ml

Uploaded by

veerapallianji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
24 views

ML Unit

All lessons of ml

Uploaded by

veerapallianji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 23
f UNIT iV Supervised Learning: Reoression action, Example of Regression, Common Regression Algarithms-Simple finear eigine-tearniNg algorithm that learns from the labelled datavets and maps the data priets pine most of we should know what supervised machine leaming algorttmns is. Iti a type of Pine learning where the algorithm leams from labelled data. Labeled data means fhe geastt whose respective target value is already known. Supervised learning has two types: Classification: It predicts the class of the dataset based om the independent input variable. Class is the categorical or discrete values, like the image of an animal is 2 cat or dog? . «Regression: It predicts the continuous output varizbles based on the independent input variable. like the prediction of house prices based on different parameters Hike house age, distance from the main road, location, area, et ere, we will discuss one of the simplest types of regression ic. Linear Regression. ‘what is Linear Regression? Linear regression is a type of supervised machine leaming algorithm that computes the ingar relationship between the dependent variable end one or more independent features Ey fing a linear equation to observed data. When there is only one independent feature, itis known as Simple Linear Regression, and shen there are more than one feature, itis known as Multiple Linear Regression. Similarly, when there is only one dependent variable, itis considered Univariate Linear Regression, while when there are more than one dependent varizbles, it is known as Maltivariate Regression. Why Linear Regression is Important? The interpretability of linear regression is @ notable strength. The model's equation provides clear coefficients that elucidate the impact of each independent variable on the dependent ‘arible, facilitating a deeper understanding of the underlying dynamics. Its simplicity is 2 vinue, as linear regression is transparent, easy to implement, and serves as @ foundational soncept for more complex algorithms. Linear regression is not merely a predictive tool; it forms the basis for various advanced dels. Techniques like regularization and support vector machines draw inspiration from regression, expanding its utility. Additionally, linear regression is 2 comesstone in sumption testing, enabling researchers to validate key assumptions about the data. Types of Linear Regression Tee are two main types of linea regression: Simple Linear Regression 107 This is the simplest form and one dependent vara I= POePIXY=BOePIX where + Yisthe dependent + Xs the indopendentvarible + Bois the inercept isthe slope “Muliple Linear Regression penetra one dependent variable, The equation ax + isthe dependent variable 1X2, vn Xpare the independent variables + Bis the intercept 2, inate the slopes thm is to find the best Fit Line equation that can predict the lepen ‘The goal ofthe al values based on the In regression set of record are presen ight in that epresents the relationship between The slope ofthe line indicates how much the change inthe independent variable(s), the dependent and independent: dependent variable changes for au Linear Regression Regression, In the figure above, ay ofa person, The regression line to get the best iin since in diferent reresion lines oR yO +02xiyi=D1+02xi Here, 1X1 Qo a ‘one input varabe(paae coefficient of ‘Once we find the be and Oval lin, So when we are finally using the value ofy forthe input value of x How to update 0; and 0; values to get the best-fit ine? ‘To achieve the bes value YY such th true value Y is minimum. S to prod the target ‘to diference between he prdiccd value YY andthe very important o update the O and valves, to reach the ¥, predicted value YY" and the ni ¥ 108 | 2 Se dee ae npeete cf eas oe TES 2 ofthe depends vase for coe doe nat open Lincar Regresico iia paneno tha os regression model arc highly cea variable on te dependent ¥ he owed oupu, The oat common easements ‘Mean Square Esra (MSE) m the average of te wguared pone The ference is the dat pul + is the predicted value for he data point the accuracy of motels peitions. MSE 1s sensitive to outliers the overall score. ‘Mean Abs ‘Mean Absolute Error i an evaluation metic used to alelatethe accuracy ofa regression saute the average absolute dllernce between the predicted values and ner of observations actual values, performance is not sensitive to the outliers as sesf the reial for fas agus OF RSS st esoened ad wha as seEjezntyr bob? esa ofthe espa’ rors fom the answer el sum of squares, oF TSS “Totat Sum of Squares (TSS) rncan i040 38 SSS" re prpoton of arnce inte aependentvanable that is the proportion of ssa rease of he preperins Rsquared ‘explained the independent van Adjusted R-Squared Error seein the dependent variable that is explained eset proprio of i pees 5m ane ion model. Adjusted R-square accounts the number of fad including irrelevant predictors that de penalizes the rocel fri epiy to explain te variance the dependent variables ajosted Fis expresses 1 -4(0-R2)(o-H)n-k-1 Adjusted 10k AI-R2) (0-9) Her, +n isthe mumber of observations numberof predictors the model + Wis coef vanable Python implementation of Linear Regression Import the necessary libraries: import pandas a pd na ‘animation import Func Animation regression, near Regression: hese are of two types: 1 Simple linear Regression 2. Multiple Linear Regression Let's Discuss Multiple Linear Regression using Python Multiple Linear Regresi and a response by ft IMPs to mode the relsionhipberacen tay or mor fearres near equation to observed data Tho step to performs re almost sua to tht of simple href ion, We can use ito fnd out whe Ran output and how diferent vane cho Here Y=DO+DI* x1 +b2*a2e43 299 Y= Dependent variable and x1, 32,33, Assumption of Regression Model * Linearity: The eaonsip beeen depends an independent vara shoul be + Homoseedasticty: Constant variance ofthe erors should be maintained + Mulivarite normaly MuipleReessonsunes tt i hm distributed. es Sr + Lack of Multicollinearity the data assumed that there ie le or n0 maltcollineaity in mny Variable: ‘As we know in the Multiple Regression Model we sea Jo of categorical data. Using Categorical Data is a good method to include nou-numers data into the respective ‘Model. Categorical Data refers to data values that represent eategories-data values ced and unordered number of values, for instance, ale female). tn the sodel, these valves can be represented by Dummy Vanables presence and absence of us Di = Dery Venables Method of Building Models: + Alea + Backward-ELmmation + Forward Seleson + Bidizectiocal Bimation + Score Comparten Hackward-Elimination > ee ante is Kea, Remove the pedicure, - the one withthe Towest P-value. Pessblesodels with one extea predictor added to the See cote ‘elowest Patue IEP SL, goto Step #3, others? Steps fovalved in any Mubiple Linear Wegrsig Step #1: Dot Pre Processing = 1. Importing The Litres, 2. Importing the Dat Sc Categorical Date Ne Duy Vane Trp import numpy as mp inoport matplot as mpl from mpl_tootkts plot impor Anes3D linport matpotlib pyplotas pt random_xl = np randomzandi) random _x2 = np random nnd fori in range(n): yappend(random_xt *x1 + nando 2 *2+1) ‘etum np array(x), nparray(y) X y= generate_dataset(200) ‘mpl reParams{tegend fontsize] = 12 fig = ple figure C= figaadd_subplotiprojection 38) label =“ 5°5) i, coe) -y). 201) e039, 02-0999, 5 reas slon=1e-8): def multtincar_regression(e0et X oment_m_coef= np zerostcoet shape) c moment_y, coef np.zeros(confshape) to ‘while Tre: error = mse( coef x, if abseror-prev_error) < epsilon: break prey_ertor= enor grad ~ radienis(coct, x,y) He) sm, goef= bl * m_coet + v.coet= b2* v_ coef ua label-tegtession, 5, colororang yendent variables and singe dependent variable. In Python, the ci implementation of multiple linear regression throug ser cet jg be sed om sf eors, Normal ex repessonis sed © Sapa eles there are 7 assrapsions taken while wsng Linear Rees Line Meds! ; sine inte dt Q + Homascedasticity of Residuals or Equal Variances + Noautocometsin in esis 1 Nunberof observa Geter te the rane of predicts + Each observation is uigve Proce ae dsribted Normally Linear Model According tthe seston, he reltonship between the independent and dependent among themselves, then the data is oblem. Bot wy is this a problem? The ase features even abe wting compen ‘We can sdemify highly corelated features 1 seater plots or heatmap, Homoscedasticty of Residuals or Eq omoscedasty isthe term that sta unsatisfactory roel. (One ean easily et an idea of the hroscedasiciy ofthe residual by pling a cate plot [No Autocorrelation in residuals (One ofthe ertical assumptions of m autocorrelation inthe data. When th are dependeat oa each eer, there is autocorrelation. This factor is visible in the case of stock pices when the price ofa stock is independent ofits previous one. ie variables on graph lke ascatterplot or line plat allows you to check for lations ifany: Number of observations Greater than the number of predictors spe numberof independent variables (preictors) inthe jrstod bythe curse of dimensionality. a ‘would be sega the random cs forthe ange of cach geet dala should it pe pried vals by ie in Problems in Regression Analysis f ar Regression Model mode) by wing conse suc as fate Data Science is an i modelsoltion for. m ata Data Anatysis Collection and Model Model Evaluation User Feeback Visualization Ofrstoment ence Process Flow — Image by Author focus on each ofthe above ph ‘hese parameters inthe dataset. Tis is @ jon problem as our target variable — Chargevingurance cost-—-is oumerc 's begin by loading the dataset and exploring the atbutes (EDA — Exploratory Duta Analysis) 2 sex ¢ bmi ¢ 1.0 female 27.90 male 33.77 male 33.00 -e Dataffame — Image by Author 38 records and 6 features. Smoker, sex, and region are categorical variables and children are numeric 1g Values roportion of missing val 0,sort_values(ascending=Fase) Fr) bri charges region sreker children a om average median charges of higher range of charges. ‘of them have almost the same ress than the percentage of male smokers ercentage of female smoks ents act onthe insurance charges, ‘we ean conclude that ‘smoker’ has considera tender bas the least impact. rrween charges and Let’ creat a hestmap to understand the sent of te coreation Bs a ‘numeric features —age, BMI, and ehldren. sas heatmap(df[[age, bm, ‘children’charges]LeomQ), em87— phshow() ‘Blues, anot~True) a0 02 toss Correlation Map — Image by Author We see that age and BMI have an average +ve correlation with charges Wew row go over the steps of model preparation and model de Feature Encoding 6 locaton SE, encoding for this cotumn, ” many features, I supa the top features, w ° 0 1 ‘evaluate reer score of more than 76% Bene Lets sse how can we make o 3A. Feature Engineering Wees ‘couple of tals. I found that the Polynomial Regression Model mse @ R2score @ Name ¢ MAE ® Fl 5813 0.782 Linear 4238 27408884 0.773 DecisionTrees x model sore by pve: Grouping mg northeast and nonkwest regions mio "ot nd southeast and southwest sre called “more_than_one_cluild” Transforming whieh is "Ye Polynoimiat Regression 83 reeressio® ‘a dependents) and undependent vara Polynomial Regression equation s gwen below bouts bent kis aso called the special case of M terms to the Mi Polynom sear model with some modification inorder to increase the acctracy Iisa. “The dataset used in Polynomial regression for It makes use ofa linear regreston mode! to fit the eomplicated and non-linear functions and datasets Hence, “In Polynomial regression, theo | e937 ag neacy So ifme erly coves ay Heda ports, le Linear Regression equate Yebber tay Ye brtbiat bat brett bate Se bot box + box ae hat bast -ar Regression equat We compare the above th ee equations, w can clearly seth degree of vanbles The lonship between detector regression ms 2 2 3 2 80000 ‘Country Manager 5 aa0000 Region Manager 6 350900 Partner 7 200000 Senior Partner 8 300000 C-level 9 500000 ao a 41000000 ‘Steps for Polynomial Regression: ‘The main steps involved in Polynomial Regression are given below: + Data Pre-processing + Build Polynomial Regression model and Visualize the result for Linear Regression and Polynomial Regression model ‘+ Predicting the output build the Linear regression model a8 between the predictions. And Linear reg Data Pre-processing Step: ‘The data pre-processing step 2 3. import matplotib pyplot as rap 4 Jmport pandas as pa m0 predict dhe output fo evel 6 because th cant as + years erprene {5a regional manager, so he must be somewbereeenee eves 7 and 6 ing the Linear regression model he dts. nbulg polnerist Linear model wsing Hn regs objet of Inthe above code, we cers 300 7) LinearRegression class and Output: a jobseNove, normalize-False) Ova}; LinearRepresson(copy_X=Tre, nicer TOE 8 ust ad some extra entre to the dataset from skleam prepro poly_regs~ PolyonalFeatures(degree=2) ‘x poly= poly tees reg? =LinearRegt Tin_peg.2fltx_ poy, Inthe above lines of code, we have ing the code, we will get another matrix x_poly, which can be seen under the variable explorer option ‘How does a Polynomial Regress that to evolve from linear regression to polynomial higherorder terms ofthe depen known as feature engineering but not ex: + ys the dependent variable, + xisthe independent variable : ate the coe + nis the degree ofthe polynomis yeathxte Here ya dependent ‘ il ee chemical synthesis in ems or nee Slope and eis ci ve anal symbesi the enor tate In he the profucton of ee takes pace in such ‘eases we use a quadratic model Here, + isthe dependent variable on x + sis the y-imercept and isthe enor ate, Ingen value Since items ofunkrown ribs, hence thee models re lence through he Lone eee these modes Sua technique, reponse tenns (quadratic, cubic et), the model can capture the non- 5 specially ifthe Id be chosen based onthe complexity ofthe und eying rel ‘The polynomial repression model strained to find the cefTiiets hati ‘ference benween the predicted vales and thea 3. Once the model ie the 3 vals inthe ining data trained it ean be used to make predictions on new, unseen data. tems observed i the aiing ica and that higher degrees of the gression overtime 5). You suspect thatthe ial might bere capture the slay ro sum of squaed differences bexveen the pr iynomial Regression i ‘To get the Dataset used fo the an por map aap port matpltib pyplot as pl pmport pandas asp Importing the dataset, fe pdLsead_csv(‘data.csv) one tatlons using Python th get varie at fon mode! on the at and nthe target vibes at Regression othe dataset odel impor LincaRegrston 0) momial Regression othe diaset i preprocessing impor PolynamiaFeats Poly = PolynomialFestures(degree-4) = poly fit wansform(X) session resus sing Sater lat. Scamer plot of feature and the target variable Vie setter pot 1 the Polynomial Regression results using # Visualising the Polynomial Regression resus pltscatte(X, y color=blue) | plplot(%, lind predict(paly fit wansform(X), colored’) pltylabel( Pressure) pltshow( Output: lynomial Regression) xlsbel( Temperature’) Implementation of Polynomial Regression ina Numpy results with both Linear and Polynomial Regression, Note thatthe input variable amy. Pret variable 9p aay Regression + Lali reo oo ean cones he Spend ae ae i ig alec, wh A Bivens of inde ee et = le using : a ei) 8 Petics te opty theutome mat began . te gore or danse dependent vay is probabilistic values which tickets 6, Bie the teetssion i Sed for solving the clifeatenpoaes ee, i regression, ins function, which predicts ead of fting a ep wefan taped gc ‘0 maximum vas toe + The curve from the logistic funtion indicates te ico of someting sch as wh ils are cancerous or aoc ihc ormabucden itp ce + Logistic Regression isa sgifsam machine leaning lid bra this ability to provide probabilities and classify new data wing eotinons ead darn: datasets, Regression can be used to classify the observations wing deent pes of. can easily determine the mos effesive variables ued forte clasicatn. ‘The below image is showing th logis function thc ve modeling epee ‘one Tete Note: Logie epession uses the eons of called logistic regression, but is use lasification algorithm. Logistic Function (Sigmoid Function ite vals to sw map pot 1 fantion we 2 mathesatical w 15 The dependent vane asthe catepoicl i MS The independent variable should not have muli-otean Logisuc Regression Ea an be obtained fom te Linee Regression exuation. The we Repression equation ate Bvt Below can be writen a: eerie Thee snycan be benen 0 and Ton s0 fortis ets divide the above Python Implement To understand the implementation of Logistic Regression in Python, we wil use the below example a i i Lune i HH is Regression using Pyon, we will se of Regression. Below ar the steps + Data Pre-processing step + Fiting Logistic Regression to the Training st + Predicting the test + Test accuracy of the resul(Cretion of Confision mati) ing the test set res 1, Data Pre-processing step: In this step, we will pre procesipcepare the daa so tat we can Ibe the same a we have done i Daa pre-processing topic. The code for ths is given below: = wea edie Bente fe pendent vanes fom the BV pendent and dependent Vaal pe wantaccortrestof predictions ent arable ave only On 1 fear Scaling hom scar preprocessing impor StandardSealet ‘The sealed output is ven below the above code, we wil get the below usp: ression(C=1.0, class_eight-None,dual-Fule, fi ntecep- Tra, . ‘warm _stan-False) I ited to the taining se. the raining tt so we wil ow pret hres by wing tests he variable 1 who want to purchase or not “The above output image shows the corresponding predicted users who want to purchase the ea 4. Test Accuracy of the result Fam tty my fincas treet ning set reSUL. To vsslice Ae rou, we tle ibrary. Below is the code fort: a = nm.meshgrid(amarange(star= st ming «1, sop =x se 0 Song a1. sop =x, se, 0] xg) 66. mtp.contouri(x1, x2, classifier prediexfam aray((x el 2 rave) T reshape shape), 7. alpha = 0.75, emap = ListodColormap( (pple eres out Rearesion Tran hee the below pout a i, region Green pits wit and Purple points within the pur! J data points ae the ob x o Jin a two-dimensional ata, ond the y observations, re plotted at ean ar the lasining d As shown in the above image, alt the data obser dhagram where the X-avis represents the mndependent 6 he ttt at ANS represents the tan Thee lane ie drawn 10. $6780 OT Positives and negatives: According to the algorithm, he ohoservations aaa data povnts 208 considered positive, and data pomnts before the Tine te TERE. the line Maximum Likelihood Estimation: Code Example estimation technique using logistic ne. We can quickly implement the mavimum likelihood jement the regression on any classification dataset, Let us try to imp! import pandas as pd import numpy as np import seaborn as sus from skleam.tinear_model import LogisticRegression Ir=LogisticRegression() In.fit(X_train,y_train) Ir_pred=Ir.predict(X_test) sns.regplot( =True, ci Ir_pred'data=df_pred ,logi The above code will fit the logistic regression for the given dataset and 8 ai _ iol for the data representing the distribution of the data and the best fit according to the algorithm. Key Takeaways + Maximum Likelihood is a function that describes the data points and their Hikeliness to the model for best fitting. abilistie methods, where probabilistic rast, the likelihood to the data ‘+ Maximum likelihood is different from the prol methods work on the principle of calculation probabilities. In cont method tries 0 maximize the likelihood of data observations accor distribution. + Maximum likelihood is an approach used for solving the problems like density distribution and is a base for some algorithms like logistic regression. «The approach is very similar and is predominantly known as the perceptron trick in terms of deep learning methods. maximum likelihood estimation, its core intuition, and working mechanism with practical examples associated with some key takeaways. This will help one understand the maximum likelihood better and more deeply and help answer interview questions related to the same very efficiently. 150

You might also like