Mean Encoding - Machine Learning Last Updated : 03 Jun, 2020 Comments Improve Suggest changes Like Article Like Report During Feature Engineering the task of converting categorical features into numerical is called Encoding. There are various ways to handle categorical features like OneHotEncoding and LabelEncoding, FrequencyEncoding or replacing by categorical features by their count. In similar way we can uses MeanEncoding. Created a DataFrame having two features named subjects and Target and we can see that here one of the features (SubjectName) is Categorical, so we have converted it into the numerical feature by applying Mean Encoding. Code: Python3 # importing libraries import pandas as pd # creating dataset data={'SubjectName':['s1','s2','s3','s1','s4','s3','s2','s1','s2','s4','s1'], 'Target':[1,0,1,1,1,0,0,1,1,1,0]} df = pd.DataFrame(data) print(df) Output: SubjectName Target 0 s1 1 1 s2 0 2 s3 1 3 s1 1 4 s4 1 5 s3 0 6 s2 0 7 s1 1 8 s2 1 9 s4 1 10 s1 0 Code : Counting every datapoints in SubjectName Python3 df.groupby(['SubjectName'])['Target'].count() Output: subjectName s1 4 s2 3 s3 2 s4 2 Name: Target, dtype: int64 Code: groupby data with SubjectName with their mean according to their positive target value Python3 df.groupby(['SubjectName'])['Target'].mean() Output: subjectName s1 0.750000 s2 0.333333 s3 0.500000 s4 1.000000 Name: Target, dtype: float64 The output shows the mean mapped with data point in SubjectName with their positive target value (1-positive and 0-Negative). Code : Finally assigning the mean value and map with df['SubjectName'] Python3 Mean_encoded_subject = df.groupby(['SubjectName'])['Target'].mean().to_dict() df['SubjectName'] = df['SubjectName'].map(Mean_encoded_subject) print(df) Output : Mean Encoded Data SubjectName Target 0 0.750000 1 1 0.333333 0 2 0.500000 1 3 0.750000 1 4 1.000000 1 5 0.500000 0 6 0.333333 0 7 0.750000 1 8 0.333333 1 9 1.000000 1 10 0.750000 0 Pros of MeanEncoding: Capture information within the label, therefore rendering more predictive features Creates a monotonic relationship between the variable and the target Cons of MeanEncodig: It may cause over-fitting in the model. Comment More infoAdvertise with us Next Article Machine Learning Models V Vikash_Kumar_Chaurasia Follow Improve Article Tags : Machine Learning python Practice Tags : Machine Learningpython Similar Reads Maths for Machine Learning Mathematics is the foundation of machine learning. Math concepts plays a crucial role in understanding how models learn from data and optimizing their performance. Before diving into machine learning algorithms, it's important to familiarize yourself with foundational topics, like Statistics, Probab 5 min read Machine Learning Models Machine Learning models are very powerful resources that automate multiple tasks and make them more accurate and efficient. ML handles new data and scales the growing demand for technology with valuable insight. It improves the performance over time. This cutting-edge technology has various benefits 14 min read Machine Learning Models Machine Learning models are very powerful resources that automate multiple tasks and make them more accurate and efficient. ML handles new data and scales the growing demand for technology with valuable insight. It improves the performance over time. This cutting-edge technology has various benefits 14 min read 50 Machine Learning Terms Explained Machine Learning has become an integral part of modern technology, driving advancements in everything from personalized recommendations to autonomous systems. As the field evolves rapidly, itâs essential to grasp the foundational terms and concepts that underpin machine learning systems. Understandi 8 min read Statistics For Machine Learning Machine Learning Statistics: In the field of machine learning (ML), statistics plays a pivotal role in extracting meaningful insights from data to make informed decisions. Statistics provides the foundation upon which various ML algorithms are built, enabling the analysis, interpretation, and predic 7 min read How does Machine Learning Works? Machine Learning is a subset of Artificial Intelligence that uses datasets to gain insights from it and predict future values. It uses a systematic approach to achieve its goal going through various steps such as data collection, preprocessing, modeling, training, tuning, evaluation, visualization, 7 min read Like