DS Day 5
DS Day 5
Formula:
\[
\]
Example:
python
scaler = MinMaxScaler()
print(scaler.fit_transform(data))
Formula:
\[
z = \frac{x - \mu}{\sigma}
\]
Example:
python
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
print(scaler.fit_transform(data))
Example:
python
encoder = LabelEncoder()
print(encoder.fit_transform(data))
- One-Hot Encoding: Converts categorical values into a series of binary
columns. Each unique value is represented as a binary column with a 1 or
0 indicating the presence or absence of the category.
Example:
python
import numpy as np
encoder = OneHotEncoder(sparse=False)
print(encoder.fit_transform(data))
Feature Engineering
Example:
python
import pandas as pd
print(df)
- Feature Selection: Selecting the most relevant features for a model. This
can be done using techniques such as Recursive Feature Elimination
(RFE), L1 regularization (Lasso), and tree-based feature importance.
python
X, y = make_classification(n_samples=100, n_features=10,
random_state=42)
model = LogisticRegression()
fit = rfe.fit(X, y)
print(fit.support_)
print(fit.ranking_)
Descriptive Statistics
- Median: The middle value when the data points are sorted.
Example:
python
import numpy as np
data = [1, 2, 2, 3, 4]
print(np.mean(data)) Mean
print(np.median(data)) Median
Example:
python
print(np.var(data)) Variance
Data Visualization
- Types of Visualizations:
- Bar Charts: Used for categorical data to show the frequency of different
categories.
- Box Plots: Used to show the distribution of data and identify outliers.
- Scatter Plots: Used to show the relationship between two numerical
variables.
python
import pandas as pd
plt.show()
Data Summarization
python
print(df.corr()) Correlation
print(df.cov()) Covariance
Example:
python
df = pd.DataFrame({
'Values': [1, 2, 3, 4]
})
grouped = df.groupby('Category')
Task 5
12. How can new features be created from existing features? Provide an example.