featureselection
featureselection
data
# Drop the "Front Door Color" column as it doesn't add value to our prediction
data = data.drop(columns=['Front Door Color'])
1 of 11 11/11/2024, 4:46 PM
Untitled12.ipynb - Colab https://colab.research.google.com/drive/1y_9mSzRweIDs2RQA9Y...
0 1500 3 10 5 300000
1 1600 3 15 4 320000
2 1700 4 20 6 350000
3 1800 4 25 3 370000
4 1900 5 30 2 400000
data
170: This is the mean (average) of the distribution. The generated numbers will center around 170.
10: This is the standard deviation (spread) of the distribution. It determines how much the numbers
100: This is the number of values to generate. In this case, it will create an array of 100 random n
2 of 11 11/11/2024, 4:46 PM
Untitled12.ipynb - Colab https://colab.research.google.com/drive/1y_9mSzRweIDs2RQA9Y...
# Apply PCA
pca = PCA(n_components=1)
data_transformed = pca.fit_transform(data)
3 of 11 11/11/2024, 4:46 PM
Untitled12.ipynb - Colab https://colab.research.google.com/drive/1y_9mSzRweIDs2RQA9Y...
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-1-509fb8b12387> in <cell line: 2>()
1 # Apply PCA
----> 2 pca = PCA(n_components=1)
3 data_transformed = pca.fit_transform(data)
4
5 # Print the transformed data (principal component)
# Sample documents
documents = [
"The cat is on the table",
"The dog is under the table",
"Cats and dogs are friends",
"Dogs run and cats jump"
]
Feature Names: ['and' 'are' 'cat' 'cats' 'dog' 'dogs' 'friends' 'is' 'jump' 'on' 'run'
'table' 'the' 'under']
4 of 11 11/11/2024, 4:46 PM
Untitled12.ipynb - Colab https://colab.research.google.com/drive/1y_9mSzRweIDs2RQA9Y...
Hours Studied
Attendance Rate
Participation in Class
Previous Grades
Extra-Curricular Activities
5 of 11 11/11/2024, 4:46 PM
Untitled12.ipynb - Colab https://colab.research.google.com/drive/1y_9mSzRweIDs2RQA9Y...
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
6 of 11 11/11/2024, 4:46 PM
Untitled12.ipynb - Colab https://colab.research.google.com/drive/1y_9mSzRweIDs2RQA9Y...
# Find the feature that gives the best improvement in model performance
best_feature = max(scores, key=scores.get) # Select the feature with the highest score
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
7 of 11 11/11/2024, 4:46 PM
Untitled12.ipynb - Colab https://colab.research.google.com/drive/1y_9mSzRweIDs2RQA9Y...
while remaining_features:: Start a loop that runs as long as there are still features we haven’t
scores = {}: Initialize an empty dictionary to keep track of model scores for each feature we test.
8 of 11 11/11/2024, 4:46 PM
Untitled12.ipynb - Colab https://colab.research.google.com/drive/1y_9mSzRweIDs2RQA9Y...
current_features = selected_features + [feature]: Add the current feature to the selected list t
X_subset = X[current_features]: Create X_subset, which includes only the selected features.
import numpy as np
9 of 11 11/11/2024, 4:46 PM
Untitled12.ipynb - Colab https://colab.research.google.com/drive/1y_9mSzRweIDs2RQA9Y...
np.random.seed(0)
np.random.choice([0, 1], size=100) simulates flipping a coin 100 times. Here, 0 represents Tails and
We count how many times 1 (Heads) and 0 (Tails) occur in the array.
By dividing the count of each outcome by the total number of flips, we get the empirical probability
import numpy as np
# Calculate correlation
correlation = np.corrcoef(study_time, test_scores)[0 1]
10 of 11 11/11/2024, 4:46 PM
Untitled12.ipynb - Colab https://colab.research.google.com/drive/1y_9mSzRweIDs2RQA9Y...
11 of 11 11/11/2024, 4:46 PM