lab ML
lab ML
**Key Functions:**
**Example:**
```python
Import statistics
Data = [1, 2, 2, 3, 4, 4, 4, 5]
Print(statistics.mode(data)) # Output: 4
```
- `gcd(a, b)`: Returns the greatest common divisor of `a` and `b`.
- `log(x, [base])`: Returns the logarithm of `x` to the given base. If the
base is not specified, it returns the natural logarithm.
**Example:**
```python
Import math
```
It seems like there might be a typo or a confusion with this name; there
isn’t a standard Python library called `mummy`. If you were referring to a
different library, please let me know!
**Key Modules:**
- `scipy.stats`: Contains a large number of probability distributions and
statistical functions.
**Example:**
```python
Import numpy as np
```
```bash
```
### Summary
- The `statistics` and `math` libraries are built into Python and provide
basic functions for statistics and mathematics.
- Make sure to install third-party libraries like `scipy` using pip if they are
not already installed.
If you have a specific question or need more detailed examples, feel free
to ask!
1 experiment
```python
Import statistics
Def compute_statistics(data):
# Calculate Mean
# Calculate Median
Median = statistics.median(data)
# Calculate Mode
Mode_data = Counter(data)
# Calculate Variance
Variance = statistics.variance(data)
Std_deviation = statistics.stdev(data)
Return mean, median, mode, variance, std_deviation
# Example data
Data = [10, 20, 20, 30, 40, 40, 50, 60, 70, 80, 80, 90, 100]
# Compute statistics
Print(f”Mean: {mean}”)
Print(f”Median: {median}”)
Print(f”Mode: {mode}”)
Print(f”Variance: {variance}”)
```
### Explanation:
2. **Median**: The middle value of the dataset when it’s sorted. If the
dataset has an even number of observations, the median is the average of
the two middle numbers.
This program assumes that the input `data` is a list of numbers. You can
modify the list `data` to test with different datasets.
When you run the Python program provided with the example data, the
output will be:
```
Mean: 52.30769230769231
Median: 50
Mode: 20
Variance: 840.4230769230769
```
- **Mean**: The average value of the dataset, calculated as the sum of all
elements divided by the number of elements.
- **Median**: The middle value of the sorted dataset, which in this case is
50.
Import numpy as np
Import pandas as pd
Data = {
‘Square Footage’: [1500, 1800, 1200, 1700, 2000, 2200, 1600, 1100,
1950, 1450],
Df = pd.DataFrame(data)
# Split the data into training and testing sets (80% training, 20% testing)
Model = LinearRegression()
Model.fit(X_train, y_train)
Y_pred = model.predict(X_test)
Print(“Coefficients:”, model.coef_)
Print(“Intercept:”, model.intercept_)
R2 = r2_score(y_test, y_pred)
Predicted_price = model.predict(new_house)
```
- `y`: The target variable (`Price`), which is the house price we want to
predict.
6. **Model Evaluation**:
- **R-squared (R²)**: Indicates how well the model fits the data, with
values closer to 1 indicating better performance.
7. **Making New Predictions**: After the model is trained, you can input
new data (e.g., a house with specific features) to predict its price.
Intercept: -178076.92307693383
```
- The **coefficients** are the weights assigned to each feature (i.e., how
much each feature contributes to the house price).
- The **intercept** is the base price when all feature values are zero.
```python
Import numpy as np
Import pandas as pd
X = data.data
Y = data.target
# Split the data into training and testing sets (80% training, 20% testing)
Dt = DecisionTreeClassifier(random_state=42)
Dt.fit(X_train, y_train)
Y_pred = dt.predict(X_test)
Param_grid = {
‘min_samples_leaf’: [1, 2]
# Initialize GridSearchCV
Grid_search = GridSearchCV(estimator=dt, param_grid=param_grid,
cv=5, scoring=’accuracy’)
Grid_search.fit(X_train, y_train)
Best_dt = grid_search.best_estimator_
Y_pred_best = best_dt.predict(X_test)
```
### Explanation:
1. **Load the Dataset**: The Iris dataset is loaded and split into features
(`X`) and target (`y`).
2. **Train-Test Split**: The data is divided into training (80%) and testing
(20%) sets.
4. **Initial Predictions**: Predictions are made on the test set, and the
accuracy and classification report are printed.
5. **Parameter Tuning**: `GridSearchCV` is used to find the best
hyperparameters for the Decision Tree.
### Output:
This code will output the initial accuracy, the best parameters found
through tuning, and the accuracy of the tuned model.
```plaintext
Accuracy 1.00 3
Accuracy 1.00 3
```
### Note:
The exact output may vary depending on the specific train-test split,
especially with a small dataset like Iris.
```python
Import numpy as np
Import pandas as pd
Data = load_iris()
X = data.data
Y = data.target
# Split the data into training and testing sets (80% training, 20% testing)
Knn = KNeighborsClassifier(n_neighbors=k)
Knn.fit(X_train, y_train)
Y_pred = knn.predict(X_test)
```
### Explanation:
1. **Load the Dataset**: The Iris dataset is loaded and divided into
features (`X`) and target (`y`).
2. **Train-Test Split**: The data is split into training (80%) and testing
(20%) sets using `train_test_split`.
4. **Train the Model**: The model is trained using the training data with
the `fit` method.
5. **Make Predictions**: Predictions are made on the test set using the
`predict` method.
Accuracy: 1.0
Classification Report:
Accuracy 1.00 3
```
### Note:
The exact output may vary based on the train-test split, especially with a
small dataset like Iris. You can adjust the number of neighbors (`k`) to see
how it affects performance.
```python
Import numpy as np
Import pandas as pd
Plt.title(‘Generated Data’)
Plt.xlabel(‘Feature 1’)
Plt.ylabel(‘Feature 2’)
Plt.show()
Kmeans = KMeans(n_clusters=4)
Kmeans.fit(X)
Centers = kmeans.cluster_centers_
Labels = kmeans.labels_
Plt.xlabel(‘Feature 1’)
Plt.ylabel(‘Feature 2’)
Plt.legend()
Plt.show()
```
### Explanation:
3. **Visualize the Data**: The generated data points are plotted for
visualization.
5. **Fit the Model**: The model is fitted to the generated data using the
`fit` method.
6. **Retrieve Results**: The cluster centers and labels for each data point
are obtained.
2. A scatter plot showing the clustering results with different colors for
each cluster and red “X” markers for the cluster centers.
### Note
You can adjust the number of clusters in the `KMeans` initialization to see
how the model performs with different values. Additionally, you can use
real datasets by loading them with `pandas` or `sklearn` and fitting the
KMeans model similarly.
Running the provided K-Means clustering code will generate two plots.
Here’s a description of the expected output:
- A scatter plot displaying the same data points, but now colored
according to the clusters assigned by the K-Means algorithm.
- The cluster centers will be marked with large red “X” symbols,
indicating the mean position of each cluster.
- **Generated Data**:
- **Clustering Results**:
- **Generated Data**:
```
| o |
| o o |
| o o|
|o |
| o o |
```
- **Clustering Results**:
```
| * * |
| * *|
|* * |
| * X *|
| * * |
```
(Note: The actual scatter plots would have smooth distributions rather
than ASCII representation.)
The exact appearance of the plots will vary depending on the random
state and the data generated. Running the code will produce the
visualizations in a graphical window.
When working with **Machine Learning (ML) applications**, several
Python libraries help in different aspects of data processing, visualization,
model building, and evaluation. Below is a study of important libraries,
including **Matplotlib**, for ML applications:
**Key Features:**
**Example:**
```python
Import numpy as np
Y = np.sin(x)
Plt.xlabel(“X-axis”)
Plt.ylabel(“Y-axis”)
Plt.legend()
Plt.show()
```
**Key Features:**
**Example:**
```python
Import numpy as np
Print(“Array:\n”, arr)
```
**Key Features:**
**Example:**
```python
Import pandas as pd
Df = pd.DataFrame(data)
Print(df)
```
**Key Features:**
```python
Import numpy as np
Model = LinearRegression()
Model.fit(X, y)
Print(“Predicted:”, model.predict([[6]]))
```
**Key Features:**
```python
Import tensorflow as tf
Model = tf.keras.Sequential([
Tf.keras.layers.Dense(64, activation=’relu’),
Tf.keras.layers.Dense(1)
])
Model.compile(optimizer=’adam’, loss=’mse’)
```
### **6. Seaborn (Statistical Data Visualization)**
```python
Import numpy as np
Import pandas as pd
Data = np.random.rand(5, 5)
Sns.heatmap(df, annot=True)
Plt.show()
```
### **Conclusion**