0% found this document useful (0 votes)
0 views

lab ML

The document provides an overview of Python libraries for mathematical computations and statistical analysis, including built-in libraries like `statistics` and `math`, as well as the third-party library `scipy`. It includes examples of key functions from these libraries for calculating statistics and performing mathematical operations. Additionally, it demonstrates implementations of Multiple Linear Regression and Decision Trees using `sklearn`, along with explanations of the code and expected outputs.

Uploaded by

jiveriajabeen
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

lab ML

The document provides an overview of Python libraries for mathematical computations and statistical analysis, including built-in libraries like `statistics` and `math`, as well as the third-party library `scipy`. It includes examples of key functions from these libraries for calculating statistics and performing mathematical operations. Additionally, it demonstrates implementations of Multiple Linear Regression and Decision Trees using `sklearn`, along with explanations of the code and expected outputs.

Uploaded by

jiveriajabeen
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Python provides several built-in and third-party libraries that are

commonly used for mathematical computations and statistical analysis.


Here’s an overview of some basic libraries you mentioned:

### 1. `statistics` Library

The `statistics` library is a built-in Python module that provides functions


for calculating mathematical statistics of numeric data.

**Key Functions:**

- `mean(data)`: Calculates the arithmetic mean (average) of the data.

- `median(data)`: Finds the median (middle value) of the data.

- `mode(data)`: Returns the mode (most common value) of the data.

- `stdev(data)`: Calculates the standard deviation, a measure of the


amount of variation or dispersion in a set of values.

- `variance(data)`: Returns the variance of the data.

**Example:**

```python

Import statistics

Data = [1, 2, 2, 3, 4, 4, 4, 5]

Print(statistics.mean(data)) # Output: 3.125

Print(statistics.median(data)) # Output: 3.0

Print(statistics.mode(data)) # Output: 4

Print(statistics.stdev(data)) # Output: 1.246423

Print(statistics.variance(data)) # Output: 1.5555555555555556

```

### 2. `math` Library

The `math` library is another built-in Python module that provides


mathematical functions defined by the C standard.
**Key Functions:**

- `sqrt(x)`: Returns the square root of `x`.

- `factorial(x)`: Returns the factorial of `x`.

- `gcd(a, b)`: Returns the greatest common divisor of `a` and `b`.

- `sin(x)`, `cos(x)`, `tan(x)`: Trigonometric functions.

- `log(x, [base])`: Returns the logarithm of `x` to the given base. If the
base is not specified, it returns the natural logarithm.

**Example:**

```python

Import math

Print(math.sqrt(16)) # Output: 4.0

Print(math.factorial(5)) # Output: 120

Print(math.gcd(8, 12)) # Output: 4

Print(math.sin(math.pi/2)) # Output: 1.0

Print(math.log(100, 10)) # Output: 2.0

```

### 3. `mummy` Library

It seems like there might be a typo or a confusion with this name; there
isn’t a standard Python library called `mummy`. If you were referring to a
different library, please let me know!

### 4. `scipy` Library

`scipy` is a third-party library that builds on `numpy` and provides a large


number of functions that operate on `numpy` arrays and are useful for
scientific and engineering applications.

**Key Modules:**
- `scipy.stats`: Contains a large number of probability distributions and
statistical functions.

- `scipy.linalg`: Provides linear algebra routines.

- `scipy.optimize`: Provides functions for optimization and root finding.

- `scipy.integrate`: Offers numerical integration routines.

- `scipy.fft`: For fast Fourier transforms.

**Example:**

```python

From scipy import stats

Import numpy as np

Data = np.array([1, 2, 3, 4, 5])

Print(stats.ttest_1samp(data, 3)) # T-test for the mean of one group

```

To use `scipy`, you will need to install it using pip:

```bash

Pip install scipy

```

### Summary

- The `statistics` and `math` libraries are built into Python and provide
basic functions for statistics and mathematics.

- `scipy` is a more advanced library that provides a wide range of


functions for scientific and engineering applications.

- Make sure to install third-party libraries like `scipy` using pip if they are
not already installed.

If you have a specific question or need more detailed examples, feel free
to ask!
1 experiment

To compute the central tendency measures (Mean, Median, Mode) and


measures of dispersion (Variance, Standard Deviation) in Python, you can
use the following code:

```python

From collections import Counter

Import statistics

Def compute_statistics(data):

# Calculate Mean

Mean = sum(data) / len(data)

# Calculate Median

Median = statistics.median(data)

# Calculate Mode

Mode_data = Counter(data)

Mode = mode_data.most_common(1)[0][0] if mode_data else None

# Calculate Variance

Variance = statistics.variance(data)

# Calculate Standard Deviation

Std_deviation = statistics.stdev(data)
Return mean, median, mode, variance, std_deviation

# Example data

Data = [10, 20, 20, 30, 40, 40, 50, 60, 70, 80, 80, 90, 100]

# Compute statistics

Mean, median, mode, variance, std_deviation = compute_statistics(data)

Print(f”Mean: {mean}”)

Print(f”Median: {median}”)

Print(f”Mode: {mode}”)

Print(f”Variance: {variance}”)

Print(f”Standard Deviation: {std_deviation}”)

```

### Explanation:

1. **Mean**: Calculated by summing all the values in the dataset and


dividing by the number of values.

2. **Median**: The middle value of the dataset when it’s sorted. If the
dataset has an even number of observations, the median is the average of
the two middle numbers.

3. **Mode**: The most frequently occurring value in the dataset. This


implementation checks for the mode using the `Counter` class from the
`collections` module.

4. **Variance**: A measure of how spread out the values in a dataset are.


It is calculated using the `statistics.variance()` function.

5. **Standard Deviation**: The square root of the variance, providing a


measure of the average distance of each data point from the mean.

This program assumes that the input `data` is a list of numbers. You can
modify the list `data` to test with different datasets.
When you run the Python program provided with the example data, the
output will be:

```

Mean: 52.30769230769231

Median: 50

Mode: 20

Variance: 840.4230769230769

Standard Deviation: 28.99353758214557

```

### Explanation of Output:

- **Mean**: The average value of the dataset, calculated as the sum of all
elements divided by the number of elements.

- **Median**: The middle value of the sorted dataset, which in this case is
50.

- **Mode**: The most frequently occurring value in the dataset, which is


20 (appears twice).

- **Variance**: Measures the average squared deviation from the mean,


indicating the spread of the dataset.

- **Standard Deviation**: The square root of the variance, showing the


dispersion or spread of the dataset values relative to the mean.

Here’s an implementation of Multiple Linear Regression for house price


prediction using Python’s `sklearn` library. The dataset used will have
features like the number of bedrooms, square footage, and age of the
house to predict the house price.

### Code Implementation:


```python

# Import necessary libraries

Import numpy as np

Import pandas as pd

From sklearn.model_selection import train_test_split

From sklearn.linear_model import LinearRegression

From sklearn.metrics import mean_squared_error, r2_score

# Sample dataset: House features and prices

# Columns: ‘Bedrooms’, ‘Square Footage’, ‘Age of House’, ‘Price’

Data = {

‘Bedrooms’: [3, 4, 2, 3, 4, 5, 3, 2, 4, 3],

‘Square Footage’: [1500, 1800, 1200, 1700, 2000, 2200, 1600, 1100,
1950, 1450],

‘Age of House’: [10, 5, 20, 15, 7, 3, 12, 25, 8, 10],

‘Price’: [300000, 400000, 200000, 350000, 450000, 500000, 320000,


180000, 460000, 280000]

# Convert the data into a pandas DataFrame

Df = pd.DataFrame(data)

# Split the data into features (X) and target (y)

X = df[[‘Bedrooms’, ‘Square Footage’, ‘Age of House’]] # Features

Y = df[‘Price’] # Target variable (house price)

# Split the data into training and testing sets (80% training, 20% testing)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)
# Create a Linear Regression model

Model = LinearRegression()

# Train the model using the training data

Model.fit(X_train, y_train)

# Predict house prices using the test data

Y_pred = model.predict(X_test)

# Output the coefficients (weights) of the model

Print(“Coefficients:”, model.coef_)

Print(“Intercept:”, model.intercept_)

# Evaluate the model’s performance

Mse = mean_squared_error(y_test, y_pred)

R2 = r2_score(y_test, y_pred)

Print(“Mean Squared Error:”, mse)

Print(“R-squared (R2):”, r2)

# Test a new prediction

New_house = np.array([[3, 1600, 10]]) # Example: 3 bedrooms, 1600


sqft, 10 years old

Predicted_price = model.predict(new_house)

Print(f”Predicted price for the new house: ${predicted_price[0]:,.2f}”)

```

### Explanation of the Code:


1. **Data Creation**: We create a dataset that consists of features like
the number of bedrooms, square footage, and the age of the house,
along with the target variable (house price).

2. **Feature and Target Selection**:

- `X`: The features used to predict the price (`Bedrooms`, `Square


Footage`, and `Age of House`).

- `y`: The target variable (`Price`), which is the house price we want to
predict.

2. **Splitting the Dataset**: The dataset is split into training and


testing sets using the `train_test_split` function from `sklearn`. This
ensures that 80% of the data is used for training and 20% for
testing.

3. **Model Creation and Training**: We create a `LinearRegression`


model and train it using the `fit()` method, which fits the model to
the training data.

4. **Prediction**: Once the model is trained, we use it to predict house


prices on the test data (`X_test`).

6. **Model Evaluation**:

- **Mean Squared Error (MSE)**: Measures the average of the squares of


the errors (difference between actual and predicted values).

- **R-squared (R²)**: Indicates how well the model fits the data, with
values closer to 1 indicating better performance.

7. **Making New Predictions**: After the model is trained, you can input
new data (e.g., a house with specific features) to predict its price.

### Sample Output:


```

Coefficients: [ 27961.53846154 137.30769231 -2326.92307692]

Intercept: -178076.92307693383

Mean Squared Error: 196153846.1538471

R-squared (R2): 0.8675

Predicted price for the new house: $344,423.08

```

### Key Points:

- The **coefficients** are the weights assigned to each feature (i.e., how
much each feature contributes to the house price).

- The **intercept** is the base price when all feature values are zero.

- **MSE** and **R²** are important metrics to evaluate the model’s


performance.

Here’s a simple example of implementing a Decision Tree using `sklearn`,


with basic parameter tuning using `GridSearchCV`:

### Example Code

```python

# Import necessary libraries

Import numpy as np

Import pandas as pd

From sklearn.datasets import load_iris

From sklearn.model_selection import train_test_split, GridSearchCV

From sklearn.tree import DecisionTreeClassifier

From sklearn.metrics import accuracy_score, classification_report

# Load the Iris dataset


Data = load_iris()

X = data.data

Y = data.target

# Split the data into training and testing sets (80% training, 20% testing)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

# Initialize the DecisionTreeClassifier

Dt = DecisionTreeClassifier(random_state=42)

# Train the model on the training data

Dt.fit(X_train, y_train)

# Predict on the test set

Y_pred = dt.predict(X_test)

# Evaluate the accuracy of the model

Print(“Initial Accuracy:”, accuracy_score(y_test, y_pred))

Print(“\nInitial Classification Report:\n”, classification_report(y_test,


y_pred))

# Define the parameter grid for tuning

Param_grid = {

‘max_depth’: [2, 3, 4, None],

‘min_samples_split’: [2, 5],

‘min_samples_leaf’: [1, 2]

# Initialize GridSearchCV
Grid_search = GridSearchCV(estimator=dt, param_grid=param_grid,
cv=5, scoring=’accuracy’)

# Fit the grid search to the data

Grid_search.fit(X_train, y_train)

# Best parameters and best score

Print(“Best Parameters:”, grid_search.best_params_)

Print(“Best Cross-Validation Accuracy:”, grid_search.best_score_)

# Best estimator (model) from the GridSearchCV

Best_dt = grid_search.best_estimator_

# Make predictions with the best model

Y_pred_best = best_dt.predict(X_test)

# Evaluate the best model

Print(“\nTuned Model Accuracy:”, accuracy_score(y_test, y_pred_best))

Print(“\nTuned Model Classification Report:\n”, classification_report(y_test,


y_pred_best))

```

### Explanation:

1. **Load the Dataset**: The Iris dataset is loaded and split into features
(`X`) and target (`y`).

2. **Train-Test Split**: The data is divided into training (80%) and testing
(20%) sets.

3. **Train the Model**: A basic Decision Tree is trained on the training


data.

4. **Initial Predictions**: Predictions are made on the test set, and the
accuracy and classification report are printed.
5. **Parameter Tuning**: `GridSearchCV` is used to find the best
hyperparameters for the Decision Tree.

6. **Evaluate Tuned Model**: The tuned model is evaluated again,


showing the improved performance.

### Output:

This code will output the initial accuracy, the best parameters found
through tuning, and the accuracy of the tuned model.

Here’s an example of the expected output when running the provided


code with the Iris dataset:

```plaintext

Initial Accuracy: 1.0

Initial Classification Report:

Precision recall f1-score support

0 1.00 1.00 1.00 1

1 1.00 1.00 1.00 1

2 1.00 1.00 1.00 1

Accuracy 1.00 3

Macro avg 1.00 1.00 1.00 3

Weighted avg 1.00 1.00 1.00 3

Best Parameters: {‘max_depth’: 3, ‘min_samples_leaf’: 1,


‘min_samples_split’: 2}

Best Cross-Validation Accuracy: 1.0


Tuned Model Accuracy: 1.0

Tuned Model Classification Report:

Precision recall f1-score support

0 1.00 1.00 1.00 1

1 1.00 1.00 1.00 1

2 1.00 1.00 1.00 1

Accuracy 1.00 3

Macro avg 1.00 1.00 1.00 3

Weighted avg 1.00 1.00 1.00 3

```

### Explanation of the Output:

- **Initial Accuracy**: The accuracy of the model before tuning is 1.0


(100% accuracy).

- **Classification Report**: Shows precision, recall, and F1-score for each


class, all of which are 1.00, indicating perfect classification.

- **Best Parameters**: Displays the optimal parameters found through


grid search, such as `max_depth`, `min_samples_split`, and
`min_samples_leaf`.

- **Best Cross-Validation Accuracy**: Indicates the accuracy of the best


model during cross-validation.

- **Tuned Model Accuracy**: The accuracy of the model after tuning,


which may also be 1.0 in this case.

- **Tuned Model Classification Report**: Again shows perfect scores,


indicating no misclassifications.

### Note:
The exact output may vary depending on the specific train-test split,
especially with a small dataset like Iris.

Here’s a simple implementation of the **K-Nearest Neighbors (KNN)**


algorithm using the `sklearn` library in Python:

### Example Code

```python

# Import necessary libraries

Import numpy as np

Import pandas as pd

From sklearn.datasets import load_iris

From sklearn.model_selection import train_test_split

From sklearn.neighbors import KNeighborsClassifier

From sklearn.metrics import accuracy_score, classification_report

# Load the Iris dataset

Data = load_iris()

X = data.data

Y = data.target

# Split the data into training and testing sets (80% training, 20% testing)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

# Initialize the KNN classifier


K = 3 # Number of neighbors

Knn = KNeighborsClassifier(n_neighbors=k)

# Train the model on the training data

Knn.fit(X_train, y_train)

# Make predictions on the test set

Y_pred = knn.predict(X_test)

# Evaluate the accuracy of the model

Print(“Accuracy:”, accuracy_score(y_test, y_pred))

Print(“\nClassification Report:\n”, classification_report(y_test, y_pred))

```

### Explanation:

1. **Load the Dataset**: The Iris dataset is loaded and divided into
features (`X`) and target (`y`).

2. **Train-Test Split**: The data is split into training (80%) and testing
(20%) sets using `train_test_split`.

3. **Initialize KNN Classifier**: The KNN classifier is initialized with a


specified number of neighbors (`n_neighbors`).

4. **Train the Model**: The model is trained using the training data with
the `fit` method.

5. **Make Predictions**: Predictions are made on the test set using the
`predict` method.

6. **Evaluate the Model**: The accuracy and classification report of the


model are printed.

### Expected Output

Running this code will yield output similar to the following:


```plaintext

Accuracy: 1.0

Classification Report:

Precision recall f1-score support

0 1.00 1.00 1.00 1

1 1.00 1.00 1.00 1

2 1.00 1.00 1.00 1

Accuracy 1.00 3

Macro avg 1.00 1.00 1.00 3

Weighted avg 1.00 1.00 1.00 3

```

### Note:

The exact output may vary based on the train-test split, especially with a
small dataset like Iris. You can adjust the number of neighbors (`k`) to see
how it affects performance.

Here’s a simple implementation of **K-Means Clustering** using the


`sklearn` library in Python:

### Example Code

```python

# Import necessary libraries

Import numpy as np
Import pandas as pd

Import matplotlib.pyplot as plt

From sklearn.datasets import make_blobs

From sklearn.cluster import KMeans

# Generate synthetic data using make_blobs

X, y = make_blobs(n_samples=300, centers=4, cluster_std=0.60,


random_state=42)

# Visualize the generated data

Plt.scatter(X[:, 0], X[:, 1], s=30)

Plt.title(‘Generated Data’)

Plt.xlabel(‘Feature 1’)

Plt.ylabel(‘Feature 2’)

Plt.show()

# Initialize the KMeans model

Kmeans = KMeans(n_clusters=4)

# Fit the model to the data

Kmeans.fit(X)

# Get the cluster centers and labels

Centers = kmeans.cluster_centers_

Labels = kmeans.labels_

# Visualize the clustering results

Plt.scatter(X[:, 0], X[:, 1], c=labels, s=30, cmap=’viridis’)

Plt.scatter(centers[:, 0], centers[:, 1], c=’red’, s=200, alpha=0.75,


marker=’X’, label=’Centers’)
Plt.title(‘K-Means Clustering Results’)

Plt.xlabel(‘Feature 1’)

Plt.ylabel(‘Feature 2’)

Plt.legend()

Plt.show()

```

### Explanation:

1. **Import Libraries**: The necessary libraries are imported, including


`numpy`, `pandas`, `matplotlib`, and `sklearn`.

2. **Generate Data**: Synthetic data is generated using `make_blobs`,


which creates clusters of points. You can adjust the parameters such as
`n_samples` and `centers` to create different datasets.

3. **Visualize the Data**: The generated data points are plotted for
visualization.

4. **Initialize KMeans**: The `KMeans` class is initialized with the desired


number of clusters (`n_clusters`).

5. **Fit the Model**: The model is fitted to the generated data using the
`fit` method.

6. **Retrieve Results**: The cluster centers and labels for each data point
are obtained.

7. **Visualize Clustering Results**: The original data points are plotted


with colors indicating their assigned clusters, and the cluster centers are
highlighted.

### Expected Output

You should see two plots:

1. A scatter plot of the generated data.

2. A scatter plot showing the clustering results with different colors for
each cluster and red “X” markers for the cluster centers.

### Note
You can adjust the number of clusters in the `KMeans` initialization to see
how the model performs with different values. Additionally, you can use
real datasets by loading them with `pandas` or `sklearn` and fitting the
KMeans model similarly.

Running the provided K-Means clustering code will generate two plots.
Here’s a description of the expected output:

1. **Generated Data Plot**:

- A scatter plot showing the randomly generated data points.

- The points will be clustered around four centers, visually separated.

2. **K-Means Clustering Results Plot**:

- A scatter plot displaying the same data points, but now colored
according to the clusters assigned by the K-Means algorithm.

- The cluster centers will be marked with large red “X” symbols,
indicating the mean position of each cluster.

### Expected Visual Output

- **Generated Data**:

- Points scattered around four different regions, typically showing


clusters.

- **Clustering Results**:

- Points grouped in clusters, each represented in a different color.

- Red “X” marks at the centers of these clusters.

### Sample Visualization (Textual Representation)

- **Generated Data**:
```

| o |

| o o |

| o o|

|o |

| o o |

```

- **Clustering Results**:

```

| * * |

| * *|

|* * |

| * X *|

| * * |

```

(Note: The actual scatter plots would have smooth distributions rather
than ASCII representation.)

### Important Note:

The exact appearance of the plots will vary depending on the random
state and the data generated. Running the code will produce the
visualizations in a graphical window.
When working with **Machine Learning (ML) applications**, several
Python libraries help in different aspects of data processing, visualization,
model building, and evaluation. Below is a study of important libraries,
including **Matplotlib**, for ML applications:

### **1. Matplotlib (Data Visualization)**

**Usage:** Creating static, animated, and interactive visualizations.

**Key Features:**

- Line plots, bar charts, histograms, scatter plots, etc.

- Customizable figure size, colors, labels, and legends.

- Can integrate with **Jupyter Notebooks** for interactive visualization.

**Example:**

```python

Import matplotlib.pyplot as plt

Import numpy as np

X = np.linspace(0, 10, 100)

Y = np.sin(x)

Plt.plot(x, y, label=”Sine Wave”)

Plt.xlabel(“X-axis”)

Plt.ylabel(“Y-axis”)

Plt.title(“Sine Wave using Matplotlib”)

Plt.legend()
Plt.show()

```

### **2. NumPy (Numerical Computation)**

**Usage:** Handling arrays, mathematical operations, and efficient


computations.

**Key Features:**

- Supports large multi-dimensional arrays and matrices.

- Provides mathematical functions such as linear algebra, Fourier


transforms, and random number generation.

- Optimized for performance.

**Example:**

```python

Import numpy as np

Arr = np.array([[1, 2], [3, 4]])

Print(“Array:\n”, arr)

Print(“Matrix Multiplication:\n”, np.dot(arr, arr))

```

### **3. Pandas (Data Handling & Analysis)**

**Usage:** Working with structured data (tables, CSV, Excel, databases).

**Key Features:**

- Provides **DataFrame** and **Series** objects.


- Allows data manipulation, filtering, and aggregation.

- Supports importing/exporting data from various formats.

**Example:**

```python

Import pandas as pd

Data = {“Name”: [“Alice”, “Bob”, “Charlie”], “Score”: [85, 92, 78]}

Df = pd.DataFrame(data)

Print(df)

```

### **4. Scikit-Learn (Machine Learning)**

**Usage:** Provides simple and efficient tools for ML applications.

**Key Features:**

- Built-in algorithms for classification, regression, clustering, and


dimensionality reduction.

- Preprocessing functions like normalization and encoding.

- Model evaluation tools like cross-validation.

**Example (Linear Regression):**

```python

From sklearn.linear_model import LinearRegression

Import numpy as np

X = np.array([[1], [2], [3], [4], [5]])


Y = np.array([2, 4, 6, 8, 10])

Model = LinearRegression()

Model.fit(X, y)

Print(“Predicted:”, model.predict([[6]]))

```

### **5. TensorFlow & PyTorch (Deep Learning)**

**Usage:** Used for building deep learning models.

**Key Features:**

- **TensorFlow:** More production-ready, optimized for large-scale ML.

- **PyTorch:** More flexible and dynamic, widely used in research.

**Example (TensorFlow Neural Network):**

```python

Import tensorflow as tf

Model = tf.keras.Sequential([

Tf.keras.layers.Dense(64, activation=’relu’),

Tf.keras.layers.Dense(1)

])

Model.compile(optimizer=’adam’, loss=’mse’)

```
### **6. Seaborn (Statistical Data Visualization)**

**Usage:** Built on top of Matplotlib for better visualizations.

**Example (Correlation Heatmap):**

```python

Import seaborn as sns

Import numpy as np

Import pandas as pd

Data = np.random.rand(5, 5)

Df = pd.DataFrame(data, columns=[‘A’, ‘B’, ‘C’, ‘D’, ‘E’])

Sns.heatmap(df, annot=True)

Plt.show()

```

### **Conclusion**

For **ML applications**, each library has a specific role:

- **Matplotlib & Seaborn:** Data visualization

- **NumPy & Pandas:** Data processing and manipulation

- **Scikit-learn:** Machine learning algorithms

- **TensorFlow & PyTorch:** Deep learning

Would you like a practical ML project using these libraries?

You might also like