0% found this document useful (0 votes)
32 views3 pages

Data Analysis with Pandas & Matplotlib

sdadg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views3 pages

Data Analysis with Pandas & Matplotlib

sdadg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

To load a dataset from a CSV file using Pandas, you'll need to ensure that the file exists

in the specified directory. Here's a complete example that demonstrates how to load the
dataset, perform some basic operations, and visualize the data using Matplotlib.

Let's assume that `[Link]` contains columns `YearsExperience` and `Salary`.


Step-by-Step Example

#### 1. Import Libraries


import numpy as np
import pandas as pd
import [Link] as plt

#### 2. Load the Dataset

Make sure `[Link]` is in the same directory as your script, or provide the full path to
the file.

# Load the dataset


dataset = pd.read_csv('[Link]')

# Display the first few rows of the dataset


print([Link]())
or
[Link]() ( also tail , info , shape , size , describe)

#### 3. Explore the Dataset

# Display basic information about the dataset


print([Link]())

# Display summary statistics


print([Link]())

#### 4. Visualize the Data

Create a scatter plot to visualize the relationship between `YearsExperience` and


`Salary`.

# Scatter plot of YearsExperience vs Salary

[Link](dataset['YearsExperience'], dataset['Salary'], color='blue')


# Adding title and labels
[Link]('Years of Experience vs Salary')
[Link]('Years of Experience')
[Link]('Salary')

# Display the plot


[Link]()

#### 5. Perform Regression Analysis

Let's perform a simple linear regression to predict Salary based on Years of Experience.

from sklearn.model_selection import train_test_split


// sklearn.model_selection is used to split your dataset into training and testing sets//

from sklearn.linear_model import LinearRegression


// LinearRegression to perform a linear regression analysis on a dataset, split the data into
training and testing sets, train the model, make predictions, and evaluate the model.
from [Link] import mean_squared_error, r2_score

//The mean_squared_error and r2_score functions from [Link] are used to


evaluate the performance of a regression model.

 Mean Squared Error (MSE): Measures the average squared difference between
the actual and predicted values. Lower values are better.
 R-squared (R²) score: Represents the proportion of variance in the dependent
variable that is predictable from the independent variable(s). Higher values
(closer to 1) are better.

# Define the features (X) and target (y)

X = dataset[['YearsExperience']]
y = dataset['Salary']

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

//  X: The feature(s) of the dataset. In this case, it is YearsExperience.


 y: The target variable. In this case, it is Salary.
 test_size=0.2: 20% of the data will be used as the test set.
 random_state=42: Ensures reproducibility of the split. Using the same random state
will always produce the same split.
# Create a Linear Regression model to Train a Linear Regression Mode
model = LinearRegression()

# Train the model


[Link](X_train, y_train)

# Make predictions on the test set


y_pred = [Link](X_test)

# Evaluate the print('Mean Squared Error:', mse)


print('R-squared:', r2)
model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Plot the regression line


[Link](X, y, color='blue')
[Link](X, [Link](X), color='red', linewidth=2)

# Adding title and labels


[Link]('Years of Experience vs Salary (with Regression Line)')
[Link]('Years of Experience')
[Link]('Salary')

# Display the plot


[Link]()
```

You might also like