0% found this document useful (0 votes)
6 views

LAB01

Lab01

Uploaded by

nhut.an41004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

LAB01

Lab01

Uploaded by

nhut.an41004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Project Project 1 (Stock Project 2 (Weather Project 3 (Healthcare

Index
Requirements Market Analysis) Data Analysis) Data Analysis)

Business Analyze historical Analyze patient data to


Analyze stock prices
Understanding weather data to predict identify trends and
1 and predict future
and Analytic future weather patterns in health
movements.
Approach conditions. conditions.

- Describe the - Describe the - Describe the


collected relevant data, collected relevant data, collected relevant data,
Data Collection, describe the data, describe the data, describe the data,
2 Understanding, preprocess the data. preprocess the data. preprocess the data.
Preparation - Need to describe the - Need to describe the - Need to describe the
data before and after data before and after data before and after
preprocessing. preprocessing. preprocessing.

- Import data into the


- Import data into the
- Import data into the database management
database management
database management system (online,
system (online,
system (online, offline).
offline).
Data Analysis offline). - Execute query
3 - Execute query
with SQL - Execute query commands in SQL
commands in SQL
commands in SQL language to identify
language to understand
language to understand trends and patterns in
weather, the factors
the stock market. health conditions of
affecting weather.
patient.

Perform data analysis Perform data analysis Perform data analysis


in Python language in Python language in Python language
using libraries using libraries using libraries
dedicated to data dedicated to data dedicated to data
analysis. analysis. analysis.
Data Analysis
4 Use online platforms Use online platforms Use online platforms
with Python
(Jupyter Notebook, (Jupyter Notebook, (Jupyter Notebook,
Google Colab, offline Google Colab, offline Google Colab, offline
platforms (PyCharm, platforms (PyCharm, platforms (PyCharm,
Visual Studio)) to Visual Studio)) to Visual Studio)) to
analyze data. analyze data. analyze data.

5 Data Visualization Visualize data to Visualize data to Visualize data to


understand stock understand historical identify trends and
market factors weather data and patterns in health
Project Project 1 (Stock Project 2 (Weather Project 3 (Healthcare
Index
Requirements Market Analysis) Data Analysis) Data Analysis)

factors affecting
affecting stock prices. conditions.
weather.

Use regression Use regression


Use regression analysis
analysis to analyze analysis to analyze
Regression to analyze factors
6 factors affecting stock factors affecting
Analysis affecting patients,
price, predict future weather, predict future
predict future data.
data. data.

Use analysis tools such Use analysis tools such Use analysis tools such
as Google Looker, as Google Looker, as Google Looker,
Data Analysis
7 Power BI to visualize Power BI to visualize Power BI to visualize
with Tool
data and perform data and perform data and perform
regression analysis. regression analysis. regression analysis.

Step 1: Collect Data from Kaggle


1. Create a Kaggle Account:
o Go to Kaggle and create an account or log in if you already have one.

2. Find a Dataset:
o Use the search bar on the Kaggle homepage to find relevant datasets for your
project.
 Stock Market Analysis: Search for terms like "stock market prices,"
"historical stock data," or "S&P 500 data."
 Weather Data Analysis: Search for "weather data," "historical weather,"
or "climate data."
 Healthcare Data Analysis: Search for "healthcare datasets," "patient
data," or "hospital data."
3. Download the Dataset:
o Navigate to the dataset page, click the "Download" button, and choose the CSV
format if available.
4. Install Kaggle API (Optional for Jupyter/Colab):
o If you want to download data directly into your Jupyter notebook or Google
Colab:
 Install the Kaggle API using this command:
!pip install kaggle
 Authenticate by uploading your Kaggle API token (download from your
Kaggle account settings).
!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
 Download the dataset directly:
!kaggle datasets download -d <dataset-name>

Project 1: Stock Market Analysis


Step 2: Data Collection and Preparation
 After downloading a stock market dataset from Kaggle (for example, Historical Stock
Market Dataset), follow these steps:
1. Load the Data into Python:
import pandas as pd
# Load CSV file
stock_data = pd.read_csv('path_to_stock_data.csv')
2. Data Exploration:
o Inspect the first few rows of the dataset:

stock_data.head()
o Check for missing values:

stock_data.isnull().sum()
3. Data Cleaning:
o Remove or fill missing values:

stock_data = stock_data.dropna() # or fill missing values


o Convert Date column to datetime:

stock_data['Date'] = pd.to_datetime(stock_data['Date'])
o Filter relevant columns:

stock_data = stock_data[['Date', 'Open', 'Close', 'High', 'Low', 'Volume']]


Step 3: Data Analysis
1. Basic Statistics:
stock_data.describe()
2. Calculate Moving Average:
stock_data['Moving_Avg'] = stock_data['Close'].rolling(window=20).mean()
3. Visualization of Stock Prices:
import matplotlib.pyplot as plt

plt.plot(stock_data['Date'], stock_data['Close'], label='Close Price')


plt.plot(stock_data['Date'], stock_data['Moving_Avg'], label='20 Day Moving Avg')
plt.legend()
plt.show()
Step 4: Regression Analysis
1. Train a Linear Regression Model:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

X = stock_data[['Open', 'High', 'Low', 'Volume']]


y = stock_data['Close']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LinearRegression().fit(X_train, y_train)


predictions = model.predict(X_test)

Project 2: Weather Data Analysis


Step 2: Data Collection and Preparation
 Download a weather dataset from Kaggle (for example, Historical Hourly Weather Data):
1. Load the Data into Python:
weather_data = pd.read_csv('path_to_weather_data.csv')
2. Data Exploration:
o Inspect the dataset:

weather_data.head()
o Check for missing values:

weather_data.isnull().sum()
3. Data Cleaning:
o Handle missing values:

weather_data.fillna(weather_data.mean(), inplace=True)
o Convert Date column to datetime:

weather_data['Date'] = pd.to_datetime(weather_data['Date'])
Step 3: Data Analysis
1. Basic Statistics:
weather_data.describe()
2. Visualization of Weather Trends:
plt.plot(weather_data['Date'], weather_data['Temperature'], label='Temperature')
plt.show()
3. Correlation Matrix:
weather_data.corr()
Step 4: Regression Analysis
1. Train a Regression Model:
X = weather_data[['Humidity', 'Pressure', 'Wind Speed']]
y = weather_data['Temperature']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LinearRegression().fit(X_train, y_train)


predictions = model.predict(X_test)

Project 3: Healthcare Data Analysis


Step 2: Data Collection and Preparation
 Download healthcare-related data from Kaggle (for example, Heart Disease UCI):
1. Load the Data into Python:
health_data = pd.read_csv('path_to_health_data.csv')
2. Data Exploration:
o Inspect the dataset:

health_data.head()
o Check for missing values:

health_data.isnull().sum()
3. Data Cleaning:
o Handle missing values and outliers.

Step 3: Data Analysis


1. Basic Statistics:
health_data.describe()
2. Visualization of Patient Data:
health_data['Diagnosis'].value_counts().plot(kind='bar')
plt.show()
3. Correlation Matrix:
health_data.corr()
Step 4: Regression Analysis
1. Train a Regression Model:
python
Copy code
X = health_data[['Age', 'Cholesterol', 'Blood Pressure']]
y = health_data['Outcome']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LinearRegression().fit(X_train, y_train)


predictions = model.predict(X_test)

Step 5: Data Visualization with Google Looker or Power BI


1. Load CSV Data into Power BI:
o Open Power BI, click on "Get Data," and select your CSV file.

o Create visualizations such as line charts for stock prices, weather trends, or patient
data.
2. Create Dashboards and Reports:
o Use the drag-and-drop interface to build interactive reports.

o For regression analysis, visualize predicted trends and compare them to actual
data.

You might also like