0% found this document useful (0 votes)
16 views

LAB01

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

LAB01

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Project 1 (Stock Project 2 Project 3

Inde Project
Market (Weather Data (Healthcare
x Requirements
Analysis) Analysis) Data Analysis)

Analyze historical Analyze patient


Business Analyze stock
weather data to data to identify
Understandin prices and predict
1 predict future trends and
g and Analytic future
weather patterns in health
Approach movements.
conditions. conditions.

- Describe the - Describe the - Describe the


collected relevant collected relevant collected relevant
Data data, describe the data, describe the data, describe the
Collection, data, preprocess data, preprocess data, preprocess
2 Understandin the data. the data. the data.
g, - Need to - Need to - Need to describe
Preparation describe the data describe the data the data before
before and after before and after and after
preprocessing. preprocessing. preprocessing.

- Import data into - Import data into


- Import data into the database the database
the database management management
management system (online, system (online,
system (online, offline). offline).
Data Analysis offline). - Execute query - Execute query
3
with SQL - Execute query commands in SQL commands in SQL
commands in SQL language to language to
language to understand identify trends
understand the weather, the and patterns in
stock market. factors affecting health conditions
weather. of patient.

4 Data Analysis Perform data Perform data Perform data


with Python analysis in Python analysis in Python analysis in Python
language using language using language using
libraries libraries libraries
dedicated to data dedicated to data dedicated to data
analysis. analysis. analysis.
Use online Use online Use online
platforms (Jupyter platforms (Jupyter platforms (Jupyter
Project 1 (Stock Project 2 Project 3
Inde Project
Market (Weather Data (Healthcare
x Requirements
Analysis) Analysis) Data Analysis)

Notebook, Google Notebook, Google Notebook, Google


Colab, offline Colab, offline Colab, offline
platforms platforms platforms
(PyCharm, Visual (PyCharm, Visual (PyCharm, Visual
Studio)) to Studio)) to Studio)) to
analyze data. analyze data. analyze data.

Visualize data to Visualize data to


Visualize data to
understand stock understand
Data identify trends
5 market factors historical weather
Visualization and patterns in
affecting stock data and factors
health conditions.
prices. affecting weather.

Use regression Use regression Use regression


analysis to analysis to analysis to
Regression analyze factors analyze factors analyze factors
6
Analysis affecting stock affecting affecting patients,
price, predict weather, predict predict future
future data. future data. data.

Use analysis tools Use analysis tools Use analysis tools


such as Google such as Google such as Google
Looker, Power BI Looker, Power BI Looker, Power BI
Data Analysis
7 to visualize data to visualize data to visualize data
with Tool
and perform and perform and perform
regression regression regression
analysis. analysis. analysis.

Step 1: Collect Data from Kaggle

1. Create a Kaggle Account:

o Go to Kaggle and create an account or log in if you already have


one.

2. Find a Dataset:

o Use the search bar on the Kaggle homepage to find relevant


datasets for your project.
 Stock Market Analysis: Search for terms like "stock
market prices," "historical stock data," or "S&P 500 data."

 Weather Data Analysis: Search for "weather data,"


"historical weather," or "climate data."

 Healthcare Data Analysis: Search for "healthcare


datasets," "patient data," or "hospital data."

3. Download the Dataset:

o Navigate to the dataset page, click the "Download" button, and


choose the CSV format if available.

4. Install Kaggle API (Optional for Jupyter/Colab):

o If you want to download data directly into your Jupyter notebook


or Google Colab:

 Install the Kaggle API using this command:

!pip install kaggle

 Authenticate by uploading your Kaggle API token


(download from your Kaggle account settings).

!mkdir ~/.kaggle

!cp kaggle.json ~/.kaggle/

!chmod 600 ~/.kaggle/kaggle.json

 Download the dataset directly:

!kaggle datasets download -d <dataset-name>

Project 1: Stock Market Analysis

Step 2: Data Collection and Preparation

 After downloading a stock market dataset from Kaggle (for example,


Historical Stock Market Dataset), follow these steps:

1. Load the Data into Python:

import pandas as pd

# Load CSV file


stock_data = pd.read_csv('path_to_stock_data.csv')

2. Data Exploration:

o Inspect the first few rows of the dataset:

stock_data.head()

o Check for missing values:

stock_data.isnull().sum()

3. Data Cleaning:

o Remove or fill missing values:

stock_data = stock_data.dropna() # or fill missing values

o Convert Date column to datetime:

stock_data['Date'] = pd.to_datetime(stock_data['Date'])

o Filter relevant columns:

stock_data = stock_data[['Date', 'Open', 'Close', 'High', 'Low', 'Volume']]

Step 3: Data Analysis

1. Basic Statistics:

stock_data.describe()

2. Calculate Moving Average:

stock_data['Moving_Avg'] = stock_data['Close'].rolling(window=20).mean()

3. Visualization of Stock Prices:

import matplotlib.pyplot as plt

plt.plot(stock_data['Date'], stock_data['Close'], label='Close Price')

plt.plot(stock_data['Date'], stock_data['Moving_Avg'], label='20 Day Moving


Avg')

plt.legend()

plt.show()

Step 4: Regression Analysis


1. Train a Linear Regression Model:

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

X = stock_data[['Open', 'High', 'Low', 'Volume']]

y = stock_data['Close']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

model = LinearRegression().fit(X_train, y_train)

predictions = model.predict(X_test)

Project 2: Weather Data Analysis

Step 2: Data Collection and Preparation

 Download a weather dataset from Kaggle (for example, Historical


Hourly Weather Data):

1. Load the Data into Python:

weather_data = pd.read_csv('path_to_weather_data.csv')

2. Data Exploration:

o Inspect the dataset:

weather_data.head()

o Check for missing values:

weather_data.isnull().sum()

3. Data Cleaning:

o Handle missing values:

weather_data.fillna(weather_data.mean(), inplace=True)
o Convert Date column to datetime:

weather_data['Date'] = pd.to_datetime(weather_data['Date'])

Step 3: Data Analysis

1. Basic Statistics:

weather_data.describe()

2. Visualization of Weather Trends:

plt.plot(weather_data['Date'], weather_data['Temperature'],
label='Temperature')

plt.show()

3. Correlation Matrix:

weather_data.corr()

Step 4: Regression Analysis

1. Train a Regression Model:

X = weather_data[['Humidity', 'Pressure', 'Wind Speed']]

y = weather_data['Temperature']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

model = LinearRegression().fit(X_train, y_train)

predictions = model.predict(X_test)

Project 3: Healthcare Data Analysis

Step 2: Data Collection and Preparation

 Download healthcare-related data from Kaggle (for example, Heart


Disease UCI):

1. Load the Data into Python:

health_data = pd.read_csv('path_to_health_data.csv')
2. Data Exploration:

o Inspect the dataset:

health_data.head()

o Check for missing values:

health_data.isnull().sum()

3. Data Cleaning:

o Handle missing values and outliers.

Step 3: Data Analysis

1. Basic Statistics:

health_data.describe()

2. Visualization of Patient Data:

health_data['Diagnosis'].value_counts().plot(kind='bar')

plt.show()

3. Correlation Matrix:

health_data.corr()

Step 4: Regression Analysis

1. Train a Regression Model:

python

Copy code

X = health_data[['Age', 'Cholesterol', 'Blood Pressure']]

y = health_data['Outcome']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

model = LinearRegression().fit(X_train, y_train)

predictions = model.predict(X_test)
Step 5: Data Visualization with Google Looker or Power BI

1. Load CSV Data into Power BI:

o Open Power BI, click on "Get Data," and select your CSV file.

o Create visualizations such as line charts for stock prices, weather


trends, or patient data.

2. Create Dashboards and Reports:

o Use the drag-and-drop interface to build interactive reports.

o For regression analysis, visualize predicted trends and compare


them to actual data.

You might also like