0% found this document useful (0 votes)
37 views

MEE 6070 Data Science and Analytics: Importing Data Using Plotting The Data Checking For Linearity

Here are the steps to web scrape COVID-19 data from a web resource: 1. Identify the website that hosts the COVID-19 data you want to scrape. Some common sources include: - Johns Hopkins COVID-19 data repository on GitHub - Our World in Data COVID-19 dataset on GitHub - Kaggle COVID-19 datasets 2. Inspect the website code to understand how the data is structured and find the relevant HTML tags containing the data. 3. Use Python libraries like BeautifulSoup and requests to connect to the website and parse the HTML. BeautifulSoup allows you to navigate the HTML tags. 4. Extract the text or attribute values from the tags containing the actual data values like cases, deaths

Uploaded by

Krupa
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

MEE 6070 Data Science and Analytics: Importing Data Using Plotting The Data Checking For Linearity

Here are the steps to web scrape COVID-19 data from a web resource: 1. Identify the website that hosts the COVID-19 data you want to scrape. Some common sources include: - Johns Hopkins COVID-19 data repository on GitHub - Our World in Data COVID-19 dataset on GitHub - Kaggle COVID-19 datasets 2. Inspect the website code to understand how the data is structured and find the relevant HTML tags containing the data. 3. Use Python libraries like BeautifulSoup and requests to connect to the website and parse the HTML. BeautifulSoup allows you to navigate the HTML tags. 4. Extract the text or attribute values from the tags containing the actual data values like cases, deaths

Uploaded by

Krupa
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

MEE 6070 Data Science and Analytics

This example shows the demonstration and utility of PYTHON


Computing Environment of the following using Google Colab

1. Importing data using Pandas

2. Plotting the data Checking for Linearity matplotlib

Example: 1
import pandas as pd

my_dist={'name' : ["a", "b", "c", "d", "e","f", "g"],
                   'age' : [20,27, 35, 55, 18, 21, 35],
                   'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO
", "MD"]}

df=pd.DataFrame(my_dist)
print(df)

Example:2
import pandas as pd
df = pd.DataFrame({'X':[78,85,96,80,86],'Y':[84,94,89,83,86],'Z':
[86,97,96,72,83]}); 
print(df)
# Example Python program to get the contents of a DataFrame as a CSV st
ring

import pandas as pds

# Standard deviations in the thickness of three wooden board variants i
n mm

standardDeviations = {"Wood 1": [0.4, 0.5, 0.3],

                      "Wood 2": [0.1, 0.2, 0.3],

                      "Wood 3": [0.7, 0.6, 0.7]

                      };

          

# Load data into a DataFrame instance

dataFrame = pds.DataFrame(data=standardDeviations, index=("Variant1",

                                                          "Variant2",

                                                          "Variant3"));

# Get the contents of the DataFrame as a CSV file

csv=dataFrame.to_csv();

print("Contents of the DataFrame as a CSV string:")

print(csv);
# Example Python program to write the contents of a DataFrame to a buff
er

import pandas as pds

from io import StringIO

# Closing price of 3 different stocks over 5 trading days

closingPrices = {"Stock1": [34.17, 34.25, 34.2, 34.24, 34.3],

                "Stock2": [10.01, 10.20, 10.1, 10.15, 10.2],

                "Stock3": [41.6, 42.1, 41.89, 42.4, 42.7]

               };

          

# Load data from the Python dictionary into a DataFrame instance

dataFrame = pds.DataFrame(data=closingPrices);

# Create an in-memory text stream

textStream = StringIO();

# Write the DataFrame contents to the text stream's buffer as a CSV

dataFrame.to_csv(textStream);

print("DataFrame as CSV (from the buffer):");

# Print the buffer contents

print(textStream.getvalue());
In the following example, we will use multiple linear regression to predict
the stock index price (i.e., the dependent variable) of a fictitious economy
by using 2 independent/input variables:

 Interest Rate
 Unemployment Rate

Stock_Index_Price = (Intercept) + (Interest_Rate coef)*X1 +


(Unemployment_Rate coef)*X2
Stock_Index_Price = (1798.4040) + (345.5401)*X1 + (-250.1466)*X2

1. Importing and printing data in Python

import pandas as pd

Stock_Market = {'Year':
[2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2
017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,201
6,2016],

'Month': [12,
11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],

'Interest_Rate':
[2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1
.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75],

'Unemployment_Rate':
[5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.
8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1],

'Stock_Index_Price':
[1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1
075,1047,965,943,958,971,949,884,866,876,822,704,719]

df =
pd.DataFrame(Stock_Market,columns=['Year','Month','Intere
st_Rate','Unemployment_Rate','Stock_Index_Price'])

print (df)
Checking for Linearity (Plotting- scatter diagrams)

import pandas as pd
import matplotlib.pyplot as plt
Stock_Market = {'Year': [2017,2017,2017,2017,2017,2017,2017,2017,2017,2
017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,20
16],
                'Month': [12, 11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,
5,4,3,2,1],
                'Interest_Rate': [2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.2
5,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75],
                'Unemployment_Rate': [5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5
.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1],
                'Stock_Index_Price': [1464,1394,1357,1293,1256,1254,123
4,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704
,719]        
                }
df = pd.DataFrame(Stock_Market,columns=['Year','Month','Interest_Rate',
'Unemployment_Rate','Stock_Index_Price'])
plt.scatter(df['Interest_Rate'], df['Stock_Index_Price'], color='red')
plt.title('Stock Index Price Vs Interest Rate', fontsize=14)
plt.xlabel('Interest Rate', fontsize=14)
plt.ylabel('Stock Index Price', fontsize=14)
plt.grid(True)
plt.show()
import pandas as pd
import matplotlib.pyplot as plt
Stock_Market = {'Year':
[2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2
017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,201
6,2016],
'Month': [12,
11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],
'Interest_Rate':
[2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1
.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75],
'Unemployment_Rate':
[5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.
8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1],
'Stock_Index_Price':
[1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1
075,1047,965,943,958,971,949,884,866,876,822,704,719]
}
df =
pd.DataFrame(Stock_Market,columns=['Year','Month','Intere
st_Rate','Unemployment_Rate','Stock_Index_Price'])
plt.scatter(df['Unemployment_Rate'],
df['Stock_Index_Price'], color='green')
plt.title('Stock Index Price Vs Unemployment Rate',
fontsize=14)
plt.xlabel('Unemployment Rate', fontsize=14)
plt.ylabel('Stock Index Price', fontsize=14)
plt.grid(True)
plt.show()
https://www.youtube.com/watch?v=z1pBdycYCGY
Digital Assignment-1

MEE 6070 Data Science and Analytics

Import the following Table in Python using Pandas

Input-1 Input-2 Output1 Output1

2.781084 2.550537 0 Apple

1.465489 2.362125 0 Apple

3.396562 4.400294 0 Apple

1.38807 1.85022 0 Apple

3.064072 3.005306 0 Apple

7.627531 2.759262 1 orange

5.332441 2.088627 1 orange

6.922597 1.771064 1 orange

8.675419 -0.24207 1 orange

7.673756 3.508563 1 orange


Web scrapping of COVID-19 data from a web resource, Github, Kaggle

You might also like