0% found this document useful (0 votes)

37 views4 pages

???? ?????????? ????

The document outlines the process for retrieving and analyzing earthquake data using the USGS API. It details steps for obtaining historical and daily data, flattening the data using Pyspark or Cloud Dataflow, and loading it into BigQuery for further analysis. Additionally, it specifies the required transformations and analysis tasks to be performed on the earthquake data.

Uploaded by

Satyajit Ligade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views4 pages

???? ?????????? ????

Uploaded by

Satyajit Ligade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

𝗘𝗮𝗿𝘁𝗵𝗾𝘂𝗮𝗸𝗲 𝗗𝗮𝘁𝗮 𝗨𝗦𝗚𝗦 API Data

1. Refer to the below Site for the API data schema

[Link]

2. Refer below steps to get data from the source.

a. First, you have to get historical data [Last month's data]
[Link]
b. After that, you have to get data for each day[Below URL will pull data for the
past day]
[Link]

3. Once you get this data from the source, please perform the below steps
a. First, get data from the source
b. Using Spark flatten all the columns from the source.
i. Flatten column names, IF you are having nested columns make them
unnest it.
Example:
“test”:”ha”,
“Feature”:[
{“Type”:”abc”
“Name”:”abc”},
{“Type”:”pqr”
“Name”:”pqr”}
]

After flattening the above JSON File, I should get the below columns in
my target table.
test, feature_type, feature_name

c. and store them in the target location. Please refer target location
earthquakeanalysis/raw/<date in YYYYMMDD>/<target file>.parquet

d. In target data, you have the URL at the location,

[Link]

Base url: [Link]

End point url: earthquakes/feed/v1.0/detail/<id>.geojson

Please refer below screenshot, for more details,

e. At this detail: location you have a URL , you have to pick this URL and pull data
for this URL using rest API.
f. Using Pyspark you have to flatten all the columns in the data and store it in the
below location
earthquakeanalysis/raw/<date in YYYYMMDD>/<ids>_<target file>.parquet
g. Above highlighted yellow “ids” value you will get from the same URL or from
the previously copied data.

4. Once this is done. To generate Analysis Layer Questions will be shared with you.

Step 1: API Request

There are two scenarios
1. Using Pyspark - Dataproc or Databricks - python - request lib.
2. Using Cloud Dataflow - python request lib
Landing Location: gs://earthquake_analysis/pyspark/landing/20241019/*.json
gs://earthquake_analysis/dataflow/landing/20241019/*.json

Step 2. Flattening the data

1. Using Pyspark
2. Using Cloud dataflow

- While doing flattening also do below transformation

- Columns like “time”, “updated” - convert its value from epoch to timestamp
- Generate column “area” - based on existing “place” column

Silver Location: gs://earthquake_analysis/Silver/20241019/*.json

Flatten historical and daily data based on below example:

"mag": 0.89,
"place": "6 km NW of The Geysers, CA",
"time": 1729308248850,
"updated": 1729308343908,
"tz": null,
"url": "[Link]
"detail": "[Link]
"felt": null,
"cdi": null,
"mmi": null,
"alert": null,
"status": "automatic",
"tsunami": 0,
"sig": 12,
"net": "nc",
"code": "75076006",
"ids": ",nc75076006,",
"sources": ",nc,",
"types": ",nearby-cities,origin,phase-data,",
"nst": 9,
"dmin": 0.01303,
"rms": 0.02,
"gap": 77,
"magType": "md",
"type": "earthquake",
"title": "M 0.9 - 6 km NW of The Geysers, CA",
"geometry": {
"longtitude":-122.813163757324,
"latitude":38.8125,
"depth": 3.25999999046326
}

Step 3: Load data into Bigquery

- Add two extra columns
- 1. Insert data : insert_dt (Timestamp)

BQ Table: earthquake_db.earthquake_data

Do below Analysis using Pyspark and BigQuery

1. Count the number of earthquakes by region
2. Find the average magnitude by the region
3. Find how many earthquakes happen on the same day.
4. Find how many earthquakes happen on same day and in same region
5. Find average earthquakes happen on the same day.
6. Find average earthquakes happen on same day and in same region
7. Find the region name, which had the highest magnitude earthquake last week.
8. Find the region name, which is having magnitudes higher than 5.
9. Find out the regions which are having the highest frequency and intensity of
earthquakes.

Cloud Composer
Historical load - Manual and its going to be one time activity
Daily Load -
- Ingestion - transformation - Bq load

Earthquake API Queries in JSON Format
0% (1)
Earthquake API Queries in JSON Format
6 pages
Mapping Global Data Sets - Json
100% (1)
Mapping Global Data Sets - Json
15 pages
GY4005 Earthquakes Exercise
No ratings yet
GY4005 Earthquakes Exercise
14 pages
Phase 1
No ratings yet
Phase 1
13 pages
Seismic Analysis With Python
No ratings yet
Seismic Analysis With Python
12 pages
Module 4-Geopandas
No ratings yet
Module 4-Geopandas
35 pages
Introduction To Geopandas
No ratings yet
Introduction To Geopandas
14 pages
Geospatial Analysis for Earthquake Impact
No ratings yet
Geospatial Analysis for Earthquake Impact
4 pages
Access Satellite Imagery with Python
No ratings yet
Access Satellite Imagery with Python
6 pages
Earthquake Data Parsing and API Use
No ratings yet
Earthquake Data Parsing and API Use
17 pages
12 Gmt2 Gridding Processing
No ratings yet
12 Gmt2 Gridding Processing
27 pages
An Introduction To GeoPandas. Everything You Need To Get You Started - by Mark Friese - Aug, 2022 - Medium
No ratings yet
An Introduction To GeoPandas. Everything You Need To Get You Started - by Mark Friese - Aug, 2022 - Medium
9 pages
Geopandas Documentation: Release 0.2.0.dev
No ratings yet
Geopandas Documentation: Release 0.2.0.dev
45 pages
Earthquake Data Analysis Project
No ratings yet
Earthquake Data Analysis Project
20 pages
Phase 2
No ratings yet
Phase 2
16 pages
Vector Data in Geospatial Science
No ratings yet
Vector Data in Geospatial Science
27 pages
Poster PaperSample 4 Pages Sample1
No ratings yet
Poster PaperSample 4 Pages Sample1
4 pages
Earthquake Prediction
No ratings yet
Earthquake Prediction
10 pages
Week 3 Assignment Instructions
No ratings yet
Week 3 Assignment Instructions
6 pages
Declustring-vanStiphout Et Al
No ratings yet
Declustring-vanStiphout Et Al
26 pages
Geo Processing
No ratings yet
Geo Processing
3 pages
317 4441 1 PB
No ratings yet
317 4441 1 PB
7 pages
Cloud Dataproc Spark Guide
No ratings yet
Cloud Dataproc Spark Guide
4 pages
GIS F2E Python - Features To Edge List in Python - Installation and Tutorial
No ratings yet
GIS F2E Python - Features To Edge List in Python - Installation and Tutorial
5 pages
Seismic Data Generation System Overview
No ratings yet
Seismic Data Generation System Overview
9 pages
Bridging Earth Science and Citizens
No ratings yet
Bridging Earth Science and Citizens
32 pages
Data Processing in Google Earth Engine
No ratings yet
Data Processing in Google Earth Engine
21 pages
Proactive Disaster Detection
No ratings yet
Proactive Disaster Detection
5 pages
Gerald Corzo 5/27/2020: Google Colab Platform 2 Reading Data 2
No ratings yet
Gerald Corzo 5/27/2020: Google Colab Platform 2 Reading Data 2
11 pages
Towards A Framework For Offering Remote Sensing Data in An Analysis-Ready Format
No ratings yet
Towards A Framework For Offering Remote Sensing Data in An Analysis-Ready Format
4 pages
Where Can Buy Machine Learning On Geographical Data Using Python 1st Edition Joos Korstanje Ebook With Cheap Price
100% (6)
Where Can Buy Machine Learning On Geographical Data Using Python 1st Edition Joos Korstanje Ebook With Cheap Price
66 pages
MOSDAC Data Extraction Guide
No ratings yet
MOSDAC Data Extraction Guide
4 pages
Remote Sensing: An Overview of Platforms For Big Earth Observation Data Management and Analysis
No ratings yet
Remote Sensing: An Overview of Platforms For Big Earth Observation Data Management and Analysis
25 pages
GIS 4653/5653: Spatial Programming and GIS
No ratings yet
GIS 4653/5653: Spatial Programming and GIS
86 pages
PDC Case Study 637,440,636
No ratings yet
PDC Case Study 637,440,636
22 pages
IJSRDV9I10257
No ratings yet
IJSRDV9I10257
4 pages
Scripting GeoServer
No ratings yet
Scripting GeoServer
28 pages
Gis Final Module
No ratings yet
Gis Final Module
10 pages
Free Ebook Opensource Geospatial Data
No ratings yet
Free Ebook Opensource Geospatial Data
58 pages
AI Phase5
No ratings yet
AI Phase5
11 pages
JGR Solid Earth - 2022 - Aden Antoniów - An Adaptable Random Forest Model For The Declustering of Earthquake Catalogs
No ratings yet
JGR Solid Earth - 2022 - Aden Antoniów - An Adaptable Random Forest Model For The Declustering of Earthquake Catalogs
19 pages
GeoEvent & GeoAnalytics: Big Data Store
No ratings yet
GeoEvent & GeoAnalytics: Big Data Store
60 pages
Dash Visualization Examples
No ratings yet
Dash Visualization Examples
1 page
1822 B.E Cse Batchno 306
No ratings yet
1822 B.E Cse Batchno 306
131 pages
Introduction to GEOquery Package
No ratings yet
Introduction to GEOquery Package
15 pages
Suicide Rates Analysis
No ratings yet
Suicide Rates Analysis
22 pages
Geopandas 50 Exercises
No ratings yet
Geopandas 50 Exercises
2 pages
Geohazards 03 00011 v2
No ratings yet
Geohazards 03 00011 v2
28 pages
Final Questions (GIS)
No ratings yet
Final Questions (GIS)
3 pages
Seismic Attributes User
No ratings yet
Seismic Attributes User
20 pages
A Beginners Guide To Geospatial Data Analysis
No ratings yet
A Beginners Guide To Geospatial Data Analysis
11 pages
Jmastats
No ratings yet
Jmastats
15 pages
Introduction GIS March 17 Workshop
No ratings yet
Introduction GIS March 17 Workshop
44 pages
ULFEM Time-Series-Analysis Package: Open-File Report 2013-1285
No ratings yet
ULFEM Time-Series-Analysis Package: Open-File Report 2013-1285
327 pages
LP DAAC - Getting Started With GEDI L2A Version 2 Data in Python
No ratings yet
LP DAAC - Getting Started With GEDI L2A Version 2 Data in Python
35 pages
Slido 2
No ratings yet
Slido 2
39 pages
Data Cleaning and Analysis for Fire Incidents
No ratings yet
Data Cleaning and Analysis for Fire Incidents
8 pages
v3 GCP Service Wise Interview Questions
No ratings yet
v3 GCP Service Wise Interview Questions
62 pages
Vedant Int Ques Till Now
No ratings yet
Vedant Int Ques Till Now
2 pages
Pyspark 1
No ratings yet
Pyspark 1
7 pages
Spark Theory
No ratings yet
Spark Theory
26 pages
PySpark Interview Questions
No ratings yet
PySpark Interview Questions
2 pages
TrakSYS X Training Course Lab Manual - D2
No ratings yet
TrakSYS X Training Course Lab Manual - D2
16 pages
Cse 480 Exam 2016 Sample Exam
No ratings yet
Cse 480 Exam 2016 Sample Exam
17 pages
Gaps in Rural Waste Tracking Recovery
No ratings yet
Gaps in Rural Waste Tracking Recovery
4 pages
Sports Shop MIS Development
No ratings yet
Sports Shop MIS Development
75 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
PLSQL 14 1 SG
No ratings yet
PLSQL 14 1 SG
33 pages
Azure Database Migration Guide
No ratings yet
Azure Database Migration Guide
1 page
Test Traj
No ratings yet
Test Traj
7 pages
Database Integrity & Security
No ratings yet
Database Integrity & Security
31 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
22 pages
Advanced Retrieval-Augmented Generation (RAG) With LangChain, LangGraph, and AI Agents - by Manoj Mukherjee - Oct, 2024 - Medium
No ratings yet
Advanced Retrieval-Augmented Generation (RAG) With LangChain, LangGraph, and AI Agents - by Manoj Mukherjee - Oct, 2024 - Medium
15 pages
Computer (Pooja Sri S) (2.0)
No ratings yet
Computer (Pooja Sri S) (2.0)
25 pages
Homework Help Simplifying Rational Expressions
100% (5)
Homework Help Simplifying Rational Expressions
11 pages
Sri Venkateswara University:: Tirupati
No ratings yet
Sri Venkateswara University:: Tirupati
9 pages
Flutter Firestore Subcollections Guide
No ratings yet
Flutter Firestore Subcollections Guide
19 pages
Excel VBA Reference
No ratings yet
Excel VBA Reference
99 pages
Spark
No ratings yet
Spark
51 pages
Library Management System
No ratings yet
Library Management System
12 pages
Prasanna Tanneeru: Professional Summary
No ratings yet
Prasanna Tanneeru: Professional Summary
10 pages
SQL Object Types and Collections Guide
100% (1)
SQL Object Types and Collections Guide
70 pages
Week 1 - Introduction To Database 2025
No ratings yet
Week 1 - Introduction To Database 2025
38 pages
ShaileshBansal Resume v5
No ratings yet
ShaileshBansal Resume v5
3 pages
A Comprehensive Survey On Artificial Intelligence and Machine Learning Techniques
No ratings yet
A Comprehensive Survey On Artificial Intelligence and Machine Learning Techniques
7 pages
Lecture 2. RDB and SQL
No ratings yet
Lecture 2. RDB and SQL
75 pages
Master Software Architecture Pragmatic
100% (3)
Master Software Architecture Pragmatic
400 pages
Final Documentation SDP
No ratings yet
Final Documentation SDP
77 pages
Tour Travel Compressed
No ratings yet
Tour Travel Compressed
73 pages
Chapter 9 Starting With Libre Office Base (Edited)
No ratings yet
Chapter 9 Starting With Libre Office Base (Edited)
6 pages
Senior Business Analyst Profile
No ratings yet
Senior Business Analyst Profile
6 pages
Computer Science Roadmap - Curriculum For The Self Taught Developer
No ratings yet
Computer Science Roadmap - Curriculum For The Self Taught Developer
1 page

???? ?????????? ????

Uploaded by

???? ?????????? ????

Uploaded by

𝗘𝗮𝗿𝘁𝗵𝗾𝘂𝗮𝗸𝗲 𝗗𝗮𝘁𝗮 𝗨𝗦𝗚𝗦 API Data

1. Refer to the below Site for the API data schema

2. Refer below steps to get data from the source.

d. In target data, you have the URL at the location,

Base url: [Link]

Please refer below screenshot, for more details,

Step 1: API Request

Step 2. Flattening the data

- While doing flattening also do below transformation

Silver Location: gs://earthquake_analysis/Silver/20241019/*.json

Flatten historical and daily data based on below example:

Step 3: Load data into Bigquery

Do below Analysis using Pyspark and BigQuery

You might also like