0% found this document useful (0 votes)
45 views11 pages

IBDT Project 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views11 pages

IBDT Project 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

FINAL PROJECT:

CLOUD BASED
PROTOTYPE

Introduction to Big Data


Modern Technologies course

22.04.2024 1
FINAL
PROJECT
Pieces of the project

Concept of your final projects


0 Datasets for the projects

Introduction to Yandex.Cloud platform


1 First steps with data for the projects

Work with the simple frameworks to deal with data TODAY’S


2 Dashboards in the cloud (DataLens) SEMINAR

Final project presentation


3 Final QA session
DataSphere demo (free time)

22.04.2024 2
FINAL
PROJECT
Prerequisites for the start

Class 6
Your object storage (bucket)
with data
Enter ID of GSOM Service account to work with Ask me if it is
Federation services not working
bpfq2tge5rupggql4gmq Your project selected …and…
Your GSOM account to Data for the project is Check the
enter processed permissions!
DataLens connection to your
data is tested
Class 7

22.04.2024 3
FINAL
PROJECT
Pipeline S3-YQL: data visualization

Steps:
1. Write YQL (SQL) queries for the EDA
Options market data (Exploratory Data Analysis)
2. Create connection with Yandex.DataLens for
further visualizations
YQL 3. Create a dataset(s) within Yandex.DataLens
for the charts and diagrams
TODAY’S 4. Create a dashboard for the final project’s
SEMINAR presentation

22.04.2024 4
FINAL
PROJECT
Pipeline S3-ClickHouse-Datalens: data visualization

Steps:
1. Verify the data in ClickHouse or PostgreSQL
JupyterHub logs 2. Create connection with Yandex.DataLens for
further visualizations
3. Verify the structure of the data (data model)
for the datasets
4. (optionally) Create new tables in the
database to speed up visualizations
TODAY’S 5. Create a dataset(s) within Yandex.DataLens
SEMINAR for the charts and diagrams
6. Create a dashboard for the final project’s
presentation

22.04.2024 5
FINAL PROJECT:
PRESENTATION
How it will be organized

You will: I will:


(1) …have 10 minutes for your project’s (1) …listen your presentation
presentation carefully
(2) …show architecture of your solution and (2) …ask you a few questions (short
motivation why do you think this QA session about 5 minutes)
approach works for your data (3) …provide you with some
(3) …demonstrate your data processing comments (if it will be
pipeline necessary)
(4) …show your dashboard(s) with EDA and (4) …grade your project (50 points
insights maximum)

22.04.2024 6
FINAL PROJECT:
PRESENTATION
My expectations / 1

1 Tools / Frameworks 3 Motivation to use selected


Cloud vs Local frameworks for your data
Pros and Cons for the tools Data structure and volume
Alternative tools Users’ requirements
Architecture of Skills required
your solution
2 Data model
Data schema 4 Data pipeline
Normalization Data preprocessing
Data types Data transformation steps
Data formats Connection to dashboards

22.04.2024 7
FINAL PROJECT:
PRESENTATION
My expectations / 2

1 EDA
Size of the data, number of
records
Structure 3 Final dashboard
Data types 2-3 pages / sheets
EDA (exploratory data Unique values for categories recommended
analysis) and insights Indicators are good
Insights and analytics: we do not practice
2 Bar plots
build a model, but we can find
dependencies Pie charts
High-load time periods Histograms
Trends Tables (if needed)
Structure changes ~8-10 diagrams /
indicators required

22.04.2024 8
FINAL PROJECT:
PRESENTATION
Criteria

1 Your data is in cloud storage /


database
Data is complete
Correct format 3 Dashboard
Access control Connections are
working
What will be graded Datasets are based on
2 Data processing pipeline
connections
It works
You can get me through the Indicators and charts
pipeline are enough for the
Credentials are secured (would data (8-10 diagrams)
be a plus) Filters etc.
Serverless (would be a plus)

22.04.2024 9
FINAL PROJECT:
LAST OF LABS
Plan for lab today

JUPYTER LOGS PROJECT OPTIONS PROJECT


One more time about serverless Create connections and datasets
Drop tables by trigger Various datasets for different charts and
Upload data by trigger indicators
Create connection and dataset Indicators
Create test chart Pie charts
“Pulse” of the Jupyter (create new field, convert to Create a dashboard
timestamp with DATETIME_PARSE, aggregate by time Sheets
period) Filters
Improve performance with table for “pulse” chart (count
events, count distinct names)
Indicators
Pie charts
Create a dashboard
Sheets
Filters

22.04.2024 10
FINAL
PROJECT
Quest for homework

1 Create a data visualization pipeline


YQL or databases connect to DataLens
2 Create dashboards for the final project
Work with DataLens: EDA and insights

3 Prepare for the final presentation


May the Force be with You!

22.04.2024 11

You might also like