0% found this document useful (0 votes)
2 views

Webinar _Learn More About Data Scientist, Data Analyst & Data Engineer

Uploaded by

Lê Tâm
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Webinar _Learn More About Data Scientist, Data Analyst & Data Engineer

Uploaded by

Lê Tâm
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

Meet our speakers!

01 Roles in Business Intelligence

02 Learn more about Data Analyst + QA

03 Learn more about Data Engineer + QA

04 Learn more about Data Scientist + QA

05 Minigame
ROLES IN
01 BUSINESS
INTELLIGENCE
BENEFITS OF
BUSINESS
INTELLIGENCE
ROLES IN BI
Learn more about

02 DATA
ANALYST
01 What is Data Analytics?

02 How to become a Data Analyst?

03 Skill set of Data Analyst

04 Career Path

05 Challenges & Lessons Learned

06 Q&A
WHAT IS
1 DATA ANALYTICS?
Data analytics is the science of analyzing raw data to make conclusions about
that information, then, help a business optimize its performance.

SCIENCE OF OPTIMIZE
RAW DATA
ANALYZING PERFORMANCE
OPTIMISATION
8 What’s the best can happen?
BUSINESS VALUE

BUSINESS ANALYTICS PREDICTIVE MODELLING


7 Why is this happening?

FORECASTING
6 What if this trend continue?

STATISTICAL ANALYSIS
5 Why is this happening?
ALERTS
4 What action are needed?

QUERY DRILLDOWN (OLAP)


3 Where exactly is the problem?

AD-HOC REPORTS
2 How many? How often? Where?
BUSINESS INTELLIGENCE
STANDARD REPORT
1 What Happened?

DEGREE OF INTELLIGENCE
SOME PROBLEMS TO
SOLVE IN CỐC CỐC
1. How to organize reports/dashboards for easy usage of other team?
2. How to define meaningful metrics?
3. Why do users use browser more active in a specific period?
4. What features do user use most frequently? How to improve them?
5. How does a new feature impact to user behavior?
6. Why do user churn?
7. Which groups of user retains most?
8. How to trigger aha moment to increase retention?

“ If you want to know widely, go data analyst


If you want to go deeply, go data scientist…”
HOW TO BECOME
2 DATA ANALYST?

● Learn what market needs: Check Job Description (Demo)

● Learn from other data analysts: Research CV/Linkedin


HOW TO BECOME
2 DATA ANALYST?

● Learn what market needs: Check Job Description (Demo)

● Learn from other data analysts: Research CV/Linkedin


Some confusing terms:

Job Tasks
Data Analyst Data
Analytics

Business Business
Analyst Analytics
SKILL SET OF
3 DATA ANALYST

DATA
RAW DATA BUSINESS
ANALYST
Technical skills: Soft skills:
● Statistics ● Critical thinking and
● BI Tools: Power BI, problem solving
Excel, Tableau ● Writing and
● Programming: R, communication
Python ● Domain knowledge
DATA ANALYST ROADMAP
4 CAREER PATH
.... Depend on how do you know yourself.
Data Analytics in Cốc Cốc
5 CHALLENGES & LESSONS LEARNED

DATA ANALYST IN REAL WORLD

● What is the key metrics we should


measure? How to calculate?
● Where is the data?
● How to communicate insights to
Ad-hoc
other team? Analysis
Build
Dashboard
Data Analytics in Cốc Cốc
5 CHALLENGES & LESSONS LEARNED
Challenges: Lessons learned:
● Huge amount of data
● Expand our technical skill sets: not
● Various data types:
Excel anymore
transactional, log, NLP,...
● Improve domain knowledge
● Various domains: business
● Increase working efficiency
cases, departments
● Improve
● High needs of data insights to
communication/presentation skills
make decisions
● High touch with data engineer and
data scientist
Q&A
Làm DA có cần phải học chuyên ngành
công nghệ thông tin không?

Sinh viên kinh tế muốn làm DA thì cần


học thêm những gì?

Cách tìm kiếm thực tập cho người có background tài


chính kế toán - nên học những phần mềm nào, tay
ngang thì nên tìm hiểu và học ở đâu
QUESTION FOR
WEBINAR ATTENDEES
3 players who has correct and fastest answer
will receive a voucher 50K
How many questions can be
solved by Data Analyst?
8 questions
OPTIMISATION
8 What’s the best can happen?
BUSINESS VALUE

BUSINESS ANALYTICS PREDICTIVE MODELLING


7 Why is this happening?

FORECASTING
6 What if this trend continue?

STATISTICAL ANALYSIS
5 Why is this happening?
ALERTS
4 What action are needed?

QUERY DRILLDOWN (OLAP)


3 Where exactly is the problem?

AD-HOC REPORTS
2 How many? How often? Where?
BUSINESS INTELLIGENCE
STANDARD REPORT
1 What Happened?

DEGREE OF INTELLIGENCE
Learn more about

03 DATA
ENGINEER
01 Who are Data Engineers?

02 A day of a Data Engineer

03 Job Market & Career Path

04 Skill set & How to start with DE?

05 Challenges & Lessons Learned

06 Q&A
WHO ARE THEY?
1 WHAT THEY DO?
They build and maintain their
organization’s data ecosystem.

● Build data pipelines for bringing


together information from different
data sources
● Distribute data throughout the
organization
● Prepare data for analytical, machine
learning and operational uses
● Understand business objectives
A DAY OF
2 DATA ENGINEER
● Daily meetings
● Data loading: SQL, ETL tools, Cron jobs taking data from data sources
● Data manipulation: Coding Java, Python, C++ tasks to extract, aggregate
raw data into usable form docs, tables,...
● Data pipeline: create tables, index storing the results into data
warehouse
● Optimizing: monitor the jobs ands results, optimize performance,...
● Data servicing: create web APIs for serving our data to teams, clients
DEMO - DATA PIPELINE
DEMO - DATA PIPELINE

Data sources:

● 3rd APIs: Safeforce, AccessTrade, CityAds...


● Cloud Storage: Coc Coc Apps on Google Play, App Store,...
● Browser metrics: Coc Coc browser usage, crash reports,...
● Service’s Database: Advertisement, Coc Coc Account,...

ETLs: inhouse ETL tools Java, C/C++, Go, Python

Database: ClickHouse, NoSQL (MongoDB, ElasticSearch,...), SQL


(MySQL)

Visualization: Grafana, inhouse


DEMO - NEW FEED
JOB MARKET &
3
CAREER PATH
Data Engineer Job Growth

Data Engineer Shortage


VIETNAM
JOB MARKET

“Data Scientist and Big Data


Engineer are expected to be at
the top of the list of most
in-demand positions in the
next 5 years with their rapidly
growing rate.”

Adecco Vietnam Salary Guide 2020


CAREER PATH
Start off as Software engineers, BI analysts

(*) Cốc Cốc Grading

Autonomy Train team


members Own disciplines

Other
Tech leader Process disciplines

Approach Business sense Scale


SKILL SET &
4 HOW TO START WITH DE
Programming languages: Java, Python,
SQL,...

Good understanding of DS&A

*.nix environment

Understanding of ETL tools

Database systems (SQL, No-SQL)

Web services
ROADMAP

Source: simplilearn.com
DATA ENGINEERING IN CỐC CỐC
5 CHALLENGES & LESSONS LEARNED

● Data in various domains Browser, Ads, Search,...


● Serve our 25 mills VNese users !
● Terabytes databases
● Billions events/day
● Thousands of rps systems
● Talented teams
Q&A
Việc chuyển từ 1 software engineering sang data
engineering cần hoàn thiện thêm những kỹ năng nào?

Em muốn hỏi là sinh viên năm 2 thì có thể làm


công việc gì bổ trợ cho công việc data engineer ạ

Mình năm nay 26 tuổi. Vừa bắt đầu tìm hiểu và học các
kiến thức cơ bản để trở thành 1 Data Engineer. Tương lai
Cốc Cốc có mở các chương trình tuyển các thực tập sinh
hoặc JUNIOR không ạ?
QUESTION FOR
WEBINAR ATTENDEES
3 players who has correct and fastest answer
will receive a voucher 50K
Which phase that “ETL" stands for?
Extract - Transform - Load
Learn more about

04 DATA
SCIENTIST
Increase in Demand for Data Scientist

Source: kickstarter.com
What Industries hiring us?
ROLES OF
DATA SCIENTIST

Contribute to Recommendation News Feed

The click through rate increased up to 20% compared to the


human setting.
ROLES OF
DATA SCIENTIST

Contribute to Monetization Advertising Department

The ad relevances increased up to 8% compared to the


non-optimization solution
ROLES OF
DATA SCIENTIST

Contribute to Anomaly Detection and Early Warning System

Reduce upto 100% human workload for rule-based setting


01 Who are Data Scientists?

02 A day of a Data Scientist

03 How to become a Data Scientist?

04 Career Path

05 Challenges & Opportunities

06 Lesson Learned from Cốc Cốc Project

07 Q&A
1 WHO ARE THEY?
A DAY IN THE LIFE OF
2 A DATA SCIENTIST
WHAT DOES
A DATA SCIENTIST DO?

RESEARCH

MEASURE RESULTS DEVELOP HYPOTHESIS

RUN EXPERIENCE CREATE VARIATIONS


WORKLOAD
HOW TO BECOME
3 A DATA SCIENTIST

● Study Data Scientist courses


● Read Data Science Books: Link
● Participate in Data Science community
IMPORTANT SKILL
ROADMAP 1 - 1.5y

Road map: http://nirvacana.com/thoughts/2013/07/08/becoming-a-data-scientist/


A TYPICAL
DATA SCIENTIST
CODING TOOLBOX
CAREER PATH &
4 PROGRESSION

Data Engineering Computer Science

Data Analysis ?
DATA SCIENCE IN CỐC CỐC
5 CHALLENGES & OPPORTUNITIES

● Build ML/AI Prediction model for AdEngine ( CTR prediction, Quality Score model)
● Designing and developing machine learning and deep learning systems
● Running machine learning tests and experiments
● Implementing appropriate ML algorithms, and turn them into microservices on our
production environments.
● Improve current targeting model ( user interest/user catalogs)
● Build recommendation/suggestion model for our current products (Music
Recommendation Engine, B2B, B2C products)
● Others: Analysis data, doing ad-hoc analysis and presenting results in a clear manner.
LESSON LEARNED FROM
6 NEWS FEED PROJECT
The personalization distributed news content
platform (NewsFeed)

System increased upto 20% CTR &


5% DAU compared to the human
setting
Look back our life!

RESEARCH

MEASURE RESULTS DEVELOP HYPOTHESIS

RUN EXPERIENCE CREATE VARIATIONS


Step 1:
Research - Recommendation Engine
Step 2:
Create Hypothesis about Relationship between
User Behavior & User - Articles Representation ( GraphDB)
Step 2:
Create Hypothesis about Relationship between
User Behavior & User - Articles Representation ( GraphDB) - cont
Step 2:
Create Hypothesis about Relationship between
User Behavior & User - Articles Representation ( GraphDB) - cont
Step 3:
Create Variations ( Code the solutions, Machine Learning Models)

GraphDB Relationship.

● (user1)-[:IS_SIMILAR]->(user2) based on queries, topicmodel..( cosine similarity)


● (article1)-[:IS_SIMILAR]->(article2)
● (article1)-[:BELONGS_TO]->(#TOPIC_01)
● (article1)-[:IN_CATEGORY]->(CATEGORY)
● (user1)-[:INTERESTED_IN:weight]->(#TOPIC)
● (user1)-[:INTERESTED_IN:weight]->(#CATEGORY)

Query samples:

● Get Top K articles are in {categories} OR {topics} that we can recommend to our target user recently and their neighbors ? @Đức Lê Huỳnh please check the query grammaticaly,
and the benchmark.
● neighbors = MATCH (c:User)-[r:IS_SIMILAR]->() WHERE userId= $uid RETURN c.userId
○ recommendation = SUM OF
■ (1) similar users MATCH (c:User)-[r:READ]->() WHERE userId= $uid RETURN c.userId AND
■ (2) MATCH (a:Article)-[r:BELONGS_TO]->($lst_categories) OR MATCH (a:Article)-[r:BELONGS_TO]->($lst_topics) WHERE
(u:INTERSTED_IN:_)>0.5 AND u.uid IN ($lst_neighbors) RETURN a.articleId AND
■ (3) BY user himself : MATCH (a:Article)-[r:BELONGS_TO]->($userid) OR MATCH (a:Article)-[r:BELONGS_TO]->($lst_topics) WHERE
(u:INTERSTED_IN:_)>0.5 AND u.uid IN ($lst_neighbors) RETURN a.articleId
Step 4:
Run Experiment ( Online/Offline A/B Testing)
Step 4:
Run Experiment ( Online/Offline A/B Testing)
Step 5:
Monitor and measure the results
The click through rate increased up to 20%
compared to the human setting with 99% of
statistical confidence interval
Q&A
Những dự án phù hợp với newbie để dần làm quen với môi trường công
việc thực tế ? Các lỗi cơ bản, sai lầm của newbie trong lúc học và ra làm
thực tế ?

Liệu một newbie trái ngành, hay chưa có kinh


nghiệm/kiến thức nhiều về IT, lập trình, có thể chuyển
hướng sang Data Science hay không?

Data science là 1 ngành đa dạng lĩnh vực. Với sinh viên


mới bắt đầu, ngoài kiến thức chung về data thì có phải
học thêm kiến thức về các ngành đó không?
MINIGAME
RULE:

1. Click on the link that sent in the chat box


2. Register to join the game
3. Answer 10 multiple-choice questions
4. The winner is who has many correct & fastest answers
CONGRATULATIONS
The Winner please send your information (name, phone number, email,
address) with a result screenshot to
Fanpage Trình duyệt Cốc Cốc
THANK YOU!

You might also like