0% found this document useful (0 votes)
7 views

Introductions To Data Science - Lecture 1 - Introduction

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Introductions To Data Science - Lecture 1 - Introduction

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Introduction to Data

Science
Lecture 1 - Introduction

Ph.D. Vahan Sargsyan


2021
Welcome to Data Science
• Sexiest profession in 21st century.
About Myself
• Who Ph.D. Vahan Sargsyan
• Where CERGE-EI, Prague, Czech Republic
• What Data Scientist in NetSuite (Oracle)
• When Monday 07.30 – 09.00 a.m. CEST, Thursday 07.30-08.15 a.m. CEST

• Teaching Assistants:
• International School of Economics at TSU (ISET) - Giorgi Kvinikadze ([email protected])
• Far Eastern Federal University (FEFU)- Valeria Shichalina ([email protected])
• Novosibirsk State University (NSU) - Elena Limanova ([email protected])
• Westminster International University (WIUT) - Ziyodakhon Malikova ([email protected])

• Contact information [email protected]


• Office Hours Tuesday 07:30 – 08:30 CET, or by appointment.
About You! 
• Far Eastern Federal University (Russia)

• Novosibirsk State University (Russia)

• Westminster International University


(Uzbekistan)

• International School of Economics at TSU


(ISET, Georgia)

• https://cutt.ly/jc4Bifu
About You! 
• DO NOT HESITATE TO ASK QUESTIONS!!!
About the Course
• Check the Syllabus
Data Science as a Profession
• Statistics and Mathematics
• Probability, algebra, regression, etc.
Statistics and
• Choose procedure
Mathematics
• Diagnose problem
Machine Traditional
• Coding Learning Research
• Databases Data
• Tools, programing languages
Science Field-
Coding Danger specific
• Field-specific knowledge Zone knowledge
• Experience in field
• Goals, methods and constraints (Domain)
Coding and Software
• Programming Laguages
What is Machine Learning?
• Term coined by Arthur Samuel (IBM) in 1959
• Machines doing things without being explicitly programmed to do so.
• Algorithm that is able to do two consecutive steps:
1. Find a pattern in a Data;
2. Make a prediction based on the found pattern.
1*2 + 2*5 = 12
• Human Learning is very similar: 1*4 + 2*3 = 10
X1 X2 X3
Y 1*4 + 2*2 = 8
2 5 12
4 3 10 1*X1 + 2*X2 = Y
4 2 8 ?=5
3 1 ?
What is Machine Learning?
• Term coined by Arthur Samuel (IBM) in 1959
• Doing things without being explicitly programmed to do so.
• Algorithm that is able to do two consecutive steps:
1. Find a pattern in a Data;
2. Make a prediction based on the found pattern.
• Human Learning is very similar: -2*5 + 1*12 = 2
X1
Y X2 X3
X1 -2*3 + 1*10 = 4
2 5 12 -2*2 + 1*8 = 4
4 3 10 -2*X2 + 1*X1 = Y
4 2 8
3 1 ? ?=5
Disadvantages and Advantages
of Machine Learning
• Disadvantages:
• Slightly distorted inputs completely wrong output
• Advantages:
• The computing force of the machine  May reveal non-ideal functional form
of the relationship.

(1*)X1 (+2*)X2 (=)y (1*)X1 (+2*)X2 (-0.5*)X3 (=)y (+)error (=)Y


2 5 12 2 5 4 10 -1 9
4 3 6 7 +0.6 7.6
4 3 10
4 2 2 7 -0.8 6.2
4 2 8 3 1 10 0 +0.4 0.4
3 1 5 6 3 2 11 +1 12
Advantages of Machine Learning
• The computing force of the machine.

• What if:
• the functional form of the relationship is not ideal;
• the provided data is not complete.
(1*)X1 (+2*)X2 (-0.5*)X3 (=)Y
2 5 9
4 7.6
4 2 6.2
1 0.4
6 3 ?
Econometrics and Regression
• Econometrics is the application of statistical methods to economic data
in order to give empirical content to economic relationships (i.e. find
patterns).
• In statistical modeling, regression analysis is a set of statistical
processes for estimating the relationships between a dependent
variable and one or more independent variables.
• Some regression models (most common):
• Ordinary Least Squares (OLS);
• Logit;
• Random Forest;
• Neural Networks and Deep Learning.
Data Science and Machine Learning
• Research for an appropriate Regression model – ±5%;
• Data Mining and Cleaning – ±80%;
• Trial and improvement, retrial and improvement, … - ±15%

• Machine learning is a continuously evolving system based on the new


data and corrections of the past predictions.

• Python as a programming Language for ML.


This course
• Planning

• Data Gathering

• Data Cleaning

• Modeling

• Analytics and Evaluation

You might also like