Introductions To Data Science - Lecture 1 - Introduction
Introductions To Data Science - Lecture 1 - Introduction
Science
Lecture 1 - Introduction
• Teaching Assistants:
• International School of Economics at TSU (ISET) - Giorgi Kvinikadze ([email protected])
• Far Eastern Federal University (FEFU)- Valeria Shichalina ([email protected])
• Novosibirsk State University (NSU) - Elena Limanova ([email protected])
• Westminster International University (WIUT) - Ziyodakhon Malikova ([email protected])
• https://cutt.ly/jc4Bifu
About You!
• DO NOT HESITATE TO ASK QUESTIONS!!!
About the Course
• Check the Syllabus
Data Science as a Profession
• Statistics and Mathematics
• Probability, algebra, regression, etc.
Statistics and
• Choose procedure
Mathematics
• Diagnose problem
Machine Traditional
• Coding Learning Research
• Databases Data
• Tools, programing languages
Science Field-
Coding Danger specific
• Field-specific knowledge Zone knowledge
• Experience in field
• Goals, methods and constraints (Domain)
Coding and Software
• Programming Laguages
What is Machine Learning?
• Term coined by Arthur Samuel (IBM) in 1959
• Machines doing things without being explicitly programmed to do so.
• Algorithm that is able to do two consecutive steps:
1. Find a pattern in a Data;
2. Make a prediction based on the found pattern.
1*2 + 2*5 = 12
• Human Learning is very similar: 1*4 + 2*3 = 10
X1 X2 X3
Y 1*4 + 2*2 = 8
2 5 12
4 3 10 1*X1 + 2*X2 = Y
4 2 8 ?=5
3 1 ?
What is Machine Learning?
• Term coined by Arthur Samuel (IBM) in 1959
• Doing things without being explicitly programmed to do so.
• Algorithm that is able to do two consecutive steps:
1. Find a pattern in a Data;
2. Make a prediction based on the found pattern.
• Human Learning is very similar: -2*5 + 1*12 = 2
X1
Y X2 X3
X1 -2*3 + 1*10 = 4
2 5 12 -2*2 + 1*8 = 4
4 3 10 -2*X2 + 1*X1 = Y
4 2 8
3 1 ? ?=5
Disadvantages and Advantages
of Machine Learning
• Disadvantages:
• Slightly distorted inputs completely wrong output
• Advantages:
• The computing force of the machine May reveal non-ideal functional form
of the relationship.
• What if:
• the functional form of the relationship is not ideal;
• the provided data is not complete.
(1*)X1 (+2*)X2 (-0.5*)X3 (=)Y
2 5 9
4 7.6
4 2 6.2
1 0.4
6 3 ?
Econometrics and Regression
• Econometrics is the application of statistical methods to economic data
in order to give empirical content to economic relationships (i.e. find
patterns).
• In statistical modeling, regression analysis is a set of statistical
processes for estimating the relationships between a dependent
variable and one or more independent variables.
• Some regression models (most common):
• Ordinary Least Squares (OLS);
• Logit;
• Random Forest;
• Neural Networks and Deep Learning.
Data Science and Machine Learning
• Research for an appropriate Regression model – ±5%;
• Data Mining and Cleaning – ±80%;
• Trial and improvement, retrial and improvement, … - ±15%
• Data Gathering
• Data Cleaning
• Modeling