0% found this document useful (0 votes)
12 views

Lecture 1. Data and Sampling

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Lecture 1. Data and Sampling

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Lecture 1

Sampling and Data

Shakhrukh Bobojanov
[email protected]
Room: 308
Office hours: By appointment

Turin Polytechnic University in Tashkent


About Me
Academic Degree
• Bachelor 2014-2018 : BSc (Hons) Economics with Finance at WIUT
• Master 2020 – 2022 : MSc in Applied Economics at WIUT
• Master 2022 – 2024 : MSc in Data Science at University of Arizona, USA
Professional Career
• Chief Financist at Artel 2018 – 2020
• Leading Research Specialist at Center for Economic Research and Reforms under President
Administration 2020 – 2022
• Research and Strategy Analyst at Uzbekistan Mortgage Refinancing Company 02/2024 –
08/2024
• Teacher Assistant at Turin Polytechnic & New Uzbekistan University from 09/2024
My interests
• Video Games, Coding (Python, MQL), Trading, Politics.

Turin Polytechnic University in Tashkent 2


Grading Policy
• Class participation – MAX 5 point (1 point per correctly solving voluntarily handout question during
Practice/Lab classes)
• Online Quiz - MAX 5 point (0.5 point per correct answer. There will be 15 quizzes at any time either
in Lecture, Practice or Laboratory classes)
• Mid-term Exam – 10 point (Computer based)
• Final Exam – 12 point (Computer based)
Note that Mid-term and Final exam questions will be taken from only questions solved in Practice/Lab classes.

Letter Grade Points Performance


A+ 31-32 Merit
A 30-29 Excellent Work
B 28-25 Very Good Work
C 24-21 Good Work
D 20-19 Average Work
E 18 Poor Work / Passing Grade
F 17-0 Failing Work
Core Textbook: Introductory Business Statistics 2e. Author: Alexander Holmes

Turin Polytechnic University in Tashkent 3


Why you should study Statistics?
1. Career Opportunities:
o Researcher: Contribute to academic, scientific, and market research in both business and government organization.
o Business Analyst: Analyze and interpret data to guide business decisions.
o Data Scientist: Extract insights from data, build predictive models, and contribute to AI projects.
o Statistician: Work with data in various sectors including government, healthcare, and finance.
o Financial Analyst: Evaluate financial data to make investment recommendations.
o Econometrician: Apply statistical methods to economic data and forecasting.
o Operations Research Analyst: Use statistical methods to improve business processes.
o Quality Analyst: Ensure product or service quality through statistical quality control.

2. Competitive Salary:
o In Uzbekistan: Fresh graduates can earn over $2000 per month.
o Global Perspective: Skilled statisticians are in demand worldwide, offering even higher salary prospects.

3. High Demand for Skills:


o Starting from 2022, research departments are being established in every government organization in Uzbekistan.

4. Applications:
o In Business and government sector: From marketing analytics to risk management in both business and government level,
statistics is integral to strategic decisions.
o In Healthcare: Biostatistics is critical for medical research, public health, and clinical trials.
o In Technology: Statisticians are crucial in developing algorithms, machine learning, and AI systems.

Turin Polytechnic University in Tashkent 4


What is statistics?
Statistics (a discipline) is a science of collecting,
analyzing, interpreting and presenting the numerical
data to make effective decision in any field:
• Collect data: From Primary or Secondary sources
• Analyze: Examine each data component to find
patterns or trends in data.
• Interpret: Explain what the findings mean in real case.
• Present: Organize and group ideas logically

Turin Polytechnic University in Tashkent 5


Basic Terms

Population: A collection of individuals


or objects or events whose properties
are to be analyzed.

Sample: A subset or small portion


of the population selected from a population
to represent the population.
Sampling is a method that allows to draw conclusions
about the population based on results from Sample.

Turin Polytechnic University in Tashkent 6


Population vs Sample
Population is every data point in group Sample is subset of population

Turin Polytechnic University in Tashkent 7


Population vs Sample
Population is every data point in group Sample is subset of population

Turin Polytechnic University in Tashkent 8


Population vs Sample

Parameters: A numerical value Statistic: A numerical value


summarizing all the data of an summarizing the sample data, for
entire population, for instance, instance, a sample mean, mode
a population mean, mode and and median.
median.

μ Mean x̄
p Proportion p̂
σ Std Dev. s
N Size n

Turin Polytechnic University in Tashkent 9


Two Areas of Statistics

•Descriptive Statistics: collection, presentation, and


description of sample data.

•Inferential Statistics: making decisions and drawing


conclusions about populations.

Turin Polytechnic University in Tashkent 10


Descriptive vs Inferential Statistics
Sampling data
Population Sampling
Case Gender IQ Missed
Class
1 Male 110 1
2 Female 85 7
3 Male 65 14
4 Male 75 11
5 Male 100 2
6 Female 87 4
… … … …

Turin Polytechnic University in Tashkent 11


What are DATA?

Data can be numbers, record names, or other labels recorded for


the observational unit. Data will consist of:
• Observational/experimental unit is the person or thing to which
the variable is observed or measured or simply a case. E.g.,
individual, household, community, country, etc.
• Variables are characteristics recorded about each individual or
thing. The variables should have a name that identify What has
been measured. E.g., height, weight, price, color, etc.

Turin Polytechnic University in Tashkent 12


Data Tables
• The following data table clearly shows the context of
the data presented:

• The each column shows the variable (characteristics


of observational unit) and each row shows the
individual observational unit for these data.

Turin Polytechnic University in Tashkent 13


Types of Variables
• Qualitative or Categorical Variable:
A variable that identifies a categories for each case, for example, gender.
Note: Arithmetic operations, such as addition and averaging, are not
meaningful for data resulting from a qualitative variable

• Quantitative or Numerical Variable:


A variable that records measurements or amounts of something and must
have measuring units, for example, weight measured in kg.
Note: Arithmetic operations such as addition and averaging, are meaningful
for data resulting from a quantitative variable

Turin Polytechnic University in Tashkent 14


Subdividing Variables Further
Qualitative and quantitative variables be
further subdivided into:

Nominal

Qualitative
Ordinal
Data
Discrete
Quantitative

Continuous

Turin Polytechnic University in Tashkent 15


Discrete data

➢ Take finite values (anything can be counted)


E.g., # of Students in a class - 30
# of books in a shelf - 20
# of visits to doctor per week - 2

➢ Cannot be broken down into portions


(not decimal numbers).
E.g., Number of students can not be 30.5 ➢ Can be visualized with bar charts and pie charts

➢ Can be counted but not be measured.

Turin Polytechnic University in Tashkent 16


Continuous data

➢ Take any values in a specified range


E.g., Weight of person
Distance from A to B
Time taken from A to B

➢ Can be broken down into portions


(decimal numbers).
E.g., Weight can be 40.5 kg ➢ Can be visualized with histograms and scatterplots

➢ Not counted but can be measured.

Turin Polytechnic University in Tashkent 17


Nominal & Ordinal data
Categorical characteristics or groups with no rank order
• Gender (male, female)
• Ethnicity (Uzbek, Kazakh, Russian)
• Personal Preferences (Coffee, tea, water)
• Marital status (married, not married, divorced)

Qualitative characteristics that have a natural rank order


• Income levels (Low, medium, high)
• Levels of satisfaction (Worst, Bad, Neutral Good, Excellent)
• Levels of Education (School, College, Higher education)
• Medal ranks (bronze, silver, gold)

Turin Polytechnic University in Tashkent 18


Quick test (Discrete & Continuous & Nominal & Ordinal)
• Temperature: The temperature in a room: 21.5°C. In many cases, a discrete
and continuous variable may
• Job Roles: Intern, Junior, Senior, Manager. be distinguished by
• Number of Patients in a Hospital: 12 patients. determining whether the
variables are related to a
• Music Genres: Rock, Jazz, Classical, Pop. count or a measurement.

• Spiciness Levels: Mild, Medium, Hot, Extra Hot. Discrete variables are
• Volume of Liquid: The volume of water in a glass: usually associated with
counting.
250.5 ml.
• Number of Cars in a Parking Lot: 15 cars. Continuous variables are
usually associated with
• Types of Pets: Dog, Cat, Bird, Fish. measurements.

Turin Polytechnic University in Tashkent 19


Fact: Mind wandering or Mental time travel
o Research from Harvard University found that people spend about Mind wandering refers to when
47% of their waking hours thinking about something other than people's thoughts shifts away from
what they're currently doing. Interestingly, the study also found the present moment, often to future
that mind wandering is linked to lower happiness, suggesting tasks or past experiences. It can
that being present can lead to greater achievements in life. lead to the situation where
o When your mind wanders during a task, especially learning, it individuals are not fully engaged
significantly reduces your ability to understand the information. with what they are currently doing.
This is because attention is a key factor in memory formation, and
when you're not focused, the brain struggles to store new
information.
o Studies show that when your mind drifts away from a task, you're
more likely to make mistakes. This is particularly true in activities
that require attention to detail, such as studying or taking exams.
o Mind wandering can also increase stress levels because it often
leads to ruminating on past events or worrying about the future.
Staying focused on the present can help reduce anxiety and
make learning more enjoyable.

Turin Polytechnic University in Tashkent 20


Data Collection

• First problem a statistician faces: how to obtain the data.


• Usually the data are sample data collected from a portion of
the population. It is important to obtain good or representative
sample data.
• Statistical Inferences to the population are made based on
statistics obtained from the sample data collected.

Turin Polytechnic University in Tashkent 21


Biased Sampling
Biased Sampling Method: A sampling method that produces data
which systematically differs from the sampled population.
An unbiased sampling method is one that is not biased.

Sampling methods that often result in biased samples:

• Convenience sample: sample selected from elements of


a population that are easily accessible
• Volunteer sample: sample collected from those elements
of the population which chose to contribute the needed
information on their own initiative

Turin Polytechnic University in Tashkent 22


Methods Used to Collect Data

Data can be collected through performing an Experiment or survey or


census:
Experiment: The investigator controls or modifies the environment and
observes the effect on the variable under study

Survey: Data are obtained by sampling some of the population of


interest. The investigator does not modify the environment.

Census: A 100% survey. Every element of the population is listed. Not


frequently used: difficult and time-consuming to compile, and expensive.

Turin Polytechnic University in Tashkent 24


Sampling methods

• Probability Sampling
Probability sampling is based on the fact that every point in the population
data has equal chance of being selected. For example, when we flip a coin,
there are 50/50 chance of getting a head or tail.

• Non - Probability Sampling


It involves non-random sampling selection based on convenience or
voluntary selection. It leads to selection bias problem.

Selection bias results when a subset of experimental units in the population


is excluded so that these units have no chance of being selected in the
sample.

Turin Polytechnic University in Tashkent 25


Random Sampling
Population • Simple Random Sampling
In Simple Random Sampling, every
member of study population has an
2 equal chance of being selected.
1 4
23 3
5 6
24 20 7 Randomly generated numbers:
21
10
8
9 4, 8, 14, 17, 22, 23
17
22
18 11
19 15
12 • Use cases: Use cases are not limited.
13
16 14

Turin Polytechnic University in Tashkent 26


Systematic Sampling
Population • Systematic Sampling
The starting element is randomly
selected from population and then nth
2 element will be selected.
1 4
23 3
5 6
24 20 7 First element: 3
10 9 Every 4th element: 3, 7, 11, 15, 19, 23
21 8
17
22
18 11
19 15
12 • Use cases: when we have complete list of
13
population.
16 14 o Full list of employees in the company
o Full list of people in the population in
telephone survey
o Full list of patients who had specific
disease

Turin Polytechnic University in Tashkent 27


Cluster Sampling
Population
• Clustered Sampling
We divide the population into smaller
groups (clusters), and then select
randomly among these clusters to form
a sample.

• Use cases
o When the population is spread
out over a large area
o Public Health researches
during pandemic
o Educational researches
(clusters can be schools,
universities)

Turin Polytechnic University in Tashkent 28


Stratified Sampling
Population
Age 18-25 Age 26-35 • Stratified Sampling
We divide the population into smaller
groups (strata) based on certain
characteristics (age, education, gender,
race), and then select randomly fixed
number of items from each strata.

Age 36-45 Age 46-55 • Use cases


o Public policy studies

Turin Polytechnic University in Tashkent 29


Any Questions?
Deadline: 12:00PM (noon time) on Thursday, 19/09/2024
Mark: 0.5 point.

https://forms.gle/Hgrvqe33EPJdeT1e7

Turin Polytechnic University in Tashkent 30


Reading materials

Introductory Business Statistics 2e. Author: Alexander


Chapter 1. Sampling and Data. Pages 5-43

Turin Polytechnic University in Tashkent 31

You might also like