0% found this document useful (0 votes)

24 views

STATISTICAL CONCEPTS-module1

STATISTICAL CONCEPTS

Uploaded by

Smitha Rajesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

STATISTICAL CONCEPTS-module1

STATISTICAL CONCEPTS

Uploaded by

Smitha Rajesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

MODULE- I

INTRODUCTION TO BIG DATA

Statistical concepts

 Statistics is a branch of applied or business mathematics where we

collect, organize, analyse and interpret numerical facts. Statistical
methods are the concepts, models, and formulas of mathematics used in
the statistical analysis of data.
 They can be subdivided into two main categories - Descriptive Statistics
and Inferential Statistics.

 Descriptive statistics further consists of measure of central tendency and

measure of dispersion and inferential statistics consists of estimation and
hypothesis testing.

1. Descriptive statistics

Descriptive statistics methods involve summarizing or describing the

sample of data in various forms to get an overall gist of the data.

2. Inferential Statistics

In contrast, inferential statistics try to make assumptions about the

population of the data, given the sample; or in predicting various
outcomes.
RANDOM EXPERIMENTS
 Random experiment is the process to observe the event having an
uncertain outcome.
 When we toss a coin the outcome is uncertain and hence, it can be termed
as a random experiment.
 The result of a random experiment is known as the outcome and the set of
all the possible outcomes of an experiment is known as sample space.
 If we repeat an experiment n number of times, then each time the
experiment is done is known as a trial.
RANDOM VARIABLES

 A random variable is a variable where value is unknown or a function

that assigns values to every of an experiment’s outcomes.
 They are often designated by letters.
 Random variables can be classified as discrete which are variables that
have specific values and continuous which are variables that can have any
values within a continuous range.
 It is different from an algebraic variable. The variable in the algebraic
equation is an unknown value that could be calculated.
 Whereas a random variable has a set of values, and any of those values
can be the resulting outcome.
 Example: tossing a coin or dice.
Types of random variables

1. Discrete Random Variable

 As the name Suggest, Discrete random variables consist of distinct or

discrete unique values. It takes a countable number of distinct values.
Now, Consider an experiment where a coin is tossed five times.

 Discrete Random Variable Example:

♦ Tossing a Coin

Here the number of outcomes that can occur is either a Head or a

Tail. Hence we can denote Head, Tail as Random variables as they are
distinct in nature.
2. Continuous Random Variable

An example of a continuous random variable can be an experiment that involves

measuring the amount of rainfall in a city over a year or the average height of a
random group of 25 people.

 Continuous Random Variable Example

► Heights of people playing Basketball.

Here Height can be any value between 4 feet’s to 7 feet’s respectively.

POPULATION AND SAMPLING

Population
 A population is an entire collection of objects or observations from which
we may collect data. It is the entire group we are interested in, which we
wish to describe or draw conclusions about.
 For each population, there are many possible samples. It is important
that the investigator carefully and completely defines the population
before collecting the sample, including a description of the members to
be included.
Sample
 A sample is a group of units selected from a larger group (the
population).
 By studying the sample, it is hoped to draw valid conclusions about the
larger group.
 A sample is generally selected for study because the population is too
large to study in its entirety. The sample should be representative of the
general population. This is often best achieved by random sampling
SAMPLING DISTRIBUTION

 A sampling distribution is a probability distribution of a statistic obtained

through a large number of samples drawn from a specific population.
 The sampling distribution of a given population is the distribution of
frequencies of a range of different outcomes that could possibly occur for
a statistic of a population.
 A lot of data drawn and used by academicians, statisticians, researchers,
marketers, and analysts are actually samples, not populations.

SAMPLING

PROBABILITY NON- PROBABILITY

SAMPLING SAMPLING

Probability vs. Non-Probability Samples

As a group, sampling methods fall into one of two categories.

 Probability samples. With probability sampling methods, each

population element has a known (non-zero) chance of being chosen for
the sample.
 Non-probability samples. With non-probability sampling methods, we
do not know the probability that each population element will be chosen,
and/or we cannot be sure that each population element has a non-zero
chance of being chosen.
Non-probability sampling methods offer two potential advantages - convenience
and cost. The main disadvantage is that non-probability sampling methods do
not allow you to estimate the extent to which sample statistics are likely to
differ from population parameters. Only probability sampling methods permit
that kind of analysis.

Non-Probability Sampling Methods

Two of the main types of non-probability sampling methods are voluntary

samples and convenience samples.

 Voluntary sample. A voluntary sample is made up of people who self-

select into the survey. Often, these folks have a strong interest in the main
topic of the survey.

Suppose, for example, that a news show asks viewers to participate in an

online poll. This would be a volunteer sample. The sample is chosen by
the viewers, not by the survey administrator.

 Convenience sample. A convenience sample is made up of people who

are easy to reach.

Consider the following example. A pollster interviews shoppers at a local

mall. If the mall was chosen because it was a convenient site from which
to solicit survey participants and/or because it was close to the pollster's
home or business, this would be a convenience sample.

Probability Sampling Methods

The main types of probability sampling methods are simple random sampling,
stratified sampling, cluster sampling, multistage sampling, and systematic
random sampling. The key benefit of probability sampling methods is that they
guarantee that the sample chosen is representative of the population. This
ensures that the statistical conclusions will be valid.

 Simple random sampling. Simple random sampling refers to any

sampling method that has the following properties.
 The population consists of N objects.
 The sample consists of n objects.
 If all possible samples of n objects are equally likely to occur, the
sampling method is called simple random sampling.

There are many ways to obtain a simple random sample. One way would
be the lottery method. Each of the N population members is assigned a
unique number. The numbers are placed in a bowl and thoroughly mixed.
Then, a blind-folded researcher selects n numbers. Population members
having the selected numbers are included in the sample.

 Stratified sampling. With stratified sampling, the population is divided

into groups, based on some characteristic. Then, within each group, a
probability sample (often a simple random sample) is selected. In
stratified sampling, the groups are called strata.

As a example, suppose we conduct a national survey. We might divide

the population into groups or strata, based on geography - north, east,
south, and west. Then, within each stratum, we might randomly select
survey respondents.

 Cluster sampling. With cluster sampling, every member of the

population is assigned to one, and only one, group. Each group is called a
cluster. A sample of clusters is chosen, using a probability method (often
simple random sampling). Only individuals within sampled clusters are
surveyed.
Note the difference between cluster sampling and stratified sampling.
With stratified sampling, the sample includes elements from each
stratum. With cluster sampling, in contrast, the sample includes elements
only from sampled clusters.

 Multistage sampling. With multistage sampling, we select a sample by

using combinations of different sampling methods.

For example, in Stage 1, we might use cluster sampling to choose clusters

from a population. Then, in Stage 2, we might use simple random
sampling to select a subset of elements from each chosen cluster for the
final sample.

 Systematic random sampling. With systematic random sampling, we

create a list of every member of the population. From the list, we
randomly select the first sample element from the first k elements on the
population list. Thereafter, we select every kth element on the list.

This method is different from simple random sampling since every

possible sample of n elements is not equally likely.

RE-SAMPLING

 Resampling is the method that consists of drawing repeated samples

from the original data samples. The method of Resampling is a
nonparametric method of statistical inference. In other words, the method
of resampling does not involve the utilization of the generic distribution
tables (for example, normal distribution tables) in order to compute
approximate p probability values.
 Resampling involves the selection of randomized cases with replacement
from the original data sample in such a manner that each number of the
sample drawn has a number of cases that are similar to the original data
sample. Due to replacement, the drawn number of samples that are used
by the method of resampling consists of repetitive cases.
 Resampling is also known as Bootstrapping or Monte Carlo Estimation.

STATISTICAL INFERENCE

 The general idea that underlies statistical inference is the comparison of

particular statistics from on observational data set (i.e. the mean, the
standard deviation, the differences among the means of subsets of the
data), with an appropriate reference distribution in order to judge the
significance of those statistics.
 When various assumptions are met, and specific hypotheses about the
values of those statistics that should arise in practice have been specified,
then statistical inference can be a powerful approach for drawing
scientific conclusions that efficiently uses existing data or those
collected for the specific purpose of testing those hypotheses.

Chapter 3 Supervision of Instruction
100% (2)
Chapter 3 Supervision of Instruction
22 pages
Cambridge Standards For Early Years Practitioners
100% (2)
Cambridge Standards For Early Years Practitioners
7 pages
Industry Analysis Report On FMCG Sector
100% (7)
Industry Analysis Report On FMCG Sector
78 pages
Santiago - The Effect of Padrino System On The Behavior of The Employees
80% (5)
Santiago - The Effect of Padrino System On The Behavior of The Employees
8 pages
HTHSCI 2G03 - Statistics and Epidemiology I
No ratings yet
HTHSCI 2G03 - Statistics and Epidemiology I
16 pages
Research Reflection Report - Tran Thi Thuy
No ratings yet
Research Reflection Report - Tran Thi Thuy
16 pages
Population Sample: Sampling and Methods Sampling Method Refers To The Way That Observations Are Selected From A
No ratings yet
Population Sample: Sampling and Methods Sampling Method Refers To The Way That Observations Are Selected From A
23 pages
Introduction To Statistics: Teacher
No ratings yet
Introduction To Statistics: Teacher
19 pages
Sampling Technique
No ratings yet
Sampling Technique
52 pages
RESEARCH DEVELOPMENT Lesson 6
No ratings yet
RESEARCH DEVELOPMENT Lesson 6
17 pages
Types of Non-Probability Sampling
No ratings yet
Types of Non-Probability Sampling
4 pages
Statistics Chapter 1
No ratings yet
Statistics Chapter 1
3 pages
PME Lec1. Sampling 13dec
No ratings yet
PME Lec1. Sampling 13dec
48 pages
Statistics Handouts
No ratings yet
Statistics Handouts
2 pages
Complete Basic Stats
No ratings yet
Complete Basic Stats
18 pages
Sampling
No ratings yet
Sampling
5 pages
Chapter1 - L2 احصاء
No ratings yet
Chapter1 - L2 احصاء
22 pages
Probability and Non Probability of Sampling
No ratings yet
Probability and Non Probability of Sampling
17 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
18 pages
5.2 Sampling Methods
No ratings yet
5.2 Sampling Methods
35 pages
Sampling
No ratings yet
Sampling
8 pages
Techniques of Sampling
No ratings yet
Techniques of Sampling
5 pages
DR. Waqar Al - Kubaisy
No ratings yet
DR. Waqar Al - Kubaisy
44 pages
EM-104-Module
No ratings yet
EM-104-Module
12 pages
Cse-613 - Mod 4
No ratings yet
Cse-613 - Mod 4
97 pages
Sampling
No ratings yet
Sampling
34 pages
Sampling Method Final
No ratings yet
Sampling Method Final
10 pages
Unit One
No ratings yet
Unit One
17 pages
UNIT 3
No ratings yet
UNIT 3
34 pages
SAMPLING
No ratings yet
SAMPLING
21 pages
SAMPLING - Probability and Non Probability
No ratings yet
SAMPLING - Probability and Non Probability
11 pages
Presentation-WPS Office
No ratings yet
Presentation-WPS Office
22 pages
Statistics Is The Science Concerned With Developing and Studying Methods For Collecting, Analyzing, Interpreting and Presenting Empirical Data
No ratings yet
Statistics Is The Science Concerned With Developing and Studying Methods For Collecting, Analyzing, Interpreting and Presenting Empirical Data
4 pages
Sampling Techniques
No ratings yet
Sampling Techniques
23 pages
Sampling Concepts, Sampling Distributions & Estimation
No ratings yet
Sampling Concepts, Sampling Distributions & Estimation
21 pages
BA4101 SFM Session 7-1
No ratings yet
BA4101 SFM Session 7-1
8 pages
Portion 3
No ratings yet
Portion 3
32 pages
RM Unit 3
No ratings yet
RM Unit 3
28 pages
Sampling
No ratings yet
Sampling
34 pages
Statistical Concepts Unit-2 DA
No ratings yet
Statistical Concepts Unit-2 DA
208 pages
Sample and Sampling Method
No ratings yet
Sample and Sampling Method
7 pages
Sampling Inferential Statistics
No ratings yet
Sampling Inferential Statistics
30 pages
Chap 007
No ratings yet
Chap 007
26 pages
Tổng Hợp BT Thống Kê (2) -Đã Gộp
No ratings yet
Tổng Hợp BT Thống Kê (2) -Đã Gộp
20 pages
UNIT-V
No ratings yet
UNIT-V
13 pages
ASSIGNMENT 3
No ratings yet
ASSIGNMENT 3
6 pages
Reggie Assignment
No ratings yet
Reggie Assignment
6 pages
chap 7
No ratings yet
chap 7
8 pages
An Introduction To Sampling Methods: Population Vs Sample
No ratings yet
An Introduction To Sampling Methods: Population Vs Sample
6 pages
Chapter 5 Statistics
No ratings yet
Chapter 5 Statistics
11 pages
Sampling Methods - Types, Techniques & Examples
No ratings yet
Sampling Methods - Types, Techniques & Examples
9 pages
L5 Basic Concepts in Statistics
No ratings yet
L5 Basic Concepts in Statistics
20 pages
Stat For MGT II New (1) - 1
No ratings yet
Stat For MGT II New (1) - 1
67 pages
Samplig & Sampling Distribution
No ratings yet
Samplig & Sampling Distribution
5 pages
Sampling Procedure
No ratings yet
Sampling Procedure
11 pages
Lecture 05
No ratings yet
Lecture 05
29 pages
Sample Sampling
No ratings yet
Sample Sampling
20 pages
Sampling Methods
No ratings yet
Sampling Methods
5 pages
Random Sampling & Probability
No ratings yet
Random Sampling & Probability
54 pages
Sampling and Estimation A Level Notes (Precision)
No ratings yet
Sampling and Estimation A Level Notes (Precision)
44 pages
Sampling and Distribution
No ratings yet
Sampling and Distribution
40 pages
10 An Introduction To Sampling Methods
No ratings yet
10 An Introduction To Sampling Methods
8 pages
Sampling Techniques
No ratings yet
Sampling Techniques
21 pages
CHAPTER 1 and 2
No ratings yet
CHAPTER 1 and 2
18 pages
ROHAN BRM Assignment
No ratings yet
ROHAN BRM Assignment
6 pages
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
big-data
No ratings yet
big-data
223 pages
Unit V-Hive
No ratings yet
Unit V-Hive
10 pages
Unit V-Apache Pig
No ratings yet
Unit V-Apache Pig
10 pages
Unit V-HBase
No ratings yet
Unit V-HBase
10 pages
Analog To Analog Conversion Techniques
No ratings yet
Analog To Analog Conversion Techniques
15 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
19 pages
Cycle Time Reduction Through Jishuken Activity and Low Cost Automation (Lca)
No ratings yet
Cycle Time Reduction Through Jishuken Activity and Low Cost Automation (Lca)
7 pages
Subject-Verb Agreement
No ratings yet
Subject-Verb Agreement
27 pages
Actual Writing Drafting Guidelines - Liwanag, Julie Anne Mangaba, Angelica
No ratings yet
Actual Writing Drafting Guidelines - Liwanag, Julie Anne Mangaba, Angelica
21 pages
Chapter 1 Boyle
No ratings yet
Chapter 1 Boyle
9 pages
Syllabus ARTH367
No ratings yet
Syllabus ARTH367
7 pages
Quanti Quali - Choosing and Formulating Title
No ratings yet
Quanti Quali - Choosing and Formulating Title
8 pages
Notes TY BBA 501 Research Methodology
No ratings yet
Notes TY BBA 501 Research Methodology
33 pages
Syllabus
No ratings yet
Syllabus
5 pages
22 Intrusion - Detection - Systems - For - Smart - Home - IoT - Dev
No ratings yet
22 Intrusion - Detection - Systems - For - Smart - Home - IoT - Dev
7 pages
ENGSCI211_2021FC_Test2_Questions (1)
No ratings yet
ENGSCI211_2021FC_Test2_Questions (1)
22 pages
Cunha 2019
No ratings yet
Cunha 2019
13 pages
Reading IBZ Sample Paper 2
No ratings yet
Reading IBZ Sample Paper 2
4 pages
Lesson 5 Criteria To Consider When Constructing Good Test Items
No ratings yet
Lesson 5 Criteria To Consider When Constructing Good Test Items
22 pages
Modules 7 Introduction To Survey Design ET 2014
No ratings yet
Modules 7 Introduction To Survey Design ET 2014
39 pages
Plant Design Project II CDB 4022
No ratings yet
Plant Design Project II CDB 4022
2 pages
On Amul
No ratings yet
On Amul
36 pages
Health Care Delivery System
No ratings yet
Health Care Delivery System
9 pages
Anchrom Brochure Main HPTLC
No ratings yet
Anchrom Brochure Main HPTLC
4 pages
Sex Work As Work
No ratings yet
Sex Work As Work
4 pages
1Z0-1052-23 (5)
100% (1)
1Z0-1052-23 (5)
32 pages
BMW vs. Mercedes
No ratings yet
BMW vs. Mercedes
67 pages
Moderating Impact of Social Support On Tourist Motivations, Experience, and Satisfaction: An Analysis of Solo Women Travellers Visiting Wellness Retreat Centres
No ratings yet
Moderating Impact of Social Support On Tourist Motivations, Experience, and Satisfaction: An Analysis of Solo Women Travellers Visiting Wellness Retreat Centres
11 pages
LP 4TH Quarter
No ratings yet
LP 4TH Quarter
4 pages
MEESHO - Untitled
No ratings yet
MEESHO - Untitled
98 pages

STATISTICAL CONCEPTS-module1

Uploaded by

STATISTICAL CONCEPTS-module1

Uploaded by

MODULE- I

INTRODUCTION TO BIG DATA

 Statistics is a branch of applied or business mathematics where we

 Descriptive statistics further consists of measure of central tendency and

Descriptive statistics methods involve summarizing or describing the

In contrast, inferential statistics try to make assumptions about the

 A random variable is a variable where value is unknown or a function

1. Discrete Random Variable

 As the name Suggest, Discrete random variables consist of distinct or

 Discrete Random Variable Example:

Here the number of outcomes that can occur is either a Head or a

An example of a continuous random variable can be an experiment that involves

 Continuous Random Variable Example

► Heights of people playing Basketball.

Here Height can be any value between 4 feet’s to 7 feet’s respectively.

POPULATION AND SAMPLING

 A sampling distribution is a probability distribution of a statistic obtained

PROBABILITY NON- PROBABILITY

Probability vs. Non-Probability Samples

As a group, sampling methods fall into one of two categories.

 Probability samples. With probability sampling methods, each

Non-Probability Sampling Methods

Two of the main types of non-probability sampling methods are voluntary

 Voluntary sample. A voluntary sample is made up of people who self-

Suppose, for example, that a news show asks viewers to participate in an

 Convenience sample. A convenience sample is made up of people who

Consider the following example. A pollster interviews shoppers at a local

Probability Sampling Methods

 Simple random sampling. Simple random sampling refers to any

 Stratified sampling. With stratified sampling, the population is divided

As a example, suppose we conduct a national survey. We might divide

 Cluster sampling. With cluster sampling, every member of the

 Multistage sampling. With multistage sampling, we select a sample by

For example, in Stage 1, we might use cluster sampling to choose clusters

 Systematic random sampling. With systematic random sampling, we

This method is different from simple random sampling since every

 Resampling is the method that consists of drawing repeated samples

 The general idea that underlies statistical inference is the comparison of

You might also like