Complete Basic Stats

this notes presents the information about basic statistics

Uploaded by

Vinut P Maradur

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

5 views

Complete Basic Stats

this notes presents the information about basic statistics

Uploaded by

Vinut P Maradur

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 18

Sourav Kapil Data Scientist 4st Jan, 2023 Statistics Notes for Data Science Basic Stats Statistics Statistics is the science of collecting, organising and analysing the data. We are specifically doing this for better decision making. Based on the representation of data such as using pie charts, bar graphs, or tables, we analyse and interpret it * Data - Facts or peace of information that can be measured, For example: Age of students of a class - Data : {25, 21, 22, 20, 23) Types of Statistics ‘The types of statistics are categorised based on these features: * Descriptive Statistics © Inferential Statistics 1. Descriptive Statistics - Descriptive statistics is a means of describing features of a dataset by generating summaries about the data samples. There are 4 major types of descriptive stats - > Measure of Frequency (Count, Percent, Frequency) > Measure of Central Tendency (Mean, Median, Mode) > Measure of Dispersion and Va (Range, Variance, Standard Deviation) > Measure of Position (Percentile/Quartile Ranks) 4 The distribution concems the frequency of each value. ‘ The central tendency concerns the averages of the values. ‘ The variability or dispersion concerns how spread out the values are.Dee Ene} Prec Mean Ronge Medion Stondord deviation Mode Vorionee Intorquertile range SScribbr 2. Inferential Statistics - Inferential statistics is a way of making inferences about population based on samples. It's a technique where we use the data that we have measured to form conclusions, There are 3 major types of inferential stats - > Confidence Interval > Hypothesis Testing > Regression AnalysisPopulation Vs Sample A population is the entire group that you want to draw conclusions about, A sample is the specific group that you will collect data from, The size of the sample is always less than the total size of the population © Samples are used to make inferences about populations. © Samples are easier to collect data from because they are practical, cost-effective, convenient, and manageable. * In research, a population doesn't always refer to people. Population ‘Sample titte Population Mean And Sample Mean + The sample mean is the mean calculated from a group of random variables, drawn from the population. Compared to the population, the sample size is small. The sample size is represented by ‘n.’ The sample mean is calculated as under :- a. Sample Mean & Ot ize of sample n where, n = Add up ai= All the observations + Population mean represents the actual mean of the whole population.The population size is large, and the sample size is denoted by ‘N.! The population mean is calculated as under :- Population Mean where N = Size of the population I= Add up ai= All the observationsSampling ‘Sampling means selecting the group of data as a sample from the entire data for your research. It allows you to test hypotheses about the characteristics of a population, Reasons for sampling - > Necessity: Sometimes it's simply not possible to study the whole population due to its size or inaccessibility. > Practicality: I's easier and more efficient to collect data from a sample. > Cost-effectiveness: There are fewer participant, laboratory, equipment, and researcher costs involved > Manageability: Storing and running statistical analyses on smaller datasets is easier and reliable. Sampling Techniques ‘When you conduct research about a group of people, it’s rarely possible to collect data from every person in that group. Instead, you select a sample. The sample is the group of individuals who will actually participate in the research. To draw valid conclusions from your results, you have to carefully decide how you will select a sample that is representative of the group as a whole. This is called a sampling method. There are two primary types of sampling methods that you can use in your research: * Probability Sampling involves random selection, allowing you to make strong statistical inferences about the whole group. > Non-probability Sampling involves non-random selection based on convenience or other criteria, allowing you to easily collect data Notes :- Sampling Frame The sampling frame Is the actual list of individuals that the sample will be drawn from. Ideally, it should include the entire target population (and nobody who is not part of that population). Example: Sampling Frame You are doing research on working conditions at a social media marketing company. Your population is all 1000 employees of the company. Your sampling frame is the company's HR database, which lists the names and contact details of every employee.Probability Sampling ‘There are four main types of probability samples - > Simple Random Sampling > Systematic Sampling > Stratified Sampling > Cluster sampling ‘Simple random sample ‘Systematic sample sercere er eerert Stratified sample i tet =e woo a0 =e =e =e é é ‘SSeribbr 1. Simple Random Sampling + Ina simple random sample, every member of the population has an equal chance of being selected. Your sampling frame should include the whole population. 4 To conduct this type of sampling, you can use tools like random number generators or other techniques that are based entirely on chance.Example: You want to select a simple random sample of 1000 employees of a social media marketing company. You assign a number to every employee in the company database from 1 to 1000, and use a random number generator to select 100 numbers. Simple random sampling hi idiiii Pibidi 2. Systematic Random Sampling ‘Systematic sampling is similar to simple random sampling, but it is usually slightly easier to conduct. Every member of the population is listed with a number, but instead of randomly generating numbers, individuals are chosen at regular intervals. Example: All employees of the company are listed in alphabetical order. From the first 10 numbers, you randomly select a starting point: number 6. From number 6 onwards, every 10th person. on the list is selected (6, 16, 26, 36, and so on), and you end up with a sample of 100 people. Systematic sampling hi idiid S"_ ~~" Tut3. Stratified Sampling + Stratified sampling involves dividing the population into subpopulations that may differ in important ways. It allows you to draw more precise conclusions by ensuring that every subgroup is properly represented in the sample, 4 To use this sampling method, you divide the population into subgroups (called strata) based on the relevant characteristics (e.g., gender identity, age range, income bracket, job role) %* Based on the overall proportions of the population, you calculate how many people should be sampled from each subgroup. Then you use random or systematic sampling to select a sample from each subgroup, Example: ‘The company has 800 female employees and 200 male employees. You want to ensure that the sample reflects the gender balance of the company, so you sort the population into two strata based on gender. Then you use random sampling on each group, selecting 80 women and 20 men, which gives you a representative sample of 100 people. Stratified sampling bidi idl biid bid idii ia 4, Cluster Sampling * Cluster sampling also involves dividing the population into subgroups, but each subgroup should have similar characteristics to the whole sample. Instead of ‘sampling individuals from each subgroup, you randomly select entire subgroups. If itis practically possible, you might include every individual from each sampled cluster. If the clusters themselves are large, you can also sample individuals from within each cluster using one of the techniques above. This is called multistage sampling4% This method is good for dealing with large and dispersed populations, but there is more risk of error in the sample, as there could be substantial differences between clusters. I's dificult to guarantee that the sampled clusters are really representative of the whole population Example: ‘The company has offices in 10 cities across the country (all wth roughly the same number of employees in similar roles). You don't have the capacity to travel to every office to collect your data, so you use random sampling to select 3 offices — these are your clusters. CLUSTER SAMPLING hid ( fil bitNon-Probability Sampling ‘There are also 4 types of non-probabilty sampling - > Convenience Sampling > Purposive sampling > Snowball Sampling > Quota Sampling Convenience sample Purposive sample ¢ t. ? Sone samo vote sample é é 23% &38 ‘SScribbr 1. Convenience Sampling 4 A convenience sample simply includes the individuals who happen to be most accessible to the researcher. This is an easy and inexpensive way to gather initial data, but there is no way to tell if the sample is representative of the population, so it can't produce generalizable results, Convenience samples are at risk for both sampling bias and selection bias,Exampl You are researching opinions about student support services in your university, so after each of your classes, you ask your fellow students to complete a survey on the topic. This is a convenient way to gather data, but as you only surveyed students taking the same classes as you at the same level, the sample is not representative of all the students at your university, Convenience sampling ity tha Mgho ALVariable % In statistical research, a variable is defined as an attribute of an object of study. %* A variable is a characteristic that can be measured and that can assume different values. Height, age, income, grades obtained at school and type of housing are all ‘examples of variables. Variables may be classified into two main categories =- > Quantitative/Numerical represents amounts > Qualitative/Categorical represents groupings VARIABLE QUANTITATIVE CATEGORICAL CONTINUOS DISCRETE ORDINAL NOMINAL BINARY Hight Number fads Grates air elow Lethe eight under faints Steet cating gang ge Per fs Sly el UpDewn vx A 1. Quantitative Variable A variable that contains quantitative data is a quantitative variable. When you collect quantitative data, the numbers you record represent the real amounts that can be added, subtracted, divided, etc. There are two types of quantitative variables: > Discrete Variables > Con uous VariablesDiscrete vs Continuous ‘Type of variable What does the data Examples represent? Discrete variables Counts of individual items or | © Number of students in a (Integer variables) values. class. © Number of different tree species in a forest Continuous variables Measurements of © Distance (Ratio variables) continuous or non-inite © Volume values © Age 2. Qualitative Variable Quaitative or Categorical variables represent groupings of some kind. They are sometimes recorded as numbers, but the numbers represent categories rather than actual amounts of things. There are three types of categorical variables -- > Binary Variables > Nominal Variables > Ordinal Variables Binary vs nominal vs ordinal variables ‘Type of variable What does the data Examples represent? Binary variables Yes or no outcomes. © Headsttails in a coin flip © Winvlose in a football game Nominal variables Groups with no rank or order | © Species names between them. © Colours Brands Ordinal variables Groups that are ranked in a ‘© Finishing place in a race specific order. © Employee DesignationFrequency Distribution And Cumulative Frequency In a frequency distribution, the sum of all the frequencies ‘equal to the total number of ‘observations. But, in the cumulative frequency distribution, the last cumulative frequency is, the same as the total number of observations. ‘School Grade Frequency of Cumulative students Frequency 1 23 2B 2 20, 2320-43 3 15 43515558 4 12 58+ 12=70 5 10 70+ 10= 80 6 8 8018 =88 What is a Histogram? A histogram is a graphical representation of data points organized into user-specified ranges. Similar in appearance to a bar graph, the histogram condenses a data series into an easily interpreted visual by taking many data points and grouping them into logical ranges or bins. 0 Cite5 Chas 6 Clas 7 Classes — case 8 Lunitlenth=5 gts ‘see No. of students — 5 10 15 20 25 30 Marks obtained —Descriptive Statistics 1. Measure of Frequency distribution A data set is made up of a distribution of values, or scores. In tables or graphs, you can summarize the frequency of every possible value of a variable in numbers or percentages. This is called a frequency distribution, For the variable of gender, you list all possible answers on the left hand column, You count the number or percentage of responses for each answer and display it on the right hand column, Gender Number Male: 182 Female 235 Other a From this table, you can see that more women than men or people with another gender identity took part in the study.2. Measure of Central Tendency Measures of central tendency estimate the centre or average of a dala set. The mean, median and mode are 3 ways of finding the average. (a) Negatively skewed (b) Normal (no skew) (c) Positively skewed Mean Median Mode Mode Mode Median Median Mean x x ! x <—_—_— > Negative Direction Perfectly Symmetrical Positive Direction Distribution % The Mean or M, is the most commonly used method for finding the average. To find the mean, simply add up all response values and divide the sum by the total number of responses. The total number of responses or observations is called N. ‘Mean number of library visits Data set 15, 3, 12, 0, 24,3 ‘Sum of all values 15+3412+0+24+3-57 Total number of responses N=6 Mean Divide the sum of values by N to find M: 57/6 = 9.5 4 The Median is the value that's exactly in the middle of a data set. To find the median, order each response value from the smallest to the biggest. Then, the median is the number in the middle. If there are two numbers in the middle, find their mean.Median number of library visit Ordered data set 0,3, 3,12, 15, 24 Middle numbers 3,12 Median Find the mean of the two middle numbers: (3 + 122= 7.5 4 The Mode is the simply the most popular or most frequent response value. A data set can have no mode, one mode, or more than one mode. To find the mode, order your data set from lowest to highest and find the response that occurs most frequently Mode number of library visits Ordered dataset 0, 3, 3, 12, 15, 24 Mode Find the most frequently occurring response: 3 3. Measure of Variability or Dispersion - Measures of variability give you a sense of how spread out the response values are, The range, standard deviation and variance each reflect different aspects of spread. * Range The range gives you an idea of how far apart the most exireme response scores are. To find the range, simply subtract the lowest value from the highest value. Range of its to the library the past year - Ordered data set: 0, 3, 3, 12, 15, 24 Range: 24-0 = 24+ Standard deviation The standard deviation (s or SD) is the average amount of variability in your dataset. It tells you, on average, how far each score lies from the mean. The larger the standard deviation, the more variable the data set is. There are six steps for finding the standard deviation: List each score and find their mean, ‘Subtract the mean from each score to get the deviation from the mean. ‘Square each of these deviations. ‘Add up all of the squared deviations. Divide the sum of the squared deviations by N Find the square root of the number you found, rary in the past year - Raw data Deviation from mean Squared deviation 15 15-95-55 30.25 3 42.25 2 12-95 =25 6.25 ° o-95=.05 20.28 24 24-95 = 145 210.25 3 42.25 M=95 sum Sum of squares Step 5: 421.5/5 = 84.3 Stop 6: ¥84.3=9.18 From learning that s mean by 9.18 points. 9.18, you can say that on average, each score deviates from the+ Variance The variance is the average of squared deviations from the mean. Variance reflects the degree of spread in the data set. The more spread the data, the larger the variance is in relation to the mean. To find the variance, simply square the standard deviation. The symbol for variance is s2 Variance of visits to the library in the past year - Population Sample : Dive Dio Variance =S=— . ——— w nt Standard deviation o Data set: 15, 3, 12, 0, 24,3 18 82 = 84,3 * Z-Score ee 4S Difference between univariate, bivariate and multivariate descriptive statistics - > Univariate statistics summarize only one variable at a time. > Bivariate statistics compare two variables. > Multivariate statistics compare more than two variables,

Population and Sample
No ratings yet
Population and Sample
2 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
18 pages
Sampling Methods - Types, Techniques & Examples
No ratings yet
Sampling Methods - Types, Techniques & Examples
9 pages
Sampling
No ratings yet
Sampling
22 pages
Stats 4th Monthly Test Reviewer
No ratings yet
Stats 4th Monthly Test Reviewer
4 pages
Green Yellow Aesthetic Cute Notebook Group Project Presentation
No ratings yet
Green Yellow Aesthetic Cute Notebook Group Project Presentation
29 pages
Portion 3
No ratings yet
Portion 3
32 pages
Business Research
No ratings yet
Business Research
13 pages
Sampling Methods
No ratings yet
Sampling Methods
9 pages
Business Statistics Sampling
No ratings yet
Business Statistics Sampling
8 pages
Business Statistics Unit-Iv
No ratings yet
Business Statistics Unit-Iv
9 pages
Sampling Techniques
No ratings yet
Sampling Techniques
25 pages
Inferential Statistics
No ratings yet
Inferential Statistics
169 pages
Sampling and Sampling Distributions
100% (22)
Sampling and Sampling Distributions
78 pages
Lecture Notes - Prob and Stat
No ratings yet
Lecture Notes - Prob and Stat
229 pages
sampling techniques
No ratings yet
sampling techniques
5 pages
What Is Statistics?: Item 2000 2010 Malaysia Population
No ratings yet
What Is Statistics?: Item 2000 2010 Malaysia Population
15 pages
809-Samyak Patwa-MR Project
No ratings yet
809-Samyak Patwa-MR Project
14 pages
Sampling Techniques
No ratings yet
Sampling Techniques
23 pages
10 An Introduction To Sampling Methods
No ratings yet
10 An Introduction To Sampling Methods
8 pages
Sampling Technique
No ratings yet
Sampling Technique
52 pages
Presentation-WPS Office
No ratings yet
Presentation-WPS Office
22 pages
Intro To Statistics
No ratings yet
Intro To Statistics
37 pages
Unit 2-2 Sampling Design
No ratings yet
Unit 2-2 Sampling Design
26 pages
An Introduction To Sampling Methods
No ratings yet
An Introduction To Sampling Methods
6 pages
Sampling Randomization
No ratings yet
Sampling Randomization
23 pages
RESEARCH DEVELOPMENT Lesson 6
No ratings yet
RESEARCH DEVELOPMENT Lesson 6
17 pages
unit 2 Probability theory
No ratings yet
unit 2 Probability theory
7 pages
Business Data Analytics Students-07-Sampling PDF
No ratings yet
Business Data Analytics Students-07-Sampling PDF
50 pages
Sampling Methods
No ratings yet
Sampling Methods
11 pages
Research: Strategies and Methods
No ratings yet
Research: Strategies and Methods
34 pages
Sampling MM 2022
No ratings yet
Sampling MM 2022
63 pages
Ampling Used in Research Work
No ratings yet
Ampling Used in Research Work
8 pages
Techniques of Sampling
No ratings yet
Techniques of Sampling
5 pages
2006 - Philosophy, Methodology and Action Research
No ratings yet
2006 - Philosophy, Methodology and Action Research
43 pages
Sampling Methods
No ratings yet
Sampling Methods
24 pages
Sampling and Distribution
No ratings yet
Sampling and Distribution
40 pages
Research Methodology
No ratings yet
Research Methodology
32 pages
DR. Waqar Al - Kubaisy
No ratings yet
DR. Waqar Al - Kubaisy
44 pages
Topic 4 Sampling Methods Types and Techniques
No ratings yet
Topic 4 Sampling Methods Types and Techniques
20 pages
Research Sampling Methods
No ratings yet
Research Sampling Methods
4 pages
PME Lec1. Sampling 13dec
No ratings yet
PME Lec1. Sampling 13dec
48 pages
Sampling Methods
No ratings yet
Sampling Methods
35 pages
Population and Sample
No ratings yet
Population and Sample
10 pages
CHAPTER 1 and 2
No ratings yet
CHAPTER 1 and 2
18 pages
DAT100_Int_Data_Ana_Lec4_Obtaining_Data
No ratings yet
DAT100_Int_Data_Ana_Lec4_Obtaining_Data
30 pages
4th Unit - Statistics (1)
No ratings yet
4th Unit - Statistics (1)
13 pages
Sampling Error: in Statistics, Sampling Error Is Incurred When The Statistical Characteristics of
No ratings yet
Sampling Error: in Statistics, Sampling Error Is Incurred When The Statistical Characteristics of
15 pages
Introduction, Lecture 1
No ratings yet
Introduction, Lecture 1
14 pages
3sampling True
No ratings yet
3sampling True
43 pages
STATISTICAL CONCEPTS-module1
No ratings yet
STATISTICAL CONCEPTS-module1
9 pages
Introduction to Biostatistics
No ratings yet
Introduction to Biostatistics
67 pages
Sampling Techniques TULIO JO GABRIEL
No ratings yet
Sampling Techniques TULIO JO GABRIEL
35 pages
Sampling Methods - Types, Techniques & Examples
No ratings yet
Sampling Methods - Types, Techniques & Examples
16 pages
Sem 6 - DSV - Unit 4 - Sampling and Estimation
No ratings yet
Sem 6 - DSV - Unit 4 - Sampling and Estimation
50 pages
SAMPLING DISTRIBUTION 1autorecovered 310922401106253550
No ratings yet
SAMPLING DISTRIBUTION 1autorecovered 310922401106253550
92 pages
NCM 111a Notes - 2
No ratings yet
NCM 111a Notes - 2
3 pages
Sampling Procedure
No ratings yet
Sampling Procedure
11 pages
3 Sampling and Data Gathering Techniques
No ratings yet
3 Sampling and Data Gathering Techniques
38 pages

Complete Basic Stats

Uploaded by

Complete Basic Stats

Uploaded by

You might also like