0% found this document useful (0 votes)
54 views

MMW (Data Management) - Part 1

The document discusses key concepts in data management and statistics including descriptive statistics, inferential statistics, populations and samples, parameters and statistics, variables, data types, levels of measurement, measures of central tendency, and measures of dispersion. Examples are provided to illustrate computing measures such as mean, median, mode, range, variance, and standard deviation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

MMW (Data Management) - Part 1

The document discusses key concepts in data management and statistics including descriptive statistics, inferential statistics, populations and samples, parameters and statistics, variables, data types, levels of measurement, measures of central tendency, and measures of dispersion. Examples are provided to illustrate computing measures such as mean, median, mode, range, variance, and standard deviation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

MATHEMATICS OF THE

MODERN WORLD
Data Management
Luzviminda T. Orilla , PhD
A Mathematical Tool:
Data Management
What is statistics?

 Statistics
is a branch of applied mathematics concerned
with collecting, organizing, and interpreting data. It
attempts to infer the properties of a large collection of
data from inspection of a sample of the collection
thereby allowing educated guesses to be made with a
minimum of expense.
MAIN BRANCHES OF STATISTICS

Descriptive Statistics refers to the


collection, presentation, and summary of
data (either using charts and graphs or
using a numerical summary).
Inferential Statistics refers to generalizing
from a sample to a population, estimating
unknown population parameters, drawing
conclusions, and making decisions.
Population and Sample
Population refers to all the items (infinite or finite) that we are
interested in. It consists of the totality of the observations,
individuals, or objects in which the investigator/researcher is
interested in.
Sample is a subset or portion of the population. It involves
looking only at some items selected from a population.
Situations where the sample maybe Situations where a population
preferred maybe preferred
1. Infinite Population 1. Small Population
2. Destructive Testing 2. Large Sample Size
3. Timely Results 3. Database Exists
4. Accuracy 4. Legal Requirements
5. Cost
6. Sensitive Information
PARAMETER and STATISTIC

 Parameter – is a value calculated using all the


data from a population.
 Statistic– is a value calculated using the data
from the sample
What is a variable?
 A VARIABLE is a characteristic of interest about an object under
investigation that can take on different possible outcomes, such as
age, hair, color, height, weight, and religious preference.
 Two kinds of Variables
QUALITATIVE VARIABLES – These are variables that can be placed into
distinct categories, according to some characteristics or attributes.
QUANTITATIVE VARIABLES – These are numerical and can be ordered
or ranked. Also, these consist of two types: Discrete and Continuous.
 Discrete are frequencies, obtained by means of counting.
 Continuous are represented by measurement values.
DATA

 Data is a set of values collected from the variable from


each of the subjects that belong to the sample. It refers
to a collection of natural phenomena descriptors such as
results from experiences, observations or experiments, or
a set of premises. It may consist of numbers, words, or
images.
 Data can be classified according to the type of variable for
which it was drawn. There are two general types of data
according to how the data vary across cases:
TYPES OF DATA
 Categorical data have values that are described by words rather
than numbers. It is of limited statistical use. On occasions, the values
of these variables might be represented using numbers. This is called
coding. Example: 1=cash; 2= check ; 3 = credit/debit; 4 = gift card
 Coding a category as number does not make the data numerical and
the numbers do not typically imply a rank. Example: 1 = Bachelor’s ;
2= Master’s; 3=Doctorate
 Numerical Data arise from counting, measuring something, or some
kind of mathematical operation. Example: number of insurance
claims; sales for the last quarter; accounting data; economic
indicators; financial ratios.
 Two types of Numerical data: discrete (distinct number or integer)
and continuous (any value within an interval).
LEVELS OF MEASUREMENT

NOMINAL LEVEL
*From the Latin nomen, meaning “name” and the weakest level of
measurement
*It merely identify a category
*These are data same as “qualitative”, ”categorical”, or “classification”
*These data are being coded numerically. The codes are arbitrary
placeholders with no numerical meaning.
*With these data, the only permissible mathematical operations are
counting (e.g., frequencies)
ORDINAL LEVEL
*Ordinal data codes connote a ranking of data values.
*It can be treated as nominal but not vice versa.
*There is no clear meaning to the distance between 1 &2 or
between 2 & 3, or between 3 & 4 (no clear meaning between
“rarely” and “ never”).
INTERVAL LEVEL
*It is a rank data and has meaningful interval between scale
points.
*Since intervals between numbers represent distances,
mathematical operations can be done such as taking the “average.
*The absence of zero is a key characteristic of interval data.
RATIO LEVEL
*It has all the properties of the other three data types and being
considered as the strongest level of measurement.
*It posses a meaningful zero that represents the absence of the
quantity being measured.
Data: (male, female, male, male, female)

Table 1: Respondents in terms sex, n=5

Sex Frequency %
Male 3 60
Female 2 40
TOTAL 5 100

R= f/N (100%)
MEASURES OF CENTRAL TENDENCY
Types of Measures for Center
 Once the data are collected, it is useful to summarize the
data set by identifying a value around which the data are
centered.

Mode – is the most frequently occurring number in a


data set.
Median – is the middle number or the mean of the
two middle numbers in an ordered set of data.
Mean – is the numerical balancing point of the data
set.
 The mean is easy to compute. You only deal with one
number. It is not so with the median.
 The mean is affected by outliers while the median is
resistant. In a sense, the median is able to resist the
pull of a far away value, but the mean is drawn to such
values.
 A change in any of the numbers changes the mean,
and the mean can be changed drastically by changing
an extreme value.
 In contrast, the median and the mode of a set of data
are usually not changed by changing an extreme value.
 The mean, the median, and the mode are all averages;
however, they are generally not equal.
Example
 Which measure of center is most useful?
 A teacher wants to know about her students family
situation. She asks for the number of children in their
families:
6 3 2 3 4 1 2 2 4 3 1 2 2 4

 A shoe manufacturer wants to know the average shoe size


of women.

 Another teacher wants to know how well her class


performed in a long test.
Compare the mean, the median, and the mode for the salaries of
5 employees of a small company.

Salaries: P370,000 P60,000 P36,000 P20,000 P20,000

Mean = P101,200
Median = P 36,000
Mode = P 20,000

Most of the employees of this company would probably


agree that the median of P36,000 better represents the average of
the salaries than does either the mean or the mode.
Measures of Dispersion
Types of Measures of Dispersion or Variability
Another important feature that can help us understand more about
a data set is the manner in which the data are distributed.
 Range is the difference between the largest value (maximum) and
the smallest value (minimum) in the data.

 Standard deviation is an extremely important measure of spread


that is based on the mean. It is a measure of the average deviation
for all of the data point from the mean.

 Variance is the square of the standard deviation of the data. It does


not use the same unit of measure as the original data.
Illustration in Computing
 EXAMPLE: A consumer group has tested a sample of 8
size-AAA batteries from each of 3 companies. The
results of the tests are shown in the following table.
According to these tests, which company produces
batteries for which the values representing hours of
constant use have the smallest standard deviation?

Ans. Company Dependable


Computing using Excel

s=1.3277
Computing using Excel

S=0.7191
Computing using Excel

S=0.8767

You might also like