Scientific Data
Scientific Data
Dr Anand Prakash
Department of Zoology
SVP College, Bhabua
What is data
Data is a collection of collection of measurements or observations in the form of numbers, text, sound, images, behaviors,
test or in any other format.
On the basis of nature, data can be
Qualitative data refers to information about qualities, or information that cannot be measured. It’s usually descriptive and
textual. Examples: eye color or the type of car. In surveys, it’s often used to categories ‘yes’ or ‘no’ answers.
Quantitative data is numerical. It’s used to define information that can be counted. Examples include distance, speed,
height, length and weight.
On the basis of the source of data it could be
Primary data: Data obtained directly from the event or experiments
Secondary data: Data which is obtained from the other sources like news paper, internet, metrological dept, pollution
department etc.
What is Information
Data when processed, organized, or structured in a way so that it produces a meaningful, valuable and useful conclusion is
called Information which gives knowledge, understanding and insights that can be used for decision-making , problem-
solving, communication and various other purposes.
Levels of Measurements
• To measure or observe data, we need to an object that can measure or quantify any event
• This object or device is called scale
Examples of Nominal Scale data
Total four different scales we wise to measure the data • Gender: Male or female
1. Nominal Scale • Race/cast/religion: The Race/cast/religion of a
2. Ordinal Scale person
• Blood type: The blood type of a person
3. Interval Scale
• Eye color: The color of a person's eyes
4. Ratio Scale • Marital status: Single, married, or divorced
• Type of car: Sedan, SUV, or truck
• Mode of transportation: Bus, train, car, bike, or
1. Nominal scale walking
• Continent: North America, South America, Asia,
• Nominal scales are also called “names” or labels. Europe, Africa, or Australia
• Behavioral pattern: Extroverted/ Introverted
• Nominal scales variables are used for labeling/ classifying/ categories/ discriminate, without any quantitative value.
• Data from Nominal scales are mutually exclusive (no overlap) and non-numeric (quantitative) none of them have
any numerical significance.
Example: Please list the type of blood group
• There is no order or ranking between the categories 1-A 2-AB 3-B
In this particular example, Here numbers are simply used as tags and
have no value.
2. Ordinal Scale
• Ordinal scale is the 2nd level of measurement that reports the ranking and ordering of the data without the distance
between the variables. The ordinal scale cannot answer “how much” different the two categories are?
• It is used for qualitative data
• Ordinal scale date include ratings about opinions or perceptions, or demographic factors that are categorized into
levels or brackets
• Ordinal variables are useful in social science research
• Ordinal data/ variables are collected using closed-ended survey questions, opinion polls, survey to compare data between
participants.
Example of Ordinal scale
• Olympic medal positions
• Out of the five mentioned laptop brand, rate the order of preference – • Level of pain on a pain scale
1. HP 2. Apple 3.Lenovo 4.Dell 5.Acer • Siblings' ages
• Ranking in a class/ company
You can not answer “how much” different the two categories are • Customer satisfaction
• Socio-economic background
• Frequency of occurrence
3. Interval Scale
• An interval scale is defined as a type of scale where the distance between any two points can be measured, requiring a
zero point and a unit of measurement.
• An example is the Celsius temperature scale, which is referenced to the melting point of ice, with each temperature
measurement located a specific number of degrees above or below this reference point.
• The interval scale is quantitative as it can quantify the difference between the values
• It allows calculating the mean and median of the variables
• To understand the difference between the variables, you can subtract the values between the variables
• The interval scale is the preferred scale in Statistics as it helps to assign any numerical values to arbitrary assessment
such as feelings, calendar types, etc.
• Ratio scale is the 4th level of measurement and possesses a true zero point or character of
origin. This is a unique feature of this scale.
• Ratio scale helps to understand the ultimate-order, interval, values, and the true zero characteristic is
an essential factor in calculating ratios
• Example
• Temperature in Kelvin
• Height in feet and inches
• distance in miles or kilometers
• Age in years
• Price of goods in dollar
The temperature outside is 0-degree Celsius. 0 degree
doesn’t mean it’s not hot or cold, it is a value
Discreate and continuous data (Variables)
Discreate Variation : When data or variables we observe, are found in a form of countable number like 1, 2, 3, 4,5…………
For examples: Number of house in Bhabua, Number of students in colleges, number of bulb produced in a company
Continuous Variation : Data which we observed are found to have any real value (it could be in fraction). There are infinite
possibilities exists between two limits. 1.1, 1.21, 1.22 etc.
like height of human/plant, Size of planet and moons in our Galaxy.
Binary Data: A binary data only takes on two possible values. For example, lamp is on or lamp is off, answer is true
or false, 0 or 1, yes or no, Head and tails, Male and females birth, Black and white etc.
Sample and sampling
Sample
• When you wish to conduct research for a data with large numbers
like production of eclectic wires, scales, thermometers, or large
population size (consider population of Kaimur district).
• It’s rarely possible to collect a data from every item/person from that
group.
• So, we take few items or individual from the group, which is called
sample.
• The sample is the group of individuals who will actually participate in
the research/study.
Imagin your Doctor takes your blood sample to check your health status
Sampling
• To draw valid conclusions from your results, you have to carefully decide
how you will select a sample that is representative of the group as a whole
is called a sampling.
Sampling Methods
Statistics and
numbers lack
value without
skilled analysis.
Raw and Arrayed data
Raw Data: The data expressed in a way as they were collected are called raw data.
Arrayed Data: Data arranged in ascending or descending order are called arrayed data.
Simple frequency table
Example: Marks obtained by 50 students in a test of 100 marks. Marks Number of
Marks obtained by the 50 students: Obtained Students
4 3
80, 70, 0, 20, 20, 45, 50, 65, 30, 50, 70, 20, 4, 90, 49, 40, 45, 30, 30, 50, 20 7
20, 80, 39, 30, 50, 50, 70, 70, 20, 40, 90, 30, 40, 50, 65, 45, 70, 79, 20, 30 7
39 1
4, 30, 50, 20, 45, 50, 45, 90, 30, 4, 50 40 3
45 5
49 1
Data arranged in an ascending order 50 9
4, 4, 4, 20, 20, 20, 20, 20, 20, 20, 30, 30, 30, 30, 30, 30, 30, 39, 40, 40, 65 2
40, 45, 45, 45, 45, 45, 49, 50, 50, 50, 50, 50, 50, 50, 50, 50, 65, 65, 70, 70 5
79 1
70, 70, 70, 70, 79, 80, 80, 90, 90, 90, 90. 80 2
90 4
Total 50
The data displayed in the given table can be arranged in a shorter form by
grouping the whole collection called distribution table
Framing a continuous distribution table form arrayed data
In order to make a continuous distribution table form simple frequency table 16 Frequency
15-30; 14
We have to decide Frequency distribution table 14
frequency
8 60-75; 7 75-90; 7
Class class limit y 6
0 15 15 3 4 0-15; 3
15 30 30 14 2
30 45 45 9 0
Calculate 45 60 60 10 0-15 15-30 30-45 45-60 60-75 75-90
Class Interval
1. Number of Class 60 75 75 7
Rule 1- Identify lowest Value here it is 4 75 90 90 7 0-15; 3
75-90; 7
Rule 2- Identify lowest Value here it is 90 Total 50
Difference (H-L): 90-4= 86
Through struggle rule number of class (k) can be 60-75; 7
15-30; 14
Number of goals in Cumulative Example: The number of goals scored per match by Kriti during a
Frequency
a football Mach Frequency hockey season was recorded. What is the median number of goals
scored by Kriti during a game?
0 1 1
1 6 7
2 7 14
3 2 16
Therefore, the median is the arithmetic mean
4 3 19 of (20/2)th and {(20/2)+1}th observation =
5 1 20 10th and 11th observation.
Example 1) The posted speed limit along a busy highway is 80 Km/h. The following values represent the speeds
( in Km/h) of 10 cars that were stopped for violating the speed limit:
96, 101, 99, 100, 98, 103, 97, 99, 102, 95
What is the mode?
Arrange 95 96 97 98 99 99 100 101 102 103
Mode= 99
Example 2) The following table represents the number of times that 100 randomly selected students ate at
the school cafeteria during the first month of school:
What is the mode of the number of times a student ate at the cafeteria?
Number of times 2 3 4 5 6 7 8
Number of students 3 8 22 29 20 8 10
Where,
f2 = 14
The shoe size of 155 people was recorded and
the raw data was presented in the form of the
following frequency table:
Types of vulnerable data
• Personal information, such as names, addresses, and social security numbers
• Financial information, such as bank details and credit cards
• Medical information, such as medical history and test results
• Corporate information, such as company secrets and customer lists
• https://studyonline.unsw.edu.au
• https://www.geeksforgeeks.org
• https://www.graphpad.com/support/
• https://www.sciencedirect.com/topics/computer-science/interval-scale
• https://flexbooks.ck12.org/cbook/ck-12-cbse-math-class-10/section/14.4/primary/lesson