Data Collection and Presentation
Data Collection and Presentation
OBJECTIVES:
DISCUSSION:
Data Management
Data management is the development, execution and supervision of plan and policies, programs and
practices that control and protect, deliver and enhance the value of data and information assets. It is an
administrative process by which that required data is a acquired, validated, stored, protected, and
processed, and by which its accessibility, reliability, and timeliness is ensured to satisfy the needs of the
data users.
The process is actually what statistics is all about. Statistics is a branch of mathematics dealing with the
collection, organization, presentation, analysis and interpretation of data. Statistical treatment of data is
essential in order to make use of data in the right form. Raw data collection is only one aspects of any
experiment ; the organization of data is equally important so that appropriate conclusions can be drawn.
This is what statistical treatment of data is all about. There are many techniques involved in statistics that
treat data in the required manner. Statistical treatment of data is essential in all experiments, whether
social, scientific or any other form. Statistical of data greatly depends on the kind of experiment and the
desired result from the experiment (Kalla, 2009).
An investigation should always be based accurate data which requires good management.
Correct methods of collecting data, right way of organizing them and good data presentation will result a
precise analysis and interpretation.
All the various steps of data processing should be planned when the study is designed, before
any data are connected. All forms used for recording data should be carefully designed and tested to
ensure that the data can easily be extracted for processing. This general principle applies to all data-
collection forms: questionnaires, as well as forms for recording results from laboratory assays, data
extraction forms from hospital notes, etc. .
1. Data gathering
There are different methods used in gathering or collecting data. These are:
Take note that the data set obtained using the interval and ratio scale is called measured data.
There are different ways or forms to present data. These are: textual, tabular and graphical.
Textual form makes use of words, sentences and paragraphs in presentation. It is commonly
used when there are only few numerical data to be enumerated or to be compared with other
data.
Tabular form is a systematic presentation of data in rows and columns. It is used when related
numerical facts need to be classified in arrays.
Graphical form shows numerical values or relationships in a pictorial form. It makes use of
graphs, symbols or visual aids.
It should be simple.
It should focus the reader’s attention on the data rather than on the form.
It should make the meanings and significance of information being presented clear.
Heading which shows the table number, title, and head note.
The title is a brief statement of the nature, classification and time reference of the
information presented and the area to which the statistics refer.
The head note is a statement enclosed in brackets between the title and the top rule of
the table that provides additional information.
Box Head is the portion that contains the column heads, which describe the data in each column.
Stub is the first column on the left of the table, which describes the data on the given row.
Footnote is a statement inserted at the bottom of the table.
Source note is the exact citation of the source of data which is usually include to acknowledge
the origin of the data.
Example:
Graphical Presentation
Accurate- the dimensional aspects should reflect the highest degree of accuracy
possible within the practical limits imposed by expert draftsman or the electronic
computer being used. It should not be deceptive, distorted, or misleading or in any way
susceptible to wrong interpretation as a result of inaccurate or careless construction.
Simple- the basic design should be simple and straight forward and not loaded with
irrelevant, or trivial symbols and ornamentation.
Clear- it should be easily read understood. There should be a forceful and unmistakable
focus of the message that the graph is trying to communicate and there should be a
truthful and unambiguous representation of the facts and that the message it conveys is
meaningful.
Attractive- it is designed and constructed to attract and hold the attention by holding a
neat, dignified, and professional appearance. It should be stylish.
Line graph is used when (1) data cover a long period of time; (2) several series are
compared; (3) movements are to be emphasized; (4) trends are to be established; (5)
estimates are to be forcasted.
Example:
Bar graph is used when numerical values of an item over a period of time are compared.
It consists of regular bars where the height of bars represents quantity of frequency for
each category.
Example:
When you conduct research about a group of people, it’s rarely possible to collect data from every person
in that group. Instead, you select a sample. The sample is the group of individuals who will actually
participate in the research.
To draw valid conclusions from your results, you have to carefully decide how you will select a sample
that is representative of the group as a whole. There are two types of sampling methods:
Probability sampling involves random selection, allowing you to make strong statistical inferences about
the whole group.
Non-probability sampling involves non-random selection based on convenience or other criteria, allowing
you to easily collect data.
You should clearly explain how you selected your sample in the methodology section of your paper or
thesis.
Population vs sample
First, you need to understand the difference between a population and a sample, and identify the target
population of your research.
The sample is the specific group of individuals that you will collect data from.
The population can be defined in terms of geographical location, age, income, and many other
characteristics.
It can be very broad or quite narrow: maybe you want to make inferences about the whole adult
population of your country; maybe your research focuses on customers of a certain company, patients
with a specific health condition, or students in a single school.
It is important to carefully define your target population according to the purpose and practicalities of your
project.
If the population is very large, demographically mixed, and geographically dispersed, it might be difficult
to gain access to a representative sample.
Sampling frame
The sampling frame is the actual list of individuals that the sample will be drawn from. Ideally, it should
include the entire target population (and nobody who is not part of that population).
You are doing research on working conditions at Company X. Your population is all 1000 employees of
the company. Your sampling frame is the company’s HR database which lists the names and contact
details of every employee.
Sample size
The number of individuals you should include in your sample depends on various factors, including the
size and variability of the population and your research design. There are different sample size
calculators and formulas depending on what you want to achieve with statistical analysis.
Probability sampling means that every member of the population has a chance of being selected. It is
mainly used in quantitative research. If you want to produce results that are representative of the whole
population, probability sampling techniques are the most valid choice.
In a simple random sample, every member of the population has an equal chance of being selected. Your
sampling frame should include the whole population.
To conduct this type of sampling, you can use tools like random number generators or other techniques
that are based entirely on chance.
Example
You want to select a simple random sample of 100 employees of Company X. You assign a number to
every employee in the company database from 1 to 1000, and use a random number generator to select
100 numbers.
2. Systematic sampling
Systematic sampling is similar to simple random sampling, but it is usually slightly easier to
conduct. Every member of the population is listed with a number, but instead of randomly
generating numbers, individuals are chosen at regular intervals.
Example
All employees of the company are listed in alphabetical order. From the first 10 numbers, you
randomly select a starting point: number 6. From number 6 onwards, every 10th person on the
list is selected (6, 16, 26, 36, and so on), and you end up with a sample of 100 people.
If you use this technique, it is important to make sure that there is no hidden pattern in the list
that might skew the sample. For example, if the HR database groups employees by team, and
team members are listed in order of seniority, there is a risk that your interval might skip over
people in junior roles, resulting in a sample that is skewed towards senior employees.
3. Stratified sampling
Stratified sampling involves dividing the population into subpopulations that may differ in
important ways. It allows you draw more precise conclusions by ensuring that every subgroup is
properly represented in the sample.
To use this sampling method, you divide the population into subgroups (called strata) based on
the relevant characteristic (e.g. gender, age range, income bracket, job role).
Based on the overall proportions of the population, you calculate how many people should be
sampled from each subgroup. Then you use random or systematic sampling to select a sample
from each subgroup.
Example
The company has 800 female employees and 200 male employees. You want to ensure that the
sample reflects the gender balance of the company, so you sort the population into two strata
based on gender. Then you use random sampling on each group, selecting 80 women and 20
men, which gives you a representative sample of 100 people.
ROMMEL H. SARREAL, RME
INSTRUCTOR I
STAT 2
4. Cluster sampling
Cluster sampling also involves dividing the population into subgroups, but each subgroup
should have similar characteristics to the whole sample. Instead of sampling individuals from
each subgroup, you randomly select entire subgroups.
If it is practically possible, you might include every individual from each sampled cluster. If the
clusters themselves are large, you can also sample individuals from within each cluster using
one of the techniques above.
This method is good for dealing with large and dispersed populations, but there is more risk of
error in the sample, as there could be substantial differences between clusters. It’s difficult to
guarantee that the sampled clusters are really representative of the whole population.
Example
The company has offices in 10 cities across the country (all with roughly the same number of
employees in similar roles). You don’t have the capacity to travel to every office to collect your
data, so you use random sampling to select 3 offices – these are your clusters.
In a non-probability sample, individuals are selected based on non-random criteria, and not
every individual has a chance of being included.
This type of sample is easier and cheaper to access, but it has a higher risk of sampling bias.
That means the inferences you can make about the population are weaker than with probability
samples, and your conclusions may be more limited. If you use a non-probability sample, you
should still aim to make it as representative of the population as possible.
Non-probability sampling techniques are often used in exploratory and qualitative research. In
these types of research, the aim is not to test a hypothesis about a broad population, but to
develop an initial understanding of a small or under-researched population.
A convenience sample simply includes the individuals who happen to be most accessible to the
researcher.
This is an easy and inexpensive way to gather initial data, but there is no way to tell if the
sample is representative of the population, so it can’t produce generalizable results.
Example
You are researching opinions about student support services in your university, so after each of
your classes, you ask your fellow students to complete a survey on the topic. This is a
convenient way to gather data, but as you only surveyed students taking the same classes as
you at the same level, the sample is not representative of all the students at your university.
Voluntary response samples are always at least somewhat biased, as some people will
inherently be more likely to volunteer than others.
You send out the survey to all students at your university and a lot of students decide to
complete it. This can certainly give you some insight into the topic, but the people who
responded are more likely to be those who have strong opinions about the student support
services, so you can’t be sure that their opinions are representative of all students.
3. Purposive sampling
This type of sampling, also known as judgement sampling, involves the researcher using their
expertise to select a sample that is most useful to the purposes of the research.
It is often used in qualitative research, where the researcher wants to gain detailed knowledge
about a specific phenomenon rather than make statistical inferences, or where the population is
very small and specific. An effective purposive sample must have clear criteria and rationale for
inclusion.
Example
You want to know more about the opinions and experiences of disabled students at your
university, so you purposefully select a number of students with different support needs in order
to gather a varied range of data on their experiences with student services.
4. Snowball sampling
If the population is hard to access, snowball sampling can be used to recruit participants via
other participants. The number of people you have access to “snowballs” as you get in contact
with more people.
Example
You are researching experiences of homelessness in your city. Since there is no list of all
homeless people in the city, probability sampling isn’t possible. You meet one person who
agrees to participate in the research, and she puts you in contact with other homeless people
that she knows in the area.