Discriptive statics
Discriptive statics
2.Data is defined as distinct pieces of information and it can come in many forms.
From numbers in a spreadsheet, text to video and databases, to images and audio
recordings, utilizing data in its different forms is the new way of the world.
Data is used to understand and improve nearly every facet of our lives. So, no
matter what field you are in, you can utilize data to make better decisions and
accomplish your goals.
We will start this lesson with an overview of data types and the most common
statistics used when analyzing data.
We'll discuss :
Categorical is used to label a group or set of items (like dog breeds - Collies,
Labs, Poodles, etc.).
5. Categorical Ordinal vs. Categorical Nominal
We can divide categorical data further into two types: Ordinal and Nominal.
Categorical Nominal data do not have an order or ranking (like the breeds of the
dog).
6. Continuous vs. Discrete
We can think of quantitative data as being either continuous or discrete.
Continuous data can be split into smaller and smaller units, and still a smaller
unit exists. An example of this is the age of the dog - we can measure the units of
the age in years, months, days, hours, seconds, but there are still smaller units
that could be associated with the age.
Discrete data only takes on countable values. The number of dogs we interact with
is an example of a discrete data type.
Another Look
To break down our data types, there are two main blocks:
You should have now mastered what types of data in the world around us falls into
each of these four buckets: Discrete, Continuous, Nominal, and Ordinal. In the next
sections, we will work through the numeric summaries that relate specifically to
quantitative variables.
Height, Age, the Number of Pages in a Book, and Annual Income all take on values
that we can add, subtract and perform other operations with to gain useful insight.
Hence, these are quantitative.
Gender, Letter Grade, Breakfast Type, Marital Status, and Zip Code can be thought
of as labels for a group of items or individuals. Hence, these are categorical.
Alternatively, the Letter Grade or Survey Ratings have a rank ordering associated
with it, as ordinal data. If you receive an A, this is higher than an A-. An A- is
ranked higher than a B+, and so on... Ordinal variables frequently occur on rating
scales from very poor to very good. In many cases, we turn these ordinal variables
into numbers, as we can more easily analyze them, but more on this later!
Final Words
In this section, we looked at the different data types we might work with in the
world around us. When we work with data in the real world, it might not be very
clean - sometimes there are typos or missing values. When this is the case, simply
having some expertise regarding the data and knowing the data type can assist in
our ability to ‘clean’ this data. Understanding data types can also assist in our
ability to build visuals to best explain the data. But more on this very soon!
9. Analyzing Quantitative Data
Four Aspects for Quantitative Data
There are four main aspects to analyzing Quantitative data.
Measures of Center
Measures of Spread
The Shape of the data.
Outliers
Analyzing Categorical Data
Though not discussed in the video, analyzing categorical data has fewer parts to
consider. Categorical data is analyzed usually by looking at the counts or
proportion of individuals that fall into each group. For example, if we were
looking at the breeds of the dogs, we would care about how many dogs are of each
breed, or what proportion of dogs are of each breed type.
Measures of Center
There are three measures of center:
Mean
Median
Mode
The Mean
In this video, we focused on the calculation of the mean. The mean is often called
the average or the expected value in mathematics. We calculate the mean by adding
all of our values together and dividing by the number of values in our dataset.
11. The Median
The median splits our data so that 50% of our values are lower and 50% are higher.
We found in this video that how we calculate the median depends on if we have an
even number of observations or an odd number of observations.
Whether we use the mean or median to describe a dataset is largely dependent on the
shape of our dataset and if there are any outliers. We will talk about this in just
a bit!
13. The Mode
The mode is the most frequently observed value in our dataset.
No Mode
If all observations in our dataset are observed with the same frequency, there is
no mode. If we have the dataset:
1, 1, 2, 2, 3, 3, 4, 4
There is no mode because all observations occur the same number of times.
Many Modes
If two (or more) numbers share the maximum value, then there is more than one mode.
If we have the dataset:
1, 2, 3, 3, 3, 4, 5, 6, 6, 6, 7, 8, 9
There are two modes 3 and 6, because these values share the maximum frequencies at
3 times, while all other values only appear once.
15. Notation
Notation is a common language used to communicate mathematical ideas. Think of
notation as a universal language used by academic and industry professionals to
convey mathematical ideas. In the next videos, you might see things that seem
confusing. Use the quizzes to assist with your understanding of the concepts.
You likely already know some notation. Plus, minus, multiply, division, and equal
signs all have mathematical symbols that you are likely familiar with. Each of
these symbols replaces an idea for how numbers interact with one another. In the
coming concepts, you will be introduced to some additional ideas related to
notation. Though you will not need to use notation to complete the project, it does
have the following properties:
Understanding how to correctly use notation makes you seem really smart. Knowing
how to read and write in notation is like learning a new language. A language that
is used to convey ideas associated with mathematics.
It allows you to read documentation, and implement an idea to your own problem.
Notation is used to convey how problems are solved all the time. One really popular
mathematical algorithm that is used to solve some of the world's most difficult
problems is known as Gradient Boosting. The way that it solves problems is
explained here(opens in a new tab). If you really want to understand how this
algorithm works, you need to be able to read and understand notation.
It makes ideas that are hard to say in words easier to convey. Sometimes we just
don't have the right words to say. For those situations, I prefer to use notation
to convey the message. Similar to the way an emoji or meme might convey a feeling
better than words, the notation can convey an idea better than words. Usually,
those ideas are related to mathematics, but I am not here to stifle your
creativity.