0% found this document useful (0 votes)

18 views

Data v2

Uploaded by

priscillahkanyuaigwa

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Data v2

Uploaded by

priscillahkanyuaigwa

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Data?

Data is everywhere

Something given or admi ed especially as a basis for reasoning or inference

Data science is a mul disciplinary blend of data inference, algorithm development, and technology in
order to solve analy cally complex problems

The sexiest job of the twenty-ﬁrst century

We will consider data science as a ﬁeld of study and prac ce that involves the collec on, storage, and
processing of data in order to derive important insights into a problem or a phenomenon. Such data
may be generated by humans (surveys, logs, etc.) or machines (weather data, road vision, etc.), and
could be in diﬀerent formats (text, audio, video, augmented or virtual reality, etc.).

Why is data science so important now?

The number of job pos ngs for ‘data scien st’ grew 57%” year-over- year in the ﬁrst quarter of 2015.
both industry and academia recently increased their demand for data science and data scien sts. The
answer is not surprising: we have a lot of data, we con nue to generate a staggering amount of data
at an unprecedented and ever-increasing speed, analyzing data wisely necessitates the involvement of
competent and well-trained prac oners, and analyzing such data can provide ac onable insights.

The “3V model” a empts to lay this out in a simple (and catchy) way. These are the three Vs:

1. Velocity: The speed at which data is accumulated.

2. Volume: The size and scope of the data.

3. Variety: The massive array of data and types (structured and unstructured).

Informa on vs. Data

Data as (1) fact, (2) signal, and (3) symbol. Here, informa on is diﬀeren ated from data in that it is
“useful.”

Where Do We See Data Science?

The ques on should be: Where do we not see data science these days? Its unlimited; its everywhere.
Increase of data volume in last 15 years.

Educa on Eg big data, data driven learning tech, online

tools

Libraries

Urban Planning Urban informa cs, traﬃc, bins

Healthcare Medical data, ﬁngerprints, Fitbit devices, Apple

watches

Poli cs Elec on campaigns, demography

Security

Public Policy eCi zen

Finance Bank, fraud, CRB,

Business Malls, markets, UBER,

How Does Data Science Relate to Other Fields?

Data Science, Social Science;

Data Science and Business Analy cs; Business analy cs (BA) refers to the skills, technologies, and
prac ces for con nuous itera ve explora on and inves ga on of past and current business
performance to gain insight and be strategic.

There are four types of analy cs, each of which holds opportuni es for data scien sts in business
analy cs:30

• Decision analy cs: supports decision-making with visual analy cs that reﬂect reasoning.

• Descrip ve analy cs: provides insight from historical data with repor ng, score cards,
clustering, etc.
• Predic ve analy cs: employs predic ve modeling using sta s cal and machine learning
techniques.

• Prescrip ve analy cs: recommends decisions using op miza on, simula on, etc.

Data Science and Engineering;

Broadly speaking, engineering in various ﬁelds (chemical, civil, computer, mechanical, etc.) has created
demand for data scien sts and data science methods.

Engineers constantly need data to solve problems. Data scien sts have been called upon to develop
methods and techniques to meet these needs. Likewise, engineers have assisted data scien sts. Data
science has beneﬁ ed from new so ware and hardware developed via engineering, such as the CPU
(central processing unit) and GPU (graphic processing unit) that substan ally reduce compu ng me.

Data Science and Computer Science;

Computer scien sts have developed numerous techniques and methods, such as (1) database (DB)
systems that can handle the increasing volume of data in both structured and unstructured formats,
expedi ng data analysis; (2) visualiza on techniques that help people make sense of data; and (3)
algorithms that make it possible to compute complex and heterogeneous data in less me.

Data Science and Sta s cs;

Computa onal Thinking

What is computa onal thinking? Typically, it means thinking like a computer scien st. Computa onal
thinking is using abstrac on and decomposi on when a acking a large complex task or designing a
large complex system.

It is an itera ve process based on the following three stages:

Three-stage process describing computa onal thinking.

1. Problem formula on (abstrac on)

2. Solu on expression (automa on)

3. Solu on execu on and evalua on (analyses).

Skills for Data Science

you are convinced that: (1) data science is a ﬂourishing and a fantas c ﬁeld; (2) it is virtually
everywhere; and (3) perhaps you want to pursue it as a career! OK. Therefore, you need 3 important
skills;

1. Willing to Experiment

2. Proﬁciency in Mathema cal Reasoning

3. Data Literacy. Data literacy is the ability to extract meaningful informa on from a
dataset,

Tools for Data Science

Going forward, it is important that you develop a solid founda on in sta s cal techniques and
computa onal thinking. And then you need to pick up a couple of programing and data processing
tools– Python, R, and SQL And so, if you already know some programing language (e.g., C, Java, PHP)
or a scien ﬁc data processing environment (e.g., Matlab), you could use them to solve many or most
of the problems and tasks in data science

2.2 Data Types

2.2.1 Structured Data

What ma ers for is that any data – whether it is a number, a category, or a text – is labeled. In other
words, we know what that number, category, or text means.

2.2.2 Unstructured Data

Unstructured data is data without labels. The lack of structure makes compila on and organizing
unstructured data a me- and energy-consuming task. It would be easy to derive insights from
unstructured data if it could be instantly transformed into structured data. However, structured data
is akin to machine language, in that it makes informa on much easier to be parsed by computers.
Unstructured data, on the other hand, is o en how humans communicate (“natural language”); but
people do not interact naturally with informa on in strict, database format.

2.3 Data Collec ons

2.3.1 Open Data

Following is the list of principles associated with open data;

• Public

• Accessible

• Described

• Reusable.

• Complete.

• Timely

• Managed Post-Release

2.3.2 Social Media Data

2.3.3 Mul modal Data

We are living in a world where more and more devices exist – from lightbulbs to cars – and are ge ng
connected to the Internet, crea ng an emerging trend of the Internet of Things (IoT). These devices
are genera ng and using much data, but not all of which are “tradi onal” types (numbers, text). When
dealing with such contexts, we may need to collect and explore mul modal (diﬀerent forms) and
mul media (diﬀerent media) data such as images, music and other sounds, gestures, body posture,
and the use of space.

2.3.4 Data Storage and Presenta on

1. CSV (Comma-Separated Values) format is the most common import and export format for
spreadsheets and databases.

2. TSV (Tab-Separated Values) ﬁles are used for raw data and can be imported into and exported
from spreadsheet so ware.

3. XML (eXtensible Markup Language) was designed to be both human- and machine- readable,
and can thus be used to store and transport data. In the real world, computer systems and databases
contain data in incompa ble formats. As the XML data is stored in plain text format, it provides a
so ware- and hardware-independent way of storing data. This makes it much easier to create data
that can be shared by diﬀerent applica ons

4. RSS (Really Simple Syndica on) is a format used to share data between services, and which
was defined in the 1.0 version of XML. It facilitates the delivery of informa on from various sources on
the Web. Informa on provided by a website in an XML file in such a way is called an RSS feed. Most
current Web browsers can directly read RSS files, but a special RSS reader or aggregator may also be
used.

5. JSON (JavaScript Object Nota on) is a lightweight data-interchange format. It is not only easy
for humans to read and write, but also easy for machines to parse and generate. It is based on a subset
of the JavaScript Programming Language,

JSON is built on two structures:

• A collec on of name–value pairs. In various languages, this is realized as an object, record,
structure, dic onary, hash table, keyed list, or associa ve array.

• An ordered list of values. In most languages, this is realized as an array, vector, list, or

sequence.

2.4 Data Pre-processing

Data in the real world is o en dirty; that is, it is in need of being cleaned up before it can be used for
a desired purpose. This is o en called data pre-processing. What makes data “dirty”? Here are some
of the factors that indicate that data is not clean or ready to process:

• Incomplete. When some of the a ribute values are lacking, certain a ributes of interest are
lacking, or a ributes contain only aggregate data.

• Noisy. When data contains errors or outliers. For example, some of the data points in a dataset
may contain extreme values that can severely aﬀect the dataset’s range.

• Inconsistent. Data contains discrepancies in codes or names. For example, if the “Name”
column for registra on records of employees contains values other than alphabe cal le ers, or if
records do not start with a capital le er, discrepancies are present.
Forms of data pre-processing

Data Cleaning

Since there are several reasons why data could be “dirty,” there are just as many ways to “clean” it.
three key methods that describe ways in which data may be “cleaned,” or be er organized, or scrubbed
of poten ally incorrect, incomplete, or duplicated informa on.

Data Munging

O en, the data is not in a format that is easy to work with. For example, it may be stored or presented
in a way that is hard to process. Thus, we need to convert it to something more suitable for a computer
to understand. To accomplish this, there is no speciﬁc scien ﬁc method. The approaches to take are
all about manipula ng or wrangling (or munging) the data to turn it into something that is more
convenient or desirable. This can be done manually, automa cally, or, in many cases, semi-
automa cally.

Consider the following text recipe.

“Add two diced tomatoes, three cloves of garlic, and a pinch of salt in the mix.” This can be turned into
a table

This table conveys the same informa on as the text, but it is more “analysis friendly.”

Handling Missing Data

Some mes data may be in the right format, but some of the values are missing. Other mes data may
be missing due to problems with the process of collec ng data, or an equipment malfunc on. Or,
comprehensiveness may not have been considered important at the me of collec on. Furthermore,
some data may get lost due to system or human error while storing or transferring the data. So, what
to do when we encounter missing data? There is no single good answer. We need to ﬁnd a suitable
strategy based on the situa on. Strategies to combat missing data include ignoring that record, using
a global constant to ﬁll in all missing values, imputa on, inference-based solu ons (Bayesian formula
or a decision tree), etc. these inference techniques are vital topics on machine learning and data
mining.

Smooth Noisy Data

There are mes when the data is not missing, but it is corrupted for some reason. This is, in some ways,
a bigger problem than missing data. Data corrup on may be a result of faulty data collec on
instruments, data entry problems, or technology limita ons. For example, a digital thermometer
measures temperature to one decimal point (e.g., 70.1°F), but the storage system ignores the decimal
points. So, now we have 70.1°F and 70.9°F both stored as 70°F. This may not seem like a big deal, but
for humans a 99.4°F temperature means you are ﬁne, and 99.8°F means you have a fever, and if our
storage system represents both of them as 99°F, then it fails to diﬀeren ate between healthy and sick
persons!

Just as there is no single technique to take care of missing data, there is no one way to remove noise,
or smooth out the noisiness in the data. However, there are some steps to try. First, you should iden fy
or remove outliers. For example, records of previous students who sat for a data science examina on
show all students scored between 70 and 90 points, barring one student who received just 12 points.
It is safe to assume that the last student’s record is an outlier (unless we have a reason to believe that
this anomaly is really an unfortunate case for a student!). Second, you could try to resolve
inconsistencies in the data. For example, all entries of customer names in the sales data should follow
the conven on of capitalizing all le ers, and you could easily correct them if they are not.

Data Integra on

To be as efficient and effec ve for various data analyses as possible, data from various sources
commonly needs to be integrated. The following steps describe how to integrate mul ple databases
or files.

1. Combine data from mul ple sources into a coherent storage place (e.g., a single ﬁle
or a database).

2. Engage in schema integra on, or the combining of metadata from diﬀerent sources.

3. Detect and resolve data value conﬂicts. For example:

a. A conﬂict may arise; for instance, such as the presence of diﬀerent a ributes
and values from various sources for the same real-world en ty.

b. Reasons for this conflict could be different representa ons or different scales;
for example, metric vs. Bri sh units.

4. Address redundant data in data integra on. Redundant data is commonly generated
in the process of integra ng mul ple databases. For example:

a. The same a ribute may have diﬀerent names in diﬀerent databases.

b. One a ribute may be a “derived” a ribute in another table; for example,

annual revenue.

c. Correla on analysis may detect instances of redundant data.

Data Transforma on

Data must be transformed so it is consistent and readable (by a system). The following ﬁve processes
may be used for data transforma on.

1. Smoothing: Remove noise from data.

2. Aggrega on: Summariza on, data cube construc on.

3. Generaliza on: Concept hierarchy climbing.

4. Normaliza on: Scaled to fall within a small, speciﬁed range and aggrega on. Some of
the techniques that are used for accomplishing normaliza on (but we will not be covering
them here) are:
a. Min–max normaliza on.

b. Z-score normaliza on.

c. Normaliza on by decimal scaling.

5. A ribute or feature construc on.

a. New a ributes constructed from the given ones.

Data Reduc on

Data reduc on is a key process in which a reduced representa on of a dataset that produces the same
or similar analy cal results is obtained. One example of a large dataset that could warrant reduc on is
a data cube. Data cubes are mul dimensional sets of data that can be stored in a spreadsheet.

A data cube could be in two, three, or a higher dimension. Each dimension typically represents an
a ribute of interest. Two of the most common techniques used for data reduc on.

• Data Cube Aggrega on. The lowest level of a data cube is the aggregated data for an individual
en ty of interest. To do this, use the smallest representa on that is suﬃcient to address the given task.
In other words, we reduce the data to its more meaningful size and structure for the task at hand.

• Dimensionality Reduc on. In contrast with the data cube aggrega on method, where the data
reduc on was with the considera on of the task, dimensionality reduc on method works with respect
to the nature of the data. Here, a dimension or a column in your data spreadsheet is referred to as a
“feature,” and the goal of the process is to iden fy which features to remove or collapse to a combined
feature. This requires iden fying redundancy in the given data and/or crea ng composite dimensions
or features that could suﬃciently represent a set of raw features. Strategies for reduc on include
sampling, clustering, principal component analysis, etc

Data Discre za on

We are o en dealing with data that are collected from processes that are con nuous, such as
temperature, ambient light, and a company’s stock price. But some mes we need to convert these
con nuous values into more manageable parts. This mapping is called discre za on. And as you can
see, in undertaking discre za on, we are also essen ally reducing data. Thus, this process of
discre za on could also be perceived as a means of data reduc on, but it holds par cular importance
for numerical data.

To achieve discre za on, divide the range of con nuous a ributes into intervals. For instance, we
could decide to split the range of temperature values into cold, moderate, and hot, or the price of
company stock into above or below its market valua on.

Data Analysis and Data Analy cs

These two terms – data analysis and data analy cs – are o en used interchangeably and could be
confusing. data analysis refers to hands-on data explora on and evalua on. Data analy cs is a broader
term and includes data analysis as [a] necessary subcomponent. Analy cs deﬁnes the science behind
the analysis. The science means understanding the cogni ve processes an analyst uses to understand
problems and explore data in meaningful ways.
One way to understand the diﬀerence between analysis and analy cs is to think in terms of past and
future. Analysis looks backwards, providing marketers with a historical view of what has happened.
Analy cs, on the other hand, models the future or predicts a result.

Analy cs makes extensive use of mathema cs and sta s cs and the use of descrip ve techniques and
predic ve models to gain valuable knowledge from data. We can categorize analysis techniques into
six classes of analysis and analy cs: descrip ve analysis, diagnos c analy cs, predic ve analy cs,
prescrip ve analy cs, exploratory analysis, and mechanis c analysis.

Descrip ve Analysis

Descrip ve analysis is about: “What is happening now based on incoming data.” It is a method for
quan ta vely describing the main features of a collec on of data. Here are a few key points about
descrip ve analysis:

• Typically, it is the ﬁrst kind of data analysis performed on a dataset.

• Usually it is applied to large volumes of data, such as census data.

• Descrip on and interpreta on processes are diﬀerent steps.

Take the example of the Census Data Set, where descrip ve analysis is applied on a whole popula on.

Frequency Distribu on,

Of course, data needs to be displayed. Once some data has been collected, it is useful to plot a graph
showing how many mes each score occurs. This is known as a frequency distribu on. Frequency
distribu ons come in diﬀerent shapes and sizes. Therefore, it is important to have some general
descrip ons for common types of distribu on. The following are some of the ways in which
sta s cians can present numerical ﬁndings.

Histogram. Histograms plot values of observa ons on the horizontal axis, with a bar showing how
many mes each value occurred in the dataset.

Normal Distribu on. In an ideal world, data would be distributed symmetrically around the center of
all scores. Thus, if we drew a ver cal line through the center of a distribu on, both sides should look
the same. This so-called normal distribu on is characterized by a bell-shaped curve

There are two ways in which a distribu on can deviate from normal:

• Lack of symmetry (called skew)

• Poin ness (called kurtosis)

Measures of Centrality

Mean, median, Mode, Range, Quar le range, Standard devia on,

Diagnos c Analy c

Diagnos c analy cs are used for discovery, or to determine why something happened. Some mes this
type of analy cs when done hands-on with a small dataset is also known as causal analysis, since it
involves at least one cause (usually more than one) and one eﬀect.
This allows a look at past performance to determine what happened and why. The result of the analysis
is o en referred to as an analy c dashboard. There are various types of techniques available for
diagnos c or causal analy cs. Among them, one of the most frequently used is correla on.

Correla ons

Correla on is a sta s cal analysis that is used to measure and describe the strength and direc on of
the rela onship between two variables. Strength indicates how closely two variables are related to
each other, and direc on indicates how one variable would change its value as the value of the other
variable changes.

Correla on is a simple sta s cal measure that examines how two variables change together over me.
Take, for example, “umbrella” and “rain.” If someone who grew up in a place where it never rained
saw rain for the first me, this person would observe that, whenever it rains, people use umbrellas.
They may also no ce that, on dry days, folks do not carry umbrellas. By defini on, “rain” and
“umbrella” are said to be correlated! More specifically, this rela onship is strong and posi ve. Think
about this for a second.

An important sta s c, the Pearson’s r correla on, is widely used to measure the degree of the
rela onship between linear related variables. When examining the stock market, for example, the
Pearson’s r correla on can measure the degree to which two commodi es are related.

Predic ve Analy cs

As you may have guessed, predic ve analy cs has its roots in our ability to predict what might happen.
These analy cs are about understanding the future using the data and the trends we have seen in the
past, as well as emerging new contexts and processes. An example is trying to predict how people will
spend their tax refunds based on how consumers normally behave around a given me of the year
(past data and trends), and how a new tax policy (new context) may aﬀect people’s refunds.

Predic ve analy cs is done in stages.

1. First, once the data collec on is complete, it needs to go through the process of
cleaning

2. Cleaned data can help us obtain hindsight in rela onships between diﬀerent variables.
Plo ng the data (e.g., on a sca erplot) is a good place to look for hindsight.

3. Next, we need to conﬁrm the existence of such rela onships in the data. This is where
regression comes into play. From the regression equa on, we can conﬁrm the pa ern of
distribu on inside the data. In other words, we obtain insight from hindsight.

4. Finally, based on the iden ﬁed pa erns, or insight, we can predict the future, i.e.,

foresight.
Process of predic ve analy cs

Prescrip ve Analy cs

Dedicated to finding the best course of ac on for a given situa on. This may start by first analyzing the
situa on (using descrip ve analysis), but then moves toward finding connec ons among various para-
meters/variables, and their rela on to each other to address a specific problem, more likely that of
predic on.

A process-intensive task, the prescrip ve approach analyzes poten al decisions, the interac ons
between decisions, the inﬂuences that bear upon these decisions, and the bearing all of this has on an
outcome to ul mately prescribe an op mal course of ac on in real me.

Prescrip ve analy cs can also suggest op ons for taking advantage of a future opportunity or mi gate
a future risk and illustrate the implica ons of each. In prac ce, prescrip ve analy cs can con nually
and automa cally process new data to improve the accuracy of predic ons and provide advantageous
decision op ons.

For example, in healthcare, we can be er manage the pa ent popula on by using prescrip ve
analy cs to measure the number of pa ents who are clinically obese, then add ﬁlters for factors like
diabetes and LDL cholesterol levels to determine where to focus treatment.

Exploratory Analysis

O en when working with data, we may not have a clear understanding of the problem or the situa on.
And yet, we may be called on to provide some insights. In other words, we are asked to provide an
answer without knowing the ques on! This is where we go for an explora on.

Exploratory analysis is an approach to analyzing datasets to find previously unknown rela onships.
O en such analysis involves using various data visualiza on approaches. Yes, some mes seeing is
believing! But more important, when we lack a clear ques on or a hypothesis, plo ng the data in
different forms could provide us with some clues regarding what we may find or want to find in the
data. Such insights can then be useful for defining future studies/ques ons, leading to other forms of
analysis.

Usually not the deﬁni ve answer to the ques on at hand but only the start, exploratory analysis should
not be used alone for generalizing and/or making predic ons from the data.

Exploratory data analysis is an approach that postpones the usual assump ons about what kind of
model the data follows with the more direct approach of allowing the data itself to reveal its underlying
structure in the form of a model. Thus, exploratory analysis is not a mere collec on of techniques;
rather, it oﬀers a philosophy as to how to dissect a dataset; what to look for; how to look; and how to
interpret the outcomes.

As exploratory analysis consists of a range of techniques; its applica on is varied as well. However, the
most common applica on is looking for pa erns in the data, such as ﬁnding groups of similar genes
from a collec on of samples

Mechanis c Analysis

Mechanis c analysis involves understanding the exact changes in variables that lead to changes in
other variables for individual objects. For instance, we may want to know how the number of free
doughnuts per employee per day aﬀects employee produc vity. Perhaps by giving them one extra
doughnut we gain a 5% produc vity boost, but two extra doughnuts could end up making them lazy
(and diabe c)!

More seriously, though, think about studying the eﬀects of carbon emissions on bringing about the
Earth’s climate change. Here, we are interested in seeing how the increased amount of CO2 in the
atmosphere is causing the overall temperature to change. We now know that, in the last 150 years,
the CO2 levels have gone from 280 parts per million to 400 parts per million.17 And in that me, the
Earth has heated up by 1.53 degrees Fahrenheit (0.85 degrees Celsius).18 This is a clear sign of climate
change, something that we all need to be concerned about, but I will leave it there for now. What I
want to bring you back to thinking about is the kind of analysis we presented here – that of studying a
rela onship between two variables. Such rela onships are o en explored using regression.

Regression

In sta s cal modeling, regression analysis is a process for es ma ng the rela onships among
variables. Given this deﬁni on, you may wonder how regression diﬀers from correla on. The answer
can be found in the limita ons of correla on analysis. Correla on by itself does not provide any
indica on of how one variable can be predicted from another. Regression provides this crucial
informa on.

Beyond es ma ng a rela onship, regression analysis is a way of predic ng an outcome variable from
one predictor variable (simple linear regression) or several predictor variables (mul ple linear
regression). Linear regression, the most common form of regression used in data analysis, assumes
this rela onship to be linear. In other words, the rela onship of the predictor variable(s) and
Regression analysis has a number of salient applica ons to data science and other sta s cal fields. In
the business realm, for example, powerful linear regression can be used to generate insights on
consumer behavior, which helps professionals understand business and factors related to profitability.
It can also help a corpora on understand how sensi ve its sales are to adver sing expenditures, or it
can examine how a stock price is affected by changes in interest rates. Regression analysis may even
be used to look to the future; an equa on may forecast demand for a company’s products or predict
stock behaviours outcome variable can be expressed by a straight line.
Machine Learning

What Is Machine Learning?

Machine learning is a spin-oﬀ or a subset of ar ﬁcial intelligence (AI). Here, the goal, is to give
“computers the ability to learn without being explicitly programmed

The prerequisite to a good machine learning system are;

a. Data prepara on capabili es.

b. Algorithms – basic and advanced.

c. Automa on and itera ve processes.

d. Scalability.

e. Ensemble modeling.

Note here that, in most cases, the applica on of machine learning is entwined with the applica on of
sta s cal analysis. Therefore, it is important to remember the diﬀerences in the nomenclature of these
two ﬁelds.
• In machine learning, a target is called a label.

• In sta s cs, a target is called a dependent variable.

• A variable in sta s cs is called a feature in machine learning.

• A transforma on in sta s cs is called feature crea on in machine learning.

Machine learning algorithms are organized into a taxonomy, based on the desired outcome of the
algorithm. Common algorithm types include:

a. Supervised learning. When we know the labels on the training examples we are using
to learn.

b. Unsupervised learning. When we do not know the labels (or even the number of labels
or classes) from the training examples we are using for learning.

c. Reinforcement learning. When we want to provide feedback to the system based on

how it performs with training examples.

Also to note; One phrase you o en hear with machine learning is data mining. That is because machine
learning and data mining overlap quite signiﬁcantly in many places. Depending on who you talk to, one
is seen as a precursor or entry point for the other. In the end, it does not ma er as long as we keep
our focus on understanding the context and deriving some meaning out of the data.

Data mining is about understanding the nature of the data to gain insight into the problem that
generated the dataset in the first place, or some uniden fied issues that may arise in the future. Take
the case of customers’ brand loyalty in the highly compe ve e-commerce market. All of the e-
commerce pla orms store a database of customers’ previous purchases and return history along with
customer profiles. This kind of dataset not only helps the business owners to understand exis ng
customers’ purchasing pa erns, such as the products they may be interested in, or to measure brand
loyalty, but also provides in-depth knowledge about poten al new customers.

Regression

Think about it as a much more sophis cated version of extrapola on. For example, if you know the
rela onship between educa on and income (the more someone is educated, the more money they
make), we could predict someone’s income based on their educa on. Simply speaking, learning such
a rela onship is regression.

In more technical terms, regression is concerned with modeling the rela onship between variables of
interest. These rela onships use some measures of error in the predic ons to reﬁne the models
itera vely. In other words, regression is a process.8

We can learn about two variables rela ng in some way (e.g., correla on), but if there is a rela onship
of some kind, can we figure out if or how one variable could predict the other? Linear regression allows
us to do that. Specifically, we want to see how a variable X affects a variable y. Here, X is called the
independent variable or predictor; y is called the dependent variable or response. Take a note of the
nota on here. The X is in uppercase because it could have mul ple feature vectors, making it a feature
matrix. If we are dealing with only a single feature for X, we may decide to use the lowercase x. On the
other hand, y is in lowercase because it is a single value or feature being predicted.
As men oned previously, linear regression fits a line (or plane, or hyperplane) to the dataset. For
example, in Figure below, we want to predict the annual return using excess return of stock in a stock
por olio. The line represents the rela on between these two variables. Here, it happens to be quite
linear (see most of the data points close to the line), but such is not always the case.

Some of the most popular regression algorithms are:

• Ordinary least squares regression (OLSR)

• Linear regression

• Logis c regression

• Stepwise regression

• Mul variate adap ve regression splines (MARS)

• Locally es mated sca erplot smoothing (LOESS)

Choosing Method or Algorithms

While it is easy to understand individual tools and methods, it is not always clear how to pick the best
one(s) given a problem. There are mul ple factors that need to be considered before choosing the
right algorithm for a problem. Some of these factors are discussed below.

Accuracy

Most of the me, beginners in machine learning incorrectly assume that for each problem the best
algorithm is the most accurate one. However, ge ng the most accurate answer possible is not always
necessary. Some mes an approxima on is adequate, depending on the problem. If so, you may be
able to cut your processing me drama cally by s cking with more approximate methods. Another
advantage of more approximate methods is that they naturally tend to avoid overﬁ ng.
Training Time

The number of minutes or hours necessary to train a model varies between algorithms. Training me
is o en closely ed to accuracy – one typically accompanies the other. In addi on, some algorithms
are more sensi ve to the number of data points than others. A limit on me can drive the choice of
algorithm, especially when the dataset is large.

Linearity

Lots of machine learning algorithms make use of linearity. Linear classiﬁca on algorithms assume that
classes can be separated by a straight line (or its higher-dimensional analog). These include logis c
regression and support vector machines. Linear regression algorithms assume that data trends follow
a straight line. These assump ons are not bad for some problems, but on others they bring accuracy
down.

Number of Parameters

Parameters are the knobs a data scien st gets to turn when se ng up an algorithm. They are numbers
that aﬀect the algorithm’s behaviour, such as error tolerance, number of itera ons, or op ons between
variants of how the algorithm behaves. The training me and accuracy of the algorithm can some mes
be quite sensi ve to ge ng just the right se ngs. Typically, algorithms with a large number of
parameters require the most trial and error to ﬁnd a good combina on.

Number of Features

For certain types of data, the number of features can be very large compared to the number of data
points. This is o en the case with gene cs or textual data. The large number of features can bog down
some learning algorithms, making training me unfeasibly long. Support vector machines are
par cularly well suited to this case

Choosing the Right Es mator

O en the hardest part of solving a machine learning problem can be finding the right es mator for the
job. Different es mators are be er suited for different types of data and different problems. How do
we learn about when to use which es mator or technique? There are two primary ways that I can
think of: (1) developing a comprehensive theore cal understanding of different ways we could develop
es mators or build models; and (2) through lots of hands-on experience. As you may have guessed, in
this book, we are going with the la er.
Supervised learning

Supervised learning algorithms use a set of examples from previous records to make predic ons about
the future. For instance, exis ng car prices can be used to make guesses about the future models. Each
example used to train such an algorithm is labeled with the value of interest – in this case, the car’s
price. A supervised learning algorithm looks for pa erns in a training set. It may use any informa on
that might be relevant – the season, the car’s current sales records, similar offerings from compe tors,
the manufacturer’s brand percep on owned by the consumers – and each algorithm may look for a
different set of informa on and find different types of pa erns. Once the algorithm has found the best
pa ern it can, it uses that pa ern to make predic ons for unlabeled tes ng data – tomorrow’s values.

There are several types of supervised learning that exist within machine learning. Among them, the
three most commonly used algorithm types are regression, classiﬁca on, and anomaly detec on.

Logis c Regression

One thing to note about linear regression is that the outcome variable is numerical. So, the ques on
is: What happens when the outcome variable is not numerical? For example, if you have a weather
dataset with the a ributes humidity, temperature, and wind speed, each is describing one aspect of
the weather for a day. And based on these a ributes, you want to predict if the weather for the day is
suitable for playing golf. In this case, the outcome variable that you want to predict is categorical (“yes”
or “no”). Fortunately, to deal with this kind of classiﬁca on problem, we have logis c regression.

So max Regression

So far, we have seen regression for numerical outcome variable as well as regression for binomial
(“yes” or “no”, “1” or “0”) categorical outcome. But what happens if we have more than two categories.
For example, you want to rate a student’s performance based on the numbers he got in individual
subjects as “excellent,” “good,” “average,” or “below average.” We need to have mul nomial logis c
regression for this. In this sense mul nomial logis c regression or so max regression is a
generaliza on of regular logis c regression to handle mul ple (more than two) classes.

In so max regression, we replace the sigmoid func on from the logis c regression by the so-called
so max func on. This func on takes a vector of n real numbers as input and normalizes the vector
into a distribu on of n probabili es. That is, the func on transforms all the n components from any
real values (posi ve or nega ve) to values in the interval (0, 1).

Classiﬁca on with kNN

Classifica on can be supervised or unsupervised. The former is the case when assigning a label to a
picture as, for example, either “cat” or “dog.” Here the number of possible choices is predetermined.
When there are only two choices, it is called two-class or binomial classifica on. When there are more
categories, it is known as mul class or mul nomial classifica on. There are many methods and
algorithms for building classifiers, with k nearest neighbor (kNN) being one of the most popular ones.

Major steps of the algorithm.

1. As in the general problem of classiﬁca on, we have a set of data points for which we
know the correct class labels.

2. When we get a new data point, we compare it to each of our exis ng data points and
ﬁnd similarity.

3. Take the most similar k data points (k nearest neighbors).

4. From these k data points, take the majority vote of their labels. The winning label is
the label/class of the new data point.

The number k is usually small between 2 and 20. As you can imagine, the more the number of
nearest neighbors (value of k), the longer it takes us to do the processing.

Decision Tree

In machine learning, a decision tree is used for classifica on problems. In such problems, the goal is to
create a model that predicts the value of a target variable based on several input variables. A decision
tree builds classifica on or regression models in the form of a tree structure. It breaks down a dataset
into smaller and smaller subsets while at the same me an associated decision tree is incrementally
developed. The final result is a tree with decision nodes and leaf nodes.

Several algorithms exist that generate decision trees, such as ID3/4/5, CART, CLS

Decision Rule

Rules are a popular alterna ve to decision trees. Rules typically take the form of an {IF: THEN}
expression (e.g., {IF “condi on” THEN “result”}). Typically for any dataset, an individual rule in itself is
not a model, as this rule can be applied when the associated condi on is sa sﬁed. Therefore, rule-
based machine learning methods typically iden fy a set of rules that collec vely comprise the
predic on model, or the knowledge base.

Decision rules (le ) and decision tree (right) for a weather data
Random Forest

A decision tree seems like a nice method for doing classifica on – it typically has a good accuracy, and,
more importantly, it provides human-understandable insights. But one big problem the decision tree
algorithm has is that it could overfit the data. What does that mean? It means it could try to model
the given data so well that, while the classifica on accuracy on that dataset would be wonderful, the
model may find itself crippled when looking at any new data; it learned too much from the data!

One way to address this problem is to use not just one, not just two, but many decision trees, each
one created slightly diﬀerently. And then take some kind of average from what these trees decide and
predict. Such an approach is so useful and desirable in many situa ons where there is a whole set of
algorithms that apply them. They are called ensemble methods.

In machine learning, ensemble methods rely on mul ple learning algorithms to obtain be er
predic on accuracy than what any of the cons tuent learning algorithms can achieve. In general, an
ensemble algorithm consists of a concrete and finite set of alterna ve models but incorporates a much
more flexible structure among those alterna ves. One example of an ensemble method is random
forest, which can be used for both regression and classifica on tasks.

Random forest operates by construc ng a mul tude of decision trees at training me and selec ng
the mode of the class as the final class label for classifica on or mean predic on of the individual trees
when used for regression tasks. The advantage of using random forest over decision tree is that the
former tries to correct the decision tree’s habit of overfi ng the data to their training set.

Here is how it works.

For a training set of N, each decision tree is created in the following manner:

1. A sample of the N training cases is taken at random but with replacement from the
original training set. This sample will be used as a training set to grow the tree.

2. If the dataset has M input variables, a number m (m being a lot smaller than M) is
speciﬁed such that, at each node, m variables are selected at random out of M. Among this m,
the best split is used to split the node. The value of m is held constant while we grow the
forest.

3. Following the above steps, each tree is grown to its largest possible extent and there
is no pruning.

4. Predict new data by aggrega ng the predic ons of the n trees (i.e., majority votes for
classiﬁca on, average for regression).

Random forest is considered a panacea of all data science problems among most of its prac oners.
There is a belief that when you cannot think of any algorithm irrespec ve of situa on, use random
forest. It is a bit irra onal, since no algorithm strictly dominates in all applica ons (one size does not
ﬁt all). Nonetheless, people have their favorite algorithms. And there are reasons why, for many data
scien sts, random forest is the favorite:

1. It can solve both types of problems, that is, classiﬁca on and regression, and does a
decent es ma on for both.
2. Random forest requires almost no input prepara on. It can handle binary features,
categorical features, and numerical features without any need for scaling.

3. Random forest is not very sensi ve to the speciﬁc set of parameters used. As a result,
it does not require a lot of tweaking and ﬁddling to get a decent model; just use a large number
of trees and things will not go terribly awry.

4. It is an eﬀec ve method for es ma ng missing data and maintains accuracy when a

large propor on of the data are missing.

So, is random forest a silver bullet? Absolutely not. First, it does a good job at classiﬁca on but not as
good as for regression problems, since it does not give precise con nuous nature predic ons. Second,
random forest can feel like a black- box approach for sta s cal modelers, as you have very li le control
on what the model does. At best, you can try diﬀerent parameters and random seeds and hope that
will change the output.

Naïve Bayes

This is a very popular and robust approach for classifica on that uses Bayes’ theorem. The Bayesian
classifica on represents a supervised learning method as well as a sta s cal method for classifica on.
In a nutshell, it is a classifica on technique based on Bayes’ theorem with an assump on of
independence among predictors. Here, all a ributes contribute equally and independently to the
decision. In simple terms, a Naïve Bayes classifier assumes that the presence of a par cular feature in
a class is unrelated to the presence of any other feature. For example, a fruit may be considered to be
an apple if it is red, round, and about three inches in diameter. Even if these features depend on each
other or upon the existence of other features, all of these proper es independently contribute to the
probability that this fruit is an apple, and that is why it is known as naïve. It turns out that in most
cases, while such a naïve assump on is found to be not true, the resul ng classifica on models do
amazingly well.

Support Vector Machine (SVM)

One thing that has been common in all the classifier models we have seen so far is that they assume
linear separa on of classes. In other words, they try to come up with a decision boundary that is a line
(or a hyperplane in a higher dimension). But many problems do not have such linear characteris cs.
Support vector machine (SVM) is a method for the classifica on of both linear and nonlinear data.
SVMs are considered by many to be the best stock classifier for doing machine learning tasks. By stock,
here we mean in its basic form and not modified. This means you can take the basic form of the
classifier and run it on the data, and the results will have low error rates. Support vector machines
make good decisions for data points that are outside the training set. In a nutshell, an SVM is an
algorithm that uses nonlinear mapping to transform the original training data into a higher dimension.
Within this new dimension, it searches for the linear op mal separa ng hyperplane (i.e., a decision
boundary separa ng the tuples of one class from another). With an appropriate nonlinear mapping to
a sufficiently high dimension, data from two classes can always be separated by a hyperplane. The SVM
finds this hyperplane using support vectors (“essen al” training tuples) and margins (defined by the
support vectors).
Linearly separable data

From line to hyperplane

Possible hyperplanes and their margins

Unsupervised Learning

We saw how to learn from data when the labels or true values associated with them are available. In
other words, we knew what was right or wrong and we used that informa on to build a regression or
classiﬁca on model that could then make predic ons for new data. Such a process fell under
supervised learning. Now, we will consider the other big area of machine learning where we do not
know true labels or values with the given data, and yet we will want to learn the underlying structure
of that data and be able to explain it. This is called unsupervised learning.

In unsupervised learning, data points have no labels associated with them. Instead, the goal of an
unsupervised learning algorithm is to organize the data in some way or to describe its structure. This
can mean grouping it into clusters or ﬁnding diﬀerent ways of looking at complex data so that it appears
simpler or more organized.

Clustering is the assignment of a set of observa ons into subsets (called clusters) so that observa ons
in the same cluster are similar in some sense. Clustering is a method of unsupervised learning, and a
common technique for sta s cal data analysis used in many ﬁelds.

Agglomera ve Clustering

This is a bo om-up approach of building clusters or groups of similar data points from individual data
points. Following is a general outline of how an agglomera ve clustering algorithm runs.

1. Use any computable cluster similarity measure, for example, Euclidean distance,
cosine similarity, etc.

2. For n objects v1, …, vn, assign each to a singleton cluster Ci = {vi}.

3. Repeat {

– iden fy the two most similar clusters Cj and Ck (could be es – chose one pair)

– delete Cj and Ck and add (Cj ∪ Ck) to the set of clusters

} un l just one cluster remains.

4 Use a denodrgram diagram to show the sequence of cluster mergres

Divisive Clustering

The reverse of the agglomera ve technique, divisive clustering works in a top-down mode, where the
goal is to break up the cluster containing all objects into smaller clusters

Here is the general approach:

1. Put all objects in one cluster.

2. Repeat un l all clusters are singletons {

– choose a cluster to split based on some criterion

– replace the chosen cluster with subclusters.

There is a simple and eﬀec ve algorithm to carry out the general approach described above: k-means.
One of the most frequently used clustering algorithms, k-means clustering is an algorithm to classify
or to group your objects based on a ributes or features into k number of groups, where k is a posi ve
integer number.

Here is how it works.

1. The basic step of k-means clustering is simple. In the beginning, we determine the
number of clusters (k) that we want and we assume the centroid or center of these clusters.
We can take any random objects as the ini al centroids, or the ﬁrst k objects in sequence can
also serve as the ini al centroids.

2. Then the k-means algorithm will do the three steps below un l convergence.

Step 1: Begin with a decision on the value of k = number of clusters.

Step 2: Put any ini al par on that classiﬁes the data into k clusters. You may assign the
training samples randomly or systema cally, as in the following:

1. Take the ﬁrst k training sample as single-element clusters.

2. Assign each of the remaining (N − K) training samples to the cluster with the nearest
centroid. A er each assignment, recompute the centroid of the gaining cluster.

Step 3: Take each sample in sequence and compute its distance from the centroid of each of
the clusters. If a sample is not currently in the cluster with the closest centroid, switch this
sample to that cluster and update the centroid of the cluster gaining the new sample and the
cluster losing the sample.

Repeat the above three steps un l convergence is achieved – that is, un l a pass through the
training sample causes no new assignments.

Expecta on Maximiza on (EM)

We have seen clustering, classiﬁca on algorithms, and probabilis c models that are based on the
existence of eﬃcient and robust procedures for learning parameters from observa ons. O en,
however, the only data available for training a model are incomplete. Missing values can occur, for
example, in medical diagnoses, where pa ent histories generally include results from a limited ba ery
of tests. The expecta on maximiza on (EM) algorithm is a fantas c approach to addressing this
problem. The EM algorithm enables parameter es ma on in probabilis c models with incomplete
data.

Reinforcement learning

Reinforcement learning (RL) a empts to model how so ware agents should take ac ons in an
environment that will maximize some form of cumula ve reward.

Let us take an example. Imagine you want to train a computer to play chess against a human. In such
a case, determining the best move to make depends on a number of factors. The number of possible
states that can exist in a game is usually very large. To cover these many states using a standard rules-
based approach would mean specifying a lot of hard- coded rules. RL cuts out the need to manually
specify rules, and RL agents learn simply by playing the game. For two-player games, such as
backgammon, agents can be trained by playing against other human players or even other RL agents.
In RL, the algorithm decides to choose the next course of ac on once it sees a new data point. Based
on how suitable the ac on is, the learning algorithm also gets some incen ve a short me later. The
algorithm always modiﬁes its course of ac on toward the highest reward. Reinforcement learning is
common in robo cs, where the set of sensor readings at one point in me is a data point, and the
algorithm must choose the robot’s next ac on. It is also a natural ﬁt for Internet-of-Things (IoT)
applica ons.

The basic reinforcement learning (RL) model consists of the following

1. a set of environment and agent states S

2. a set of ac ons A of the agent

3. policies of transi oning from states to ac ons

4. rules that determine the scalar immediate reward of a transi on and

5. rules that describe what the agent observes.

The typical framing of a reinforcement learning (RL) scenario (an agent takes ac ons in an
environment that is interpreted into a reward and a representa on of the state which is fed back into
the agent.)

John Deere Gator Manufactured Year From Serial Number Lookup Table
48% (25)
John Deere Gator Manufactured Year From Serial Number Lookup Table
4 pages
Academy of Street Fighting
91% (11)
Academy of Street Fighting
12 pages
Port Authority-Training Report PDF
No ratings yet
Port Authority-Training Report PDF
54 pages
Online Auction System
100% (2)
Online Auction System
25 pages
Appen Digital Signature Required 2
No ratings yet
Appen Digital Signature Required 2
2 pages
A02 Inspection Certificate Voestalpine Grobblech GMBH: Detail of Supply
No ratings yet
A02 Inspection Certificate Voestalpine Grobblech GMBH: Detail of Supply
4 pages
English Language Proficiency Test PDF
No ratings yet
English Language Proficiency Test PDF
3 pages
Explaratory Data Analysis - Python
No ratings yet
Explaratory Data Analysis - Python
16 pages
FDS - UNIT 1
No ratings yet
FDS - UNIT 1
233 pages
Data Science SPPU
No ratings yet
Data Science SPPU
115 pages
Unit 1 To 5
No ratings yet
Unit 1 To 5
202 pages
FDSUNIT 1
No ratings yet
FDSUNIT 1
27 pages
Unit I- Data Science
No ratings yet
Unit I- Data Science
161 pages
Introduction To Datasciecne
No ratings yet
Introduction To Datasciecne
50 pages
Data Science
No ratings yet
Data Science
244 pages
Unit 1
No ratings yet
Unit 1
26 pages
Data Science Unit 1 Notes
No ratings yet
Data Science Unit 1 Notes
22 pages
unit_1
No ratings yet
unit_1
9 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
36 pages
CS 3353 FDS Unit 1 Notes Jpr
No ratings yet
CS 3353 FDS Unit 1 Notes Jpr
39 pages
Defining Data Science
100% (1)
Defining Data Science
167 pages
L1 - Introduction To Data Science
No ratings yet
L1 - Introduction To Data Science
33 pages
mod 3
No ratings yet
mod 3
96 pages
Unit 1
No ratings yet
Unit 1
28 pages
Big Data and Data Science
No ratings yet
Big Data and Data Science
6 pages
Unit 1
No ratings yet
Unit 1
76 pages
DS231_Week_2
No ratings yet
DS231_Week_2
33 pages
Facets of Data
0% (1)
Facets of Data
22 pages
Fods Notes
No ratings yet
Fods Notes
139 pages
DS231 Module 2
No ratings yet
DS231 Module 2
33 pages
BCA Lecture I
No ratings yet
BCA Lecture I
20 pages
ETCh2
No ratings yet
ETCh2
36 pages
Data Science A Beginner S Guide 1668243666
100% (1)
Data Science A Beginner S Guide 1668243666
26 pages
IDS - Lecture 1
No ratings yet
IDS - Lecture 1
52 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
Introduction To Data Science: Chapter Two
No ratings yet
Introduction To Data Science: Chapter Two
52 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
36 pages
Data Science - FYBCA-Sem-II
No ratings yet
Data Science - FYBCA-Sem-II
13 pages
Foundation of Data Science
100% (2)
Foundation of Data Science
143 pages
CHAPTER 2 Emerging
No ratings yet
CHAPTER 2 Emerging
8 pages
FDS CH1
No ratings yet
FDS CH1
4 pages
UNIT-1
No ratings yet
UNIT-1
25 pages
Unit 1
No ratings yet
Unit 1
19 pages
Ds Unit 1
No ratings yet
Ds Unit 1
18 pages
DSUP Chapter 1 PDF
No ratings yet
DSUP Chapter 1 PDF
31 pages
Cs3352 Foundation of Data Science
No ratings yet
Cs3352 Foundation of Data Science
80 pages
Chapter 2 - Intro to Data Sciences[2]
No ratings yet
Chapter 2 - Intro to Data Sciences[2]
41 pages
Chap1-Overview of Data Science
No ratings yet
Chap1-Overview of Data Science
50 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
41 pages
Fundamentals of Data Science
100% (3)
Fundamentals of Data Science
62 pages
CH1 Introduction To Data Science BS
No ratings yet
CH1 Introduction To Data Science BS
69 pages
Data Science: by Neha Tyagi
100% (1)
Data Science: by Neha Tyagi
17 pages
Project Report
No ratings yet
Project Report
29 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
26 pages
UNIT 1 PPT 1
No ratings yet
UNIT 1 PPT 1
27 pages
Chapter 2-2
No ratings yet
Chapter 2-2
34 pages
IDS_sem ans unit 1
No ratings yet
IDS_sem ans unit 1
10 pages
Data Science: October 2021
No ratings yet
Data Science: October 2021
51 pages
CS3352 FDS Notes - 03 - by WWW - Notesfree.in
No ratings yet
CS3352 FDS Notes - 03 - by WWW - Notesfree.in
138 pages
Chapter 2
No ratings yet
Chapter 2
10 pages
FDS NOTES
No ratings yet
FDS NOTES
137 pages
TE Sem1 UNIT 1 (Data Science and Visualization) HONOURS - TE (SEM V)
No ratings yet
TE Sem1 UNIT 1 (Data Science and Visualization) HONOURS - TE (SEM V)
28 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
37 pages
M-1
No ratings yet
M-1
98 pages
Chapter 2 Data Science1
No ratings yet
Chapter 2 Data Science1
41 pages
5_6237938787641463884
No ratings yet
5_6237938787641463884
9 pages
Data Science Basics
From Everand
Data Science Basics
Zoe Codewell
No ratings yet
INFORMATION TECHNOLOGY EXAM QUESTIONS FOR JSS2 SECOND
No ratings yet
INFORMATION TECHNOLOGY EXAM QUESTIONS FOR JSS2 SECOND
2 pages
Frequency Redesign For 2G BSC
No ratings yet
Frequency Redesign For 2G BSC
15 pages
Pid Toolbox
No ratings yet
Pid Toolbox
6 pages
Ebookmetafile 8961
No ratings yet
Ebookmetafile 8961
64 pages
BritishCouncil Price List PearsonEdexcel IAL 1
No ratings yet
BritishCouncil Price List PearsonEdexcel IAL 1
4 pages
Genetators O&M Manual V2_1
No ratings yet
Genetators O&M Manual V2_1
86 pages
Carabao Manure As An Alternative Bricks
83% (6)
Carabao Manure As An Alternative Bricks
24 pages
Infection Prevention: Jhpiego
100% (1)
Infection Prevention: Jhpiego
419 pages
SAMPLE BBU
No ratings yet
SAMPLE BBU
1 page
Salary Guide 2018: Czech Republic Labour Market Trends
No ratings yet
Salary Guide 2018: Czech Republic Labour Market Trends
29 pages
Urban Health Problems & Nuhm
No ratings yet
Urban Health Problems & Nuhm
48 pages
Hierarchical Model
No ratings yet
Hierarchical Model
4 pages
The History of Dunelm, Now Pendennick
No ratings yet
The History of Dunelm, Now Pendennick
35 pages
Hydrometra in Goats
No ratings yet
Hydrometra in Goats
4 pages
RA CIVILENG0518 Davao jg18 PDF
No ratings yet
RA CIVILENG0518 Davao jg18 PDF
39 pages
Student AdmitCard
No ratings yet
Student AdmitCard
1 page
Vijay - Evaluation of Old and Historic Buildings Subjected To Fire
No ratings yet
Vijay - Evaluation of Old and Historic Buildings Subjected To Fire
16 pages
A Fuego Lento - Partitura Completa
No ratings yet
A Fuego Lento - Partitura Completa
1 page
Ch. 16 - Distribution To Shareholders
No ratings yet
Ch. 16 - Distribution To Shareholders
28 pages
Ch 8 (1)
No ratings yet
Ch 8 (1)
18 pages
Minidv Camcorder
No ratings yet
Minidv Camcorder
2 pages
IdolCon 2020 Schedule's PDF
No ratings yet
IdolCon 2020 Schedule's PDF
6 pages
The Beedi Industry in India: An Overview
100% (1)
The Beedi Industry in India: An Overview
41 pages