0% found this document useful (0 votes)
43 views

ML & AI-Introduction To Data-Science Tools

This document provides an introduction to common data science tools used for extracting knowledge from large volumes of structured and unstructured data. It discusses linear algorithms like linear and logistic regression, principal component analysis, and tree-based algorithms like decision trees, random forests and gradient boosting. It also mentions neural networks and their use in problems like image recognition.

Uploaded by

san_misus
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

ML & AI-Introduction To Data-Science Tools

This document provides an introduction to common data science tools used for extracting knowledge from large volumes of structured and unstructured data. It discusses linear algorithms like linear and logistic regression, principal component analysis, and tree-based algorithms like decision trees, random forests and gradient boosting. It also mentions neural networks and their use in problems like image recognition.

Uploaded by

san_misus
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Francisco Villarreal-Valderrama

Dec 15, 2021


·
3 min read

Introduction to data-science tools

Data science is an interdisciplinary approach to extracting


knowledge from noisy, structured and unstructured large volumes
of data. It encompasses preparing data for analysis and processing,
performing advanced data analysis, and presenting the results to
reveal patterns.

The process of data mining and analysis involves


applying mathematics, statistics, computer science,
information science, and domain knowledge to illustrate
stories that clearly convey the meaning of results to decision-makers
and stakeholders at every level of technical knowledge and
understanding. This shows the role of a data scientist, which is
someone who creates programming code, and combines it with
statistical knowledge to explain how the obtained results can
be used to solve business problems.

As a scientific field, data-science unifies scientific methods,


processes, algorithms and systems into a set of tools based on
statistics, data analysis and informatics. Data science is
closely related to data mining, machine learning and big data. The
most common tools involve:
Linear algorithms

Linear regression

It creates numerical predictions using the best linear fitting of a


data-set. The resulting model is easy to understand and shows the
biggest drivers of the results. Nonetheless, it can be too simple to
capture more complex relationships among the variables.

Logistic regression

This is an adaptation of linear regression to classification problems.


Similarly, it is easy to understand but not powerful enough to handle
complex relationships between the variables.
Principal Component Analysis

It is a data-compression tool based on the correlation among the


data variables. Its applications include anomaly detection and
prediction. It’s often combined with other tools to yield better
results.
Tree-based

Decision tree

This algorithm is comprised by a series of yes/no rules based on the


data features, forming a decision tree to match all the possible
outcomes of the process. It’s an easy-to-understand algorithm but
can become large when handling complex data-sets.
Random forest

It takes advantage of many decision trees with rules created from the
data itself. Individual decision trees are combined to form a
powerful predictor with better overall performance. It tends to give
high-quality results at the cost of not-easy-to-understand large
models.
Gradient boosting

It uses simpler decision trees that are increasingly focused on known


data. It is a high performance tool that gives very case-specific
results. That is, a small change in the feature set can create radical
changes in the model.

Neural networks

General neural network models

It consists in interconnected neurons that pass messages to each


other, with layers of neurons stacked on top of one another. These
models can handle extremely complex tasks but are very slow to
train and often have a complex architecture. Neural network models
outstand for image recognition and classification problems.
Nonetheless, their use as predictors is limited since its very hard to
understand the possible outcomes.

31

You might also like