0% found this document useful (0 votes)

43 views

ML & AI-Introduction To Data-Science Tools

This document provides an introduction to common data science tools used for extracting knowledge from large volumes of structured and unstructured data. It discusses linear algorithms like linear and logistic regression, principal component analysis, and tree-based algorithms like decision trees, random forests and gradient boosting. It also mentions neural networks and their use in problems like image recognition.

Uploaded by

san_misus

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views

ML & AI-Introduction To Data-Science Tools

Uploaded by

san_misus

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Francisco Villarreal-Valderrama

Dec 15, 2021

·
3 min read

Introduction to data-science tools

Data science is an interdisciplinary approach to extracting

knowledge from noisy, structured and unstructured large volumes
of data. It encompasses preparing data for analysis and processing,
performing advanced data analysis, and presenting the results to
reveal patterns.

The process of data mining and analysis involves

applying mathematics, statistics, computer science,
information science, and domain knowledge to illustrate
stories that clearly convey the meaning of results to decision-makers
and stakeholders at every level of technical knowledge and
understanding. This shows the role of a data scientist, which is
someone who creates programming code, and combines it with
statistical knowledge to explain how the obtained results can
be used to solve business problems.

As a scientific field, data-science unifies scientific methods,

processes, algorithms and systems into a set of tools based on
statistics, data analysis and informatics. Data science is
closely related to data mining, machine learning and big data. The
most common tools involve:
Linear algorithms

Linear regression

It creates numerical predictions using the best linear fitting of a

data-set. The resulting model is easy to understand and shows the
biggest drivers of the results. Nonetheless, it can be too simple to
capture more complex relationships among the variables.

Logistic regression

This is an adaptation of linear regression to classification problems.

Similarly, it is easy to understand but not powerful enough to handle
complex relationships between the variables.
Principal Component Analysis

It is a data-compression tool based on the correlation among the

data variables. Its applications include anomaly detection and
prediction. It’s often combined with other tools to yield better
results.
Tree-based

Decision tree

This algorithm is comprised by a series of yes/no rules based on the

data features, forming a decision tree to match all the possible
outcomes of the process. It’s an easy-to-understand algorithm but
can become large when handling complex data-sets.
Random forest

It takes advantage of many decision trees with rules created from the
data itself. Individual decision trees are combined to form a
powerful predictor with better overall performance. It tends to give
high-quality results at the cost of not-easy-to-understand large
models.
Gradient boosting

It uses simpler decision trees that are increasingly focused on known

data. It is a high performance tool that gives very case-specific
results. That is, a small change in the feature set can create radical
changes in the model.

Neural networks

General neural network models

It consists in interconnected neurons that pass messages to each

other, with layers of neurons stacked on top of one another. These
models can handle extremely complex tasks but are very slow to
train and often have a complex architecture. Neural network models
outstand for image recognition and classification problems.
Nonetheless, their use as predictors is limited since its very hard to
understand the possible outcomes.

Utility SEC Distribution Planning Standard
100% (4)
Utility SEC Distribution Planning Standard
182 pages
DPS-02 SEC.Design Guidline underground low voltage
No ratings yet
DPS-02 SEC.Design Guidline underground low voltage
77 pages
11-TES-P-104 - 05 - Cable Installations, Engineering Requirements - REV - 02
No ratings yet
11-TES-P-104 - 05 - Cable Installations, Engineering Requirements - REV - 02
33 pages
S-AAA-GEN-SQA (Safety & Quality) (Rev.0-2015)
No ratings yet
S-AAA-GEN-SQA (Safety & Quality) (Rev.0-2015)
22 pages
SDS 5
No ratings yet
SDS 5
184 pages
SAP Mail Title and Texts For Billing
No ratings yet
SAP Mail Title and Texts For Billing
5 pages
31 SDMS 02rev01
No ratings yet
31 SDMS 02rev01
15 pages
PTS Appendix-Vlll
No ratings yet
PTS Appendix-Vlll
99 pages
C01 Lattice Steel Structure For 66 KV Substation
No ratings yet
C01 Lattice Steel Structure For 66 KV Substation
30 pages
Material Submittal - OH Conductor - ECE - 144dpi - 75% - 144dpi - 14%
No ratings yet
Material Submittal - OH Conductor - ECE - 144dpi - 75% - 144dpi - 14%
246 pages
01-SDCS-24 - R0.2021
No ratings yet
01-SDCS-24 - R0.2021
10 pages
Specifications For Metro Ethernet 13-Sdms-10
No ratings yet
Specifications For Metro Ethernet 13-Sdms-10
30 pages
Measurement of Voltage in Engineering Practices Lab
No ratings yet
Measurement of Voltage in Engineering Practices Lab
4 pages
Part4 Installation of Distribution Pillars
No ratings yet
Part4 Installation of Distribution Pillars
9 pages
The Saudi Arabian Distribution Code
No ratings yet
The Saudi Arabian Distribution Code
163 pages
ADDC Air Terminal Units Specs PDF
100% (1)
ADDC Air Terminal Units Specs PDF
4 pages
32-SDMS-13 MV Special Use Smart RMU-630 A
No ratings yet
32-SDMS-13 MV Special Use Smart RMU-630 A
48 pages
S-AAA-CAB-GEN (Rev.0-2011)
No ratings yet
S-AAA-CAB-GEN (Rev.0-2011)
35 pages
S Ohl Spo - 00
No ratings yet
S Ohl Spo - 00
16 pages
EVBox Brochure INTL - DIGITAL - Compressed 1 1 PDF
No ratings yet
EVBox Brochure INTL - DIGITAL - Compressed 1 1 PDF
25 pages
List of Approved International Laboratories For Testing Distribution Material Updated September 2019
No ratings yet
List of Approved International Laboratories For Testing Distribution Material Updated September 2019
1 page
70-TMSS-04 Cable Trench Materials Rev 1
No ratings yet
70-TMSS-04 Cable Trench Materials Rev 1
13 pages
DPS-01, Demand Factor1
No ratings yet
DPS-01, Demand Factor1
1 page
S-AA-CIV-AID (Asset Identification, Labelling and Beautification) (Rev.0-2015)
No ratings yet
S-AA-CIV-AID (Asset Identification, Labelling and Beautification) (Rev.0-2015)
12 pages
Sec DPS
No ratings yet
Sec DPS
542 pages
Sceco Materials Standard Specification: B@IZ y ( (©X@R - ) (A ™ X - @@la (A ™yz@c (A
No ratings yet
Sceco Materials Standard Specification: B@IZ y ( (©X@R - ) (A ™ X - @@la (A ™yz@c (A
31 pages
11-SDMS-02 LV Overhead Line Conductor PDF
No ratings yet
11-SDMS-02 LV Overhead Line Conductor PDF
11 pages
Sec-Erb Materials Standard Specification: Eastern Region Branch Saudi Electricity Company
No ratings yet
Sec-Erb Materials Standard Specification: Eastern Region Branch Saudi Electricity Company
15 pages
A Review of Key Power System Stability Chalenges For Large Scale PV Integration RSE15
No ratings yet
A Review of Key Power System Stability Chalenges For Large Scale PV Integration RSE15
14 pages
En 809
No ratings yet
En 809
12 pages
Alfanar Switch Boxes Junction Boxes Catalog PDF
No ratings yet
Alfanar Switch Boxes Junction Boxes Catalog PDF
12 pages
Edms 03 400 1 M V Cable PDF
No ratings yet
Edms 03 400 1 M V Cable PDF
15 pages
40 PDF
No ratings yet
40 PDF
20 pages
ITP-Tubular Poles - Sample Test PDF
No ratings yet
ITP-Tubular Poles - Sample Test PDF
4 pages
11-SDMS-05: Saudi Electricity Company
No ratings yet
11-SDMS-05: Saudi Electricity Company
31 pages
Electricity Planning Regulations For Supply: EPP-C1
No ratings yet
Electricity Planning Regulations For Supply: EPP-C1
20 pages
ITCC Company Profile
No ratings yet
ITCC Company Profile
11 pages
01 Basic
No ratings yet
01 Basic
31 pages
Meter Inspection and Maintenance: Distribution Maintenance Manual FOR
No ratings yet
Meter Inspection and Maintenance: Distribution Maintenance Manual FOR
10 pages
UPS Compliance
No ratings yet
UPS Compliance
11 pages
Part 3 - Technical Specs
No ratings yet
Part 3 - Technical Specs
659 pages
12-Sdms-03 Rev01 Warning Tape
No ratings yet
12-Sdms-03 Rev01 Warning Tape
12 pages
Alfanar Low Voltage Control Cables Catalog
No ratings yet
Alfanar Low Voltage Control Cables Catalog
52 pages
58-TMSS-01-R0 - Oil Immersed Reactor, 33kV To 380kV
No ratings yet
58-TMSS-01-R0 - Oil Immersed Reactor, 33kV To 380kV
32 pages
Extracted Pages From SDCS-02 REV. 1 PART 1
No ratings yet
Extracted Pages From SDCS-02 REV. 1 PART 1
2 pages
Electrical Methodology
100% (1)
Electrical Methodology
60 pages
Cable Sizing
No ratings yet
Cable Sizing
24 pages
KM-EPM-D-TRF-GENERAL-Issue 0.0-13.01.2023
No ratings yet
KM-EPM-D-TRF-GENERAL-Issue 0.0-13.01.2023
75 pages
DPS 02
No ratings yet
DPS 02
78 pages
Vitreous China Plumbing Fixtures 75SMSS2
No ratings yet
Vitreous China Plumbing Fixtures 75SMSS2
25 pages
12 Sdms 05 Foc Splicing
No ratings yet
12 Sdms 05 Foc Splicing
23 pages
Aa 036025 002
No ratings yet
Aa 036025 002
1 page
37SDMS02
No ratings yet
37SDMS02
5 pages
Bare Copper Class 2 Conductor en 13602
No ratings yet
Bare Copper Class 2 Conductor en 13602
2 pages
42-SDMS-01 REV. 05: Saudi Electricity Company
No ratings yet
42-SDMS-01 REV. 05: Saudi Electricity Company
32 pages
02 Power Factor Correction
No ratings yet
02 Power Factor Correction
12 pages
PTS-20WM319 04 Design Criteria
No ratings yet
PTS-20WM319 04 Design Criteria
42 pages
TTDS Lectures
No ratings yet
TTDS Lectures
13 pages
Data Science
No ratings yet
Data Science
33 pages
Unit 2 Data Science
No ratings yet
Unit 2 Data Science
53 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
11 pages
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
No ratings yet
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
4 pages
REnergy-News-State of Green 2022 in Australia
No ratings yet
REnergy-News-State of Green 2022 in Australia
31 pages
23014359-1 23014359 Invoice 160220234
No ratings yet
23014359-1 23014359 Invoice 160220234
1 page
WECC-Approved Dynamic Models January 2020
No ratings yet
WECC-Approved Dynamic Models January 2020
4 pages
Trafo-ABB-Training Module
100% (1)
Trafo-ABB-Training Module
53 pages
Mgmt-PM-A Manager's Guide To Coaching
No ratings yet
Mgmt-PM-A Manager's Guide To Coaching
58 pages
Multi Converter
No ratings yet
Multi Converter
17 pages
Electric Power Distribution Systems
No ratings yet
Electric Power Distribution Systems
231 pages
Java Report
No ratings yet
Java Report
28 pages
Toefl Certificate
No ratings yet
Toefl Certificate
3 pages
Cash Counts Procedures
75% (4)
Cash Counts Procedures
2 pages
Lis CXL Pro
No ratings yet
Lis CXL Pro
16 pages
KNUC Annual Report 2015 PDF
No ratings yet
KNUC Annual Report 2015 PDF
25 pages
Fast Legendary Cloak Guide
No ratings yet
Fast Legendary Cloak Guide
8 pages
P4c02-Snw-Tv-Mec-Sgt-Sho-2010 - Ecs-Street Level Ducting Layout Key Plan-P4c02-Snw-Tv-Mec-Sgt-Sho-2010
No ratings yet
P4c02-Snw-Tv-Mec-Sgt-Sho-2010 - Ecs-Street Level Ducting Layout Key Plan-P4c02-Snw-Tv-Mec-Sgt-Sho-2010
1 page
ERP Solution Report
No ratings yet
ERP Solution Report
12 pages
Implementation of IEEE 754 Compliant Single Precision Floating-Point Adder Unit Supporting Denormal Inputs On Xilinx FPGA
No ratings yet
Implementation of IEEE 754 Compliant Single Precision Floating-Point Adder Unit Supporting Denormal Inputs On Xilinx FPGA
5 pages
Intel Ivy Bridge/Panther Point AMD Seymour XT: Compal Electronics, Inc. Compal Electronics, Inc. Compal Electronics, Inc
No ratings yet
Intel Ivy Bridge/Panther Point AMD Seymour XT: Compal Electronics, Inc. Compal Electronics, Inc. Compal Electronics, Inc
51 pages
Set 1
No ratings yet
Set 1
6 pages
Nepal College of Information Technology: General Knowledge Quiz App
No ratings yet
Nepal College of Information Technology: General Knowledge Quiz App
20 pages
Samsung BD-P2500 Blu-Ray Disc Player
100% (2)
Samsung BD-P2500 Blu-Ray Disc Player
73 pages
Big Data Analytics Unit Test-I Answers Bank
No ratings yet
Big Data Analytics Unit Test-I Answers Bank
10 pages
TMS320F28335 DSP Development Board
No ratings yet
TMS320F28335 DSP Development Board
6 pages
2 357862 1 255
No ratings yet
2 357862 1 255
4 pages
Tutorial 1 Getting Started
No ratings yet
Tutorial 1 Getting Started
7 pages
Cambridge - IGCSE - ComputerScience - Chapter (10) Boolean Logic
No ratings yet
Cambridge - IGCSE - ComputerScience - Chapter (10) Boolean Logic
18 pages
Get (Ebook) An Introduction to Artificial Intelligence in Education by Yu, Shengquan, Lu, Yu ISBN 9789811627705, 9789811627699, 981162769X, 9811627703 free all chapters
100% (8)
Get (Ebook) An Introduction to Artificial Intelligence in Education by Yu, Shengquan, Lu, Yu ISBN 9789811627705, 9789811627699, 981162769X, 9811627703 free all chapters
81 pages
Digital Signature
No ratings yet
Digital Signature
13 pages
Report On Siddartha Bank Limited
No ratings yet
Report On Siddartha Bank Limited
24 pages
Dialogue, Thinking Together and Digital Technology in The Classroom - Some Educational Implications of A Continuing Line of Inquiry
No ratings yet
Dialogue, Thinking Together and Digital Technology in The Classroom - Some Educational Implications of A Continuing Line of Inquiry
13 pages
Instructors Review - Preparation For The Clad Exam - v1 (3066) PDF
No ratings yet
Instructors Review - Preparation For The Clad Exam - v1 (3066) PDF
166 pages
Web Dev
No ratings yet
Web Dev
15 pages
Dentons APAC-Privacy-Cybersecurity-08
No ratings yet
Dentons APAC-Privacy-Cybersecurity-08
12 pages
L19balb015-Utkarsh Singh-Ballb-B
No ratings yet
L19balb015-Utkarsh Singh-Ballb-B
11 pages
Apps Notes RFID 125kHz
No ratings yet
Apps Notes RFID 125kHz
8 pages
Matrix Representations for Sierpinski Graphs to Study Spectra at different iteration
No ratings yet
Matrix Representations for Sierpinski Graphs to Study Spectra at different iteration
12 pages
Mysql Database:: How To Connect To Databse
No ratings yet
Mysql Database:: How To Connect To Databse
9 pages

ML & AI-Introduction To Data-Science Tools

Uploaded by

ML & AI-Introduction To Data-Science Tools

Uploaded by

Francisco Villarreal-Valderrama

Dec 15, 2021

Introduction to data-science tools

Data science is an interdisciplinary approach to extracting

The process of data mining and analysis involves

As a scientific field, data-science unifies scientific methods,

It creates numerical predictions using the best linear fitting of a

This is an adaptation of linear regression to classification problems.

It is a data-compression tool based on the correlation among the

This algorithm is comprised by a series of yes/no rules based on the

It uses simpler decision trees that are increasingly focused on known

General neural network models

It consists in interconnected neurons that pass messages to each

You might also like