0% found this document useful (0 votes)
2 views

Machine Learning - course

The document provides an overview of machine learning (ML) and its relationship with artificial intelligence (AI), detailing various types of ML, including supervised, unsupervised, reinforcement, and transfer learning. It discusses the importance of data collection, modeling, and deployment in ML projects, as well as the significance of feature engineering and evaluation metrics. Additionally, it explains the role of tools like Anaconda and Conda in managing environments for data science projects.
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Machine Learning - course

The document provides an overview of machine learning (ML) and its relationship with artificial intelligence (AI), detailing various types of ML, including supervised, unsupervised, reinforcement, and transfer learning. It discusses the importance of data collection, modeling, and deployment in ML projects, as well as the significance of feature engineering and evaluation metrics. Additionally, it explains the role of tools like Anaconda and Conda in managing environments for data science projects.
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 6

Machine Learning - machines, computers can do tasks or us, how do we tell computer

to describe things? Things that computers couldn't do, can do now with machine
learning, in a specific way, the goal of machine learning is to make machines act
more like humans based on data and algorithms

Ai - human intelligence exhibited by machines, a machine that acts like a human


neuro ai - machines can be better at specific tasks than humans, at one thing at a
time
machine learning is a subset of ai, an approach to achieve AI through a set of
data, science of getting computers to act without getting computers to be
specifically programmed
deep learning - one of the techniques of implementing machine learning, type of
algorithm

data science - overlaps ML, analyzing data and doing sth with it

machine learning facilitates the decision making in terms of business, they are
better at making business decisions and predictions based on massive amount of data

ML - predicting results based on incoming data

TYPES OF MACHINE LEARNING


Supervised -- classification (deciding which images are cats and dogs etc.,
deciding sth, we have groups), regression (predicting stock prices) --> data
already has categories

Unsupervised -- clustering (we or -- the machines have to create these groups based
on the similarity of data), association rule learning (associate diff things to
predict, based on relationships between data, variables in data) --> data without
labels, eg csv without column names

Reinforcement --> teaching machines through trial and error, machine, f.e. learns a
game by learning a million of times to learn how to win etc.

ML - uses an algorithms to learn about patterns in data and learn based on them to
learn and predict sth in the future
ML Algo we start with an input and an ideal output, it look at the input and at the
output and tries to figure out the set of instructions in between these two

find patterns in data so that we can use them in the future to predict sth

ML framework - a template we can use to think about how to solve problems

////////////////////////////////////////////////////////////////////////////////

Data Collection - Data Modelling - Deployment

1. Create a framework
2. Match the framework to viable ML/DS tools
3. Learn by doing

Modelling - using ML algorithms to find insights within a dataset


Deployment - taking your set of instructions and using it in an app. This can be
anything from recommending products to customers on your online store to a hospital
trying to better predict disease presence.

Supervised learning, is called supervised because you have data and labels. A
machine learning algorithm tries to learn what patterns in the data lead to the
labels.

Unsupervised learning is when you have data but no labels. The data could be the
purchase history of your online video game store customers. Using this data, you
may want to group similar customers together so you can offer them specialised
deals. You could use a machine learning algorithm to group your customers by
purchase history.

After inspecting the groups, you provide the labels. There may be a group
interested in computer games, another group who prefer console games and another
which only buy discounted older games. This is called clustering.

What’s important to remember here is the algorithm did not provide these labels. It
found the patterns between similar customers and using your domain knowledge, you
provided the labels.

Transfer learning is when you take the information an existing machine learning
model has learned and adjust it to your own problem.

https://www.mrdbourke.com/a-6-step-field-guide-for-building-machine-learning-
projects/

Structured data — Think a table of rows and columns, an Excel spreadsheet of


customer transactions, a database of patient records. Columns can be numerical,
such as average heart rate, categorical, such as sex, or ordinal, such as chest
pain intensity.

Unstructured data — Anything not immediately able to be put into row and column
format, images, audio files, natural language text.

Static data — Existing historical data which is unlikely to change. Your companies
customer purchase history is a good example.

Streaming data — Data which is constantly updated, older records may be changed,
newer records are constantly being added. f.i. news headlines

1. Problem definition - what problem are we trying to solve


2. Data - what kind of data do we have
3. Evaluation - what defines success for us
4. Features - what do we already know about the data (body weight etc. our goal is
to turn those features into patterns to make predictions)
5. Modelling - what ML model should we use
6. Experimentation - How could we improve, what can we try next

ML isn't a solution for everything


Will a simple hand-coded instruction based system work?

Supervised/Unsupervised/Transfer/Reinforcement Learning
data - labels, ML Algo uses the data to match the labels
classification predicting if one things is one or another, more thatn two options,
multi calss classification
regression trying to predict a number

data- no labels
purchase history of ur clients, find patterns in data and match similar things
together
cluster algo matches similar data based on patterns and based on that you label
them

Transfer Learning - what one ml model has learned in another ml model


we can find a model that distinguishes cars and modify it to match dog breeds
use foundational patterns from the first one to use it in teh second one

reinforcement one - through trial and error, unitll it wins, for example, in a game

SL I know the inputs and outputs


UL I'm not sure of the outputs but I have inputs
TL I think my problem may be similar to something else and can I use another ML
model to apply it to my problem

Different types of evaluation metrics


Classification Regression Recommendation
Accuracy Mean absolute error Precision at K, top K predictions etc.
Precision Mean squared error
Recall Root mean squared error

Features------>>>>> different kinds of data withing our structured/unstructured


data
feature variables
we wanna use the feature variables to predict the target variables
numerical features
categorical features (one thing or another, like sex etc.)

Feature engineering - looking at different features of data and creating new


ones/altering existing ones
unstructured data has features as well, like dogs images - 4 legs rectangular
shapes etc.

what features should you use ? - only when samples has similar information

Feature coverage - how many samples have different features? Ideally, every sample
has the same features

Modelling - 3 sets
based our problem and data, what ML model should we use

1. Choosing and training a model training data


2. Tuning a model validation data
3. Model comparison test data

The most important concept in ML (the training, validation and test sets or 3 sets,
1 for training, 1 for validation and tuning and 1 for testing and comparing)

Course materials (training set) -> Practice exam (validation set) -> Final exam
(test set)
Generalization - the ability for a ml model to perform well on data it hasn't seen
before
we want to avoid memorization and training our odel on data it's seen before so we
split the data:
70-80% training split, 10-15% validation split (we check results there, model
tuning, improve them), 10-15% test split

Structured data ---> catboost, dmlcxgboost, gradient boosting, random forest tend
do wokr best
Unstructued data ---> deep learning, transfer learning, neural networks work best

minimize the time between experiments


try new things
start small and build up

tuning can happen both on the training(if there's no validation set) and the
validation set
ml models habe hyperparameteers we can adjuts
a models first results arent its last
improve the model

underfitting training 64% test 47%


overfitting training 93% test 99%
if there's a drastic difference between the training and the test sets, we have
underfitting or overfitting
a good model fits just right, balanced, goldilocks zone

training 98%
test 86% accuracy --> it's ok

why under/overfitting?
data leakage - test data leaks into training data
testing always stays the same - test set!!!

data mismatch - when data we test on is different from the data we train on, having
different features in testing and training data

data should have the same features, the same data


how to fix it (underfitting)?
try a more advanced model
increase model hyperparameters
reduce amount of features
train longer

overfitting
collect more data
try a less advanced model

when comparing 2 models, we compare apples to apples, oranges to oranges, the same
data
is the extra accuracy worth the longer time ? etc. etc.

head towards generality


keep the test set seperate at all costs
compare apples to apples, models created in the same sort of environment
one best performence metrci does not equal best model
anaconda -> jupyter notebook, numpy etc. etc. -> ...dararara...

WHAT IS CONDA?
ANACONDA/MINICONDA/CONDA
HARDWARE STORE/WORKBENCH/PERSONAL ASSISTANT
ANACONDA AND MINICONDA ARE SOFTWARE DISTRIBUTIONS, THEY COME WITH CODE OTHER PEOPLE
HAVE WRITTEN, PACKAGES

ANACONDA ~ 3GB it installs all of the major data science packages, many of them
dont get used
MINICONDA - the most useful packages, minimum requirements ~ 200MB
CONDA - comes with miniconda, it can help me set up my tools and my workbench etc.,
conda is a package manager, helps me download, install and manage the packages has
the abillity to create environments

miniconda + conda = python + matplotlib, numpy, pandas, scikitlearn, tensorflow


etc.

environment ---- collection of tools and packages we might want to use in our
project

conda is what we use to create the environment and install and update the tools we
wanna use

conda allows us to share the same tools and packages with others, we can just send
the project folder with the environment etc.

we dont wanna run into depedency issues

base is the default environment

./env withing the sample project one folder

that's where 2 comes in. A .yml is basically a text file with instructions to tell
Conda how to set up an environment.

For example, to export the environment we created earlier at


/Users/daniel/Desktop/project_1/env as a YAML file called environment.yml we can
use the command:

conda env export --prefix /Users/daniel/Desktop/project_1/env > environment.yml

Finally, to create an environment called env_from_file from a .yml file called


environment.yml, you can run the command:

conda env create --file environment.yml --name env_from_file

m markdown
y code

esc + y/m
enter goes back to the cell

a add a cell above


b add a cell below
dd delete a cell

when we exit a notebook and open it again, all the cells have to be executed again

ML - subdomain of CSm subset of AI that focuses on algorithms which help a computer


learn from data without explicit programming
subset of AI that tries to solve a specific problem and make predictions using data

AI - area of CS, where the goal is to enable computers to act like humans & perform
human-like tasks

DS - field that attempts to find insights from data(might use ML)

SL - uses labeled inputs (the input has a corresponding output label) to train
models and learn outputs
UL - uses unlabeled data to learn about patterns in the data, groups similar data,
similar data, (f.e. clustering)
RL - agent learning in interactive environment based on rewards and penalties

feature vector - inputs

qualitative - categorical data (finite number of categories or groups), gender etc.


nominal data (no inherent order) -> one hot encoding, "1" if matches category

ordinal data (inherent order), f.e. bad, not so good, mediocre, good, great etc.,
lets give it a number 1 2 3 4 5 etc. 5 - great 1 - bad etc.

quantitative data - numerical valued data (discrete or continues)

TYPES OF PREDICTIONS
SL:
classification - predict discrete classes (this is a dog, this is a cat etc.)
multi-class classification cat/dog/lizard etc.
binary - hot dog, not hot dog, cat/dog, positive/negative

regression - predict continuous values, f.e. the price of a house etc.

each row is a different sample in the data


each column represents a different feature

ideal outputs - targets vector

whole data set - features matrix

You might also like