Homework: Prediction Methods and Machine Learning
Homework: Prediction Methods and Machine Learning
Pierre Michel
1
• Different optimizers available (e.g. stochastic gradient descent “sgd”).
MNIST
The MNIST dataset is a subset of the NIST (National Institute of Standards and Technology). It is a dataset
composed of 70000 images of digits (written by a panel of people) labelled according to the digit represented.
The database is divided into a set of 60000 training images and 10000 test images. The small size of the
images (28 × 28) makes it possible to quickly train neural networks that can be up to 99% accurate.
Matplotlib
The library MatplotLib is very useful for drawing graphics and viewing images in Python. In deep learning
applications its main use is to visualize:
• images from the training games
• classifier predictions
• convolution filters learned by neural networks
• images generated by the generating networks
• etc.
2
Exercise 3: Practice on other datasets
You will have to use the skills you have acquired in the previous exercises to perform classification tasks on
more complex data sets than MNIST. For example, use images from CIFAR10 to classify images in color
(beware of the shape of the input data!), or the images from the Fashion-MNIST dataset that allow you to
use neural networks on grayscale images with more complex content.
You can also search for other datasets, consider a task other than classification, try to detect the position of
a particular object in an image, classify or predict events from data other than images (think of text, or big
data problems for example). . . it’s up to you to define your objective!
In this context you will have to:
• Present your objective clearly: which task are you performing ?
• Present the dataset used (dimensions, predictors, target, and some descriptive statistics).
• Present the architecture of your neural network and illustrate it.
• Illustrate the prediction performance and results of your network.
CIFAR10
The CIFAR10 database (available via Keras) is a subset of the CIFAR100 database, collected by Alex
Krizhevsky, Vinod Nair et Geoffrey Hinton (source: https://www.cs.toronto.edu/~kriz/cifar.html). It
contains 60000 color image of size 32 × 32 labelled according to 10 classes (6000 images per class). 50000
images are used for the training set, 10000 for the test set.