0% found this document useful (0 votes)
8 views

Appendix

The document provides an overview of humanoid robots and their technological levels, ranging from manual control to full autonomy. It also introduces a multimodal neural network designed to enhance productivity in industrial settings and discusses the implementation of convolutional neural networks (CNN) in a simplified manner. The content is tailored for first-year students, including those without a STEM background.

Uploaded by

nihilnoths
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Appendix

The document provides an overview of humanoid robots and their technological levels, ranging from manual control to full autonomy. It also introduces a multimodal neural network designed to enhance productivity in industrial settings and discusses the implementation of convolutional neural networks (CNN) in a simplified manner. The content is tailored for first-year students, including those without a STEM background.

Uploaded by

nihilnoths
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Appendix I

Humanoid robots
The slides are designed for first-year students, including non-STEM
students.
(Please note that many details have been simplified or omitted for
clarity and ease of understanding.)
Humanoid robots

One multimodal neural network


that unifies perception, language
understanding, and learned 6 August 2024 - Figure 02 aims to enhance productivity in industrial settings,
tested at BMW's Spartanburg plant.
control to overcome multiple
longstanding challenges in
robotics

Runs entirely on embedded


low-power GPUs, making it
more feasible for real-world
applications
21 February 2025 – The Figure company has introduced Helix AI, a revolutionary
technology that enables humanoid robots to autonomously identify and
manipulate a wide range of unfamiliar household objects in real-time.
6 levels of humanoid robot technologies
Level Name Description Example Tasks
The humanoid robot is entirely Remote-controlled walking, lifting objects, or performing
L0 Manual Control
controlled by a human operator. tasks under direct human input.
The robot can assist with specific tasks
Balancing while walking, following simple commands (e.g.,
L1 Assisted Operation but requires constant human
"pick up this object") with human guidance.
oversight.
The humanoid robot can perform
Walking on flat surfaces, performing repetitive tasks like
L2 Semi-Autonomous predefined tasks autonomously in
sorting items, or basic object recognition.
constrained environments.
The robot can perform tasks
Context-Aware Navigating dynamic environments, carrying items, interacting
L3 autonomously but needs human
Automation with people under supervision.
intervention in complex scenarios.
The robot performs almost all tasks
independently in most environments Complex household chores, assisting the elderly, dynamic
L4 Highly Autonomous
but may still require human input in problem-solving, or performing skilled labor.
rare cases.
The humanoid robot can operate fully
Full independence in caregiving, construction, customer
L5 Fully Autonomous autonomously in any environment and
service, or any other human-like activity.
handle unexpected situations.
Appendix II
High level ideas of CNN
The illustrations are designed for first-year students, including non-STEM students.

- Out of syllabus -

(Please note that many details have been simplified or omitted for clarity and ease
of understanding.)
From the idea to implement (simplified)
Feature not present
0 0 1 0 0 1 0 0 1 0 1 0 0
0 0 1 0 1 0 0 0 1 0 1 0 What we want is for a
0 0 1 1 0 0 0 0 1 0 1 0 higher value in the
0 0 1 1 0 0
0 0 1 0 1 0
The sub-area of Filter 1 feature map to
the input Feature map of Filter 1
0 0 1 0 0 1 indicate higher
0*0 0*1 1*0
Input activation, meaning
0*0 0*1 1*0 the feature is more
0*0 0*1 1*0 strongly present in that
A simplified way to do it in math is by region of the image.
multiplying each subarea value with
the corresponding filter value and
summing the results.
From the idea to implement (simplified)
Feature strongly
3 presents
0 0 1 0 0 1 0 1 0 0 1 0 0
0 0 1 0 1 0 0 1 0 0 1 0
0 0 1 1 0 0 0 1 0 0 1 0
0 0 1 1 0 0 The sub-area of Filter 1
0 0 1 0 1 0 the input Feature map of Filter 1
0 0 1 0 0 1
0*0 1*1 0*0
Input
0*0 1*1 0*0
0*0 1*1 1*0
From the idea to implement (simplified)
Feature presents a
0 0 1 0 0 1 1 0 0 0 1 0 0 3 1 little bit
0 0 1 0 1 0 1 0 1 0 1 0
0 0 1 1 0 0 1 1 0 0 1 0
0 0 1 1 0 0 The sub-area of Filter 1
0 0 1 0 1 0 the input Feature map of Filter 1
0 0 1 0 0 1
1*0 0*1 0*0
Input
1*0 0*1 1*0
1*0 1*1 0*0
From the idea to implement (simplified)
0 0 1 0 0 1 0 1 0 0 3 1 1
0 0 1 0 1 0 0 1 0 0 3 2 1
0 0 1 1 0 0 0 1 0 0 3 2 1
0 0 1 1 0 0 Filter 1 0 3 1 1
0 0 1 0 1 0
Feature map of Filter 1
0 0 1 0 0 1
Input

The feature map for filter 1


is computed, and the
highest value indicates
where the pattern is most
strongly present.
Neural network? (simplified)
0 0 1 0 0 1 1: 0 Okay, but how this is
0 0 1 0 1 0 2: 0
0 0 1 1 0 0 3: 1 possibly related to
4: 0
0 0 1 1 0 0 Neural Network?


0 0 1 0 1 0
7: 0
0 0 1 0 0 1

36 inputs
8: 0
Input 9: 1
10: 0
1 2 3 4 5 6


7 8 9 10 11 12 13: 0 Each pixel of the input
13 14 15 16 17 18 14: 0
15: 1
image corresponding to one
19 20 21 22 23 24
25 26 27 28 29 30 16: 1 node in the input layer.
31 32 33 34 35 36 … There are 36 nodes in the
36: 1
The slot number of input layer.
the 36 input values
Input layer
Neural network? (simplified)
0 0 1 0 0 1 1: 0
0 0 1 0 1 0 2: 0
0 0 1 1 0 0 3: 1
4: 0 Feature map of Filter 1
0 0 1 1 0 0


0 0 1 0 1 0
7: 0
0 0 1 0 0 1 8: 0
Input 9: 1
10: 0
1 2 3 4 5 6 A neuron corresponds to a


7 8 9 10 11 12 13: 0
13 14 15 16 17 18 14: 0
slot in the feature map of a
19 20 21 22 23 24 15: 1 filter and computes its
16: 1
25 26 27 28 29 30 activation by performing
31 32 33 34 35 36 …
36: 1 element-wise multiplication
The slot number of
the 36 input values and summing the results.
Input layer
Neural network? (simplified)
0 0 1 0 0 1 0 0 1 1: 0
0 0 1 0 1 0 0 0 1 2: 0
0 0 1 1 0 0 0 0 1 3: 1
4: 0 Feature map of Filter 1
0 0 1 1 0 0 The sub-area of


0 0 1 0 1 0 the input 7: 0
0 0 1 0 0 1 8: 0
Input 9: 1
10: 0
1 2 3 4 5 6


7 8 9 10 11 12 13: 0 This neuron focuses on the
14: 0
13 14 15 16 17 18
15: 1
3x3 red-marked sub-area and
19 20 21 22 23 24
25 26 27 28 29 30 16: 1 is linked to the corresponding
31 32 33 34 35 36 … 9 input pixels (instead of
36: 1
The slot number of linking to all input).
the 36 input values
Input layer
Neural network? (simplified)
0 0 1 0 0 1 0 0 1 0 1 0 1: 0
0 0 1 0 1 0 0 0 1 0 1 0 2: 0 0
0 0 1 1 0 0 0 0 1 0 1 0 3: 1
1
4: 0 Feature map of Filter 1
0 0 1 1 0 0 The sub-area of 0
Filter 1


0 0 1 0 1 0 the input 0
7: 0
0 0 1 0 0 1 8: 0 1
Input 9: 1 0
10: 0
0
1 2 3 4 5 6 The filter values are the


1
7 8 9 10 11 12 13: 0
13 14 15 16 17 18 14: 0 0 weight of the neural
19 20 21 22 23 24 15: 1 network (they are
25 26 27 28 29 30 16: 1

obtained through
31 32 33 34 35 36
36: 1 training)
The slot number of
the 36 input values
Input layer
Neural network? (simplified) 0

0 0 1 0 0 1 0 0 1 0 1 0 1: 0
0 0 1 0 1 0 0 0 1 0 1 0 2: 0 0
0 0 1 1 0 0 0 0 1 0 1 0 3: 1
1
4: 0 Feature map of Filter 1
0 0 1 1 0 0 The sub-area of 0
Filter 1


0 0 1 0 1 0 the input 0
7: 0
0 0 1 0 0 1 8: 0 1 0
Input 9: 1 0
10: 0
0*0 0*1 1*0 0 The activation value of a
1 2 3 4 5 6


1
7 8 9 10 11 12 0*0 0*1 1*0 13: 0 neuron corresponds to a
13 14 15 16 17 18 14: 0 0
value in the feature map,
19 20 21 22 23 24 0*0 0*1 1*0 15: 1
16: 1
computed as the sum of the
25 26 27 28 29 30
31 32 33 34 35 36 … input values multiplied by
36: 1 their respective weights.
The slot number of
the 36 input values
Input layer
Neural network? (simplified) 0

0 0 1 0 0 1 0 1 0 1: 0
0 0 1 0 1 0 0 1 0 2: 0
0 0 1 1 0 0 0 1 0 3: 1
4: 0 Feature map of Filter 1
0 0 1 1 0 0 The sub-area of


0 0 1 0 1 0 the input 7: 0
0 0 1 0 0 1 8: 0 0 As another example, this
Input 9: 1 neuron focuses on the
10: 0
1 2 3 4 5 6 next 3x3 red-marked sub-


7 8 9 10 11 12 13: 0 area and is linked to the
13 14 15 16 17 18 14: 0
19 20 21 22 23 24 15: 1
corresponding 9 input
25 26 27 28 29 30 16: 1 pixels.
31 32 33 34 35 36 …
36: 1
The slot number of
the 36 input values
Input layer
Neural network? (simplified) 0

0 0 1 0 0 1 0 1 0 0 1 0 1: 0
0 0 1 0 1 0 0 1 0 0 1 0 2: 0
0 0 1 1 0 0 0 1 0 0 1 0 3: 1 0
4: 0 Feature map of Filter 1
0 0 1 1 0 0 The sub-area of 1
Filter 1


0 0 1 0 1 0 the input 0
0 0 1 0 0 1
7: 0 The filter values are
8: 0 0 0
Input 9: 1 1 the weight of the
10: 0 0 neural network, the
1 2 3 4 5 6


7 8 9 10 11 12 13: 0 0
same set of weights
13 14 15 16 17 18 14: 0 1 as they are also from
19 20 21 22 23 24 15: 1 0
25 26 27 28 29 30 16: 1 filter 1.
31 32 33 34 35 36 …
36: 1
The slot number of
the 36 input values
Input layer
Neural network? (simplified) 0 3

0 0 1 0 0 1 0 1 0 0 1 0 1: 0
0 0 1 0 1 0 0 1 0 0 1 0 2: 0
0 0 1 1 0 0 0 1 0 0 1 0 3: 1 0
4: 0 Feature map of Filter 1
0 0 1 1 0 0 The sub-area of 1
Filter 1


0 0 1 0 1 0 the input 0
7: 0 The activation value of a
0 0 1 0 0 1 8: 0 0
0
Input 9: 1
neuron corresponds to a
1
10: 0 0 value in the feature map,
1 2 3 4 5 6 0*0 1*1 0*0
computed as the sum of the


3
7 8 9 10 11 12 0*0 1*1 0*0 13: 0 0 input values multiplied by
13 14 15 16 17 18 14: 0 1
0*0 1*1 0*0 15: 1 0 their respective weights.
19 20 21 22 23 24
25 26 27 28 29 30 16: 1
31 32 33 34 35 36 …
36: 1
The slot number of
the 36 input values
Input layer
Neural network? (simplified)
1: 0
2: 0
3: 1
Now, there are two very 4: 0
special properties here that


make it different from the 7: 0
8: 0 0
fully connected neural
9: 1
network that you learnt in 10: 0
lecture 3, can you spot the


3
differences? 13: 0
14: 0
15: 1
16: 1



(for each slot in feature
36: 1
map and on each filter)

Input layer
Special property 1 – Local connectivity
1: 0
2: 0
Not every input link 3: 1
4: 0
with every neuron!


7: 0
8: 0 0
Because we focus on 9:
10:
1
0 Biological Inspiration:
recognizing pattern in This concept is inspired by the


3
13: 0
sub-area ☺. 14: 0
visual cortex in the brain,
15: 1 where neurons respond
16: 1 primarily to stimuli in their


… local receptive fields.
(for each slot in feature
36: 1
map and on each filter)

Input layer
Special property 2 – Weight sharing
1: 0
2: 0
Some neurons share 3: 1
4: 0
exact the same set


of weights! 7: 0
8: 0 0 Same set of weights
9: 1
Because the same 10: 0 There should be in total of


3 16 neurons corresponding
filter is applied across 13: 0
to the 16 slots in the
14: 0
different subareas of feature map of filter 1


15: 1
the input image. 16: 1 sharing the same set of 9
… weights (all detecting the
(for each slot
36: 1 same pattern).
in feature
map and on
Input layer each filter)
Making good use of these properties

Property 1. Local connectivity


Reduces the number of trainable parameters, making training more efficient.
Allows for hierarchical feature learning, where simple features are detected first
and combined into more complex patterns.

Property 2. Parameter sharing


Further reduces the number of parameters in the network. Instead of
learning a unique set of weights for every connection (as in fully connected
layers), the same set of weights (the filter) is used across different subareas
of the input.
One more idea - Pooling

Pooling: Even if we shrink the image by picking only the most


important parts, we can still recognize the object.

Pooling
Convolutional Neural Networks (CNNs)

Incrementally extracts more complex and abstract features


97% Cat

1% Mongoose

… 2% Ocelot
Classification with fully
connected Neural Network
Input on the extracted feature Output

You might also like