AI Report Copy 1

nothing

Uploaded by

honeykanna2024

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

AI Report Copy 1

nothing

Uploaded by

honeykanna2024

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

AI REPORT 1

CS 362 Artificial Intelligence

Lab Report
Code Repository – https://github.com/siddhunayak/AILAB6

Manikanta Mettu Varshini Varma Siddhu Nayak

202211050 202211054 202211056

Abstract—The Artificial Intelligence Lab report presents a II. E XPERIMENT - 7 : M ATCHBOX E DUCABLE N AUGHTS
comprehensive exploration and discussion of solutions to Arti- AND C ROSSES E NGINE (W EEK 7)
ficial Intelligence challenges using established software libraries
and techniques that have been the focal point of our coursework. A. Objective
This report delves into the practical application of these • Maximize the expected reward by balancing exploration
concepts, showcasing how they can address real-world problems. (trying new arms) and exploitation (selecting known high-
This report covers a wide range of topics, including graph reward arms) in the context of a binary bandit with two
search, Heuristic function, Travelling salesman Problem, Non-
rewards.
deterministic Search, Simulated Annealing, MINIMAX, Alpha-
Beta Pruning, Bayesian Network, Hidden Markov Model, Deci- • Use an epsilon-greedy algorithm to decide upon the
sion Tree, Hopfield network, n-armed bandit problem. In this action to take for maximizing the expected reward.
report, we also will discuss, develop algorithms, and design the • Develop a modified epsilon-greedy agent to track non-
solution by utilizing popular python libraries. stationary rewards in a 10-armed bandit and show
In summary, the Artificial Intelligence Lab Report offers a whether it is able to latch onto correct actions or not
valuable resource for understanding the practical application after at least 10000 time steps.
of Artificial Intelligence concepts and tools. It sheds light on
the challenges and opportunities in this field, emphasizing the
relevance of coursework materials in addressing real-world B. Introduction
problems.
The lab assignment involves exploring the concept of the
N-armed bandit problem, which is a classic reinforcement
learning problem. The goal is to identify which lever to pull
in order to maximize cumulative reward based on trials. The
document discusses the exploration vs exploitation dilemma,
I. I NTRODUCTION
where exploration involves trying different arms to learn about
their reward distributions, and exploitation involves choosing
the greedy action by selecting the arm believed to have the
RTIFICIAL Intelligence research involves developing
A systems that can mimic human intelligence to perform
tasks such as understanding natural language, recognizing
highest expected reward based on available information. The
assignment also involves developing a 10-armed bandit in
which all ten mean-rewards start out equal and then take inde-
patterns, making decisions, and learning from data. An AI
pendent random walks. Additionally, it includes implementing
lab focuses on experimenting with various AI techniques
an epsilon-greedy agent to track non-stationary rewards and
and algorithms to address challenges in creating intelligent
analyzing the results after at least 10,000 time steps.
systems.
This report provides a summary of experiments conducted
C. Problem statement
in an AI lab, which are aimed at addressing the key challenges
faced by AI systems and implementing solutions to overcome 1) Read the reference on MENACE by Michie and check
them. These experiments cover a range of topics within for its implementations. Pick the one that you like the
AI, including machine learning, natural language processing, most and go through the code carefully. Highlight the
computer vision, robotics, and more. parts that you feel are crucial. If possible, try to code the
MENACE in any programming language of your liking.
The experiments in the AI lab involve tasks such as import-
ing and preprocessing data, training and evaluating machine
learning models, designing algorithms for decision-making M ETHODOLOGY
and problem-solving, and integrating AI systems into real- The MENACE experiment employs a reinforcement learn-
world applications. The goal is to develop AI systems that can ing framework using a physical representation (matchboxes
effectively understand and respond to user needs, while also and beads) to learn the game of Noughts and Crosses (Tic-Tac-
addressing challenges such as accuracy, efficiency, scalability, Toe). The primary components of the methodology include:
and adaptability to new data and environments. 1) Physical Model:
AI REPORT 2

• Each matchbox represents a unique game state, with The number of winning games increased, reflect-
•
colored beads signifying potential moves. ing a successful adaptation to the game dynamics
• The number of beads for each move indicates the through reinforcement learning.
likelihood of choosing that move. 2) Bead Count Adjustments:
2) Learning Mechanism: • The count of beads for winning moves increased,
• MENACE learns via trial and error, where it rewards making them more likely to be selected in subse-
successful moves by adding beads and penalizes quent games, while losing moves saw a reduction
unsuccessful ones by removing beads. in their bead counts.
• The agent starts with an equal probability of select- 3) Learning Behavior:
ing any move and adjusts its strategy based on past • Initial exploration was observed, with moves se-
game outcomes. lected more randomly. As MENACE gained experi-
3) State Representation and Action Selection: ence, it began to exploit the knowledge gained from
• Game states are encoded into matchboxes, each previous games, favoring successful strategies.
containing bead counts for available moves.
• Moves are selected randomly based on the weighted C ONCLUSION
counts of beads. The MENACE experiment serves as a pioneering example
4) Reinforcement Process: of reinforcement learning implemented in a physical format,
• After each game, MENACE updates the bead counts illustrating how trial-and-error learning can lead to improved
in the relevant matchbox based on whether it won decision-making over time. By leveraging a simple yet effec-
or lost. tive method of reward and punishment through bead counts
in matchboxes, MENACE effectively learns to play Noughts
5) Simulation:
and Crosses. The simulation on the Pegasus 2 computer
• After testing the physical model, MENACE was
demonstrated that reinforcement learning principles can be
implemented in a digital format on a Pegasus 2 applied digitally, providing insights into the learning process of
computer, allowing for programmatic handling of intelligent agents. This foundational work laid the groundwork
states and moves. for future advancements in machine learning and artificial
intelligence, highlighting the significance of exploration and
E XECUTION S TEPS exploitation in adaptive systems.
1) Initialization:
• Set up the matchboxes for all possible game
D. problem statement
states with initial bead counts representing potential 1) Consider a binary bandit with two rewards 1-success, 0-
moves. failure. The bandit returns 1 or 0 for the action that you
2) Game Simulation: select, i.e. 1 or 2. The rewards are stochastic (but station-
ary). Use an epsilon-greedy algorithm discussed in class
• Run multiple games of Noughts and Crosses, allow-
and decide upon the action to take for maximizing the
ing MENACE to select moves based on the bead
expected reward. There are two binary bandits named
counts and adjusting the counts after each game.
binaryBanditA.m and binaryBanditB.m are waiting for
3) Move Selection: you.
• For each game state, MENACE randomly selects a 2) Develop a 10-armed bandit in which all ten mean-
move from the matchbox, weighted by the number rewards start out equal and then take independent ran-
of beads for each move. dom walks (by adding a normally distributed increment
4) Updating Matchboxes: with mean zero and standard deviation 0.01 to all mean-
• After each game, update the bead counts in the
rewards on each time step)
matchbox corresponding to the game state based on 3) The 10-armed bandit that you developed is difficult to
the game outcome (win or loss). crack with a standard epsilon-greedy algorithm since
the rewards are non-stationary. We did discuss how to
5) Data Collection:
track non-stationary rewards in class. Write a modified
• Track the performance of MENACE over multiple
epsilon-greedy agent and show whether it is able to latch
games to analyze its learning process and effective- onto correct actions or not.
ness.
E. Methodology
R ESULTS
The methodology section details the operational aspects and
1) Performance Improvement: procedures inherent in the Bandit problem:
• Over time, MENACE demonstrated an ability to 1) Exploration vs Expliotation: - Exploration entails ven-
select winning moves more frequently as it accu- turing into unfamiliar territories, options, or strategies
mulated experience. to discover potential benefits or opportunities, whereas
AI REPORT 3

exploitation involves maximizing the gains from known, G. Results

established options or strategies, typically exploiting 1) Average Rewards for Bandit A:
existing knowledge or resources.
• The average rewards obtained for Bandit A under
2) Purpose: - Exploration aims to gather information about
different exploration-exploitation strategies are sum-
potential rewards, while exploitation focuses on max-
marized below.
imizing immediate gains based on current knowledge,
• For Bandit A, the probability of success for each
balancing both is crucial for optimizing long-term re-
arm is p = [0.1, 0.2].
wards.
3) Architecture: The N-arm bandit problem involves a 2) Average Rewards for Bandit B:
scenario where a decision-maker must repeatedly choose • The average rewards obtained for Bandit B under

between N different options, each with an unknown different exploration-exploitation strategies are sum-
reward distribution. marized below.
The approach typically involves a balance between • For Bandit B, the probability of success for each

exploration (trying different options to learn their re- arm is p = [0.8, 0.9].
wards) and exploitation (choosing the currently best-
known option to maximize short-term rewards), often H. Conclusion
implemented using algorithms like epsilon-greedy, UCB • The N-arm bandit experiment provided valuable insights
(Upper Confidence Bound), or Thompson sampling. into the effectiveness of different exploration-exploitation
strategies.
• Through experimentation with various exploration rates
F. Execution Steps (ϵ), we observed distinct trade-offs between exploration
and exploitation.
The following steps outline the execution process for
• The results indicate that a balanced approach (ϵ = 0.1)
comparing the binary bandits with different exploration-
often yields competitive performance, leveraging both
exploitation strategies:
exploration and exploitation effectively.
1) Initialize parameters: • High exploration rates (ϵ = 0.3) showed potential for
• Set the number of time steps T = 50. discovering new options but at the expense of immediate
• Define the exploration rates ϵ = [0.01, 0.1, 0.3]. rewards, while low exploration rates (ϵ = 0.01) leaned
2) Define Bandit A and Bandit B: heavily towards exploiting known options, risking over-
looking better alternatives.
• For Bandit A: p = [0.1, 0.2] • Overall, the N-arm bandit experiment underscores the
• For Bandit B: p = [0.8, 0.9] importance of carefully tuning exploration-exploitation
3) For each exploration rate ϵ: strategies to optimize cumulative rewards over time.
• Initialize empty arrays to store rewards for Bandit
A and Bandit B. I. References
4) At each time step t from 1 to 50: 1) Experiments on the mechanization of game-learning
• For each bandit: Part I. Characterization of the model and its param-
eters
– Choose an action:
∗ With probability ϵ, choose a random action.
∗ Otherwise, choose the action with the highest
estimated reward.
– Receive a reward based on the chosen action’s
success probability.
– Update the estimated reward for the chosen ac-
tion.
– Store the obtained reward in the respective array.
5) Calculate average rewards:
• Calculate the average reward for Bandit A and
Bandit B across all time steps for each exploration
rate.
6) Compare results:
• Analyze the average rewards obtained for Ban-
dit A and Bandit B under different exploration-
exploitation strategies to determine their effective-
ness.

RLbook Solutions Manual
No ratings yet
RLbook Solutions Manual
35 pages
Demonstrate Knowledge of The Stages of Child Development
No ratings yet
Demonstrate Knowledge of The Stages of Child Development
64 pages
Question 1) Search 10 Marks: Final Term Examination Spring-2020
No ratings yet
Question 1) Search 10 Marks: Final Term Examination Spring-2020
5 pages
COGS1010 Test 3
No ratings yet
COGS1010 Test 3
11 pages
ai_report_endsem
No ratings yet
ai_report_endsem
8 pages
Data Challenge - NC Soft
No ratings yet
Data Challenge - NC Soft
4 pages
SmartAgent - Creating Reinforcement Learning Tetris AI
No ratings yet
SmartAgent - Creating Reinforcement Learning Tetris AI
52 pages
qb serial test 1
No ratings yet
qb serial test 1
7 pages
Hindsight Experience Replay
No ratings yet
Hindsight Experience Replay
15 pages
Policy Gradient Reinforcement Learning Without Regret: by Travis Dick
No ratings yet
Policy Gradient Reinforcement Learning Without Regret: by Travis Dick
108 pages
ML Unit I Notes
No ratings yet
ML Unit I Notes
27 pages
AI Magazine - 2024 - Hanna - Toward the confident deployment of real‐world reinforcement learning agents
No ratings yet
AI Magazine - 2024 - Hanna - Toward the confident deployment of real‐world reinforcement learning agents
8 pages
Rl Catalogue
No ratings yet
Rl Catalogue
3 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
136 pages
Imitation:: Clean Imitation Learning Implementations
No ratings yet
Imitation:: Clean Imitation Learning Implementations
7 pages
Proposal PDF
No ratings yet
Proposal PDF
8 pages
ML Unit-I Chapter-I Introduction
No ratings yet
ML Unit-I Chapter-I Introduction
36 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
47 pages
AML774 Post Assignment 2
No ratings yet
AML774 Post Assignment 2
4 pages
RL-Endterm Report - Mridul Agarwal
No ratings yet
RL-Endterm Report - Mridul Agarwal
27 pages
CS5500: Reinforcement Learning Assignment 3: Additional Guidelines
No ratings yet
CS5500: Reinforcement Learning Assignment 3: Additional Guidelines
7 pages
Module 1
No ratings yet
Module 1
72 pages
Thesis Reinforcement Learning
100% (2)
Thesis Reinforcement Learning
5 pages
Module 2 PDF
No ratings yet
Module 2 PDF
26 pages
RL Introduction
No ratings yet
RL Introduction
225 pages
Ecs 403 ML Module I
No ratings yet
Ecs 403 ML Module I
33 pages
Final
No ratings yet
Final
18 pages
Building Reinforcement Learning Environment
No ratings yet
Building Reinforcement Learning Environment
7 pages
AGAMDEEP
No ratings yet
AGAMDEEP
25 pages
Question Bank RL
No ratings yet
Question Bank RL
4 pages
Reinforcement Learning: Pablo Zometa - Department of Mechatronics - GIU Berlin 1
No ratings yet
Reinforcement Learning: Pablo Zometa - Department of Mechatronics - GIU Berlin 1
12 pages
Towards Delivering a Coherent Self-Contained Explanation of Proximal Policy Optimization
No ratings yet
Towards Delivering a Coherent Self-Contained Explanation of Proximal Policy Optimization
36 pages
16 - Reinforcement Learning and Bandits.pptx
No ratings yet
16 - Reinforcement Learning and Bandits.pptx
41 pages
Games of Knightian Uncertainty As AGI Testbeds: Ntroduction
No ratings yet
Games of Knightian Uncertainty As AGI Testbeds: Ntroduction
4 pages
Sos - Aditi Kadam
No ratings yet
Sos - Aditi Kadam
8 pages
Project
No ratings yet
Project
47 pages
Main
No ratings yet
Main
62 pages
21csc206t Ai Ft3 Set b Answer Key
No ratings yet
21csc206t Ai Ft3 Set b Answer Key
10 pages
Learning What To Value
No ratings yet
Learning What To Value
6 pages
AS01
No ratings yet
AS01
14 pages
Assignment 1
No ratings yet
Assignment 1
24 pages
A Survey of Demonstration Learning
No ratings yet
A Survey of Demonstration Learning
30 pages
RL
No ratings yet
RL
94 pages
Written Assignment 1
No ratings yet
Written Assignment 1
2 pages
history of ai 3
No ratings yet
history of ai 3
9 pages
IITM_MS_Thesis____Final
No ratings yet
IITM_MS_Thesis____Final
83 pages
Paper Fiuri
No ratings yet
Paper Fiuri
17 pages
Lec 1 Intro Course Overview
No ratings yet
Lec 1 Intro Course Overview
50 pages
Skill-Based Curiosity For Intrinsically Motivated Reinforcement Learning
No ratings yet
Skill-Based Curiosity For Intrinsically Motivated Reinforcement Learning
20 pages
Mid_Sem_Report (1)
No ratings yet
Mid_Sem_Report (1)
10 pages
Layout AI
No ratings yet
Layout AI
2 pages
Solution ML KOE -073 PUT(7th Sem 2024-25) Neeru
No ratings yet
Solution ML KOE -073 PUT(7th Sem 2024-25) Neeru
14 pages
Ai Lab Manual Artificial Intelligence Lab Using Python (LC-CSE-326G)
No ratings yet
Ai Lab Manual Artificial Intelligence Lab Using Python (LC-CSE-326G)
29 pages
mdp1 6pp
No ratings yet
mdp1 6pp
13 pages
ML - Unit 1 - Part I
No ratings yet
ML - Unit 1 - Part I
24 pages
r65
No ratings yet
r65
180 pages
Lecture 1
No ratings yet
Lecture 1
69 pages
2312.08365v2
No ratings yet
2312.08365v2
39 pages
RL2024 Phase1 OGV2
No ratings yet
RL2024 Phase1 OGV2
2 pages
Reinforcement Learning Details
No ratings yet
Reinforcement Learning Details
9 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Intermediate AI Prompting – Reinforcement Learning
From Everand
Intermediate AI Prompting – Reinforcement Learning
Eric Centore
No ratings yet
Quiz 5
No ratings yet
Quiz 5
9 pages
Mess Menu - 3
No ratings yet
Mess Menu - 3
5 pages
5.CISC and RISC Architecture
No ratings yet
5.CISC and RISC Architecture
14 pages
3.addressing Mode
No ratings yet
3.addressing Mode
9 pages
Active Concept Getting 2. Passive Concept Getting
50% (2)
Active Concept Getting 2. Passive Concept Getting
6 pages
Objectives of Teaching Technical English in Engineering Colleges
No ratings yet
Objectives of Teaching Technical English in Engineering Colleges
28 pages
SPED Report PSDS
No ratings yet
SPED Report PSDS
2 pages
Impact of Chemical Engineering Design On Society by
No ratings yet
Impact of Chemical Engineering Design On Society by
15 pages
AP Research Paper Final Rubric
No ratings yet
AP Research Paper Final Rubric
2 pages
Lesson Plan For EDUC 614: Grade (K-12) /developmental Level: Kindergarten
No ratings yet
Lesson Plan For EDUC 614: Grade (K-12) /developmental Level: Kindergarten
14 pages
Testmanship Tips-Test Wiseness by Anonymous
No ratings yet
Testmanship Tips-Test Wiseness by Anonymous
6 pages
Task Designing
No ratings yet
Task Designing
3 pages
Presentation On Business Communication: Presented To Dr. Tilak Raj
No ratings yet
Presentation On Business Communication: Presented To Dr. Tilak Raj
17 pages
Purpose of The Curriculum
No ratings yet
Purpose of The Curriculum
4 pages
English Compulsory Model Paper
86% (7)
English Compulsory Model Paper
46 pages
ABC Recording Form - Fillable
No ratings yet
ABC Recording Form - Fillable
3 pages
Chapter 4 - Reaction Paper
No ratings yet
Chapter 4 - Reaction Paper
2 pages
Bartlett 1932 Memory War of The Ghosts
No ratings yet
Bartlett 1932 Memory War of The Ghosts
6 pages
Daily Lesson Plan Managing Money, Grade 3: Word AB C D Meaning
No ratings yet
Daily Lesson Plan Managing Money, Grade 3: Word AB C D Meaning
3 pages
Types of Research in Public Administration and Governance
No ratings yet
Types of Research in Public Administration and Governance
16 pages
Powerpoint TM Level 1
100% (1)
Powerpoint TM Level 1
32 pages
Inquiry
No ratings yet
Inquiry
8 pages
Science Lesson Plan
No ratings yet
Science Lesson Plan
4 pages
Tugasan / Assignment: Kajian Tindakan Dalam Pendidikan Awal Kanak-Kanak
No ratings yet
Tugasan / Assignment: Kajian Tindakan Dalam Pendidikan Awal Kanak-Kanak
9 pages
aPuzzTrend by SDS Romy Triambulo
No ratings yet
aPuzzTrend by SDS Romy Triambulo
23 pages
Music Technology Promoting Violin and String Instrument Instruction
No ratings yet
Music Technology Promoting Violin and String Instrument Instruction
13 pages
Brain Left and Right Hemisphere PPT Presentation
No ratings yet
Brain Left and Right Hemisphere PPT Presentation
10 pages
DR Heila Pienaar: D Phil (An Interdisciplinary Doctorate in Information
No ratings yet
DR Heila Pienaar: D Phil (An Interdisciplinary Doctorate in Information
14 pages
Mind-Body Medicine Presentation 8 23 13 English
No ratings yet
Mind-Body Medicine Presentation 8 23 13 English
41 pages
Apjeas-2017 4 2 11 PDF
No ratings yet
Apjeas-2017 4 2 11 PDF
8 pages
A Night Out in Newcastle
No ratings yet
A Night Out in Newcastle
3 pages
Purpose of Life
No ratings yet
Purpose of Life
4 pages