0% found this document useful (0 votes)
47 views

MDP - Thesis - Assignment-Ankit Khatri

The document describes a proposed experiment to test whether human metacognitive learning follows reinforcement learning mechanisms. The experiment would use a Mouselab MDP paradigm where participants plan routes between nodes to maximize rewards. Data on cumulative rewards would be collected across 12 trials with varying reward structures to test for effects of scarcity and misalignment. The data would be analyzed by plotting reward trends to compare performance under different conditions and test predictions from reinforcement learning. A sample size would be determined using confidence intervals and would include participants of different ages and experience levels.

Uploaded by

ANKIT KHATRI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

MDP - Thesis - Assignment-Ankit Khatri

The document describes a proposed experiment to test whether human metacognitive learning follows reinforcement learning mechanisms. The experiment would use a Mouselab MDP paradigm where participants plan routes between nodes to maximize rewards. Data on cumulative rewards would be collected across 12 trials with varying reward structures to test for effects of scarcity and misalignment. The data would be analyzed by plotting reward trends to compare performance under different conditions and test predictions from reinforcement learning. A sample size would be determined using confidence intervals and would include participants of different ages and experience levels.

Uploaded by

ANKIT KHATRI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Assignment

Understanding human metacognitive learning at the Max Planck Institute for


Intelligent Systems - Rationality Enhancement Group

Submitted By : Ankit Khatri

Task 1: We want to test whether human metacognitive learning follows a reinforcement learning
mechanism by testing the three predictions (scarcity, delay, misalignment). Please select one of them and
describe how you would test it by answering the following questions:

1. Which experimental design would you use and what would be the procedures and materials
of the experiment?

Ans. I have considered two situations for this assignment. One, where many of the decisions go
unrewarded and two, misalignment, where rewards are determined by luck more than by the
quality of the plan, for testing whether Human metacognitive learning follows Reinforcement
learning mechanism. Reinforcement learning struggles to learn in these scenarios. To test whether
there is such an effect on Human metacognitive learning as well, I have designed an experiment
scenario which is discussed below. I’ll try to prove that there is less prominent growth in rewards
as the participant continues to play, in the case of scarcity and misalignment situations.

Experiment. It is highly difficult to observe people’s metacognitive behavior and learning


patterns. Hence, to externalize the mental representation that people use for planning, I have used
the MDP Mouselab paradigm which is a process-tracing paradigm which renders people’s
behavior highly diagnostic of their planning strategies . In this paradigm, each participant has to
plan their route starting from an origin point till finishing point along some path. Each node in the
path has some gain and loss associated with it, thus the goal of the experiment is to choose a path
in order to maximize the overall score. The values in these nodes are initially hidden, but can be
clicked to reveal it by paying a certain price. This is done in order to encourage participants to
reveal the information as necessary, rather than clicking all at once.

Materials. Using the MDP Mouselab paradigm, I created an experiment which has a total of 12
rounds for each participant. In the first four rounds, the nodes contain random reward values
which contain two rounds for low click cost and two for high click cost. Each node, when clicked,
displays an emoticon which is mapped to a random reward value which is randomly updated in
each subsequent round so that the user cannot memorize. In the next four rounds, each participant
can click to display the associated reward values with each node to plan their path accordingly,
which again contains two rounds each for low and high click cost respectively. In the last 4
rounds, the maze contains certain nodes with zero reward values. These nodes are selected
randomly in each round so that the participant cannot memorize the location of such nodes in
each subsequent round. These 4 rounds will be used to test the scarcity metric to know the extent
of effect of zero rewards (actions which go unrewarded) on human metacognitive learning.
Procedure. After recruiting participants (the procedure for their selection is discussed in the next
section), each one of them is given the task to complete all the 12 rounds in the experiment. The
scores are displayed on the screen and are updated after each action (click, move).

2. How would you determine the sample size?

Ans. A sample size is the number of observations taken from a population for a survey or
experiment. Its value must be optimally suitable for the concerned experiment. It depends on
various factors like the nature of the population (homogeneous or heterogeneous), availability of
resources, sampling method used, degree of accuracy required, etc. There are several methods to
determine optimum sample size, like :

● Arbitrary approach (5% or 10%)


● Conventional approach (sample size of similar studies)
● Cost benefit analysis
● Confidence interval approach (Cochran's formula & Slovin's formula)

Ideally, we will first calculate the sample size for an infinite population and then adjust it to the
required population using the confidence interval approach. The required population may consist
of participants from different age groups. I have divided the total population in 3 age groups.

● The first group will contain individuals in the age group of 14-18 having a high school
percentage over 85%.
● Next group will contain individuals in the age group of 21-35, having cleared the
AMCAT (All India level) examination score of 500 or above out of 900 or cleared other
exams like JEE/CDS/SSC.
● Next will have participants with an age of over 50 years, having at least 15 years of
experience in their respective fields.

The Sample size can also be determined by the confidence interval approach for each of the age
groups. The required population may also consist of participants who have completed 100+
Human Intelligence Tasks (HIT), had HIT approval rate of at least 90% and were located in the
United States as used in other similar studies. [1]

3. What data would you collect?

Ans. The data that would be collected is the total average cumulative reward for each trial and for
each age group.
4. How would you analyze the collected data?

Ans. As discussed above, the data collected was the average cumulative reward for each trial and
each age group to test the hypotheses whether human metacognitive learning follows a
reinforcement learning mechanism.
Trials 1-4 will be for misalignment metric, 9-12 is for scarcity metric. The data (reward values)
collected from these trials would be plotted on a line graph for each age group. The trend of these
two metrics will be compared with the results of the trials in rounds 5-8 for testing the hypothesis.
Concretely, we predict that in the case of the trials having misalignment (random rewards) and
scarcity (unrewarded decisions), the average cumulative reward should show a less prominent
growth as compared to the trials 5-8, where the participant can plan better as he is able to
clearly view the rewards of future states in order to maximize the cumulative reward value. It will
clearly show a more prominent upward trend as compared to the other trials.

Task 2

Hosted Web App : https://ankitknitj.github.io/mcl_experiment/


GitHub Repository : https://github.com/ankitknitj/Masters_Thesis_MDPAssignment

References

1. ​He, R., Lieder, F., & Jain, Y. (2021, July). Measuring and modeling how people learn how to
plan and how people adapt their planning strategies to the structure of the
Environment.
2. He, Ruiqi, Yash Raj Jain, and Falk Lieder. "Have I done enough planning or should I plan
more?." arXiv preprint arXiv:2201.00764 (2022).
3. jsPsych: https://www.jspsych.org
4. Mouselab MDP: https://github.com/fredcallaway/Mouselab-MDP

You might also like