Guidelines
Guidelines
Updated: [11/25/24]
Version 2.0
Table of Contents
Project Overview .................................................................................................................... 1
Terminology ........................................................................................................................... 2
Workflow ............................................................................................................................... 2
Step 1: Identify Expected Image and Prompt Information ..................................................... 2
Step 2: Identify a Suitable Image and Upload ....................................................................... 3
Step 3: Paste in the Image Source URL ................................................................................ 6
Step 4: Write a Prompt that Requires Your Selected Image ................................................... 7
Step 5: Write Out your Solution Steps and Final Answer ....................................................... 8
Step 6: Provide an Answer to the Prompt ............................................................................. 8
Step 7: Write an Open-World Prompt and Answer ................................................................ 9
More About LLMs & Reasoning .......................................................................................... 11
Visual Category Example Images ...................................................................................... 11
Capability Categories and Examples ................................................................................. 19
Difficulty Definitions ......................................................................................................... 20
Math Definitions ............................................................................................................... 21
Language Category Definitions ......................................................................................... 22
Complete Prompt Examples ............................................................................................. 22
Tips for Writing Good Prompts .......................................................................................... 29
Project Overview
Large Language Models (LLMs) often lack strong reasoning capabilities, such as those required to
explain how to solve a math problem or interpret unique graphical data. We can improve an LLM’s
reasoning performance by providing structured examples that train the LLM to generate accurate and
logical replies.
For this project, you will source images that adhere to a document category and meet the image quality
guidelines from either personal photographs or online sources. You will then write a prompt that asks a
single question which requires visual interpretation of information within the image based on the
See the More About LLMs & Reasoning section to learn more.
Terminology
Terms Definitions
Annotation Documenting and correcting the errors identified in the model’s responses
Capability Category The capability of the model that is being tested by the prompt
Domain The image category: Structured Documents or Math
L2 Visual Category The type of image within the provided domain
Language Category The specific combination of languages to be used in the images and prompts
Math Definition A specialty for the math prompt within the capabilities
Open-World A prompt type that does not fall into the predefined capability categories
Prompt The initial query or instruction given to the model
Workflow
Step 1: Identify Expected Image and Prompt Information
1. Requirements for your image and prompt will include the following fields:
• L1 Visual Category (Domain)
• Visual Category
• Capability Category
• Math Definition, if applicable
• Language Category, if applicable
• Difficulty
a. The interface for the “Advanced Search” feature highlighting the inclusion of imgur.com
as a restriction is shown here:
4. Look for images in the “Images” tab of the search that meet your needs. Note that the
“site:imgur.com” has the same result as restricting using the Advanced Search approach.
Once you have saved a suitable image, upload it using the file upload feature by either drag-
and-dropping the image or using “Browse” to find the image on your local filesystem.
• Search the domain list above for the domain to confirm that it’s allowed!
Are not links to web pages that happen to contain the image.
If your image is a personal photograph, use an image hosting service to upload the image so that a
source URL can be provided (e.g. imgur.com).
You will receive feedback for your prompt and image, which you should take into consideration to fine-
tune your prompt to ensure that it relates to and requires the image.
You should use this provided Explanation as a basis, making edits as necessary to confirm both
accuracy of the explanation and adherence to the style that is expected for your explanation. All
prompt explanations must end with the prompt answer as a standalone line of text.
For math prompts, explanations must additionally adhere to the following requirements:
Provide a step-by-step approach to solve the prompt and come to the final answer.
Include step numbers in the explanation.
Does not skip any operations or processes that would be necessary to solve the
prompt.
The generated explanation may contain errors and likely will require edits! Please review the
generated prompt and make sure that it adheres to proper grammar or other constraints, has a
natural flow to the language, and is entirely accurate.
Please take this feedback into consideration for ways to enhance your explanation and to
identify any elements which may have been missed in the explanation process. Once you have
validated your explanation, you can continue to either submit an Open-World Prompt for your
image (if required: Step 9) or move on to Translate your prompt, explanation, and answer (Step
10)!
For these prompts, you will also be expected to provide a Prompt Answer, in line with the normal
expectations for prompt answers based on Step 6:
Once your translation has been presented to you, you have completed the prompt. Select
the blue “Submit & Continue” in the bottom right-hand corner to confirm the fields adhere to the
guidelines, then submit the task and move on to a fresh unit. Well done!
Text
Documents
Financial
Tables
Line Graph
Bubble
Chart
Population
Pyramid
Dot Plot
Stem-and-
Leaf Plot
Pie Chart
Photo of These should be photos taken of a computer monitor displaying the plot. These images
Computer should NOT be a screenshot.
Monitor • The background surrounding the monitor can be in or out of focus, and can have
multiple colors, as long as the image on the monitor can be visually assessed.
Other For Images in the Math domain, images not taken of a computer monitor can be any
other type of image that is appropriate to the capability and math definition.
Difficulty Definitions
Easy Structured Documents (Must Meet All):
• Trivial visual aspects of a chart/table/infographic.
• Requires NO fine-grained object recognition (titles, large letters, easy trends).
• Requires NO complicated format understanding.
Math:
• The prompt requires 3 or fewer steps to solve
• An average high school student could solve the problem in under 2 minutes
• There are minimal constraints on how the model should solve the problem
Medium Structured Documents:
• Neither easy nor hard.
Math:
• The prompt requires 4-7 steps to solve
• An average high school student could solve the problem in 3-5 minutes.
• Some constraints are present (e.g. audience, tone, etc.)
Hard Structured Documents (Meets Any):
• Requires complicated format for language generation (e.g. multi-level bullets,
specific order of listing).
• Requires careful association of different visual aspects (multiple figures in the
same image, a flowchart together with a plot, etc).
• Requires visual-related professional knowledge (the professional knowledge
MUST be visual in nature for the prompt to be hard).
Math:
• The prompt requires 8 or more steps to solve
• An average high school student would take over 5 minutes to solve the problem.
Math Definitions
Integers and Rational Numbers: Properties of integers, fractions, decimals, and number lines.
Number and Factors and Multiples: Prime numbers, least common multiple (LCM), greatest common divisor
Operations (GCD).
Operations with Real Numbers: Addition, subtraction, multiplication, and division of integers,
fractions, and decimals.
Exponents and Roots: Laws of exponents, square roots, cube roots.
Scientific Notation: Expressing large and small numbers.
Absolute Value: Understanding and applying the concept of absolute value.
Percents: Converting between percents, fractions, and decimals; percentage increases and
decreases.
Ratio and Proportion: Solving problems involving ratios, rates, and proportional relationships.
Calculus Limits and Continuity: Understanding the concept of limits and when functions are continuous.
Derivatives: Definition and application of derivatives, product rule, quotient rule, and chain rule.
Applications of Derivatives: Understanding rates of change, slopes, optimization problems, and
motion analysis.
Integrals: Definite and indefinite integrals, the Fundamental Theorem of Calculus.
Applications of Integrals: Areas under curves, volumes of solids of revolution, accumulation
problems.
Algebra and Linear Equations and Inequalities: Solving and graphing single-variable equations and inequalities.
Functions Systems of Equations: Solving systems of linear equations using substitution, elimination, and
graphical methods.
Quadratic Equations: Solving quadratics using factoring, completing the square, and the quadratic
formula.
Polynomials: Adding, subtracting, multiplying, and factoring polynomials.
Exponential and Logarithmic Functions: Properties of exponents, logarithms, and solving related
equations.
Functions and Their Graphs: Understanding domain and range, graphing different types of functions
(linear, quadratic, etc.).
Absolute Value Functions: Solving and graphing absolute value equations.
Rational Expressions and Equations: Simplifying and solving rational expressions.
Piecewise Functions: Understanding and graphing piecewise-defined functions.
Inequalities: Solving inequalities.
Data Analysis, Measures of Central Tendency: Mean, median, mode, and range.
Statistics, and Measures of Dispersion: Variance, standard deviation, interquartile range.
Probability Data Representation: Interpreting data from tables, histograms, bar graphs, box plots, scatterplots.
Probability Rules: Basic probability, conditional probability, independence, and the Law of Total
Probability.
Combinations and Permutations: Counting techniques, factorials, and using them to solve probability
problems.
Normal Distribution: Understanding the bell curve, z-scores, and using normal distributions.
Regression Analysis: Linear regression, correlation coefficients, and fitting data to models.
Geometry Basic Geometric Shapes: Properties of triangles, quadrilaterals, polygons, and circles.
Congruence and Similarity: Criteria for congruent and similar figures, scale factors.
Coordinate Geometry: Slope, distance, midpoint, and equations of lines and circles.
3D Geometry: Cubes, spheres, cones, and cylinders, as well as the calculation of their surface areas
and volumes
Prompt Analyze the menu and identify the most expensive and least expensive
menu items.
Answer Most expensive: Grilled Chicken and Avocado
Least expensive: Bottle of water
Comments The prompt is asking for a comparison across several values, which fits the
“Factoid or Complex Question Answering” competency. The prompt has a
short answer that is objective, and can be assessed without complex spatial
relationships or detailed reasoning.
Prompt Generate a Markdown table that shows the total number of dogs in each
continent, based on summation of countries that are on the same
continent from the graphic.
Answer | Continent | Number of Dogs |
| ----- | ----- |
| Asia | 154M |
| North America | 89M |
Comments This prompt requests a Markdown table, which is a required
element for the “Document and Graph Understanding” capability
category. There is a minor amount of logic and external knowledge
required for the prompt – combining countries from the same
continent. However, the complexity of the output is very simple,
and the visual elements used for assessment do not require
complicated spatial relationships, so this prompt falls in the
Medium difficulty category.
Prompt Treating the blue and orange lines as independent trends over time
with matching units in both the x and y axes, determine the
equation of best linear fit for each trend, the R-squared value
rounded to the nearest thousandth for each fit, and identify which
fit has greater confidence based on these results. Present your
answer in paragraph format with proper sentences, using LaTeX for
equations. Assume that each tick on the x axis is separated by 5
units.
Answer The linear fit for the orange trend is \(y=-6.5x+187.5\) and the R-
squared is 0.966. The linear fit for the blue trend is \(y=-3.5x+150\)
and the R-squared is 0.28. The fit for the orange trend has a higher
confidence due to the greater R-squared value.
Comments Identification of trends is a clear use-case for the “Diagram
Reasoning” capability. The prompt requesting multiple calculations
(2 different linear fits and related R-squared values) adds more
complex calculations to the prompt, and the request for LaTeX
adds an additional constraint on presentation.
Prompt Based on the above invoice, assuming that the buyer and seller are
both in the state of Massachusetts, what is the percentage of sales
tax being collected, to the nearest 0.1%?
Answer Based on the invoice and assuming that the transaction is in the
state of Massachusetts, the sales tax is 27.1%.
Comments The prompt is asking for a minor factoid based on the information
presented in the invoice. It requires professional knowledge that
sales tax is not collected on shipping in the state of
Massachusetts, which pushes this into the “Hard” prompt
category.
Prompt Based on the nested donut chart above, what is the relative change
in the likelihood that a randomly-selected individual from the
referenced event was in 3rd class versus 1st class?
Answer The relative change that a random individual was in 3rd class
versus 1st class is +1.18.
Comments By asking for a combination of information across the nested
charts to be used for calculation, this graph can then be
interpreted as 2 independent but connected graphs. The
requirement for a challenging reasoning step to determine the
relative change makes this a “Hard” prompt.