Prompt Engineering
Prompt Engineering
This guide shares strategies and tactics for getting better results from large language
models (sometimes referred to as GPT models) like GPT-4. The methods described
here can sometimes be deployed in combination for greater effect. We encourage
experimentation to find the methods that work best for you.
Some of the examples demonstrated here currently work only with our most capable
model, gpt-4. In general, if you find that a model fails at a task and a more capable
model is available, it's often worth trying again with the more capable model.
Tactics:
Tactics:
Tactics:
Use intent classification to identify the most relevant instructions for a user
query
For dialogue applications that require very long conversations, summarize or
filter previous dialogue
Summarize long documents piecewise and construct a full summary
recursively
Tactics:
Instruct the model to work out its own solution before rushing to a conclusion
Use inner monologue or a sequence of queries to hide the model's reasoning
process
Ask the model if it missed anything on previous passes
Tactics:
Tactic:
Tactics
Each of the strategies listed above can be instantiated with specific tactics. These
tactics are meant to provide ideas for things to try. They are by no means fully
comprehensive, and you should feel free to try creative ideas not represented here.
Who’s president? Who was the president of Mexico in 2021, and how frequently
are elections held?
SYSTEM
When I ask for help to write something, you will reply with a document that contains
at least one joke or playful comment in every paragraph.
USER
Write a thank you note to my steel bolt vendor for getting the delivery in on time and
in short notice. This made it possible for us to deliver an important order.
Open in Playground
Tactic: Use delimiters to clearly indicate distinct parts of the input
Delimiters like triple quotation marks, XML tags, section titles, etc. can help
demarcate sections of text to be treated differently.
USER
Open in Playground
SYSTEM
You will be provided with a pair of articles (delimited with XML tags) about the same
topic. First summarize the arguments of each article. Then indicate which of them
makes a better argument and explain why.
USER
Open in Playground
SYSTEM
You will be provided with a thesis abstract and a suggested title for it. The thesis title
should give the reader a good idea of the topic of the thesis but should also be
eye-catching. If the title does not meet these criteria, suggest 5 alternatives.
USER
Open in Playground
For straightforward tasks such as these, using delimiters might not make a difference
in the output quality. However, the more complex a task is the more important it is to
disambiguate task details. Don’t make the model work to understand exactly what
you are asking of them.
SYSTEM
Step 1 - The user will provide you with text in triple quotes. Summarize this text in
one sentence with a prefix that says "Summary: ".
Step 2 - Translate the summary from Step 1 into Spanish, with a prefix that says
"Translation: ".
USER
Open in Playground
USER
ASSISTANT
The river that carves the deepest valley flows from a modest spring; the grandest
symphony originates from a single note; the most intricate tapestry begins with a
solitary thread.
USER
Open in Playground
USER
USER
Open in Playground
USER
Open in Playground
SYSTEM
Use the provided articles delimited by triple quotes to answer questions. If the
answer cannot be found in the articles, write "I could not find an answer."
USER
Given that all models have limited context windows, we need some way to
dynamically lookup information that is relevant to the question being asked.
Embeddings can be used to implement efficient knowledge retrieval. See the tactic
"Use embeddings-based search to implement efficient knowledge retrieval" for more
details on how to implement this.
SYSTEM
You will be provided with a document delimited by triple quotes and a question. Your
task is to answer the question using only the provided document and to cite the
passage(s) of the document used to answer the question. If the document does not
contain the information needed to answer this question then simply write:
"Insufficient information." If an answer to the question is provided, it must be
annotated with a citation. Use the following format for to cite relevant passages
({"citation": …}).
USER
Open in Playground
Suppose for example that for a customer service application, queries could be
usefully classified as follows:
SYSTEM
You will be provided with customer service queries. Classify each query into a
primary category and a secondary category. Provide your output in json format with
the keys: primary and secondary.
USER
Open in Playground
Based on the classification of the customer query, a set of more specific instructions
can be provided to a model for it to handle next steps. For example, suppose the
customer requires help with "troubleshooting".
SYSTEM
You will be provided with customer service inquiries that require troubleshooting in a
technical support context. Help the user by:
- Ask them to check that all cables to/from the router are connected. Note that it is
common for cables to come loose over time.
- If all cables are connected and the issue persists, ask them which router model
they are using
- Now you will advise them how to restart their device:
-- If the model number is MTD-327J, advise them to push the red button and hold it
for 5 seconds, then wait 5 minutes before testing the connection.
-- If the model number is MTD-327S, advise them to unplug and replug it, then wait 5
minutes before testing the connection.
- If the customer's issue persists after restarting the device and waiting 5 minutes,
connect them to IT support by outputting {"IT support requested"}.
- If the user starts asking questions that are unrelated to this topic then confirm if
they would like to end the current chat about troubleshooting and classify their
request according to the following scheme:
Open in Playground
Notice that the model has been instructed to emit special strings to indicate when the
state of the conversation changes. This enables us to turn our system into a state
machine where the state determines which instructions are injected. By keeping
track of state, what instructions are relevant at that state, and also optionally what
state transitions are allowed from that state, we can put guardrails around the user
experience that would be hard to achieve with a less structured approach.
SYSTEM
USER
Problem Statement: I'm building a solar power installation and I need help working
out the financials.
- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost me a flat $100k per year, and
an additional $10 / square foot
What is the total cost for the first year of operations as a function of the number of
square feet.
ASSISTANT
Open in Playground
But the student's solution is actually not correct! We can get the model to
successfully notice this by prompting the model to generate its own solution first.
SYSTEM
First work out your own solution to the problem. Then compare your solution to the
student's solution and evaluate if the student's solution is correct or not. Don't decide
if the student's solution is correct until you have done the problem yourself.
USER
Problem Statement: I'm building a solar power installation and I need help working
out the financials.
- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost me a flat $100k per year, and
an additional $10 / square foot
What is the total cost for the first year of operations as a function of the number of
square feet.
ASSISTANT
The student's solution is incorrect. They made an error in the maintenance cost
calculation, using 100x instead of 10x. The correct total cost for the first year of
operations is 360x + 100,000.
Open in Playground
Inner monologue is a tactic that can be used to mitigate this. The idea of inner
monologue is to instruct the model to put parts of the output that are meant to be
hidden from the user into a structured format that makes parsing them easy. Then
before presenting the output to the user, the output is parsed and only part of the
output is made visible.
SYSTEM
Follow these steps to answer the user queries.
Step 1 - First work out your own solution to the problem. Don't rely on the student's
solution since it may be incorrect. Enclose all your work for this step within triple
quotes (""").
Step 2 - Compare your solution to the student's solution and evaluate if the student's
solution is correct or not. Enclose all your work for this step within triple quotes (""").
Step 3 - If the student made a mistake, determine what hint you could give the
student without giving away the answer. Enclose all your work for this step within
triple quotes (""").
Step 4 - If the student made a mistake, provide the hint from the previous step to the
student (outside of triple quotes). Instead of writing "Step 4 - ..." write "Hint:".
USER
Open in Playground
Alternatively, this can be achieved with a sequence of queries in which all except the
last have their output hidden from the end user.
First, we can ask the model to solve the problem on its own. Since this initial query
doesn't require the student’s solution, it can be omitted. This provides the additional
advantage that there is no chance that the model’s solution will be biased by the
student’s attempted solution.
USER
Open in Playground
Next, we can have the model use all available information to assess the correctness
of the student’s solution.
SYSTEM
Compare your solution to the student's solution and evaluate if the student's solution
is correct or not.
USER
Open in Playground
Finally, we can let the model use its own analysis to construct a reply in the persona
of a helpful tutor.
SYSTEM
You are a math tutor. If the student made an error, offer a hint to the student in a way
that does not reveal the answer. If the student did not make an error, simply offer
them an encouraging comment.
USER
Open in Playground
SYSTEM
You will be provided with a document delimited by triple quotes. Your task is to select
excerpts which pertain to the following question: "What significant paradigm shifts
have occurred in the history of artificial intelligence."
Ensure that excerpts contain all relevant context needed to interpret them - in other
words don't extract small snippets that are missing important context. Provide output
in JSON format as follows:
[{"excerpt": "..."},
...
{"excerpt": "..."}]
USER
ASSISTANT
Are there more relevant excerpts? Take care not to repeat excerpts. Also ensure that
excerpts contain all relevant context needed to interpret them - in other words don't
extract small snippets that are missing important context.
Open in Playground
A text embedding is a vector that can measure the relatedness between text strings.
Similar or relevant strings will be closer together than unrelated strings. This fact,
along with the existence of fast vector search algorithms means that embeddings
can be used to implement efficient knowledge retrieval. In particular, a text corpus
can be split up into chunks, and each chunk can be embedded and stored. Then a
given query can be embedded and vector search can be performed to find the
embedded chunks of text from the corpus that are most related to the query (i.e.
closest together in the embedding space).
Example implementations can be found in the OpenAI Cookbook. See the tactic
“Instruct the model to use retrieved knowledge to answer queries” for an example of
how to use knowledge retrieval to minimize the likelihood that a model will make up
incorrect facts.
SYSTEM
You can write and execute Python code by enclosing it in triple backticks, e.g.
```code goes here```. Use this to perform calculations.
USER
Find all real-valued roots of the following polynomial: 3*x**5 - 5*x**4 - 3*x**3 - 7*x -
10.
Open in Playground
Another good use case for code execution is calling external APIs. If a model is
instructed in the proper use of an API, it can write code that makes use of it. A model
can be instructed in how to use an API by providing it with documentation and/or
code samples showing how to use the API.
SYSTEM
You can write and execute Python code by enclosing it in triple backticks. Also note
that you have access to the following module to help users send messages to their
friends:
```python
import message
Open in Playground
WARNING: Executing code produced by a model is not inherently safe and
precautions should be taken in any application that seeks to do this. In
particular, a sandboxed code execution environment is needed to limit the
harm that untrusted code could cause.
Evaluation procedures (or "evals") are useful for optimizing system designs. Good
evals are:
30% ~10
10% ~100
3% ~1,000
1% ~10,000
Model-based evals can be useful when there exists a range of possible outputs that
would be considered equally high in quality (e.g. for questions with long answers).
The boundary between what can be realistically evaluated with a model-based eval
and what requires a human to evaluate is fuzzy and is constantly shifting as models
become more capable. We encourage experimentation to figure out how well
model-based evals can work for your use case.
SYSTEM
You will be provided with text delimited by triple quotes that is supposed to be the
answer to a question. Check if the following pieces of information are directly
contained in the answer:
Finally, provide a count of how many "yes" answers there are. Provide this count as
{"count": <insert count here>}.
SYSTEM
USER
"""Neil Armstrong is famous for being the first human to set foot on the Moon. This
historic event took place on July 21, 1969, during the Apollo 11 mission."""
Open in Playground
SYSTEM
USER
"""Neil Armstrong made history when he stepped off the lunar module, becoming the
first person to walk on the moon."""
Open in Playground
Here's an example input where none are satisfied:
SYSTEM
USER
Open in Playground
There are many possible variants on this type of model-based eval. Consider the
following variation which tracks the kind of overlap between the candidate answer
and the gold-standard answer, and also tracks whether the candidate answer
contradicts any part of the gold-standard answer.
SYSTEM
Use the following steps to respond to user inputs. Fully restate each step before
proceeding. i.e. "Step 1: Reason...".
Step 1: Reason step-by-step about whether the information in the submitted answer
compared to the expert answer is either: disjoint, equal, a subset, a superset, or
overlapping (i.e. some intersection but not subset/superset).
Step 2: Reason step-by-step about whether the submitted answer contradicts any
aspect of the expert answer.
SYSTEM
USER
Question: """What event is Neil Armstrong most famous for and on what date did it
occur? Assume UTC time."""
Expert Answer: """Neil Armstrong is most famous for being the first person to walk on
the moon. This historic event occurred on July 21, 1969."""
Open in Playground
Here's an example input with answer that directly contradicts the expert answer:
SYSTEM
USER
Question: """What event is Neil Armstrong most famous for and on what date did it
occur? Assume UTC time."""
Submitted Answer: """On the 21st of July 1969, Neil Armstrong became the second
person to walk on the moon, following after Buzz Aldrin."""
Expert Answer: """Neil Armstrong is most famous for being the first person to walk on
the moon. This historic event occurred on July 21, 1969."""
Open in Playground
Here's an example input with a correct answer that also provides a bit more detail
than is necessary:
SYSTEM
USER
Question: """What event is Neil Armstrong most famous for and on what date did it
occur? Assume UTC time."""
Submitted Answer: """At approximately 02:56 UTC on July 21st 1969, Neil Armstrong
became the first human to set foot on the lunar surface, marking a monumental
achievement in human history."""
Expert Answer: """Neil Armstrong is most famous for being the first person to walk on
the moon. This historic event occurred on July 21, 1969."""
Open in Playground