100% found this document useful (1 vote)
179 views

Exploring The Security Risks of Using Large Language Models

Exploring the Security Risks of Using Large Language Models
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
179 views

Exploring The Security Risks of Using Large Language Models

Exploring the Security Risks of Using Large Language Models
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Practical Guide

Exploring the
Risks of Using
Large Language
Models

www.brightsec.com
Introduction

As machine learning algorithms continue to evolve, Large Language Models

(LLMs) like GPT-4 have gained immense popularity. While these models hold great

promise in revolutionizing various industries—ranging from content generation

and customer service to research and development—they also come with their

own set of risks and ethical concerns. In this paper, we will comprehensively

examine the risks associated with using LLMs.

Large language models like ChatGPT are susceptible to various kinds of attacks

against their infrastructure, which can range from exploiting software

vulnerabilities to social engineering tactics.

This is becoming more critical by the day as more and more software

development is actually done with the help of AI LLM tools such as GitHub’s Co-

pilot or Amazon’s CodeWhisperer. A successful attack on such an LLM application,

can lead to chain reaction causing engineering disruption across thousands of

organizations, where developers rely on them to generate code.

By shedding light on the

potential risks of LLMs

through the prism of the Top


Al tools in the
10 list for LLM vulnerabilities,

this paper aims to contribute


development process
to a more informed and
70% of all respondents are using or cautious approach to
are planning to use Al tools in their leveraging this technology.
development process this year. Those Addressing these risks is not
learning to code are more likely than just the responsibility of the
professional developers to be using or researchers and developers
use Al tools (82% vs. 70%). but is a collective

responsibility that society

must undertake. Only through

such a holistic approach can

we hope to mitigate the risks

while maximizing the benefits

of this transformative

technology.
This white paper will cover the
following topics:

How LLMs are buil


The key vectors for attacking LLM
Attacking the model directl
Attacking the infrastructur
Attacking the applicatio
Examples of the attack vectors using the LLM Owasp Top 1
Ethical considerations when using LLM
Summary
o are LLM
H w Training:

applications built Feeding the processed data into the neural


network, which learns to predict the next word
in a sequence, understand context, and
generate coherent text. This training process
Large language models (LLMs) are built using
requires substantial computational resources
advanced machine learning techniques,
and time.
particularly deep learning. The construction
process involves several key steps:
Fine-Tuning:
Data Collection: Adjusting the model with additional training,
often on a more specialized dataset, to improve
Gathering a vast and diverse dataset of text performance on specific tasks or domains.
from books, websites, articles, and other
written media.
Evaluation:
Data Processing: Testing the model's performance on a set of
tasks to ensure it generates accurate and
Cleaning and organizing the data to remove coherent responses.
errors, inconsistencies, and irrelevant
information, and sometimes annotating it to aid
Iteration:
the model's learning.

Refining the model through multiple iterations


Model Design: of training and evaluation to enhance its
capabilities.
Selecting an appropriate neural network
architecture, like the Transformer model, which
has proven effective for handling sequential
data like text.

MACHINE LEARNING DEVELOPMENT LIFECYCLE

Feedback
Exploration & Validation Wrangling (cleaning)
New Data From Model
Inference Monitoring & Logging

ULM Prompt Injection


DATA data versioning

Train Test
model versioning
code code
Model Engineering
param
Model Evaluation
param
Model Packaging MODEL
Model Format
Model Serving

- pKI. ONNK fax - service, Docker, K8s
Feature Engineering
 Best Model Selection

Hyperparameters Tuning Model Performance Metric ODE uild & Integration Testing Deploument Dev to Production
accurac F1
C B

precision
code versioning

The result is a highly sophisticated model capable of understanding and generating human like -
text, answering questions, translating languages, and performing various other language related -
tasks.
LLM Attack Vectors Introduction to the
OWASP Top 10
As can be seen in the diagram above,
Vulnerabilities for
there are three key vectors that can
be attacked in the overall LLM. There
LLMs
are:
The Open Web Application Security Project
Attack the LLM Model directly (OWASP), a respected authority in web
security, has compiled a critical list of the top
10 vulnerabilities frequently encountered in
Attack the infrastructure and Large Language Model (LLM) applications. This
integrations list serves as an authoritative guide, shedding
light on the severity, exploitability, and
Attack the application commonality of each vulnerability. Notable
risks such as prompt injections, data exposure,
insufficient sandboxing, and unauthorized code
execution are detailed, illustrating the array of
security challenges that LLM applications may
This curated list is more than a simple catalog face.
of vulnerabilities; it is an educational tool aimed
at a broad audience, from developers and This curated list is more than a simple catalog
designers to architects and organizational of vulnerabilities; it is an educational tool aimed
leaders. Its purpose is to enhance the collective at a broad audience, from developers and
understanding of the security vulnerabilities designers to architects and organizational
inherent in the deployment and operation of leaders. Its purpose is to enhance the collective
LLMs. By bringing these issues to the forefront, understanding of the security vulnerabilities
OWASP not only raises awareness but also inherent in the deployment and operation of
provides valuable remediation tactics and LLMs. By bringing these issues to the forefront,
strategic advice designed to fortify the security OWASP not only raises awareness but also
framework of LLM applications. provides valuable remediation tactics and
strategic advice designed to fortify the security
framework of LLM applications.
Vulnerability Description

This manipulates a large language model


(LLM) through crafty inputs, causing
Prompt injections unintended actions by the LLM. Direct
injections overwrite system prompts, while
indirect ones manipulate inputs from external
sources

This vulnerability occurs when an LLM output


is accepted without scrutiny, exposing
Insecure Output Handling backend systems. Misuse may lead to severe
consequences like XSS, CSRF, SSRF,
privilege escalation, or remote code
execution.

This occurs when LLM training data is


tampered,introducing vulnerabilities or biases
Training Data Poisoning that compromise security, effectiveness, or
ethical behavior. Sources include Common
Crawl, WebText, OpenWebText, & books.

Attackers cause resource-heavy operations


on LLMs, leading to service degradation or
Model Denial of Service high costs. The vulnerability is magnified due
to the resource-intensive nature of LLMs and
unpredictability of user inputs.

LLM application lifecycle can be


compromised by vulnerable components or
Supply Chain Vulnerabilities services, leading to security attacks. Using
third-party datasets, pre- trained models,
and plugins can add vulnerabilities.

LLMs may inadvertently reveal confidential


data in its responses, leading to unauthorized
Sensitive Information Disclosure data access, privacy violations, and security
breaches. It's crucial to implement data
sanitization and strict user policies to
mitigate this.
Vulnerability Description

LLM plugins can have insecure inputs and


insufficient access control. This lack of
Insecure Plugin Design application control makes them easier to
exploit and can result in consequences like
remote code execution.

LLM-based systems may undertake actions


leading to unintended consequences. The
Excessive Agency issue arises from excessive functionality,
permissions, or autonomy granted to the
LLM-based systems.

Systems or people overly dependent on


LLMs without oversight may face
Overreliance misinformation, miscommunication, legal
issues, and security vulnerabilities due to
incorrect or inappropriate content generated
by LLMs.

This involves unauthorized access, copying,


or exfiltration of proprietary LLM models. The
Model Theft impact includes economic losses,
compromised competitive advantage, and
potential access to sensitive information.
The ramifications of a successful jailbreak

Attacking the LLM attack on a GenAI system like an LLM are

grave. Such an attack not only undermines the

Model Directly with pre-established safety measures that inhibit

the model from carrying out harmful

Prompt Injection commands, but it also effectively erases the

demarcation between permissible and

impermissible actions for the AI system.

Number 1 on the OWASP Top 10 for LLMs is


In the absence of these safeguards, the model
prompt injections. The most common attack
becomes a potential tool for nefarious
vector is to attack the LLM directly with inputs
activities, lacking any internal checks to
that will circumvent the LLM’s safeguards. The
prevent it from executing dangerous or
term "jailbreak" or "prompt injection" refers to a
unethical tasks as directed by the attacker.
sophisticated technique that manipulates
The dissolved boundary between acceptable
Language Learning Models (LLMs) like
and unacceptable use exposes a significant
ChatGPT-4 into disseminating information or
vulnerability, making the system susceptible to
instructions that are illegal, unethical, or
further exploitation that could lead to real-
contravene societal norms and values. 

world consequences.

Within a mere two hours of ChatGPT-4's


By neutralizing the safety mechanisms
release in March 2023, cybersecurity experts
designed to curtail such behaviors, a
and malicious actors alike successfully
successful jailbreak attack transforms an
executed prompt injection attacks. These
otherwise useful and compliant AI system into
attacks led the AI system to furnish an array of
an unwitting accomplice for a host of illicit
unsettling instructions for activities that are
activities. Therefore, it's crucial for developers
socially unacceptable, unethical and potentially
and cybersecurity experts to continually
dangerous by spouting homophobic
update and fortify the safety features that
statements, creating phishing emails, and
regulate LLMs, as well as to monitor for new
supporting violence.


types of threats and vulnerabilities that could

compromise the integrity and intended utility


Since that initial breach, a growing number of
of these advanced AI systems.
threat actors have honed their methods for

exploiting LLMs. They employ an intricate mix

of tactics, including meticulously crafted role-

playing scenarios, predictive text manipulation,

reverse psychology, and other linguistic

gambits. These techniques aim to sidestep the

internal content filters and control mechanisms

designed to regulate an LLM's responses,

thereby tricking the system into violating its

own safety protocols.


Real World Example
LLMs analyze the context in which words or phrases are used, in order to filter out potentially
harmful content. For example, a prompt for the ingredients to make napalm, a flammable liquid
used in warfare, will result in a polite error message from the safeguards.

Adjusting the prompt structure can inadvertently lead the LLM to


reveal sensitive information. For instance, framing a request for the
LLM to populate a JSON object with certain data could result in the
model unintentionally listing the components of napalm. This
underscores the need for robust content moderation mechanisms
to prevent the disclosure of harmful or sensitive information.
Mitigating Prompt Infrastructure
Injection Attacks Attacks
Prompt injection attacks on large language
models (LLMs) like GPT models can be a
Inadequate Sandboxing
significant concern, as they involve
manipulating the model into generating Sandboxing is a security mechanism that runs
unintended or harmful responses. To mitigate code in a restricted environment, limiting its
these risks, various strategies can be access to the rest of the system and data.

implemented including this subset:


A poorly isolated LLM that interacts with
external resources, infrastructure or sensitive
Common vulnerabilities associated with systems opens the door to a multitude of
insufficient sandboxing in LLMs include: security risks, ranging from unauthorized
access to unintended actions executed by the
Input Sanitization and Filtering: LLM itself.

Implement robust input validation to detect


and filter out malicious or suspicious Common vulnerabilities associated with
patterns in user input insufficient sandboxing in LLMs include:

Use regex patterns to identify and block Lack of Environmental Segregation:


known malicious prompt structures
Failing to isolate the LLM environment from
Employ natural language processing
other essential systems or data repositories
techniques to understand the context and poses a risk of cross-system exploitation.
intent of the input and block prompts that
may lead to undesirable outputs
Inadequate Access Controls:
Contextual Understanding and
Response Limitation:
Absence of stringent restrictions can grant the
LLM unwarranted access to sensitive
Enhance the model's ability to understand resources, amplifying the potential for abuse.
the context of a prompt better and limit
responses based on ethical guidelines or Unrestricted System-Level Interactions:
safety rules
When an LLM is capable of performing system-
Develop mechanisms within the model to level actions or communicating with other
detect when it's being prompted to processes, it becomes a ripe target for
generate outputs that violate pre-set exploitation.
guidelines and refuse to generate such
responses.

Use of Safelists and Blocklists:


Create safelists (allowlists) of acceptable
topics or commands and blocklists of
prohibited content
Regularly update these lists based on
evolving trends and newly identified threats.
Real-world Attack
Preventative measures
Scenarios:
include:

Granting a LLM too much access to a


Isolate the LLM environment from other
filesystem can lead to a myriad of serious critical systems and resources.

complications. Such unfettered access


heightens the risk of data privacy breaches as
Restrict the LLM's access to sensitive
the model may access and read sensitive files. resources and limit its capabilities to the

There's a real danger of intellectual property minimum required for its intended purpose.

theft if it can view or interact with proprietary


data and code. Security credentials like Regularly audit and review the LLM's

environment and access controls to ensure


passwords and API keys, often stored on file
that proper isolation is maintained.
systems, could be inadvertently disclosed,
posing significant security risks. 

The integrity of data is also at stake, with the


potential for the LLM to alter or delete crucial
files, causing data corruption or complete loss. Server-Side Request
This scenario may also bring about compliance Forgery
issues, as excessive access rights can
contravene stringent regulatory standards,
inviting legal troubles and possible fines.
Server-Side Request Forgery (SSRF) presents a
critical threat vector that targets the
Furthermore, should the LLM fall prey to a
infrastructure of LLMs. This type of
cyberattack, this extensive access can be
vulnerability arises when an attacker coaxes an
leveraged by malicious actors to inflict broader
LLM into initiating unauthorized requests or
damage on the system or network. Addressing
accessing restricted resources, such as internal
these vulnerabilities necessitates a strict
services, APIs, or secured data stores.

adherence to the principle of least privilege,


limiting the LLM's access strictly to what is
necessary for its function.
Given the operational design of LLMs, which
frequently necessitates making external
requests for data retrieval or service
interaction, they inherently become susceptible
to SSRF exploits. These attacks can leverage
the model's functionality to breach security
perimeters, undermining the integrity and
confidentiality of the system.
Common SSRF
Vulnerabilities in Preventative
Measures:
LLMs
Strict Input Validation and
Sanitization
As with many attacks, inadequate input
validation allows attackers to exploit the LLM Implement strict validation mechanisms to filter out
by crafting prompts that initiate unauthorized malicious or unexpected prompts that could initiate
unauthorized requests.
actions or data requests. Additionally, security
misconfigurations or errors in network or
application security settings can inadvertently Security Audits and
expose internal resources to the LLM, widening Configuration Reviews:
the attack surface. Regularly assess network and application security
settings to confirm that internal resources remain
shielded from the LLM.

Network Segmentation:
Attack Scenarios
Isolate the LLM from sensitive internal resources to
Include: minimize the potential damage from an SSRF attack.

Unauthorized Access: Monitoring and Alerting:

An attacker could craft a prompt designed to Implement real-time monitoring to quickly detect
instruct the LLM to request data from an internal and respond to unusual or unauthorized activities.
service. This can bypass established access
controls, enabling unauthorized access to system
files to which the LLM has access, potentially Least Privilege Access:
leaking sensitive information.
Limit what the LLM can do and access, both in terms
of data and actions, to minimize the impact of an
API Exploitation: attack.

A security misconfiguration may leave an API


vulnerable, allowing the LLM to interact with it. An
attacker could exploit this to access or modify
sensitive data.
To summarize, infrastructure attacks on LLMs
through inadequate sandboxing exploit
Databases: insufficient isolation mechanisms to carry out
unauthorized actions. If an LLM is not
If the LLM has any form of database access, an adequately sandboxed, it may inadvertently
SSRF attack could be used to read, modify, or delete
data. have more access to system resources than
intended. Attackers can take advantage of this
oversight to execute commands, access
sensitive data, or interact with internal systems
beyond the scope of the LLM's intended
functionality. 

These breaches can lead to significant security


incidents, including data exfiltration, system
compromise, and operational disruption. To
mitigate such risks, it's essential to enforce
strict sandboxing policies, ensure rigorous
access controls, and maintain a clear boundary
between the LLM operations and other critical
system components.
Attacking the
Application Preventative measures
include:
Large Language Models (LLMs) themselves, as
machine learning constructs, are not directly Content Security Policy (CSP):
vulnerable to traditional web-based attacks
Implementing a Content Security Policy is a powerful
such as Cross-Site Scripting (XSS). XSS defense against XSS attacks. CSP allows the
attacks typically exploit vulnerabilities in web definition of a set of rules for the browser to follow,
applications that do not properly sanitize user specifying which resources are allowed to be
loaded. It can help mitigate the impact of XSS by
input, allowing attackers to inject malicious preventing the execution of scripts from
scripts into web pages viewed by other users.
unauthorized sources.

However, if an LLM is integrated into a web


application in a way that involves processing Regular Model Updating and
user input (such as generating responses Patching:
based on user-provided text), and if the web Continuously update and patch the model to
application is not properly handling or address any emerging vulnerabilities. Incorporate
sanitizing that input, it is conceivable for XSS to new data and learning to keep the model resilient
against evolving attack vectors.
become a concern.

For example: Monitoring and Logging:


Implement comprehensive monitoring and logging to
XSS via LLM Responses: keep track of how the model is being used. Analyze
logs for signs of misuse or attempted attacks.
If an LLM is used to generate content that is
directly displayed on a web page, and the Ethical Guidelines and Usage
content includes user-supplied input without Policies:
proper sanitization, it could be manipulated to
include malicious scripts that lead to XSS when Establish clear ethical guidelines and usage policies
for the model. Ensure users are aware of these
rendered in a user's browser. guidelines and understand the importance of
adhering to them.
Reflection of Malicious Input:
If an LLM echoes back parts of user input in its
responses, and those responses are
incorporated into web pages without proper
encoding, an attacker could craft input that
results in an XSS attack when the LLM's
response is displayed.
In both scenarios, the LLM is not the direct
target or source of the XSS vulnerability; rather,
it's the web application's handling of the LLM's
output that creates the XSS risk. It's critical to
ensure that any application using an LLM to
process or display content in a web
environment properly sanitizes both input to
and output from the LLM to prevent XSS and
other injection-based attacks.
Legal Concerns with Privacy Violations: If an LLM is not designed
with privacy considerations in mind, it may
inadvertently generate content that reveals
LLMs personal data, violating user privacy and
potentially leading to identity theft or
doxxing incident
At the time of this paper was written, laws were
being enacted regarding the safe usage of Content Manipulation for Deception: LLMs
LLMs. This includes an Executive Order (EO) can be used to create convincing fake
from President Joe Biden encompassing a wide content, such as deepfakes or fraudulent
spectrum of crucial topics, spanning from communications, which can be used in
addressing algorithmic bias and safeguarding phishing attacks or to deceive individuals
privacy to establishing regulations for the for various malicious purposes
safety of advanced AI models known as Facilitation of Illegal Activities: An LLM could
'frontier models,' such as GPT-4, Bard, and be prompted to provide advice or
Llama 2. 
instructions on performing illegal activities,
The EO directs various government agencies to like hacking into secure systems, creating
establish dedicated domains of AI regulation harmful substances, or evading laws.
within the upcoming year. Additionally, it
features a section dedicated to promoting open It's important to note that the development of
development in AI technologies, nurturing ethical guidelines and the implementation of
innovations in AI security, and leveraging AI robust content moderation and filtering
tools to enhance overall security measures. systems are crucial in preventing such
unethical behavior by LLMs. 

To summarize, LLM prompt injection attacks


Ethical Concerns involve manipulating the input prompts
provided to the model to elicit specific
with LLMs responses that may include dangerous, illegal,
or unethical content. These attacks aim to
bypass internal content filters and controls,
exploiting the model's capabilities to generate
Unethical behavior by a large language model responses that align with the attacker's
(LLM) can manifest in various ways, depending objectives. 

on how the model is used or misused, and the


content it generates. Here are some examples: By injecting carefully crafted and deceptive
prompts, threat actors can potentially deceive
the LLM into generating harmful outputs,
Bias Propagation: If an LLM is trained on posing risks to individuals, organizations, and
biased data, it may produce outputs that societal norms. Effective defense against these
perpetuate stereotypes or discriminatory attacks requires robust content filtering
viewpoints. For example, it could associate mechanisms, continuous monitoring of model
certain jobs or activities with a specific behavior, and proactive measures to detect and
gender or ethnic group, reinforcing harmful prevent the generation of harmful or misleading
societal biases outputs.
Misinformation Spread: An LLM could be
manipulated to generate and disseminate
false information, contributing to the spread
of misinformation or deepening social and
political divides.
Conclusion

In conclusion, as we navigate the complex interplay between technological

innovation and security, the significance of safeguarding LLMs from

emerging threats cannot be overstated. This white paper has introduced the

anatomy of risks associated with LLMs, spotlighting the vulnerabilities

outlined by the authoritative OWASP Top 10 list. From prompt injections to

SSRF, these vulnerabilities are not mere glitches but profound gaps that

could be exploited to unleash a cascade of adverse events in an increasingly

AI-integrated software landscape. 

These vulnerabilities could serve as conduits for nefarious actors,

compromising not only the integrity of the LLMs but also the very

foundations of trust and reliability that underpin the burgeoning AI sector. In

the quest to harness the transformative power of AI, we must be ever

vigilant, ensuring that our advances are not sullied by lapses in security that

could erode public confidence or stifle innovation.

You might also like