On the 8th day of Christmas, we are announcing… Excessive Agency Test Suite! 🌲 As AI agents become more popular, we frequently hear that developers want to limit agent permissions and scope, in order to prevent end users who are attempting fraud and scams. Developers don’t want their agents accessing tools or taking autonomous actions that could put everything at risk. The OWASP Top 10 LLM Vulnerabilities list comprehensively captures all security failure modes, and Excessive Agency at #6 on the list is an important one to address during the agent build phase. That’s why we are releasing the Excessive Agency Test Suite. Our research team developed powerful generator models to create this comprehensive test suite, and identified that these new tests have high attack success rates against AI agents. ⚡ You can search for “owasp-llm06-excessive-agency” in Patronus Datasets to view and download the dataset, or access it remotely in code using the Patronus SDK. 🎉 Try it out here: https://app.patronus.ai
Patronus AI’s Post
More Relevant Posts
-
So many people are reposting about the Freysa AI Agent incident, but we are here to show how Lasso Security can prevent these cases from happening. Join us to the LLM wild west and learn how we Lasso such cases 🤠 #GenAISecurity #LLMSecurity #AgenticSecurity
If you are anything like me, your Linkedin and X feed in the last couple of days was full with “Someone Just Won $50,000 by Outsmarting an AI Agent!” “AI Agent hack caused 50K$ loss”. Let’s break it down from LLM Security POV: 1. The Freysa AI Agent incident is actually just a nice experiment with the single goal of collecting data and learning about prompt injections. Very similar to the good old Gandalf Challenge. 2. Unlike Gandalf (and many other examples) a few differences prevail: a. There was money involved (which makes it more interesting) b. It was an “agent” (with very limited capabilities) and not just a chatbot c. It was open sourced (you could see the system prompt and other players' attempts). 3. The fact that the Agent’s system prompt is visible is not that realistic, but from a security standpoint - you should always assume that your system prompt is exposed if not protected properly (also a new entry to OWASP top 10 for LLM - LLM07:2025 System Prompt Leakage). 4. The fact that we are talking about agents and not chatbots pose some additional challenges but the basics remain - Input and output handling are still essential and smart guardrails can prevent most of the unwanted behaviors by the model. In the example below you can see our Lasso’s Freysa experiment replication. We used the exact system prompt used by Freysa and sent the winning user input from the competition to various models. On the left side, the regular chat approves the transfer (as expected). On the right side, our protected chat, with Lasso’s guardrails block the request. Wanna learn more about #freysa, #PromptInjection, #LLMsecurity or #Agentsecurity? Let’s connect.
To view or add a comment, sign in
-
-
If you are anything like me, your Linkedin and X feed in the last couple of days was full with “Someone Just Won $50,000 by Outsmarting an AI Agent!” “AI Agent hack caused 50K$ loss”. Let’s break it down from LLM Security POV: 1. The Freysa AI Agent incident is actually just a nice experiment with the single goal of collecting data and learning about prompt injections. Very similar to the good old Gandalf Challenge. 2. Unlike Gandalf (and many other examples) a few differences prevail: a. There was money involved (which makes it more interesting) b. It was an “agent” (with very limited capabilities) and not just a chatbot c. It was open sourced (you could see the system prompt and other players' attempts). 3. The fact that the Agent’s system prompt is visible is not that realistic, but from a security standpoint - you should always assume that your system prompt is exposed if not protected properly (also a new entry to OWASP top 10 for LLM - LLM07:2025 System Prompt Leakage). 4. The fact that we are talking about agents and not chatbots pose some additional challenges but the basics remain - Input and output handling are still essential and smart guardrails can prevent most of the unwanted behaviors by the model. In the example below you can see our Lasso’s Freysa experiment replication. We used the exact system prompt used by Freysa and sent the winning user input from the competition to various models. On the left side, the regular chat approves the transfer (as expected). On the right side, our protected chat, with Lasso’s guardrails block the request. Wanna learn more about #freysa, #PromptInjection, #LLMsecurity or #Agentsecurity? Let’s connect.
To view or add a comment, sign in
-
-
Don't get us wrong, #AI is magic, but behind the curtain, LLMs are complex systems created by humans, which means there can be 𝙧𝙚𝙖𝙡 #security challenges. But no need to fret, the SURGe team is unveiling new research that shows how the OWASP Top 10 framework and Splunk can help better defend LLM-based apps and their users. Yep, head to #SplunkBlogs to learn the latest: https://splk.it/3WDbpIk #SplunkSecurity
To view or add a comment, sign in
-
-
I spent the night on a Adult site to help create AI content blocking to protect our children: Introducing PH Real-time AI Proxy Blocker I am developing a real-time proxy blocker that employs AI to filter out specific content rather than entire websites. This means you can safely use any site without compromising your security. Long-term plans include: - Training the model on convicted paedophiles' chat logs. - Integrating it with part of a VPN service. - Migrating the codebase to Rust. I will host this service soon, free of charge - as one cannot profit from child safety. code: https://lnkd.in/eZdGBsCw #NodeJs #buildinpublic #ChildSafety
To view or add a comment, sign in
-
”As increasingly powerful open source models for generating deepfakes and synthetic content continue to be released, expect to see growing debate on whether the current approaches to governing open source are fit for purpose.” #disinformation #deepfake #verification
Realtime deepfakes are evolving fast. Generated by Deep Live Cam (currently the top trending repo on Github), there's several reasons why this demo is both impressive and concerning. Reports of deepfake fraud have skyrocketed recently, with many leveraging live faceswapping for impersonation, biometric spoofing, and anonymisation. Just a few years ago, these tools were awkward to use and the outputs were typically of a low fidelity. Today, the landscape looks very different: 1️⃣ Occlusion, lighting consistency, and 3d face profiling are impressive in Deep Live Cam compared to other live faceswapping models, although the results aren't perfect- note the glasses 'popping' in and out. 2️⃣ Unlike some older models, Deep Live Cam requires just one image of a target to create a high quality live faceswap. Data aggregation, alignment, and model training were previously significant barriers to access. 3️⃣ The underlying model leverages GFP-GAN, originally designed for 'blind' restoration/upscaling of images of faces. A good reminder that despite diffusion models taking over in image generation, GANs still have their relative strengths. Crucially, this project is open-source and will likely be incorporated into no code platforms for launching models, making it even easier to deploy for non-technical users. The creator states users are "expected to use this software responsibly while abiding the local law" and that they have built in checks to prevent the tool being used sensitive content such as violence or nudity. However, this doesn't prevent the fraud uses cases mentioned above, or a determined bad actor from editing the codebase to remove these filters. As increasingly powerful open source models for generating deepfakes and synthetic content continue to be released, expect to see growing debate on whether the current approaches to governing open source are fit for purpose. https://lnkd.in/eBqkF65p #ai #cybersecurity #deepfakes #generativeai #opensource
To view or add a comment, sign in
-
-
Realtime deepfakes are evolving fast. Generated by Deep Live Cam (currently the top trending repo on Github), there's several reasons why this demo is both impressive and concerning. Reports of deepfake fraud have skyrocketed recently, with many leveraging live faceswapping for impersonation, biometric spoofing, and anonymisation. Just a few years ago, these tools were awkward to use and the outputs were typically of a low fidelity. Today, the landscape looks very different: 1️⃣ Occlusion, lighting consistency, and 3d face profiling are impressive in Deep Live Cam compared to other live faceswapping models, although the results aren't perfect- note the glasses 'popping' in and out. 2️⃣ Unlike some older models, Deep Live Cam requires just one image of a target to create a high quality live faceswap. Data aggregation, alignment, and model training were previously significant barriers to access. 3️⃣ The underlying model leverages GFP-GAN, originally designed for 'blind' restoration/upscaling of images of faces. A good reminder that despite diffusion models taking over in image generation, GANs still have their relative strengths. Crucially, this project is open-source and will likely be incorporated into no code platforms for launching models, making it even easier to deploy for non-technical users. The creator states users are "expected to use this software responsibly while abiding the local law" and that they have built in checks to prevent the tool being used sensitive content such as violence or nudity. However, this doesn't prevent the fraud uses cases mentioned above, or a determined bad actor from editing the codebase to remove these filters. As increasingly powerful open source models for generating deepfakes and synthetic content continue to be released, expect to see growing debate on whether the current approaches to governing open source are fit for purpose. https://lnkd.in/eBqkF65p #ai #cybersecurity #deepfakes #generativeai #opensource
To view or add a comment, sign in
-
-
Very soon we’ll be requiring a proof of personhood (unique human) on video calls given the pace at which open-source real-time deepfake tech is evolving. Check out this live video of fake Elon using Deep live Cam (top repo on GitHub) Perhaps combining solutions like Sam Altman backed worldcoin with content provenance tech (C2PA) can provide legitimacy & trust to verifiable content. For unverified content (no C2PA) deepfake detection tech can flag fakes in real-time. Would be impossible (commercially) to use deepfake tech on all live videos, there just isn’t enough financial incentive for that. Follow ContentLens for more updates on detecting manipulated, deepfake & AI generated content coupled with C2PA for content transparency Dhruv Suri Daksh Ramesh Chawla Ankit Sharma Digvijay Singh
Realtime deepfakes are evolving fast. Generated by Deep Live Cam (currently the top trending repo on Github), there's several reasons why this demo is both impressive and concerning. Reports of deepfake fraud have skyrocketed recently, with many leveraging live faceswapping for impersonation, biometric spoofing, and anonymisation. Just a few years ago, these tools were awkward to use and the outputs were typically of a low fidelity. Today, the landscape looks very different: 1️⃣ Occlusion, lighting consistency, and 3d face profiling are impressive in Deep Live Cam compared to other live faceswapping models, although the results aren't perfect- note the glasses 'popping' in and out. 2️⃣ Unlike some older models, Deep Live Cam requires just one image of a target to create a high quality live faceswap. Data aggregation, alignment, and model training were previously significant barriers to access. 3️⃣ The underlying model leverages GFP-GAN, originally designed for 'blind' restoration/upscaling of images of faces. A good reminder that despite diffusion models taking over in image generation, GANs still have their relative strengths. Crucially, this project is open-source and will likely be incorporated into no code platforms for launching models, making it even easier to deploy for non-technical users. The creator states users are "expected to use this software responsibly while abiding the local law" and that they have built in checks to prevent the tool being used sensitive content such as violence or nudity. However, this doesn't prevent the fraud uses cases mentioned above, or a determined bad actor from editing the codebase to remove these filters. As increasingly powerful open source models for generating deepfakes and synthetic content continue to be released, expect to see growing debate on whether the current approaches to governing open source are fit for purpose. https://lnkd.in/eBqkF65p #ai #cybersecurity #deepfakes #generativeai #opensource
To view or add a comment, sign in
-
-
#LLMSecurity #AIsecurity #GenAI #LLM We often confuse 'Prompt Injection' with 'Jailbreaking' and use them interchangeably. It is important to understand the basic difference between the two since they are NOT the same. Prompt Injection occurs because LLMs are unable to differentiate between user instructions and developer instructions. As a result, an attacker can use carefully crafted prompts to replace developer instructions with special untrusted user input. Jailbreaking involves bypassing the security controls put in place for the model. LLMs are designed to refuse to respond to prompts that request dangerous or sensitive information. By using cleverly crafted prompts, an adversary can trick the LLM into breaking those security rails and provide the dangerous information. P.S. Image is AI generated :-)
To view or add a comment, sign in
-
-
🛡️ AI Security Deep Dive #7 for #CyberSecurityMonth 🛡️ Today we’re diving into Insecure Plugin Design, it might sound a bit less exciting than "Prompt Injection" but the risk is still significant. An LLM plugin is anything bolted onto the LLM to allow it to enhance it's capabilities, usually looking data up from databases, browsing the web or executing code. If your LLM plugins are poorly designed, attackers can exploit them to gain unauthorized access, perform malicious actions, or even trigger code execution for their malicious code. Essentially, if a plugin doesn't properly validate its inputs, it opens the door to all kinds of nasty exploits. A real-world example of this vulnerability occurred recently with ChatGPT plugins*. In early 2023, a plugin vulnerability allowed malicious actors to perform cross-plugin request forgery, exploiting improperly validated inputs to access and exfiltrate data between plugins. This vulnerability demonstrated how plugins—when not properly isolated and secured—can compromise an entire LLM-based system and its users. The key to mitigating these risks is to enforce strict parameterized inputs, conduct thorough validation, and minimize plugin permissions to just what is necessary for its task. Implementing a zero-trust approach to plugin design can drastically reduce the impact of these vulnerabilities. 🔐 Keep an eye out for the next deep dive into Excessive Agency, where I'll explore how giving too much autonomy to LLM systems can have unintended, often dangerous consequences. #OWASP #AI #CyberSecurityMonth #InsecurePluginDesign #LLMSecurity * Source = https://lnkd.in/dt4_2Z8V
To view or add a comment, sign in
-
Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer
2moLimiting AI agency is crucial for building trust and ensuring responsible development. The OWASP Top 10 LLM Vulnerabilities list highlights this critical need, and your Excessive Agency Test Suite offers a powerful tool to address it. How can developers leverage these tests to create truly secure and ethical AI agents that benefit society?