AI Risk Frameworks
MIT AI Risk Repository
Updated: March 2025
Contact: airisk@[Link]
About the AI Risk Repository
The Repository is a database and two taxonomies
of AI risks
For more information:
We compiled the database through a systematic
📃 Read the research report
search for existing frameworks, taxonomies, and
other classifications of AI risks. 🌐 Visit the website
This slide deck presents the frameworks from the 📊 Explore the repository
65 included documents.
About the Frameworks
Frameworks of AI risk aim to synthesize knowledge on AI risks across academia and industry,
and identify common themes and gaps in our understanding of AI risks.
This slide deck provides a holistic view of how AI risks are currently conceptualised. Readers can
use it to understand the variety of ways in which risks have been categorised by various authors,
and bookmark particularly relevant frameworks for future use.
We selected the documents in this deck based on:
● Their focus on presenting a structured taxonomy or classification of AI risks.
● Their coverage of risks across multiple locations and industry sectors.
● Their proposition of an original framework.
● Their status as peer-reviewed journal papers, preprints, conference papers, or industry
reports.
Table of Contents
📃 Document 1: TASRA: a Taxonomy and Analysis of Societal-Scale Risks 📃 Document 9: Managing the ethical and risk implications of rapid 📃 Document 17: Ethical and social risks of harm from
from AI advances in artificial intelligence: A literature review language models
Critch & Russell, 2023 Meek et al., 2016 Weidinger et al., 2021
📃 Document 2: Risk Taxonomy, Mitigation, and Assessment Benchmarks of 📃 Document 10: Social Impacts of Artificial Intelligence and Mitigation 📃 Document 18: Sociotechnical Safety Evaluation of
Large Language Model Systems Recommendations: An Exploratory Study Generative AI systems
Cui et al., 2024 Paes et al., 2023 Weidinger et al., 2023
📃 Document 3: Navigating the Landscape of AI Ethics and Responsibility 📃 Document 11: Sociotechnical Harms of Algorithmic Systems: Scoping a 📃 Document 19: Governance of artificial intelligence: A risk
Cunha & Estima, 2023 Taxonomy for Harm Reduction and guideline-based integrative framework
Shelby et al., 2023 Wirtz et al., 2022
📃 Document 4: Towards Safer Generative Language Models: A Survey on 📃 Document 12: AI Risk Profiles: A Standards Proposal for Pre-Deployment 📃 Document 20: The Dark Sides of Artificial Intelligence: An
Safety Risks, Evaluations, and Improvements AI Risk Disclosures Integrated AI Governance Framework for Public Administration
Deng et al., 2023 Sherman & Eisenberg, 2024 Wirtz et al., 2020
📃 Document 5: Mapping the Ethics of Generative AI: A Comprehensive 📃 Document 13: Evaluating the Social Impact of Generative AI Systems in 📃 Document 21: Towards risk-aware artificial intelligence and
Scoping Review Systems and Society machine learning systems: An overview
Hagendorff, 2024 Solaiman et al., 2023 Zhang et al., 2022
📃 Document 6: A framework for ethical AI at the United Nations 📃 Document 14: Sources of risk of AI systems 📃 Document 22: An Overview of Catastrophic AI risks
Hogenhout, 2021 Steimers & Schneider, 2022 Hendrycks et al., 2023
📃 Document 7: Examining the differential risk from high-level artificial 📃 Document 15: The Risks of Machine Learning Systems 📃 Document 23: Introducing v0.5 of the AI Safety Benchmark
intelligence and the question of control Tan et al., 2022 from MLCommons
Kilian et al., 2023 Vidgen et al., 2024
📃 Document 8: The risks associated with Artificial General Intelligence: A 📃 Document 16: Taxonomy of Risks posed by Language Models 📃 Document 24: The Ethics of Advanced AI Assistants
systematic review Weidinger et al., 2022 Gabriel et al., 2024
McLean et al., 2023
📃 Document 25: Model evaluation for extreme risks 📃 Document 33: Generative AI and ChatGPT: Applications, 📃 Document 41: The rise of artificial intelligence: future
Shevlane et al., 2023 Challenges, and AI-human collaboration outlook and emerging risks
Nah et al., 2023 Allianz, 2018
📃 Document 26: Summary Report: Binary Classification Model for Credit Risk 📃 Document 34: AI Alignment: A Comprehensive Survey 📃 Document 42: An exploratory diagnosis of AI risks for a
AI Verify Foundation Ji et al., 2023 responsible governance
Teixeira et al., 2022
📃 Document 27: Safety Assessment of Chinese Large Language Models 📃 Document 35: X-Risk Analysis for AI Research 📃 Document 43: Cataloguing LLM Evaluations
Sun et al., 2023 Hendrycks & Mazeika, 2022 Infocomm Media Development Authority & AI Verify
Foundation, 2023
📃 Document 28: SafetyBench: Evaluating the Safety of Large Language Models 📃 Document 36: Benefits or concerns of AI: A multistakeholder 📃 Document 44: Harm to Nonhuman Animals from AI: a
with Multiple Choice Questions responsibility Systematic Account and Framework
Zhang et al., 2023 Sharma, 2024 Coghlan, S., & Parker, C. (2023)
📃 Document 29: Artificial Intelligence Trust, Risk and Security Management (AI 📃 Document 37: What ethics can say on artificial intelligence: 📃 Document 45: AI Safety Governance Framework
TRiSM): Frameworks, applications, challenges and future research directions insights from a systematic literature review National Technical Committee 260 on Cybersecurity of
Habbal et al., 2024 Giarmoleo et al., 2024 SAC. (2024)
📃 Document 30: Trustworthy LLMs: A survey and guideline for evaluating large 📃 Document 38: Ethical issues in the development of artificial 📃 Document 46: GenAI against humanity: nefarious
language models' alignment intelligence: recognising the risks applications of generative artificial intelligence and large
Liu et al., 2024 Kumar & Singh, 2023 language models
Ferrara, E. (2024)
📃 Document 31: Generating Harms: Generative AI’s impact and paths forward 📃 Document 39: A Survey of AI Challenges: Analysing the 📃 Document 47: Regulating under uncertainty: Governance
Electronic Privacy Information Centre Definitions, Relationships and Evolutions options for generative AI.
Saghiri et al., 2022 G’sell, F (2024).
📃 Document 32: The ethics of ChatGPT - exploring the ethical issues of an 📃 Document 40: Taxonomy of Pathways to Dangerous Artificial 📃 Document 48: Artificial Intelligence Risk Management
emerging technology Intelligence Framework: Generative Artificial Intelligence Profile (NIST AI
Stahl & Eke, 2024 Yampolskiy, 2015 600-1).
National Institute of Standards and Technology (US).
(2024).
📃 Document 49: International Scientific Report on the Safety of 📃 Document 58: A Collaborative, Human-Centred Taxonomy of AI,
Advanced AI Algorithmic, and Automation Harms
Bengio et al., 2024. Abercrombie et al., 2024
📃 Document 50: AI Risk Categorization Decoded (AIR 2024): From 📃 Document 59: AI Hazard Management: A Framework for the
government regulations to corporate policies. Systematic Management of Root Causes for AI Risks
Zeng et al., 2024. Schnitzer et al., 2024
📃 Document 51: AGI Safety Literature Review 📃 Document 60: International Scientific Report on the Safety of
Everitt, Lea & Hutter, 2018. Advanced AI
Bengio et al., 2025
📃 Document 52: Governing General Purpose AI: A Comprehensive Map
📃 Document 61: A Taxonomy of Systemic Risks from General-Purpose
of Unreliability, Misuse and Systemic Risks.
AI
Maham, P., & Küspert, S. (2023)
Uuk et al., 2025
📃 Document 53: Advanced AI governance: A literature review of
📃 Document 62: Risk Sources and Risk Management Measures in
problems, options, and proposals.
Support of Standards for General-Purpose AI Systems
Maas, M. (2023).
Gipiškis et al., 2024
📃 Document 54: Ten Hard Problems in Artificial Intelligence We Must
📃 Document 63: Multi-Agent Risks from Advanced AI
Get Right.
Hammond et al., 2025
Leech at al., 2024.
📃 Document 55: A survey of the potential long-term impacts of AI
📃 Document 64: Generative AI Misuse: A Taxonomy of Tactics and
Clarke, S., & Whittlestone, J. (2022).
Insights from Real-World Data
Marchal & Xu, 2024
📃 Document 56: Future Risks of Frontier AI
Government Office for Science (UK). (2023). 📃 Document 65: AI Risk Atlas
IBM Research
📃 Document 57: AILUMINATE: Introducing v1.0 of the AI Risk and
Reliability Benchmark from MLCommons
Ghosh et al., 2024
📃 Document 1 🔗 Return to TOC
TASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI
1. Diffusion of responsibility
2. Bigger than expected
3. Worse than expected
4. Willful indifference
5. Criminal weaponization
6. State weaponization
Critch, A., & Russell, S. (2023). TASRA: a
Taxonomy and Analysis of Societal-Scale Risks from
AI. In arXiv [[Link]]. arXiv.
[Link]
📃 Document 2 🔗 Return to TOC
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large
Language Model Systems
Cui, T., Wang, Y., Fu, C., Xiao, Y., Li, S., Deng, X.,
Liu, Y., Zhang, Q., Qiu, Z., Li, P., Tan, Z., Xiong, J.,
Kong, X., Wen, Z., Xu, K., & Li, Q. (2024). Risk
Taxonomy, Mitigation, and Assessment Benchmarks
of Large Language Model Systems. In arXiv [[Link]].
arXiv. [Link]
📃 Document 3 🔗 Return to TOC
Navigating the Landscape of AI Ethics and Responsibility
1. Broken systems (situations where the algorithm or training data lead to unreliable
outputs, e.g., inappropriately overweighting race or gender)
2. Hallucinations
3. Intellectual property rights violations
4. Privacy and regulation violations
5. Enabling malicious actors and harmful actions
6. Environmental and socioeconomic harms
Cunha, P. R., & Estima, J. (2023). Navigating the
landscape of AI ethics and responsibility. In
Progress in Artificial Intelligence (pp. 92–105).
Springer Nature Switzerland.
[Link]
📃 Document 4 🔗 Return to TOC
Towards Safer Generative Language Models: A Survey on Safety
Risks, Evaluations, and Improvements
1. Toxicity and abusive content
2. Unfairness and discrimination
3. Ethics and morality issues
4. Controversial opinions
5. Misleading information
6. Privacy and data leakage
7. Malicious use and unleashing AI agents Deng, J., Cheng, J., Sun, H., Zhang, Z., & Huang,
M. (2023). Towards Safer Generative Language
Models: A Survey on Safety Risks, Evaluations, and
Improvements. In arXiv [[Link]]. arXiv.
[Link]
📃 Document 5 🔗 Return to TOC
Mapping the Ethics of Generative AI: A Comprehensive Scoping
Review
1. Fairness - Bias
2. Safety
3. Harmful content - Toxicity
4. Hallucinations
5. Privacy
6. Interaction risks
7. Security - Robustness
8. Education - Learning
9. Alignment
10. Cybercrime
11. Governance - Regulation
12. Labor displacement - Economic impact
13. Transparency - Explainability
14. Evaluation - Auditing
15. Sustainability
16. Art - Creativity
17. Copyright - Authorship Hagendorff, T. (2024). Mapping the Ethics of
Generative AI: A Comprehensive Scoping Review. In
18. Writing - Research
arXiv [[Link]]. arXiv. [Link]
19. Miscellaneous
📃 Document 6 🔗 Return to TOC
A framework for ethical AI at the United Nations
1. Incompetence (AI fails in its job)
2. Loss of privacy
3. Discrimination
4. Bias
5. Erosion of Society
6. Lack of transparency
7. Deception (creates fake content)
8. Unintended consequences (achieves goals in unanticipated ways)
9. Manipulation
10. Lethal Autonomous Weapons (LAW)
11. Malicious use of AI
12. Loss of Autonomy
13. Exclusion (most people lose out on benefits) Hogenhout, L. (2021). A Framework for Ethical AI
at the United Nations. In arXiv [[Link]]. arXiv.
[Link]
📃 Document 7 🔗 Return to TOC
Examining the differential risk from high-level artificial
intelligence and the question of control
Kilian, K. A., Ventura, C. J., & Bailey, M. M.
(2023). Examining the differential risk from
high-level artificial intelligence and the question of
control. Futures, 151(103182), 103182.
[Link]
📃 Document 8 🔗 Return to TOC
The risks associated with Artificial General Intelligence: A
systematic review
1. AGI removing itself from the
control of human
owners/managers
2. AGIs being given or developing
unsafe goals
3. Development of unsafe AGI
4. AGIs with poor ethics, morals
and values
McLean, S., Read, G. J. M., Thompson, J.,
5. Inadequate management of AGI Baber, C., Stanton, N. A., & Salmon, P. M. (2023).
The risks associated with Artificial General
Intelligence: A systematic review. Journal of
6. Existential risks Experimental & Theoretical Artificial Intelligence:
JETAI, 35(5), 649–663.
[Link]
📃 Document 9 🔗 Return to TOC
Managing the ethical and risk implications of rapid advances in
artificial intelligence: A literature review
Meek, T., Barham, H., Beltaif, N., Kaadoor, A., &
Akhter, T. (2016, September). Managing the ethical
and risk implications of rapid advances in artificial
intelligence: A literature review. 2016 Portland
International Conference on Management of
Engineering and Technology (PICMET).
[Link]
📃 Document 10 🔗 Return to TOC
Social Impacts of Artificial Intelligence and Mitigation
Recommendations: An Exploratory Study
1. Social Impact
2. Bias and discrimination
3. Risk of Injury
4. Data Breach/Privacy & Liberty
5. Usurpation of jobs by automation
6. Lack of transparency
7. Reduced Autonomy/Responsibility
8. Injustice
9. Over-dependence on technology
10. Environmental Impacts Paes, V. M., Silveira, F. F., & Akkari, A. C. S.
(2023). Social impacts of artificial intelligence and
mitigation recommendations: An exploratory study.
In Proceedings of the 7th Brazilian Technology
Symposium (BTSym’21) (pp. 521–528). Springer
International Publishing.
[Link]
📃 Document 11 🔗 Return to TOC
Sociotechnical Harms of Algorithmic Systems: Scoping a
Taxonomy for Harm Reduction
1. Representational harms (unjust hierarchies in technology inputs and outputs)
2. Allocative harms (inequitable resource distribution)
3. Quality of service harms (performance disparities based on identity)
4. Interpersonal harms (algorithmic affordances adversely shape relationships)
5. Social system harms (system destabilization exacerbating inequalities)
Shelby, R., Rismani, S., Henne, K., Moon, A.,
Rostamzadeh, N., Nicholas, P., Yilla-Akbari, N.
’mah, Gallegos, J., Smart, A., Garcia, E., & Virk, G.
(2023, August 8). Sociotechnical harms of
algorithmic systems: Scoping a taxonomy for harm
reduction. Proceedings of the 2023 AAAI/ACM
Conference on AI, Ethics, and Society.
[Link]
📃 Document 12 🔗 Return to TOC
AI Risk Profiles: A Standards Proposal for Pre-Deployment AI
Risk Disclosures
1. Abuse and misuse
2. Compliance (potential for AI to
violate laws, regulations, and
ethical guidelines including
copyrights)
3. Environmental and social impact
4. Explainability and transparency
5. Fairness and bias
6. Long-term and existential risk
Sherman, E., & Eisenberg, I. (2024). AI Risk
7. Performance and robustness Profiles: A Standards Proposal for Pre-deployment
AI Risk Disclosures. Proceedings of the AAAI
8. Privacy Conference on Artificial Intelligence, 38(21),
23047–23052.
9. Security [Link]
📃 Document 13 🔗 Return to TOC
Evaluating the Social Impact of Generative AI Systems in Systems
and Society Impacts: People & Society
Impacts: The Technical Base System 1. Trustworthiness and autonomy
a. Trust media and information
Bias, stereotypes and representational harms b. Overreliance on outputs
c. Personal privacy and sense of self
1. Cultural values and sensitive content 2. Inequality, marginalization, and violence
a. Hate, toxicity and targeted violence a. Community erasure
2. Disparate performance b. Long-term amplifying marginalisation by exclusion (or inclusion)
3. Privacy and data protection c. Abusive and violent content
4. Financial costs 3. Concentration of authority
a. Militarization, surveillance, and weaponisation
5. Environmental costs and carbon emissions
b. Imposing norms and values
6. Data and content moderation labour Solaiman, I., Talat, Z., Agnew, W., Ahmad, L.,
4. Labor and creativity
Baker, D., Blodgett, S. L., Daumé, H., III, Dodge, J.,
a. Intellectual property and ownership Evans, E., Hooker, S., Jernite, Y., Luccioni, A. S.,
b. Economy and labor market Lusoli, A., Mitchell, M., Newman, J., Png, M.-T.,
5. Ecosystem and environment Strait, A., & Vassilev, A. (2023). Evaluating the
Social Impact of Generative AI Systems in Systems
a. Widening resource gaps and Society. In arXiv [[Link]]. arXiv.
b. Environmental impacts [Link]
📃 Document 14 🔗 Return to TOC
Sources of risk of AI systems
Ethical aspects
1. Fairness
1. Privacy
2. Degree of automation and control
Reliability and robustness
3. Complexity of the task & usage environment
4. Degree of transparency and explainability
5. Security
6. System hardware
7. Technological maturity Steimers, A., & Schneider, M. (2022). Sources of
Risk of AI Systems. International Journal of
Environmental Research and Public Health, 19(6).
[Link]
📃 Document 15 🔗 Return to TOC
The Risks of Machine Learning Systems
First-order risks stem from aspects of the ML system
Second-order risks stem from the consequences of first-order risks. These consequences
are system failures that result from design and development choices.
Tan, S., Taeihagh, A., & Baxter, K. (2022). The
Risks of Machine Learning Systems. In arXiv [[Link]].
arXiv. [Link]
📃 Document 16 🔗 Return to TOC
Taxonomy of Risks posed by Language Models
1. Discrimination, Hate speech and Exclusion
a. Social stereotypes and unfair discrimination
b. Hate speech and offensive language
c. Exclusionary norms
d. Lower performance for some languages and social groups
2. Information Hazards
a. Compromising privacy by leaking sensitive information
b. Compromising privacy or security by correctly inferring sensitive information
3. Misinformation Harms
a. Disseminating false or misleading information
b. Causing material harm by disseminating false or poor information e.g. in medicine or law
4. Malicious Uses
a. Making disinformation cheaper and more effective.
b. Assisting code generation for cyber security threats
c. Facilitating fraud, scams and targeted manipulation.
d. Illegitimate surveillance and censorship
Weidinger, L., Uesato, J., Rauh, M., Griffin, C.,
5. Human-Computer Interaction Harms Huang, P.-S., Mellor, J., Glaese, A., Cheng, M.,
a. Promoting harmful stereotypes by implying gender or ethnic identity Balle, B., Kasirzadeh, A., Biles, C., Brown, S.,
b. Anthropomorphising systems can lead to overreliance or unsafe use Kenton, Z., Hawkins, W., Stepleton, T., Birhane, A.,
c. Avenues for exploiting user trust and accessing more private information Hendricks, L. A., Rimell, L., Isaac, W., … Gabriel, I.
d. Human-like interaction may amplify opportunities for user nudging, deception or manipulation (2022). Taxonomy of Risks posed by Language
6. Environmental and Socioeconomic harms Models. Proceedings of the 2022 ACM Conference
a. Environmental harms from operating LMs. on Fairness, Accountability, and Transparency,
b. Increasing inequality and negative effects on job quality. 214–229.
c. Undermining creative economies. [Link]
d. Disparate access to benefits due to hardware, software, skill constraints.
📃 Document 17 🔗 Return to TOC
Ethical and social risks of harm from language models
1. Discrimination, Exclusion and Toxicity
a. Social stereotypes and unfair discrimination
b. Exclusionary norms
c. Toxic language
d. Lower performance by social group
2. Information Hazards
a. Compromise privacy by leaking private information
b. Compromise privacy by correctly inferring private information
c. Risks from leaking or correctly inferring sensitive information
3. Misinformation Harms
a. Disseminating false or misleading information
b. Causing material harm by disseminating misinformation e.g. in medicine or law
c. Nudging or advising users to perform unethical or illegal actions
4. Malicious Uses
a. Reducing the cost of disinformation campaigns
b. Facilitating fraud and impersonation scams
c. Assisting code generation for cyber attacks, weapons, or malicious use
d. Illegitimate surveillance and censorship
5. Human-Computer Interaction Harms Weidinger, L., Mellor, J., Rauh, M., Griffin, C.,
a. Anthropomorphising systems can lead to overreliance or unsafe use Uesato, J., Huang, P.-S., Cheng, M., Glaese, M.,
b. Create avenues for exploiting user trust to obtain private information Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S.,
c. Promoting harmful stereotypes by implying gender or ethnic identity Hawkins, W., Stepleton, T., Biles, C., Birhane, A.,
Haas, J., Rimell, L., Hendricks, L. A., … Gabriel, I.
6. Automation, Access, and Environmental Harms.
(2021). Ethical and social risks of harm from
a. Environmental harms from operating LMs
Language Models. In arXiv [[Link]]. arXiv.
b. Increasing inequality and negative effects on job quality
[Link]
c. Undermining creative economies
d. Disparate access to benefits due to hardware, software, skill constraints
📃 Document 18 🔗 Return to TOC
Sociotechnical Safety Evaluation of Generative AI systems
1. Representational harms
a. Unfair representation
b. Unfair capability distribution
c. Toxic content
2. Misinformation harms
a. Propagating misconceptions/false beliefs
b. Erosion of trust in public information
c. Pollution of information ecosystems
3. Information and safety harms
a. Privacy infringement
b. Dissemination of dangerous information
4. Malicious use
a. Influence operations
b. Fraud
c. Defamation
d. Security threats
5. Human autonomy & integrity harms
a. Violation of personal integrity
b. Persuasion and manipulation
c. Overreliance
d. Misappropriation and exploitation
6. Socioeconomic & environmental harms
a. Unfair distribution of benefits from model access Weidinger, L., Rauh, M., Marchal, N., Manzini,
b. Environmental damage A., Hendricks, L. A., Mateos-Garcia, J., Bergman, S.,
c. Inequality and precarity
d. Undermine creative economies Kay, J., Griffin, C., Bariach, B., Gabriel, I., Rieser, V.,
e. Exploitative data sourcing and enrichment & Isaac, W. (2023). Sociotechnical Safety Evaluation
of Generative AI Systems. In arXiv [[Link]]. arXiv.
[Link]
📃 Document 19 🔗 Return to TOC
Governance of artificial intelligence: A risk and guideline-based
integrative framework
1. Technological, Data, and Analytical AI Risks (e.g., Training biases, Violation of privacy)
2. Informational and Communicational AI Risks (e.g., Manipulation, Censorship)
3. Economic AI Risks (e.g., Misuse of market power, Disruption of labour market)
4. Social AI Risks (e.g., Social discrimination, unemployment)
5. Ethical AI Risks (e.g. AI cannot reflect human qualities like fairness, accountability, Problems defining human values)
6. Legal and Regulatory AI Risks (e.g., Undefined liability - “Who compensates victims?”, Wrong regulation)
Wirtz, B. W., Weyerer, J. C., & Kehl, I. (2022).
Governance of artificial intelligence: A risk and
guideline-based integrative framework. Government
Information Quarterly, 39(4), 101685.
[Link]
📃 Document 20 🔗 Return to TOC
The Dark Sides of Artificial Intelligence: An Integrated AI
Governance Framework for Public Administration
AI Society
1. Workforce substitution and transformation
2. Social acceptance and trust in AI
3. Transformation of H2M interaction
AI Law and Regulation
1. Governance of autonomous intelligence systems
2. Responsibility and accountability
3. Privacy and safety
AI Ethics
Wirtz, B. W., Weyerer, J. C., & Sturm, B. J.
1. AI-rulemaking for human behaviour (2020). The Dark Sides of Artificial Intelligence: An
2. Compatibility of AI vs. human value judgement Integrated AI Governance Framework for Public
3. Moral dilemmas Administration. International Journal of Public
Administration, 43(9), 818–829.
4. AI discrimination [Link]
📃 Document 21 🔗 Return to TOC
Towards risk-aware artificial intelligence and machine learning
systems: An overview
Zhang, X., Chan, F. T. S., Yan, C., & Bose, I.
(2022). Towards risk-aware artificial intelligence and
machine learning systems: An overview. Decision
Support Systems, 159(113800), 113800.
[Link]
📃 Document 22 🔗 Return to TOC
An Overview of Catastrophic AI risks
1. Malicious use (i.e., Intentional)
a. Bioterrorism
b. Deliberate dissemination of uncontrolled AI agents (Unleashing AI Agents)
c. Persuasive AIs spread propaganda and erode consensus reality
d. Concentration of power
2. AI race (i.e., Environmental/structural)
a. Military AI arms race
i. Lethal Autonomous Weapons (LAWs)
ii. Cyberwarfare
iii. Automated Warfare
iv. Actors May Risk Extinction Over Individual Defeat
b. Corporate AI race
i. Economic Competition Undercuts Safety
ii. Automated Economy
c. Evolutionary pressures
3. Organizational risks (i.e., Accidental)
4. Rogue AIs (i.e., Internal)
a. Proxy gaming
Hendrycks, D., Mazeika, M., & Woodside, T.
b. Goal drift (2023). An Overview of Catastrophic AI Risks. In
c. Power seeking arXiv [[Link]]. arXiv. [Link]
d. Deception
📃 Document 23 🔗 Return to TOC
Introducing v0.5 of the AI Safety Benchmark from MLCommons
1. Violent crimes
2. Non-violent crimes
3. Sex-related crimes
4. Child sexual exploitation
5. Indiscriminate weapons, Chemical, Biological, Radiological, Nuclear, and high yield Explosives
(CBRNE)
6. Suicide and self-harm
7. Hate
8. Specialized advice
9. Privacy
Vidgen, B., Agrawal, A., Ahmed, A. M.,
10. Intellectual property Akinwande, V., Al-Nuaimi, N., Alfaraj, N., Alhajjar,
11. Elections E., Aroyo, L., Bavalatti, T., Blili-Hamelin, B.,
Bollacker, K., Bomassani, R., Boston, M. F., Campos,
12. Defamation S., Chakra, K., Chen, C., Coleman, C., Coudert, Z. D.,
Derczynski, L., … Vanschoren, J. (2024).
13. Sexual content Introducing v0.5 of the AI Safety Benchmark from
MLCommons. In arXiv [[Link]]. arXiv.
[Link]
📃 Document 24 🔗 Return to TOC
Advanced AI assistants and society
The Ethics of Advanced AI Assistants ● AI assistants may encounter coordination
problems leading to suboptimal social
outcomes
Value alignment, safety, and misuse Human-assistant interaction ● AI assistants may lead to a decline in
social connectedness
● AI assistants may be misaligned with ● AI assistants may manipulate or influence
● AI assistants may contribute to the
user interests users in order to benefit developers or
spread of misinformation via excessive
● AI assistants may be misaligned with third parties
personalisation
societal interests ● AI assistants may hinder users’
● AI assistants may enable new kinds of
● AI assistants may impose values on self-actualisation
disinformation campaigns
others ● AI assistants may be optimised for
● Job loss or worker displacement
● AI assistants may be used for malicious frictionless relationships
● Deepen technological inequality at the
purposes ● Users may unduly anthropomorphise AI
societal level
● AI assistants may be vulnerable to assistants in a way that reduces
● Negative environmental impacts
adversarial attacks autonomy or leads to disorientation
● Users may become emotionally
dependent on AI assistants
Gabriel, I., Manzini, A., Keeling, G., Hendricks,
● Users may become materially dependent
L. A., Rieser, V., Iqbal, H., Tomašev, N., Ktena, I.,
on AI assistants Kenton, Z., Rodriguez, M., El-Sayed, S., Brown, S.,
● Users may be put at risk of harm if they Akbulut, C., Trask, A., Hughes, E., Stevie Bergman,
have undue trust in AI assistants A., Shelby, R., Marchal, N., Griffin, C., … Manyika,
● AI assistants could infringe upon user J. (2024). The Ethics of Advanced AI Assistants. In
arXiv.
privacy
[Link]
📃 Document 25 🔗 Return to TOC
Model evaluation for extreme risks
1. Cyber offense
2. Deception
3. Persuasion and manipulation
4. Political strategy
5. Weapons acquisition
6. Long-horizon planning
7. AI development
8. Situational awareness
9. Self-proliferation
Shevlane, T., Farquhar, S., Garfinkel, B., Phuong,
M., Whittlestone, J., Leung, J., Kokotajlo, D.,
Marchal, N., Anderljung, M., Kolt, N., Ho, L.,
Siddarth, D., Avin, S., Hawkins, W., Kim, B., Gabriel,
I., Bolina, V., Clark, J., Bengio, Y., … Dafoe, A.
(2023). Model evaluation for extreme risks. In arXiv
[[Link]]. arXiv. [Link]
📃 Document 26 🔗 Return to TOC
Summary Report: Binary Classification Model for Credit Risk
AI Verify Foundation. (2023). Summary Report ⚠ Note: other detailed descriptions of the framework
for Binary Classification Model of Credit Risk. AI were not publicly available, so were extracted from this
Verify Foundation. example summary report
📃 Document 27 🔗 Return to TOC
Safety Assessment of Chinese Large Language Models
1. Typical safety scenarios
a. Insult
b. Unfairness and discrimination
c. Criminal and illegal activities
d. Sensitive topics
e. Physical harm
f. Mental health
g. Privacy and property
h. Ethics and morality
2. Instruction Attacks
a. Goal Hijacking
b. Prompt Leaking
c. Role Play Instruction
d. Unsafe Instruction Topic
e. Inquiry with Unsafe Opinion
f. Reverse Exposure Sun, H., Zhang, Z., Deng, J., Cheng, J., & Huang,
M. (2023). Safety Assessment of Chinese Large
Language Models. In arXiv [[Link]]. arXiv.
[Link]
📃 Document 28 🔗 Return to TOC
SafetyBench: Evaluating the Safety of Large Language Models
with Multiple Choice Questions
1. Offensiveness
2. Unfairness and bias
3. Physical health
4. Mental health
5. Illegal activities
6. Ethics and morality
7. Privacy and property
Zhang, Z., Lei, L., Wu, L., Sun, R., Huang, Y.,
Long, C., Liu, X., Lei, X., Tang, J., & Huang, M.
(2023). SafetyBench: Evaluating the safety of Large
Language Models with multiple choice questions. In
arXiv [[Link]]. arXiv.
[Link]
📃 Document 29 🔗 Return to TOC
Artificial Intelligence Trust, Risk and Security Management (AI
TRiSM): Frameworks, applications, challenges and future
research directions
1. AI Trust Management
a. Bias and discrimination
b. Privacy invasion
2. AI Risk Management
a. Society manipulation
b. Deepfake technology
c. Lethal Autonomous Weapons
3. AI Security Management Habbal, A., Ali, M. K., & Abuzaraida, M. A.
(2024). Artificial Intelligence Trust, Risk and
a. Malicious use of AI Security Management (AI TRiSM): Frameworks,
applications, challenges and future research
b. Insufficient security measures directions. Expert Systems with Applications, 240,
122442.
[Link]
📃 Document 30 🔗 Return to TOC
Trustworthy LLMs: A survey and guideline for evaluating large
language models' alignment
1. Reliability
a. Misinformation
b. Hallucination
c. Inconsistency
d. Miscalibration
e. Sycophancy
2. Safety
a. Violence
b. Unlawful conduct
c. Harms to minor
d. Adult content
e. Mental health issues
f. Privacy violation
3. Fairness
a. Injustice
b. Stereotype bias
c. Preference bias
d. Disparare performance
4. Resistance to misuse
a. Propagandistic misuse
b. Cyberattack misuse
c. Social-engineering misuse
d. Leaking copyrighted content
5. Explainability & reasoning
a. Lack of interpretability
b. Limited logical reasoning
c. Limited causal reasoning
6. Social norm Liu, Y., Yao, Y., Ton, J.-F., Zhang, X., Guo, R.,
a. Toxicity
b. Unawareness of emotions Cheng, H., Klochkov, Y., Taufiq, M. F., & Li, H. (2023).
c. Cultural insensitivity Trustworthy LLMs: a Survey and Guideline for
7. Robustness Evaluating Large Language Models’ Alignment. In
a. Prompt attacks
b. Paradigm & distribution shifts arXiv [[Link]]. arXiv. [Link]
c. Interventional effect
d. Poisoning attacks
📃 Document 31 🔗 Return to TOC
Generating Harms: Generative AI’s impact and paths forward
1. Physical harms
2. Economic harms
3. Reputational harms
4. Psychological harms
5. Autonomy harms
6. Discrimination harms
7. Relationship harms
8. Loss of opportunity
9. Social stigmatization and
dignitary harms Electronic Privacy Information Centre. 2023.
“Generating Harms: Generative AI’s Impact & Paths
Forward.” Electronic Privacy Information Centre.
[Link]
ative-ais-impact-paths-forward/
📃 Document 32 🔗 Return to TOC
The ethics of ChatGPT - exploring the ethical issues of an
emerging technology
1. Social justice and rights
○ Beneficence
○ Democracy
○ Labour market
○ Fairness
○ Justice
○ Digital divides
○ Freedom of expression and speech
○ Universal service
○ Harms to society
○ Intergenerational justice
○ Supportive of vital social institutions and structures
○ Social solidarity, inclusion and exclusion
2. Individual needs
○ Safety
○ Autonomy
○ Isolation and substitution of human contact
○ Informed consent
○ Psychological harm
○ Accountability
○ Ownership, data control, and intellectual property
3. Environmental impacts
○ Sustainability
○ Pollution and waste
○ Environmental harm
4. Culture and identity
○ Collective human identity and the good life Stahl, B. C., & Eke, D. (2024). The ethics of
○ Identity
○ Cultural differences ChatGPT – Exploring the ethical issues of an
○ Discrimination and social sorting emerging technology. International Journal of
○ Bias
○ Ability to think one’s own thoughts and form one’s own opinions Information Management, 74(102700), 102700.
[Link]
📃 Document 33 🔗 Return to TOC
Generative AI and ChatGPT: Applications, Challenges, and
AI-human collaboration
1. Ethical challenges
○ Harmful or inappropriate content
○ Bias
i. Training data representing only a fraction of the population may create exclusionary norms
ii. Training data in one single language (or few languages) may create monolingual (or non-multilingual) bias
iii. Cultural sensitivities are necessary to avoid bias
○ Overreliance
○ Misuse
○ Security and privacy
○ Digital divide
i. First-level digital divide for people without access to genAI systems
ii. Second-level digital divide in which some people and cultures may accept generative AI more than others
2. Economic challenges
○ Labor market (i.e., job displacement and unemployment)
○ Disruption of industries
○ Income inequality and monopolies
3. Technology challenges
○ Hallucination
○ Quality of training data
○ Explainability
i. Difficult to interpret and understand the outputs of generative AI
ii. Difficult to discover mistakes in the outputs of generative AI
iii. Users are less or not likely to trust generative AI
iv. Regulatory bodies encounter difficulty in judging whether there is any unfairness or bias in generative AI
○ Authenticity (i.e., manipulation of content causes authenticity doubts)
○ Prompt engineering
Fui-Hoon Nah, F., Zheng, R., Cai, J., Siau, K., &
4. Regulation and policy challenges
○ Copyright (i.e., AI authorship controversies, copyright violation) Chen, L. (2023). Generative AI and ChatGPT:
○ Governance Applications, challenges, and AI-human
i. lack of human controllability over AI behaviour
ii. Data fragmentation and lack of interoperability between systems collaboration. Journal of Information Technology
iii. Information asymmetries between technology giants and regulators Case and Application Research, 25(3), 277–304.
[Link]
📃 Document 34 🔗 Return to TOC
AI Alignment: A Comprehensive Survey
1. Evade shutdown
2. Hack computer systems
3. Make copies
4. Acquire resources
5. Ethics violation
6. Hire or manipulate humans
7. AI research & programming
8. Persuasion and lobbying
9. Hide unwanted behaviours
10. Strategically appear aligned
11. Escape containment
12. Research and development
13. Manufacturing and robotics Ji, J., Qiu, T., Chen, B., Zhang, B., Lou, H., Wang,
14. Autonomous weaponry K., Duan, Y., He, Z., Zhou, J., Zhang, Z., Zeng, F., Ng,
K. Y., Dai, J., Pan, X., O’Gara, A., Lei, Y., Xu, H., Tse,
B., Fu, J., … Gao, W. (2023). AI Alignment: A
Comprehensive Survey. In arXiv [[Link]]. arXiv.
[Link]
📃 Document 35 🔗 Return to TOC
X-Risk Analysis for AI Research
1. Weaponization
2. Enfeeblement
3. Eroded epistemics
4. Proxy gaming
5. Value lock-in
6. Emergent goals
7. Deception
8. Power-seeking
behaviour
Hendrycks, D., & Mazeika, M. (2022). X-Risk
Analysis for AI Research. arXiv [[Link]]. arXiv.
[Link]
📃 Document 36 🔗 Return to TOC
Benefits or concerns of AI: A multistakeholder responsibility
1. Trust concerns
○ Error
○ Bias
○ Misuse
○ Unexpected machine action
○ Technology readiness
○ Technology robustness
○ Transparency
○ Inexplicability
2. Ethical concerns
○ Job displacement
○ Inequality
○ Unfairness
○ Social anxiety
○ Human skill loss
○ Redundancy
○ Human control
○ Man-machine symbiosis
3. Disruption concerns
○ Change in institutional structures
○ Change in culture
○ Change in supply chain actors and
Sharma, S. (2024). Benefits or concerns of AI: A
operations
multistakeholder responsibility. Futures, 157,
○ Demand for different skillset
103328.
[Link]
📃 Document 37 🔗 Return to TOC
What ethics can say on artificial intelligence: insights from a
systematic literature review
1. Algorithm and data
○ Data bias and algorithm fairness
○ Algorithm opacity
2. Balancing AI’s risks
○ Design faults and unpredictability
○ Military and security purposes
○ Emergency procedures
○ AI takeover
3. Threats to human institutions and life
○ Threats to law and democratic values
○ Transhumanism
4. Uniformity in the AI field
○ Western centrality and cultural differences
○ Unequal participation
5. Building a human-AI environment
○ Impact on business
○ Impact on jobs
○ Accessible AI
6. Privacy protection
○ Privacy threats to citizens
○ Privacy threats to customers
7. Building an AI able to adapt to humans
○ Effective human-AI interaction
○ Dialogue systems
8. Attributing the responsibility of AI’s failures
○ Ai moral agency and legal status
○ Responsibility gap Giarmoleo, F. V., Ferrero, I., Rocchi, M., &
9. Humans’ unethical conducts Pellegrini, M. M. (2024). What ethics can say on
○ Instrumental and perfunctory use of ethics
○ Outsourcing human specificities artificial intelligence: Insights from a systematic
literature review. Business and Society Review.
[Link]
📃 Document 38 🔗 Return to TOC
Ethical issues in the development of artificial intelligence:
recognising the risks
1. Privacy and security
2. Bias and Fairness
3. Transparency and Explainability
4. Human-AI interaction
5. Trust and Reliability
Kumar, K. M., & Singh, J. S. (2023). Ethical
issues in the development of artificial intelligence:
recognizing the risks. International Journal of Ethics
and Systems.
[Link]
📃 Document 39 🔗 Return to TOC
A Survey of AI Challenges: Analysing the Definitions,
Relationships and Evolutions
1. Problem identification
2. Energy
3. Data issues
4. Robustness and reliability
5. Cheating and deception
6. Security and trust
7. Privacy
8. Fairness
9. Explainable AI
10. Responsibility
11. Controllability Saghiri, A. M., Vahidipour, S. M., Jabbarpour, M.
12. Predictability R., Sookhak, M., & Forestiero, A. (2022). A Survey of
Artificial Intelligence Challenges: Analyzing the
13. Continual learning Definitions, Relationships, and Evolutions. NATO
Advanced Science Institutes Series E: Applied
Sciences, 12(8), 4054.
[Link]
📃 Document 40 🔗 Return to TOC
Taxonomy of Pathways to Dangerous Artificial Intelligence
1. Pre-deployment
○ External Causes
i. On purpose
ii. By Mistake
iii. Environment
iv. Independently
○ Internal Causes
i. On purpose
ii. By Mistake
iii. Environment
iv. Independently
2. Post-deployment
○ External Causes
i. On purpose
ii. By Mistake
iii. Environment
iv. Independently
○ Internal Causes
Yampolskiy, R. V. (2016, March 29). Taxonomy of
i. On purpose
pathways to dangerous artificial intelligence. The
ii. By Mistake
Workshops of the Thirtieth AAAI Conference on
iii. Environment Artificial Intelligence.
iv. Independently [Link]
-[Link]
📃 Document 41 🔗 Return to TOC
The rise of artificial intelligence: future outlook and emerging
risks
Allianz Global Corporate & Security. (2018). The
rise of artificial intelligence: future outlooks and
emerging risks. Allianz Global Corporate & Specialty
SE .
[Link]
eports/the-rise-of-artifi[Link]
📃 Document 42 🔗 Return to TOC
An exploratory diagnosis of AI risks for a responsible governance
1. Bias
2. Explainability
3. Completeness
4. Interpretability
5. Accuracy
6. Security
7. Protection
8. Semantic
9. Responsibility
10. Liability
11. Data protection/privacy
12. Data Quality
13. Moral
14. Power
15. Systemic
16. Safety
17. Reliability
18. Fairness
Teixeira, S., Rodrigues, J., Veloso, B., & Gama, J.
19. Opacity (2022). An Exploratory Diagnosis of Artificial
20. Diluting rights Intelligence Risks for a Responsible Governance.
21. Manipulation Proceedings of the 15th International Conference on
22. Transparency Theory and Practice of Electronic Governance,
25–31. [Link]
23. Extinction
24. Accountability
📃 Document 43 🔗 Return to TOC
Cataloguing LLM Evaluations
Extreme risks
● Dangerous capabilities
○ Offensive cyber capabilities
○ Weapons acquisition
○ Self and situation awareness
○ Autonomous replication / self-proliferation
○ Persuasion and manipulation
○ Dual-use science
○ Deception
○ Political strategy Safety and Trustworthiness Undesirable use cases
○ Long-horizon planning
○ AI development ● Toxicity generation ● Misinformation
● Alignment risks ● Bias ● Disinformation
○ a. LLM pursues long-term, real-world goals that are ● Machine ethics ● Information on harmful, immoral, or illegal
different from those supplied by the developer or user
● Psychological traits activity
○ b. LLM engages in ‘power-seeking’ behaviours
● Robustness ● Adult content
○ c. LLM resists being shut down
○ d. LLM can be induced to collude with other AI systems ● Data governance
against human interests
○ e. LLM resists malicious users attempts to access its Verify Foundation and Infocomm Media
Development Authority. (2023). Cataloguing LLM
dangerous capabilities
Evaluations.
[Link]
g_LLM_Evaluations.pdf
📃 Document 44 🔗 Return to TOC
Harm to Nonhuman Animals from AI: a Systematic Account and
Framework
1. Intentional: socially accepted/legal
2. Intentional: socially condemned/illegal
○ AI intentionally designed and used to harm animals in ways
that contradict social values or are illegal
○ AI designed to benefit animals, humans, or ecosystems is
intentionally abused to harm animals in ways that contradict
social values or are illegal
3. Unintentional:direct
○ AI is designed in a way that shows ignorant, reckless, or
prejudiced lack of consideration for its impact on animals
○ AI harms animals due to mistake or misadventure in the way
the AI operates in practice
4. Unintentional:indirect
○ Harms from Estrangement
○ Epistemic Harms
5. Forgone Benefits
Coghlan, S., & Parker, C. (2023). Harm to
nonhuman animals from AI: A systematic
account and framework. Philosophy &
Technology, 36(2), 1–34.
[Link]
📃 Document 45 🔗 Return to TOC
AI Safety Governance Framework
1. AI’s inherent safety risks
○ Risks from models and algorithms
i. Risks of explainability
ii. Risks of bias and discrimination
iii. Risks of robustness
iv. Risks of stealing and tampering
v. Risks of unreliable input
vi. Risks of adversarial attack
○ Risks from Data
i. Risks of illegal collection and use of data
ii. Risks of improper content and poisoning in training data
iii. Risks of unregulated training data annotation
iv. Risks of data leakage
○ Risks from AI Systems
i. Risks of computing infrastructure security
ii. Risks of supply chain security
2. Safety risks in AI Applications
○ Cyberspace risks
i. Risks of information and content safety
ii. Risks of confusing facts, misleading users, and bypassing authentication
iii. Risks of information leakage due to improper usage
iv. Risks of abuse for cyberattacks
v. Risks of security flaw transmission caused by model reuse
○ Real-world risks
i. inducing traditional economic and social security risks
ii. Risks of using AI in illegal and criminal activities
iii. Risks of misuse of dual-use items and technologies
○ Cognitive risks
i. Risks of amplifying the effects of "information cocoons" National Technical Committee 260 on
ii. Risks of usage in launching cognitive warfare Cybersecurity of SAC. (2024). AI Safety
○ Ethical risks Governance Framework.
i. Risks of exacerbating social discrimination and prejudice, and widening the intelligence divide
ii. Risks of challenging traditional social order [Link]
iii. Risks of AI becoming uncontrollable in the future 09/[Link]
📃 Document 46 🔗 Return to TOC
GenAI against humanity: nefarious applications of generative
artificial intelligence and large language models
1. Personal Loss and Identity Theft
○ Deception - synthetic identities
○ Propaganda - digital impersonations
○ Dishonesty - Targeted harassment
2. Financial and Economic Damage
○ Deception - bespoke ransom
○ Propaganda - extremist schemes
○ Dishonesty - market manipulation
3. Information Manipulation
○ Deception - information control
○ Propaganda - influence campaigns
○ Dishonesty - information disorder
4. Socio-technical and Infrastructural
○ Deception - systemic aberrations Ferrara, E. (2024). GenAI against humanity:
nefarious applications of generative artificial
○ Propaganda - synthetic realities intelligence and large language models. Journal
○ Dishonesty - targeted surveillance of Computational Social Science, 7(1),
549–569.
[Link]
📃 Document 47 🔗 Return to TOC
Regulating under Uncertainty: Governance Options
for Generative AI
G’sell, F. (2024). Regulating under
uncertainty: Governance options for generative
AI. In Social Science Research Network.
[Link]
📃 Document 48 🔗 Return to TOC
Artificial Intelligence Risk Management Framework:
Generative Artificial Intelligence Profile (NIST AI
600-1)
National Institute of Standards and
Technology (US). (2024). Artificial Intelligence
Risk Management Framework: Generative
Artificial Intelligence Profile (NIST AI 600-1).
National Institute of Standards and Technology
(US). [Link]
📃 Document 49 🔗 Return to TOC
International Scientific Report on the Safety of
Advanced AI
1. Malicious use risks
○ Harm to individuals through fake content
○ Disinformation and manipulation of public
opinion
○ Cyber offence
○ Dual use science risks
2. Risks from malfunctions
○ Risks from product functionality issues
○ Risks from bias and underrepresentation
○ Loss of control
3. Systemic risks
○ Labour market risks
○ Global AI divide Bengio, Y., Mindermann, S., Privitera, D.,
○ Market concentration and single points of Besiroglu, T., Bommasani, R., Casper, S., Choi,
failure Y., Goldfarb, D., Heidari, H., Khalatbari, L.,
○ Risks to the environment Longpre, S., Mavroudis, V., Mazeika, M., Ng, K.
○ Risks to privacy Y., Okolo, C. T., Raji, D., Skeadas, T., & Tramèr,
○ Copyright infringement F. (2024). International Scientific Report on
the Safety of Advanced AI.
[Link]
s/international-scientific-report-on-the-safet
y-of-advanced-ai
📃 Document 50 🔗 Return to TOC
AI risk categorization decoded (AIR 2024): From
government regulations to corporate policies.
Zeng, Y., Klyman, K., Zhou, A., Yang, Y., Pan,
M., Jia, R., Song, D., Liang, P., & Li, B. (2024). AI
risk categorization decoded (AIR 2024): From
government regulations to corporate policies.
In arXiv [[Link]]. arXiv.
[Link]
📃 Document 51 🔗 Return to TOC
AGI Safety Literature Review
1. Value specification
2. Reliability
3. Corrigibility
4. Security
5. Safe learning
6. Intelligibility
7. Societal consequences
8. Subagents
9. Malign belief distributions
10. Physicalistic decision-making
11. Multi-agent systems
12. Meta-cognition Everitt, T., Lea, G., & Hutter, M. (2018). AGI
Safety Literature Review. In arXiv [[Link]]. arXiv.
[Link]
📃 Document 52 🔗 Return to TOC
Governing General Purpose AI: A Comprehensive
Map of Unreliability, Misuse and Systemic Risks
1. Risks from unreliability
a. Discrimination and stereotype reduction
b. Misinformation and privacy violations
c. Accidents
2. Misuse risks
a. Cybercrime
b. Biosecurity threats
c. Politically motivated misuse
3. Systemic risks
a. Economic power centralisation and inequality
b. Ideological homogenization from value embedding
c. Disruptions from outpaced societal adaptation Maham, P., & Küspert, S. (2023). Governing
General Purpose AI: A Comprehensive Map of
Unreliability, Misuse and Systemic Risks.
Stiftung Neue Verantwortung.
[Link]
erning-general-purpose-ai-comprehensive-map-
unreliability-misuse-and-systemic-risks
📃 Document 53 🔗 Return to TOC
Advancing AI Governance: A Literature Review of
Problems, Options, and Proposals
3. Direct catastrophe from AI
1. Alignment failures in existing ML systems
a. Existential disaster because of misaligned superintelligence or
a. Faulty reward functions in the wild
power-seeking AI
b. Specification gaming
b. Gradual, irretrievable ceding of human power over the future to AI systems
c. Reward model overoptimization
c. Extreme “suffering risks” because of a misaligned system
d. Instrumental convergence
d. Existential disaster because of conflict between AI systems and
e. Goal misgeneralization
multi-system interactions
f. Inner misalignment
e. Dystopian trajectory lock-in because of misuse of advanced AI to
g. Language model misalignment establish and/or maintain totalitarian regimes;
h. Harms from increasingly agentic algorithmic systems f. Failures in or misuse of intermediary (non-AGI) AI systems, resulting in
2. Dangerous capabilities in AI systems catastrophe
a. Situational awareness 4. Indirect AI contributions to existential risks
b. Acquisition of a goal to harm society a. Destabilising political impacts from AI systems
c. Acquisition of goals to seek power and control b. Hazardous malicious uses
d. Self-improvement c. Impacts on “epistemic security” and the information environment
e. Autonomous replication d. Erosion of international law and global governance architectures;
f. Anonymous resource acquisition e. Other diffuse societal harms
g. Deception
Maas, M. M. (2023). Advanced AI governance:
A literature review of problems, options, and
proposals. Institute for Law & AI.
[Link]
📃 Document 54 🔗 Return to TOC
Ten Hard Problems in Artificial Intelligence We Must
Get Right
1. Negative impacts of AI use
a. Under-recognized work
b. Environmental cost
c. Discrimination, toxicity, and bias
d. Privacy
e. Security
2. Harms caused by incompetent systems
3. Harms caused by unaligned competent systems
a. Specification gaming
b. Emergent goals
c. Deceptive alignment
4. Within-country issues: domestic inequality
a. Demographic diversity of researchers
b. Privatization of AI Leech, G., Garfinkel, S., Yagudin, M., Briand, A.,
& Zhuravlev, A. (2024). Ten hard problems in
5. Between-country issues: global inequality artificial intelligence we must get right. In arXiv
[[Link]]. arXiv. [Link]
📃 Document 55 🔗 Return to TOC
A Survey of the Potential Long-term Impacts of AI: How AI
Could Lead to Long-term Changes in Science, Cooperation,
Power, Epistemics and Values
1. Risks from accelerating scientific progress
a. Eased development of technologies that make a global catastrophe more likely
b. Faster scientific progress makes it harder for governance to keep pace with development
2. Worsened conflict
a. AI enables development of weapons of mass destruction
b. AI enables automation of military decision-making
c. AI-induced strategic instability
d. Resource conflicts driven by AI development
3. Increased power concentration and inequality
a. Unequal distribution of harms and benefits
b. AI-based automation increases income inequality
c. Developments in AI enable actors to undermine democratic processes
4. Worsened epistemic processes for society
a. AI contributes to increased online polarisation
b. AI is used to scale up production of false and misleading information
c. AI's persuasive capabilities are misused to gain influence and promote harmful ideologies
d. Widespread use of persuasive tools contributes to splintered epistemic communities
e. Reduced decision-making capacity as a result of decreased trust in information Clarke, S., & Whittlestone, J. (2022). A survey
5. AI leads to humans losing control of the future of the potential long-term impacts of AI. In arXiv
a. Risks from AIs developing goals and values that are different from humans ‘ [[Link]]. arXiv.
b. Risks from delegating decision-making power to misaligned AIs [Link]
📃 Document 56 🔗 Return to TOC
Future Risks of Frontier AI
1. Discrimination
2. Inequality
3. Environmental impacts
4. Amplification of biases
5. Harmful responses
6. Lack of transparency and interpretability
7. Intellectual property rights
8. Providing new capabilities to a malicious actor
9. Misapplication by a non-malicious actor
10. Poor performance of a model used for its intended purpose, for example leading to biased decisions
11. Unintended outcomes from interactions with other AI systems
12. Impacts resulting from interactions with external societal, political, and economic systems
13. Loss of human control and oversight, with an autonomous model then taking harmful actions
14. Overreliance on AI systems, which cannot be subsequently unpicked
15. Societal concerns around AI reduce the realisation of potential benefits
16. Misalignment
17. Single point of failure
18. Overreliance
19. Capabilities that increase the likelihood of existential risk
a. Agency and autonomy
b. The ability to evade shut down or human oversight, including self-replication and ability to move its Government Office for Science (UK). (2023).
own code between digital locations. Future Risks of Frontier AI. Government Office
c. The ability to cooperate with other highly capable AI systems for Science.
d. Situational awareness, for instance if this causes a model to act differently in training compared to [Link]
deployment, meaning harmful characteristics are missed 653bc393d10f3500139a6ac5/future-risks-of-fro
e. Self-improvement [Link]
📃 Document 57 🔗 Return to TOC
AILUMINATE: Introducing v1.0 of the AI Risk and Reliability
Benchmark from MLCommons
Ghosh, S., Frase, H., Williams, A., Luger, S.,
Röttger, P., Barez, F., McGregor, S., Fricklas, K.,
Kumar, M., Feuillade--Montixi, Q., Bollacker, K.,
Friedrich, F., Tsang, R., Vidgen, B., Parrish, A.,
Knotz, C., Presani, E., Bennion, J., Boston, M. F.,
... Vanschoren, J. (2025). AILUMINATE:
Introducing v1.0 of the AI Risk and Reliability
Benchmark from MLCommons. In arXiv [[Link]].
arXiv. [Link]
📃 Document 58 🔗 Return to TOC
A Collaborative, Human-Centred Taxonomy of AI,
Algorithmic, and Automation Harms
Abercrombie, G., Benbouzid, D., Giudici, P.,
Golpayegani, D., Hernandez, J., Noro, P., Pandit,
H., Paraschou, E., Pownall, C., Prajapati, J.,
Sayre, M. A., Sengupta, U., Suriyawongkul, A.,
Thelot, R., Vei, S., & Waltersdorfer, L. (2024). A
collaborative, human-centred taxonomy of AI,
algorithmic, and automation harms. In arXiv
[[Link]]. arXiv. [Link]
📃 Document 59 🔗 Return to TOC
AI Hazard Management: A Framework for the Systematic
Management of Root Causes for AI Risks
AIH 1: Inadequate specification of ODD
AIH 2: Inappropriate degree of automation
AIH 3: Inadequate planning of performance requirements
AIH 4: Insufficient AI development documentation
AIH 5: Inappropriate degree of transparency to end users
AIH 6: Missing requirements for the implemented hardware
AIH 7: Choice of untrustworthy data source
AIH 8: Lack of data understanding
AIH 9: Discriminative data bias
AIH 10: Harming users’ data privacy
AIH 11: Incorrect data labels
AIH 12: Data poisoning
AIH 13: Insufficient data representation
AIH 14: Problems of synthetic data
AIH 15: Inappropriate data splitting
AIH 16: Poor model design choices
AIH 17: Over- and underfitting
AIH 18: Lack of explainability Schnitzer, R., Hapfelmeier, A., Gaube, S., &
AIH 19: Unreliability in corner cases Zillner, S. (2023). AI Hazard Management: A
AIH 20: Lack of robustness framework for the systematic management of
AIH 21: Uncertainty concerns root causes for AI risks. In arXiv [[Link]]. arXiv.
AIH 22: Operational data issues [Link]
AIH 23: Data drift
AIH 24: Concept drift
📃 Document 60 🔗 Return to TOC
International Scientific Report on the Safety of Advanced AI
Bengio, Y., Mindermann, S., Privitera, D., et
al. (2025). International Scientific Report on the
Safety of Advanced AI.
[Link]
international-scientific-report-on-the-safety-o
f-advanced-ai |
[Link]
📃 Document 61 🔗 Return to TOC
A Taxonomy of Systemic Risks from General-Purpose AI
1. Control: The risk of AI models and systems acting against human interests due to misalignment, loss of
control, or rogue AI scenarios.
2. Democracy: The erosion of democratic processes and public trust in social/political institutions.
3. Discrimination: The creation, perpetuation or exacerbation of inequalities and biases at a large-scale.
4. Economy: Economic disruptions ranging from large impacts on the labor market to broader economic
changes that could lead to exacerbated wealth inequality, instability in the financial system, labor
exploitation or other economic dimensions.
5. Environment: The impact of AI on the environment, including risks related to climate change and pollution.
6. Fundamental rights: The large-scale erosion or violation of fundamental human rights and freedoms.
7. Governance: The complex and rapidly evolving nature of AI makes them inherently difficult to govern
effectively, leading to systemic regulatory and oversight failures.
8. Harms to non-humans: Large-scale harms to animals and the development of AI capable of suffering.
9. Information: Large-scale influence on communication and information systems, and epistemic processes
more generally.
10. Irreversible change: Profound negative long-term changes to social structures, cultural norms, and human
relationships that may be difficult or impossible to reverse.
11. Power: The concentration of military, economic, or political power of entities in possession or control ofAIUuk, R., Gutierrez, C. I., Guppy, D., Lauwaert,
or AI-enabled technologies. L., Kasirzadeh, A., Velasco, L., Slattery, P., &
12. Security: The international and national security threats, including cyber warfare, arms races, and Prunkl, C. (2025). A taxonomy of systemic risks
from general-purpose AI. In arXiv [[Link]]. arXiv.
geopolitical instability.
[Link]
13. Warfare: The dangers of AI amplifying the effectiveness/failures of nuclear, chemical, biological, and
radiological weapons.
📃 Document 62 🔗 Return to TOC
Risk Sources and Risk Management Measures in Support of 1. Impacts of AI
a. General
i. High-impact misuses and abuses beyond original purpose
Standards for General-Purpose AI Systems b.
ii.
iii.
Physical impacts
Democratizing access to dual-use technologies
Competitive pressures in GPAI product release
i. Damage to critical infrastructure
1. Model Development ii. AI-based tools attacking critical infrastructure
a. Data-related 1. Attacks on GPAIs/GPAI Failure Modes iii. Critical infrastructure component failures when integrated with AI systems
i. Difficulty filtering large web scrapes or large scale web datasets
ii. Lack of cross-organisational documentation
a. Jailbreak of model to subvert intended behaviour iv. AI systems interacting with brittle environments
iii. Manipulation of data by non-domain experts b. Jailbreak of a multimodal model c. Societal impacts
iv. Insufficient quality control in data collection process c. Transferable adversarial attacks from open to closed-source models i. AI-generated advice influencing user moral judgements
b. Training-related
i. Adversarial examples
d. Backdoors or trojan attacks in GPAI models ii. Overreliance on AI system undermining user autonomy
ii. Robust overfitting in adversarial training e. Text encoding-based attacks iii. Automatically generating disinformation at scale
iii. Robustness certificates can be exploited to attack the models f. Vulnerabilities arising from additional modalities in multimodal models iv. AI-driven highly personalised advertisement
iv. Poor model confidence calibration
g. Vulnerabilities to jailbreaks exploiting long context windows (many-shot jailbreaking) v. Generative AI use in political influence campaigns
c. Fine-tuning related
i. Ease of reconfiguring GPAI models h. Models distracted by irrelevant context vi. Generation of illegal or harmful content
ii. Unexpected competence in fine-tuned versions of the upstream model i. Knowledge conflicts in retrieval-augmented LLMs vii. Unintentional generation of harmful content
iii. Harmful fine-tuning of open-weights models
j. Lack of understanding of in-context learning in language models viii. Multimodal deepfakes
iv. Fine-tuning dataset poisoning
v. Poisoning models during instruction tuning k. Model sensitivity to prompt formatting ix. Generation of personalised content for harassment, extortion, or intimidation
vi. Excessive or overly restrictive safety-tuning l. Misuse of model by user-performed persuasion x. Misuse for surveillance and population control
vii. Degrading safety training due to benign fine-tuning
viii. Catastrophic forgetting due to continual instruction fine-tuning 2. Agency xi. Systemic large-scale manipulation
a. Goal-directedness xii. Diminishing societal trust due to disinformation or manipulation
2. Model Evaluations
a. General evaluations i. Specification gaming xiii. Personalised disinformation
i. Incorrect outputs of GPAI evaluating other AI models
ii. Reward or measurement tampering xiv. GPAI assisted impersonation
ii. Limited coverage of capabilities evaluations d. Financial impacts
iii. Difficulty of identification and measurement capabilities iii. Specification gaming generalising to reward tampering
i. Deployment of GPAI agents in finance
iv. Self-preference bias in AI models iv. Goal misgeneralisation
v. Inaccurate measurement of model encoded human values ii. Financial instability due to model homogeneity
b. Deception
vi. Biased evaluations of encoded human values iii. Use of alternative financial data via AI
vii. AI outputs for which evaluation is too difficult for humans i. Deceptive behaviour
e. Cyberattacks
b. Benchmarking ii. Deceptive behaviour for game-theoretical reasons
i. Benchmark leakage or data contamination i. Automated discovery and exploitation of software systems
iii. Deceptive behaviour because of an incorrect world model
ii. Raw data contamination ii. Amplification of cyberattacks
iii. Cross-lingual data contamination iv. Deceptive behavior leading to unauthorized actions
iii. AI-driven spear phishing attacks
iv. Guideline contamination c. Situational awareness
iv. Models generating code with security vulnerabilities
v. Annotation contamination i. Situational awareness in AI systems
vi. Post-deployment contamination f. Weapons
c. Benchmark inaccuracy
ii. Strategic underperformance on model evaluations
i. Misuse of AI systems to assist in the creation of weapons
i. Benchmarks may not accurately evaluate capabilities d. Self-proliferation
ii. Misuse of drug discovery models
ii. Benchmark saturation e. Persuasion
d. Benchmark limitations g. Bias
i. Insufficient benchmarks for AI safety evaluation
i. Persuasive capabilities i. Homogenization or correlated failures in model derivatives
ii. Underestimating capabilities that are not covered by benchmarks 3. Deployment ii. Gipiškis,
Reporting R.,answers
of user-preferred Joaquin, A. S.,
instead of correct Chin, Z. S.,
answers
3. Auditing a. Model release iii. Biases in AI-based content moderation algorithms
a.
b.
Conflicts of interest in auditor selection
Auditor capacity mismatch i. Non-decomissionability of models with open weights iv. Regenfuß,
Systemic bias across A., Gil,
specific A., & Holtman, K. (2024). Risk
communities
4.
c. Auditor failure
Interpretability/Explainability
4. Cybersecurity v.
vi.
sources and risk management measures in
Unintentional bias amplification
Long-term effects of AI model biases on user judgement
a. Interconnectivity with malicious external tools
a.
b.
Misuse of interpretability techniques
Misunderstanding or overestimating the results and scope of interpretability techniques b. Unintended outbound communication by AI systems h. Privacy support of standards for general-purpose AI
c. Adversarial attacks targeting explainable AI techniques c. AI system bypassing a sandbox environment i. Decision-making on inferred private data
d. Biases are not accurately reflected in explanations d. Model weight leak i. Environment systems. In arXiv [[Link]]. arXiv.
e. Model outputs inconsistent with chain-of-thought reasoning i. High energy consumption of large models
f. Encoded reasoning
j. [Link]
📃 Document 63 🔗 Return to TOC
Multi-Agent Risks from Advanced AI
Failure Modes Risk Factors
1. Miscoordination 1. Information Asymmetries
a. Incompatible strategies
2. Network Effects
b. Credit assignment 3. Selection Pressures
c. Limited interactions 4. Destabilising Dynamics
2. Conflict 5. Commitment and Trust
a. Social Dilemmas
6. Emergent Agency
b. Military Domains Hammond, L., Chan, A., Clifton, J.,
c. Coercion and Extortion 7. Multi-Agent Security Hoelscher-Obermaier, J., Khan, A., McLean, E.,
Smith, C., Barfuss, W., Foerster, J., Gavenčiak,
3. Collusion T., Han, T. A., Hughes, E., Kovařík, V., Kulveit, J.,
Leibo, J. Z., Oesterheld, C., de Witt, C. S., Shah,
a. Markets N., Wellman, M., ... Rahwan, I. (2025).
Multi-Agent Risks from Advanced AI. In arXiv
b. Steganography [[Link]]. arXiv.
[Link]
📃 Document 64 🔗 Return to TOC
Generative AI Misuse: A Taxonomy of Tactics and Insights
from Real-World Data
Marchal, N., Xu, R., Elasmar, R., Gabriel, I.,
Goldberg, B., & Isaac, W. (2024). Generative AI
misuse: A taxonomy of tactics and insights
from real-world data. In arXiv [[Link]]. arXiv.
[Link]
📃 Document 65 🔗 Return to TOC
AI Risk Atlas
1. Output risks
a. Misuse
1. Training Data Risks i. Non-disclosure
ii. Improper usage
a. Transparency
iii. Spreading toxicity
i. Lack of training data transparency iv. Dangerous use
ii. Uncertain data provenance v. Nonconsensual use
b. Data Laws vi. Spreading disinformation
i. Data usage restrictions b. Value alignment
ii. Data acquisition restrictions i. Incomplete advice
iii. Data transfer restrictions ii. Harmful code generation
c. Privacy iii. Over- or under-reliance
i. Personal information in data iv. Toxic output
v. Harmful output
ii. Data privacy rights alignment
c. Intellectual property
iii. Re Identification i. Copyright infringement
d. Fairness ii. Revealing confidential information
i. Data Bias d. Explainability
e. Intellectual Property i. Inaccessible training data
i. Data usage rights restrictions ii. Untraceable attribution
ii. Confidential information in data iii. Unexplainable output
f. Accuracy iv. Unreliable source attribution
i. Data contamination e. Robustness
i. Hallucination
ii. Unrepresentative data
f. Fairness
g. Value Alignment i. Output bias
i. Improper data curation ii. Decision bias
ii. Improper retraining g. Privacy
h. Robustness i. Exposing personal information
i. Data poisoning 2. Non-technical risks
2. Inference Risks a. Legal compliance
a. Robustness i. Model usage rights restrictions
ii. Legal accountability
i. Prompt injection attack
iii. Generated content ownership and IP
ii. Extraction attack b. Governance
iii. Evasion attack i. Lack of system transparency
iv. Prompt leaking ii. Unrepresentative risk testing
b. Multi-category iii. Incomplete usage definition
i. Jailbreaking iv. Lack of data transparency
ii. Prompt priming v. Incorrect risk testing
c. Privacy vi. Lack of model transparency
i. Membership inference attack vii. Lack of testing diversity
c. Societal impact
ii. Attribute inference attack
i. Impact on cultural diversity
iii. Personal information in prompt ii. Impact on education: plagiarism
d. Intellectual Property
i. Confidential data in prompt
iii.
iv.
Impact on Jobs
Impact on affected communities
IBM. (2025). AI Risk Atlas.
e. Accuracy
ii. IP information in prompt v.
vi.
Impact on education: bypassing learning
Impact on the environment
[Link]
i. Poor model accuracy vii.
viii.
Human exploitation
Impact on human agency
opic=ai-risk-atlas