Building evaluation frameworks for trustworthy AI systems.
Pinned Loading
-
agent-eval-toolkit
agent-eval-toolkit PublicDecision-oriented evaluation toolkit for LLM & agent systems, focusing on trust, failure modes, and deployment readiness in enterprise environments.
Python
-
spk_balance
spk_balance PublicEvaluation-oriented MVP exploring speaking–writing feedback loops for agent and LLM communication quality.
Python
-
agent-accountability-eval
agent-accountability-eval PublicAn evidence-based system for evaluating agentic AI trustworthiness through accountability, continuous evaluation, and human-in-the-loop governance.
-
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.