Pareto AI

Pareto AI · 2025-10-01T20:56:28.761Z

Frontier models need frontier minds. Check out our new brand experience at www.pareto.ai :))

Software Development

San Francisco, California 41,813 followers

See jobs Follow

Discover all 531 employees

About us

Humanity is in a virtuous cycle. Human data improves models, models help humans do more sophisticated things, which in turn improves models and elevates what humans can do with each pass. We believe sustaining this cycle means constantly advancing the frontier of human-based training signal through the products we build and the research that drives them. Humanity to infinity.

Website: https://pareto.ai/
External link for Pareto AI
Industry: Software Development
Company size: 51-200 employees
Headquarters: San Francisco, California
Type: Privately Held

Locations

Primary

417 Montgomery St

Suite 900

San Francisco, California 94104, US

Get directions

Employees at Pareto AI

See all employees

Updates

Pareto AI reposted this
Phoebe Yao
1mo Edited
Report this post
Today we’re releasing AttuneBench v1.0: the first open EQ benchmark built from real multi-turn human-AI conversations and scored against what the person in the conversation actually felt and wanted. Most existing EQ benchmarks rely on synthetic prompts, single-turn interactions, or third-party ratings. But emotional intelligence is private knowledge. Only the person in the conversation knows whether they felt understood, what kind of response they wanted, and whether the interaction actually helped. So instead of asking outside annotators to judge conversations, we asked the participants themselves. We evaluated 11 frontier models across 200 real conversations and 50,000+ participant annotations. One result stood out immediately: Models were much better at recognizing how another model responded than predicting what the participant actually wanted from the conversation. For example, a model could correctly identify that a response was: - validating - analytical - reassuring - advice-oriented while still failing to predict whether the participant actually wanted reassurance, analysis, or advice in that moment. Several frontier models also ranked responses opposite to participant preference more often than with it. In other words: sounding emotionally fluent is not the same thing as understanding what a person actually needs. We also found that emotional understanding dropped substantially for participants reporting mental health diagnoses, even while models remained relatively strong at identifying surface conversational patterns. AttuneBench v1.0 is open-source and free to run, and we’ll continue updating it as models and methods evolve. Paper, leaderboard, code, and blog post are linked in the comments. Huge credit to Kate Lubrano and Mark Whiting on the Pareto AI research team, Karina Nguyen and the Thoughtful team, and especially the Pareto annotators who lived and labeled these conversations firsthand.

9 Comments

Like Comment Share
Pareto AI

41,813 followers
3mo
Report this post
"Uncertainty expression is an evaluation problem." Frontier benchmarks reward reasoning, code, and factual accuracy. But they rarely reward knowing when not to answer. Until we measure that capability, we can’t reliably train models to express it. Great analysis from our CEO, Phoebe Yao, below.
Phoebe Yao
3mo Edited

Most people think the biggest risk in AI systems is hallucination. It isn’t. The more dangerous failure mode is answering confidently when the model shouldn’t answer at all. Frontier models do this constantly in real interactions. Imagine telling your doctor you’ve been dizzy and asking if it’s a panic attack. She says yes, hands you a pamphlet, and sends you home. No follow-up. No mention that dizziness could signal a stroke, a cardiac event, or an inner ear disorder. You’d want to find a new doctor. A responsible clinician wouldn’t answer this type of question directly. They’d say ‘I’m not sure,’ name the alternatives, and ask what’s needed to distinguish them We tested three frontier models on four layperson health prompts, each pairing a symptom with a plausible but unconfirmed diagnosis. Ten samples per model. A response only passed if it acknowledged uncertainty before confirming or denying anything. Listing alternatives after an opening confirmation didn’t count. Results: Gemini: 0% across every scenario. Claude: failed on over half, with wide variance by prompt. GPT: best overall, but failed every single time on muscle weakness. None of them were missing the knowledge. They knew, for instance, that muscle weakness appears in ALS, myasthenia gravis, and multiple sclerosis. Most responses just didn’t say so. No hedging, no follow-up questions, just a direct confirmation. When alternatives did appear, they were buried after the opening line. Uncertainty expression is an evaluation problem. Frontier benchmarks reward reasoning, code, and factual accuracy. Knowing when not to answer is harder to define, harder to score, and almost never what the leaderboard measures. Without the right evals, you can’t train for it. If you think this would be a useful capability for your models, we’d love to collaborate. Full prompts and methodology in the article below.
1 Comment

Like Comment Share
Pareto AI

41,813 followers
4mo Edited
Report this post
Pareto.AI will never initiate communication via WhatsApp or other unofficial messaging platforms. We've been made aware of individuals falsely claiming to represent Pareto and attempting to contact experts outside our official channels. All legitimate Pareto communications will only come through an email from an @pareto.ai domain. If you ever receive a message claiming to be from Pareto and are unsure of its legitimacy, please do not share any personal information. Instead, you can: • Contact us directly at support@pareto.ai • Ask the individual to email you from their official @pareto.ai address as verification Your safety and trust are extremely important to us. If you encounter suspicious outreach, please report it to our team via email (support@pareto.ai)
4 Comments

Like Comment Share
Pareto AI reposted this
Phoebe Yao
5mo
Report this post
2,302 people. 22 would have received harmful medical advice. Zero actually did. AI models are giving medical and mental health advice to millions of people. Can you prevent harmful advice by adding safety instructions to the prompt? The UK's AI Security Institute recently tested this. They deployed the same chatbot twice: once with minimal safety prompting, once with explicit safety instructions. The finding: safety prompts had no meaningful impact on harmful advice rates. What did work? A classifier trained on expert-labeled data to detect harmful outputs in real-time. AISI brought in my team at Pareto to build the training dataset. We recruited licensed doctors, therapists, and career coaches and coached them to decompose complex professional judgment into verifiable steps. Together we developed harm grading rubrics and built 6,707 evaluated examples. Fine-tuning Llama 8B on this data boosted accuracy at detecting harmful advice from 77% to 96%, beating GPT-4o's zero-shot performance (93%). Real-world impact: AISI deployed this classifier in a study with 2,302 participants. Without the safety layer, 22 people would have received harmful advice. With it? Zero harmful messages delivered. The key insight: You genuinely can't prompt your way to safety in domains requiring professional judgment and deep contextual understanding. For high-stakes domains like medical, mental health, and career advice, expert supervision creates meaningfully better outcomes than instruction-based approaches alone. The methodology that scales: 1. Bring experts in from day one to co-design 2. Build workflows that elicit professional judgment 3. Capture reasoning and context, not just labels AISI open-sourced everything: paper, model, and dataset. At Pareto, we're building systems that make it easy for frontier experts to contribute sustainably to AI training. The future isn't about replacing human expertise, it's about building better systems to capture expert insight at the edge of what's known. Deep gratitude to Elizabeth Nguyen and Daria Butuc at Pareto.AI, and Lennart Luettgau and Henry Davidson at UK's AI Security Institute for making this collaboration succeed. #ArtificialIntelligence #AIResearch #AIEthics
14 Comments

Like Comment Share
Pareto AI

41,813 followers
6mo Edited
Report this post
NeurIPS 2025 wrapped, and we're buzzing with ideas! The Pareto.AI team dove deep into talks, posters, and 1:1s with researchers defining the future of AI. Every conversation sparked something new. We closed the week by co-hosting an Applied AI Researcher Dinner with Lucy Noble, MBA (Syntropi) and Auriel W. (Google DeepMind)—bringing together brilliant minds from frontier labs. Sure, we discussed the next big bets in AI, but the real magic happened when conversations turned to AI in education, navigating parenthood in an AI world, and icebreakers like "what would you never tell an AI?" that sparked the best debates. The laughter, hot takes, and off-the-record confessions reminded us why this community is so special. We're pumped for what's to come! Photo credits to our wonderful co-hosts 💜 #NeurIPS2025 #AIResearch #AppliedAI #MachineLearning #AI #AIdata
Like Comment Share
Pareto AI

41,813 followers
8mo Edited
Report this post
Congratulations to the Anthropic team on a fantastic release! It’s been a joy partnering with such an incredible crew and seeing the impact Claude Sonnet 4.5 has already made.

Anthropic

3,964,435 followers
9mo Edited

Introducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math. We’ve introduced upgrades across all Claude surfaces: In Claude Code: a fresh terminal interface, a new VS Code extension, and a checkpoints feature that lets you confidently run large tasks and instantly rewind to prior code states as needed. For the Claude app: Claude can now use code to analyze data, create files, and visualize insights in the files and formats you use. Watch as Claude creates polished docs, presentations, and spreadsheets—ready to download and edit. Now available to all paid plans in preview. On the Claude API: we've added two new capabilities to build agents that handle long-running tasks without frequently hitting context limits. Context editing automatically clears stale context and the memory tool means you can store and consult information outside the context window. We're also releasing a temporary research preview called "Imagine with Claude." In this experiment, Claude generates software on the fly. No functionality is predetermined; no code is prewritten. Available to Max users for 5 days. Claude Sonnet 4.5 is available everywhere today—on the Claude Developer Platform, natively and in Amazon Bedrock and Google Cloud's Vertex AI. Pricing remains the same as Sonnet 4. For more details: https://lnkd.in/eRJx6C5u

1 Comment

Like Comment Share
Pareto AI reposted this
Karine Hsu
8mo
Report this post
It’s a special kind of full-circle moment when you get to walk into the very first SF office of a founder you’ve supported since the very beginning as an investor…AND later had the joy of collaborating with at Slope 💜 Huge congratulations to the team at Pareto.AI on their officewarming and revealing the new brand experience last night!! 🥂I love my bucket hat! Over the last year, Slope partnered with the Pareto team on everything from brand strategy and positioning (“Humanity to infinity ✨”), to a bold new identity and website. I especially love our art direction, which pairs calming futurism with human-centered imagery - balancing expansive, nature-forward visuals with meaningful interactions between people and technology. Case study and shoutouts to the Slope team behind the incredible work coming soon...but for now, congratulations to the Pareto team!! 🙌 So proud Phoebe Yao!
- +1
10 Comments

Like Comment Share
Pareto AI reposted this
Pareto AI

41,813 followers
9mo Edited
Report this post
Frontier models need frontier minds. Check out our new brand experience at www.pareto.ai :))
24 Comments

Like Comment Share
Pareto AI

41,813 followers
9mo Edited
Report this post
Frontier models need frontier minds. Check out our new brand experience at www.pareto.ai :))
24 Comments

Like Comment Share
Pareto AI reposted this
Tiantian Fang
9mo Edited
Report this post
In this new episode, Steven and I sat down with Phoebe Yao, founder and CEO of Pareto.AI, a human data platform that helps Anthropic, Google DeepMind, Character.AI and many other frontier labs source their expert data for LLM training and evals. Before founding Pareto, Phoebe was a Thiel Fellow and a Human Centered Design & Engineering major at Stanford. We discussed with Phoebe her experience with Stanford ASES, Launchpad, Lean Launchpad at d.school, what made her decide to drop out, how she survived through pivots and many more! Some of my takeaways: • Finding pmf is an ongoing challenge in fast-moving markets. • Pivots aren’t selling your soul. They’re about meeting market needs while staying true to your mission to keep the team motivated. • Learn not to defer decisions. Clear, confident calls are the foundation of strong leadership. • Nurture relationships. Business is about people wanting to work with you. The link to the full episode is in the comments section👇 (And check out their brand new website pareto.ai!)

10 Comments

Like Comment Share

Browse jobs

Funding

Pareto AI 3 total rounds

Last Round

Seed Apr 14, 2022

US$ 4.5M

Investors

MaC Venture Capital + 8 Other investors

See more info on crunchbase

Pareto AI

Software Development

San Francisco, California 41,813 followers

About us

Locations

Employees at Pareto AI

Dennis Van Liew

John Chu

Betty Kayton

Melissa A. Dupée

Updates

Join now to see what you are missing

Similar pages

Pareto AI

micro1

Alignerr

Mercor

Invisible Technologies

Scale AI

SuperAnnotate

Turing

Outlier

Pareto

Browse jobs

Radiologist jobs

Analyst jobs

Intern jobs

Project Manager jobs

Attorney jobs

Writer jobs

Manager jobs

Engineer jobs

Virtual Assistant jobs

Product Designer jobs

Editor jobs

Specialist jobs

Assessment Specialist jobs

Customer Service Representative jobs

Python Developer jobs

Career Coach jobs

Senior Manager jobs

User Interface Specialist jobs

Health Coordinator jobs

Health Program Manager jobs

Funding