Shadi Copty’s Post

View profile for Shadi Copty

Sr Director Llama Partner Engineering @ Meta | Founder @ Minorio

Evaluating my fine tuned 3B model with LLM judge - many of you have asked so here goes :-). Tl;Dr - 91% accuracy vs 47% not-fine tuned; with only 140 examples and 3 minutes of training (https://lnkd.in/gHw993ZS) I started out with creating a new synthetic test data set that the fine tuned models haven't seen, using the same code from the training set (data here: https://lnkd.in/gcCSdT5e). I used Comet's Opik per Armand's recommendation (you should follow him if you aren't :-)). Then built my evaluations test for (1) JSON schema compliance, (2) distance from reference answer (3) the LLM Judge which scores first on json format, then on compliance with schema, then on entities being detected, then on not detecting more than necessary. I tried this out with a bunch of LLMs that could run on this laptop, as you can see the 3B fine tuned one performed the best, followed by the 1B, then all of the out of the box ones pretty similar and not so great. Code for you to enjoy: https://lnkd.in/guhzdUkf You'll need a Comet account, which is shockingly free. Enjoy :-) Next I'm going to try to beef up the data gen with Distilable and CrewAI, let me know if you find that interesting or have other ideas.

  • table
Shep ⚡️ Bryan

A top follow for AI, ontology, and cross-domain explorations.

2mo

Did you test how it stands up to a larger parameter model like the 70B that helped make the synthetic data? I know the intent is to have a model you can run locally, but I’m also curious how a base 3.3 and tuned 3.3 would compare if you followed the same steps. Claude 3.5 handles my entity extraction work like a champ but obviously it’s hundreds of billions more parameters in size and more $ to use

Prakash Muralidharan

AI Builder and Partner for Insurance companies AWS-Machine Learning certified, AWS Cloud Quest: Generative AI, AWS-Associate Architect

2mo

Great going and Merry Xmas! If it's not too much, could you list out or point me to your complete stack for the desktop LLM? Just trying to get all the big pieces.

Odis Ureña

⤷ GenAI-Powered Growth Marketer | Scaling Lead Generation & Multichannel Success | Co-Founder Synthwave Solutions

2mo

Amazing results! Achieving 91% accuracy with 140 examples and 3 minutes is impressive. What’s your next step?

Like
Reply

Impressive results! It’s amazing to see how fine-tuning with just a small dataset can lead to such a big accuracy boost. The use of synthetic test data and a structured evaluation process definitely seems like a smart approach for pushing the performance of the models.

Shahid H.

Chief Operating Officer at Tilli Software | Ex - Accenture, Wipro

2mo
John Milinovich

Head of GenAI Product at Canva

2mo

This is wild! Thanks for sharing. Going to dig into this and model university soon!

Like
Reply
Alen Joses R

SDE II @Comcast | LLM | Gen AI Enthusiasts | Python | DevOps ♾️

2mo

Very informative

Like
Reply
Shahid H.

Chief Operating Officer at Tilli Software | Ex - Accenture, Wipro

2mo

Thanks for sharing Shadi Copty

Yaroslav Boiko

Senior Frontend Developer | React, Typescript

2mo

Love this 👍 thx!

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics