🚀 Llama 3.1 is a game-changer! I had to blink my eyes today when I ran some tool-calling experiments (in Dutch): the 70B version performed perfectly on tough cases, just like GPT-4o. And Officially Llama 3.1 does not even speak Dutch. I ran it on a double RTX 4090 GPU for 80ct/h (20 euros a day)! ‘Tool calling’ is the glue between human speech and the system executing the spoken request. So far self-hosted LLMs were not strong enough to do this reliably even after fine-tuning, so an external LLM like OpenAI GPT-4 was always needed. No more! Now banks, insurance companies, and governments who are afraid of privacy and security issues with external LLMs can run a high-quality model on 4000 euros of graphics card! The limit is gone! Last but not least, the 8B version (which you can host for 15ct an hour) is also really good at tool-calling. It has minor glitches, but if your use case is not too demanding it is a blazingly fast-responding little beast.
Yes sir! That’s what I’m offering now for a small fee: https://onderwijsorakel.nl/zorgen-over-privacy-omdat-ai-in-de-cloud-draait-de-ai-act-lijkt-alarmerend-ik-neem-de-zorgen-graag-weg-%f0%9f%98%8e/ Let us do this together 😎
Nice insights Hans van Dam.
AI architect Apps | Generative AI | PhD | LLM and UX expert
6moUnfortunately, it is not as impressive as I thought at first glance. After trying more examples, it is not as easily steerable as GPT-4o and understands less complex instructions for Dutch examples. For English, it is much better (amazing actually and better than anything before), but still not as good as I thought. 31-7-2024: After a couple of days I noticed that function calling works fairly well in Dutch too, but the functions need to be in the same language as the user-messages to work properly. Then, even the 8B model seems good enough for my current purposes; and it is sooo fast.