Hans van Dam’s Post

AI architect Apps | Generative AI | PhD | LLM and UX expert

7mo Edited

🚀 Llama 3.1 is a game-changer! I had to blink my eyes today when I ran some tool-calling experiments (in Dutch): the 70B version performed perfectly on tough cases, just like GPT-4o. And Officially Llama 3.1 does not even speak Dutch. I ran it on a double RTX 4090 GPU for 80ct/h (20 euros a day)! ‘Tool calling’ is the glue between human speech and the system executing the spoken request. So far self-hosted LLMs were not strong enough to do this reliably even after fine-tuning, so an external LLM like OpenAI GPT-4 was always needed. No more! Now banks, insurance companies, and governments who are afraid of privacy and security issues with external LLMs can run a high-quality model on 4000 euros of graphics card! The limit is gone! Last but not least, the 8B version (which you can host for 15ct an hour) is also really good at tool-calling. It has minor glitches, but if your use case is not too demanding it is a blazingly fast-responding little beast.

6 Comments

Hans van Dam

AI architect Apps | Generative AI | PhD | LLM and UX expert

6mo

Unfortunately, it is not as impressive as I thought at first glance. After trying more examples, it is not as easily steerable as GPT-4o and understands less complex instructions for Dutch examples. For English, it is much better (amazing actually and better than anything before), but still not as good as I thought. 31-7-2024: After a couple of days I noticed that function calling works fairly well in Dutch too, but the functions need to be in the same language as the user-messages to work properly. Then, even the 8B model seems good enough for my current purposes; and it is sooo fast.

1 Reaction

Pascal Mariany

7mo

Yes sir! That’s what I’m offering now for a small fee: https://onderwijsorakel.nl/zorgen-over-privacy-omdat-ai-in-de-cloud-draait-de-ai-act-lijkt-alarmerend-ik-neem-de-zorgen-graag-weg-%f0%9f%98%8e/ Let us do this together 😎

2 Reactions

Naveen Yannam

Freelance Data/Generative AI Engineer | Azure 5x Certified | Databricks 3x Certified | Python Certified

5mo

Nice insights Hans van Dam.

See more comments

To view or add a comment, sign in

More Relevant Posts

Venkatarangan Thirumalai Nallan Chakravarthy

Keynote Speaker on Generative AI
10mo
Report this post
In this two-minute demonstration, I'll be showing the Google Gemma, Microsoft Phi3, and Meta LLAMA3 models, all running locally on my personal computer. You will be able to observe the impressive speed at which these models operate on my system, and see how this performance compares to that of ChatGPT (GPT4). My PC setup includes an AMD Ryzen 7 5800 processor, 32GB of RAM, and an NVIDIA GeForce 3060 with 12GB of GPU RAM. For software, I'm utilizing Ollama, Docker, and Open WebUI. #gemma #phi3 #llama3 #openwebui #ollama
Like Comment
To view or add a comment, sign in
Eoin Tolster

Innovation director
9mo
Report this post
WE NEED TO STOP OpenAI Matthew here can explain it better than I can, but in summary OpenAI should not be allowed to push a regulation on our graphics cards. They are trying to do Regulatory capture which will kill open source. Imagine having to pay a fee or sign up to government or company to use your own Ai model on your own PC. https://lnkd.in/g6FeejjA

OpenAI Wants to TRACK GPUs?! They Went Too Far With This…

https://www.youtube.com/

1 Comment
Like Comment
To view or add a comment, sign in
Jeremy Springston

AI Cluster Lead and Navy Reserve Engineering Duty Officer
8mo
Report this post
I'm going to nerd out for a moment: this GPU is fucking cool! Last week, I bought an HP Omen gaming computer (on sale, thankfully) because it had the highest Bang for your Buck ratio (yes, that is a mathematical thing I invented.) It came with an NVIDIA GeForce RTX 4060Ti and it's the first non-Google Colab GPU I have ever used. Had a hell of a time figuring out how to use it, but finally...FINALLY...set it up and Python recognized it. I built a denoising convolutional autoencoder. classifier (DCAC) using Tensorflow about a year ago, but I am an idiot and can't get my GPU to work with it now. Doing what any reasonable person would do, I switched to PyTorch and it fucking works! My DCAC was refactored for PyTorch and I ran a wide variety of experiments today. I'll post the results soon!
Like Comment
To view or add a comment, sign in
Md Abu Nasar Alam

ServiceNow Developer at Sigma Software with expertise in ServiceNow administration and application development
11mo Edited
Report this post
Nvidia just launched Chat with RTX 1) AI chatbot that runs locally on your PC. Nvidia just released "Chat with RTX", which lets you create a custom GPT chatbot trained on your own local data - including documents, videos, and more! 2) Custom GPT chatbot You can create your own custom GPT chatbot with this tool. This tool represents NVIDIA's push towards enabling developers and users to harness the power of generative AI locally on their PCs. 3) Data is the new gold Since "Chat with RTX" runs locally, without needing an internet connection. This ensures that the user's data remains private. you cn download from here https://lnkd.in/dJwuc6iq system required RAM 16GB or greater OS Windows 11 Driver 535.11 or later File Size 35 GB #code #nvidia #rtx #ai
Like Comment
To view or add a comment, sign in
Robert Tadjer

Functional Consultant & Project Manager in D365 CE
9mo
Report this post
“ChatRTX is a demo app that lets you personalize a GPT large language model (LLM) connected to your own content—docs, notes, images, or other data. Leveraging retrieval-augmented generation (RAG), TensorRT-LLM, and RTX acceleration, you can query a custom chatbot to quickly get contextually relevant answers. And because it all runs locally on your Windows RTX PC or workstation, you’ll get fast and secure results.” The GPU (Graphics Processing Unit) is certainly not for everyone: NVIDIA GeForce™ RTX 30 or 40 Series GPU or NVIDIA RTX™ Ampere or Ada Generation GPU with at least 8GB of VRAM It installs a web server and Python instance on your PC, which then leverages Mistral or Llama 2, then use Nvidia’s Tensor cores on an RTX GPU to speed up queries. Read my next post as well on another locally run generative AI chatbot. #ChatRTX #GenerativAI #LLM #LocalAI #Privacy

NVIDIA ChatRTX

nvidia.com
Like Comment
To view or add a comment, sign in
Salamun Fajri

Software Engineer
7mo Edited
Report this post
Llama 3.1 has just been released, and it's impressive. Llama 3.1 405b matches much larger models, the 70b outperforms GPT-4o, and the 8b almost matches GPT-3.5. Llama 3.1 8b outshines the model that powered ChatGPT in late 2022 and can run on my laptop. And the 405b model? Sure, you can run it on your laptop too. You just need 231 GB of disk space. No biggie, right? 😅 I tried to run the 405b on Ollama Docker on my Dell 5530 Ubuntu laptop with Nvidia GPU. Given the hardware limitations, I'm not sure this will work well, and just like in the picture, it's still downloading...
Like Comment
To view or add a comment, sign in
Feras Alsayigh

Data Scientist Spacialist at Wakeb | SDAIA T5 | Data Science | Artificial Intelligence | NLP | LLMs | CompTIA Data+
9mo Edited
Report this post
NVIDIA released a demo of a RAG Model and I found the deployment of it is so fascinating since chat with RTX uses retrieval-augmented generation (RAG), NVIDIA TensorRT-LLM software and NVIDIA RTX acceleration to bring generative AI capabilities to local, GeForce-powered Windows PCs. Users can quickly, easily connect local files on a PC as a dataset to an open-source large language model like Mistral or Llama 2, enabling queries for quick, contextually relevant answers. #AI #LLM #RTX #NVIDIA #RAG Resource: https://lnkd.in/dMi8xiJR

1 Comment
Like Comment
To view or add a comment, sign in
Tebjan Halm

Software developer. Visual programming. Graphics programming. Real-time AI.
10mo
Report this post
StableDiffusion running at 45fps on my laptop GPU 🤯🫨 Unbelievable speed boost! 🚀 Taking the integration of #vvvv and Python to the next level, I've now enhanced the library to push the performance boundaries even further. With the same seamless object marshalling, this upgraded setup boasts a 50% speed increase, achieving a smooth 45 FPS! 😱 This video showcases a hyper-optimized #StreamDiffusion with #TensorRT, turning the PyTorch tensors directly into shader-ready textures. This minimizes overhead and maximizes the creative potential in real-time #AI art generation. Experience the power of vvvv combined with cutting-edge AI, all running on a RTX 4090 mobile GPU—comparable to a RTX 4070 Ti desktop, but now even faster than in my previous video. The future of interactive installations and creative coding looks bright! 😎 Stay tuned for more progress updates... #madewithvvvv #stablediffusion #touchdesigner #processing #creativecoding #realtimeAI

4 Comments
Like Comment
To view or add a comment, sign in
Gradio

56,691 followers
1mo Edited
Report this post
NVIDIA's text-to-image model SANA is now licensed as Apache 2.0. > Generates images up to 4096 × 4096 res. > Sana-0.6B is competitive with Flux-12B, being 20 times smaller and 100+ times faster. > Deploy Sana-0.6B on 16GB GPU, <1 second to generate a 1024 × 1024 image. SANA (Code is Apache 2.0; Model weights are custom licensed: NSCL v2-custom) enables high-definition content creation at very-low cost! Use it with their official gradio app: https://nv-sana.mit.edu/ SANA is from NVIDIA's NVLabs. Quick start guide with gradio 🤠: https://lnkd.in/gSjXFAyg
30 Comments
Like Comment
To view or add a comment, sign in
Paul C.

AI Systems Architect | Security Automation Specialist | Edge Computing Specialist
5mo Edited
Report this post
I built an XCP-ng (FOSS fork of Xen) home server with two rtx3090 GPUs in 2022. At the time, NeoX-20B was the largest GPT-style model available for local inference, and I think Tim Dettmers had either just taken over the BitsAndBytes project from Meta (or it had just been open sourced, something like that) so quantizing weights wasn't as straightforward. I don't think it was even integrated into Transformers yet. The XCP rig has served (npi) me well over the past two years. It wasn't trivial to set up (e.g. PCI passthrough for not-quite-enterprise mobos can involve kernel stuff) but it's been trivial to maintain ever since. And Xen's specialized host OS approach makes it very easy to clone, revert, snapshot, etc. which has greatly simplified all downstream tasks, including the identical start environments I use for each iteration of my ML container repo: https://lnkd.in/gvm9wCGS This Jackalope footage was generated in a minute or two using one of the rtx 3090s. Local txt2vid has already surpassed Gen-2, which I could have sworn was still really impressive earlier this year (mostly because Sora wasn't and still isn't available for public use). There's also two implementations of Pixtral used to explain the contents of images, and Phi-3.5-vision is used to describe some IKEA PDFs that are scraped using a selenium docker container searching duckduckgo. And I put one langchain RAG example that downloads, chunks, and semantically searches War & Peace for a specific passage. And there's a function-calling llama3.1 example for generating recurrence rule ("rrule") strings from natural language. But, if you just want to make lots of jackalopes, I understand that too. The perceptron (aka McCulloch–Pitts neuron, according to wikipedia) is slowly and inevitably absorbing all prior domains of media into common multimodal latent spaces. This obviously includes centuries-old franken-taxidermy prank cryptids.
Like Comment
To view or add a comment, sign in

1,036 followers

View Profile Follow

Hans van Dam’s Post

More from this author

Agent Frameworks reinvent Reactive programming

Explore topics

Hans van Dam’s Post

More Relevant Posts

OpenAI Wants to TRACK GPUs?! They Went Too Far With This…

https://www.youtube.com/

More from this author

Agent Frameworks reinvent Reactive programming

Explore topics