Embedded LLM

Software Development

Creator of JamAI Base: The collaborative spreadsheet where AI ideas flow, chaining cells into powerful pipelines.

Discover all 24 employees

About us

Your open-source AI ally. We specialize in integrating LLM into your business. Creator of JamAI Base.

Website: https://embeddedllm.com/
External link for Embedded LLM
Industry: Software Development
Company size: 11-50 employees
Headquarters: Singapore
Type: Privately Held
Founded: 2023
Specialties: Artificial intelligence, AI, Generative AI, LLM, Large language model, HIP, ROCm, CUDA, Enterprise AI, Autopilot, Copilot, GPT, On-Device AI, AI Consultancy, Embedded AI, Open Source, and On-Premises

Products

JamAI Base

Low-Code Development Platforms

JamAI Base: The Open-Source Firebase for AI – Let Your Database Orchestrate LLMs Fellow devs, if you're spending more time following the latest LLM papers and tinkering with LLM stacks than actually building projects, I feel your pain. Introducing JamAI Base: AI as Easy as Excel JamAI Base is the BaaS that makes AI integration as intuitive as using a spreadsheet. If you know Excel, you can harness the power of LLMs with almost no learning curve. Think of it as Firebase with built-in, AI-powered automation. JamAI Base's breakthrough feature is the Generative Table. Imagine a database table that: - Autonomously generates content based on your prompts. - Handles complex AI workflows like RAG and LLM orchestration. - Instantly creates a real-time REST API. Define your schema, create tables, and call the API – we handle the heavy lifting so you can focus on building incredible apps. This includes ensuring your AI always benefits from the latest LLM innovations.

Locations

Primary

Singapore, SG

Get directions
Federal Territory of Kuala Lumpur, Kuala Lumpur, MY

Get directions

Employees at Embedded LLM

See all employees

Updates

Embedded LLM

5,055 followers
1d
Report this post
vLLM Now Supports Running GGUF on AMD Radeon GPU 🚀 Exciting news! We've ported vLLM's GGUF kernel to AMD ROCm, unlocking impressive performance gains on AMD Radeon GPUs. 📊 In our benchmarks using the shareGPT dataset on an AMD Radeon RX 7900XTX, vLLM outperformed Ollama, even at batch sizes where Ollama traditionally excels. 💪 This is a game-changer for those running LLMs on AMD hardware, especially when using quantized models (5-bit, 4-bit, or even 2-bit). With over 60,000 GGUF models available on Hugging Face, the possibilities are endless. 💡 Key benefits: - Superior performance: vLLM delivers faster inference speeds compared to Ollama on AMD GPUs. - Wider model support: Run a vast collection of GGUF quantized models. - Efficient execution: Optimized for AMD ROCm, maximizing hardware utilization. 🔗 Learn more and get started: https://lnkd.in/g5qvUi8t We'd love to hear your feedback! Have you experimented with vLLM on Llama.cpp with Vulkan? What inference engine do you prefer for LLM tasks on AMD GPUs? What features or optimizations would you like to see in vLLM for AMD GPUs? #vLLM #AMD #ROCm #LLM #AI #GGUF
1 Comment

Like Comment Share
Embedded LLM

5,055 followers
5d
Report this post
AMD ROCm releases are getting seriously interesting. Here's what has me excited about ROCm 6.3: - Re-engineered FlashAttention-2: Up to 3X speedups and support for longer sequence lengths. - SGLang Integration: Get started here: https://lnkd.in/gv_cJz3n - Fortran Compiler with OpenMP Offloading: Legacy Fortran codebases can now leverage GPU acceleration without extensive refactoring. - Multi-Node FFTs: Distributed workloads across multiple Instinct accelerators are now supported. - Computer Vision Enhancements: Library updates bring support for the AV1 codec, GPU-accelerated JPEG decoding, and audio preprocessing. Blog: https://lnkd.in/gQSA7xQv

SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD GPUs #

rocm.blogs.amd.com

Like Comment Share
Embedded LLM

5,055 followers
6d
Report this post
75% of training time can be wasted on communication overhead between GPUs. Even on H100 systems, this can be as high as 43%! For massive models like Llama-3 405B, that translates to a staggering 25 days spent just on communication! 🤯 But there's good news! DeepSpeed Domino is here to the rescue. This new tensor parallelism (TP) engine minimizes communication overhead, unlocking faster and more efficient LLM training for both single-node and multi-node setups. - Near-complete communication hiding: Domino cleverly overlaps communication with computation, dramatically reducing wasted time. - Novel multi-node scalable TP solution: Domino is designed to excel in both single and multi-node environments, enabling efficient scaling for even the largest models. Learn more about DeepSpeed Domino: Blog: https://lnkd.in/eVEd5GwU Paper: https://lnkd.in/ecHc4bph
3 Comments

Like Comment Share
Embedded LLM reposted this
Mei Ling Leung

Member of Technical Staff at Embedded LLM
1w
Report this post
An ML engineer who only wants to train models is like a carpenter who only wants to hammer nails. 🔨 This highlights a fundamental flaw in AI engineering education today. We're churning out graduates who are experts in model training but lack the essential skills to solve real-world problems. It's like a carpentry school that only teaches hammering techniques. Sure, you'll learn how to drive a nail, but what about the rest of the craft? 🤔 Here's the problem: Most AI/ML courses focus heavily on algorithms, frameworks (like PyTorch), and model training. Students are given clean datasets and clear objectives. They become proficient in tuning hyperparameters and optimizing for accuracy. But the real world is messy: - Data is rarely readily available: You need to identify the right data sources, collect and clean the data, and deal with missing values, inconsistencies, and biases. - Problems are complex and ambiguous: You need to define the problem, frame it correctly, and choose the right approach. - Solutions require more than just models: You need to consider deployment, monitoring, scalability, and ethical implications. We need a new approach to AI education: - Problem-first learning: Start with real-world problems and teach students how to break them down into manageable steps. - Focus on the entire lifecycle: Cover the entire ML workflow, from data collection and preparation to model deployment and monitoring. - Develop critical thinking and problem-solving skills: Encourage students to think critically, analyze data, and evaluate solutions. - Emphasize communication and collaboration: Foster strong communication and interpersonal skills through group projects, presentations, and interactions with domain experts. - Prioritize domain knowledge: Integrate industry case studies, real-world projects, and collaborations with businesses into the curriculum Let's empower the next generation of engineers to build real-world solutions, not just train models in isolation. Remember what my boss once told me: "Remember, the real value you bring is your ability to solve problems, not just your knowledge of specific tools. XGBoost and SVM are great, but they're just means to an end. Your creativity, critical thinking, and understanding of the business are what truly make a difference."

9 Comments

Like Comment Share
Embedded LLM

5,055 followers
1w
Report this post
🔥 Pixtral Large is now supported on vLLM! 🔥 Run Pixtral Large with multiple input images from day 0 using vLLM. Install vLLM: pip install -U VLLM Run Pixtral Large: vllm serve mistralai/Pixtral-Large-Instruct-2411 --tokenizer_mode mistral --limit_mm_per_prompt 'image=10' --tensor-parallel-size 8 About Pixtral Large: * Built upon Mistral Large 2, preserving its exceptional text performance. * State-of-the-art on MathVista, DocVQA, VQAv2 * 123B multimodal decoder, 1B parameter vision encoder * 128K context window: fits minimum of 30 high-resolution images * Licensed under MRL 🤗 https://lnkd.in/gixc5mE4
1 Comment

Like Comment Share
Embedded LLM

5,055 followers
2w
Report this post
vLLM v0.6.4 brings expanded model support, Intel AI Gaudi Support, and significant progress in vLLM V1 core engine and torch.compile support. What's new: - New LLMs and VLMs: Idefics3(VLM), H2OVL-Mississippi(VLM for OCR and Document AI), Qwen2-Audio(Audio LLM), FalconMamba(Mamba LLM), Florence-2(VLM) - New encoder-decoder embedding models: BERT, RoBERTa, XLM-RoBERTa - Expanded Task Support: - Text Classification: Qwen2 classification - Embeddings: Llama embeddings, Math-Shepherd, Qwen2 embeddings - VLM Embeddings: VLM2Vec, E5-V, Qwen2-VL embeddings - Task Parameter: --task to specify generation or embedding tasks - Chat-Based Embeddings API: Pass multi-modal conversations to embedding models. - Tool Calling Parsers: Granite 3.0, Jamba, granite-20b-functioncalling - LoRA Support: Granite 3.0 MoE, Idefics3, Llama embeddings, Qwen, Qwen2-VL - BNB: Idefics3, Mllama, Qwen2, MiniCPMV - Hardware Support: - Intel Gaudi (HPU) Backend: A key advantage of Gaudi is massive scalability with standard Ethernet. - CPU Support for Embedding Models: Deploy embedding models on CPUs. - Performance Enhancements: Combined chunked prefill with speculative decoding and improved fused_moe performance. Explore the full release notes for detailed information: https://lnkd.in/gTXXmEWG #vLLM #Intel #Gaudi3 #Idefics3

Release v0.6.4 · vllm-project/vllm

github.com

2 Comments

Like Comment Share
Embedded LLM

5,055 followers
2w
Report this post
📢 Calling all robotics enthusiasts! 🤖 Meet Iris! Iris can now speak 🎙️ Follow our talented Robotics Engineer, Wessam Hamid, as he tackles the challenges and breakthroughs of building and programming cutting-edge humanoid robots like Iris. Stay tuned for exclusive updates, behind-the-scenes insights, and a glimpse into the future of robotics. 🚀 #robotics #humanoidrobot #LLM

Wessam Hamid

I design robots
2w

Iris update: It Speaks! Made some cool progress with Iris – it can talk now! 🎙️ This is my first shot at speech-to-speech reasoning, and it can switch between different language models and voices on command. It’s making API calls for voice transcription, responses, and text-to-speech, though it's still a bit slow (I sped up some parts of the video and added a timer to show that). Next step: speed things up and get the Realsense integrated. I’m also thinking of running the models locally on a dedicated AI PC – now I just need to figure out how to fund that piece. Excited to see where this goes! Do let me know if you have any suggestions.

Like Comment Share
Embedded LLM

5,055 followers
3w
Report this post
⚡️Huge performance gains for LLM training on AMD GPUs! 🐯Liger Kernel v0.4.0 now fully supports AMD GPUs! Thanks to our collaboration with Liger Kernel, you can now enjoy a 26% speed boost and a massive 60% reduction in memory usage when training LLM on AMD GPUs. Check out the benchmarks in our blog: https://lnkd.in/gU2cPX_m A big thank you to Hot Aisle Inc. for sponsoring the #MI300X and to Pin-Lun (Byron) Hsu for the collaboration!

Pin-Lun (Byron) Hsu

Building Liger-Kernel @Linkedin | Committer @flyteorg @theASF
3w Edited

Liger Kernel v0.4.0 has arrived! https://lnkd.in/gR73PfFh 1. Full AMD Support: We have partnered with https://embeddedllm.com to adjust the Triton configuration to fully support AMD! With version 0.4.0, you can run multi-GPU training with 26% higher speed and 60% lower memory usage on AMD. See the full blogpost from https://lnkd.in/gUBF9Ur6. Embedded LLM Hot Aisle Inc. Jon Stevens Pin Siang Tan Tun Jian Tan 2. Modal CI Migration: We have moved our entire GPU CI stack to Modal! Thanks to intelligent Docker layer caching and blazingly fast container startup time and scheduling, we have reduced the CI overhead by over 10x (from minutes to seconds). Modal Charles Frye Alec Powell Erik Bernhardsson 3. LLaMA 3.2-Vision Model: We have added kernel support for the LLaMA 3.2-Vision model. You can easily use `liger_kernel.transformers.apply_liger_kernel_to_mllama` to patch the model. Tyler Romero Shivam Sahni 4. HuggingFace Gradient Accumulation Fixes: We have fixed the notorious HuggingFace gradient accumulation issue (https://lnkd.in/gCKtftbw) by carefully adjusting the cross entropy scalar. You can now safely use v0.4.0 with the latest HuggingFace gradient accumulation fixes (transformers>=4.46.2)! Wing Lian Arthur Zucker 5. JSD Kernel: We have added the JSD kernel for distillation, which also comes with a chunking version! Chun-Chih Tseng Yun Dai Qingquan Song 6. Technical Report: We have published a technical report on arXiv (https://lnkd.in/gwHq6_7c) with abundant details. Yanning Chen Haowen Ning Animesh Singh Kapil Surlaker

Release v0.4.0: Full AMD support, Tech Report, Modal CI, Llama-3.2-Vision! · linkedin/Liger-Kernel

github.com

Like Comment Share
Embedded LLM

5,055 followers
3w
Report this post
Liger Kernels Leap the CUDA Moat: A Case Study with Liger, LinkedIn's SOTA Training Kernels on AMD GPU 🚀 Exciting news! We've partnered with the LinkedIn/Liger-Kernel team to fully support AMD GPUs in their latest v0.4.0 release. This brings significant performance improvements to Large Language Model (LLM) training on AMD hardware. Key Benefits: - Faster Training: Up to 26% faster multi-GPU training throughput. - Reduced Memory Usage: Train larger models and use bigger batch sizes with up to 60% memory reduction. - Longer Context Lengths: Explore new possibilities with support for up to 8x longer context lengths. Check out the benchmark on our blog: https://lnkd.in/gU2cPX_m Check out the v0.4.0 release: https://lnkd.in/gJxXK8cy A big thank you to Hot Aisle Inc. for sponsoring the MI300X and to Pin-Lun (Byron) Hsu for the collaboration! #LLM #AI #AMD #ROCm #LigerKernels
4 Comments

Like Comment Share
Embedded LLM

5,055 followers
1mo
Report this post
Malaysia's AI Scene is Electrifying! ⚡️ The Future is Being Shaped in SEA. Malaysia is rapidly becoming an AI powerhouse. With a data center boom projected to increase capacity ninefold, global tech giants like Microsoft, NVIDIA, Amazon, Google, and Oracle are investing heavily in its digital infrastructure. At MDX 2024, we witnessed incredible enthusiasm and groundbreaking AI applications. The energy was palpable, and the future possibilities seemed limitless. Embedded LLM is proud to be a part of this revolution. Alongside AMD and Selangor Human Resource Development Centre (SHRDC), we're empowering Malaysian businesses with cutting-edge AI solutions. Malaysia's commitment to AI is evident. We were honored to host Malaysia's Digital Minister at our booth, demonstrating strong government support for AI innovation. We're incredibly impressed by Malaysia's commitment to AI. The future is bright, and we're eager to learn and contribute to this exciting journey.

2 Comments

Like Comment Share

Embedded LLM

Software Development

Creator of JamAI Base: The collaborative spreadsheet where AI ideas flow, chaining cells into powerful pipelines.

About us

Products

JamAI Base

Low-Code Development Platforms

Locations

Employees at Embedded LLM

Pin Siang Tan

Co-Founder @ Embedded LLM | Follow me to Learn about LLM and Database Systems

Jasmine Lee

Digital Transformation | Sales Professional | Account Management | Business Development | SaaS

Jeff A.

Division Designer @ ELLM | Hardware deployment for Edge and enterprise AI solutions

Jia-Huei Tan

PhD | Sr. AI/ML Engineer/Researcher in LLM and 2+3D Perception (Deep Learning, LiDAR, Image)

Updates

SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD GPUs #

rocm.blogs.amd.com

Release v0.6.4 · vllm-project/vllm

github.com

Release v0.4.0: Full AMD support, Tech Report, Modal CI, Llama-3.2-Vision! · linkedin/Liger-Kernel

github.com

Join now to see what you are missing

Similar pages

Hilti Asia IT Services

A Serious Company

Infront Consulting Group Asia Pacific

YTL-Sea Digital Bank Project

Cohu Malaysia

AI Nusantara

Webby Group

Hugging Face

MoneyLion

EzGIG