Phi-3 Vision is now available on Huggingface. Phi series models from MS have smaller sizes but higher accuracy in their size range because the models are trained on a curated high quality dataset. "The model is intended for broad commercial and research use in English. The model provides uses for general purpose AI systems and applications with visual and text input capabilities which require: >> memory/compute constrained environments; >> latency bound scenarios; >> general image understanding; >> OCR; >> chart and table understanding. Our model is designed to accelerate research on efficient language and multimodal models, for use as a building block for generative AI powered features." Mithilesh Thakkar
Nirmal Patel’s Post
More Relevant Posts
-
This weekend, I'm diving into the architecture and components of Generative AI. After some research, I found this picture to be a great starting point for understanding how Generative AI executes inference. Once you grasp this diagram, you can explore each phase more deeply. Here's a brief explanation: Models: There are commercial models and open-source models. Commercial models, like ChatGPT and Gemini, are API-based. Open-source models, offered by vendors like Meta and Microsoft, include models such as Lima and BitNet. These models are trained on general datasets. Fine-Tuning: This technique adds domain-specific knowledge, such as medical, financial services, or public sector information, to general LLMs. It customizes the models with additional knowledge through fine-tuning. Prompt Engineering: This is how users interact with generative AI by crafting queries. You can customize the AI's responses through prompt text. RAG : RAG uses search engines to fetch external documents and incorporates these search results to enhance the responses of large language models (LLMs). This approach fills in gaps in the LLMs' knowledge and improves the accuracy of their answers. REF: https://lnkd.in/gg-vDcPP
To view or add a comment, sign in
-
-
🌐 Industry news: Zyphra has released Zamba2-7B, a state-of-the-art small language model with 7 billion parameters, outperforming leading models like Mistral-7B, Google’s Gemma-7B, and Meta’s Llama3-8B. Zamba2-7B boasts superior inference efficiency with 25% faster time to first token, 20% improvement in tokens per second, and reduced memory usage. Trained on 3 trillion tokens with advanced pretraining techniques, Zamba2-7B is designed for use in natural-language tasks on consumer GPUs and enterprise applications. Read more: (https://lnkd.in/db46_c_F). 📊 Trending dataset: The Notable AI Models 2024 dataset on Kaggle includes machine learning models that meet one or more key criteria: state-of-the-art improvements, historical significance, or high citation counts (over 1,000 citations). This dataset serves as a resource for anyone researching cutting-edge AI advancements or exploring impactful models from the past. It’s particularly useful for students, developers, and researchers aiming to understand AI trends. Explore the dataset: (https://lnkd.in/dbRFrPei). 🛠️ Top tool: Question Base, featured on Product Hunt, is an AI-powered tool that simplifies knowledge management by automatically documenting conversations in Slack and answering employee questions. It integrates seamlessly with Notion, Zendesk, and Intercom, making it easier for companies to maintain up-to-date documentation without manual effort. This tool is designed to reduce the time spent on managing internal knowledge, while ensuring quick access to information. Check it out: (https://lnkd.in/diHuwbgD).
To view or add a comment, sign in
-
-
DeepSeek Unveils R1-Lite-Preview: A Transparent Reasoning AI Model Rivaling GPT-o1-preview What’s New Chinese AI research firm DeepSeek has introduced R1-Lite-Preview, a next-generation AI model focused on transparent reasoning and Chain-of-Thought (CoT) processes. This model challenges OpenAI’s GPT-o1-preview, delivering competitive performance with an added emphasis on real-time reasoning visualization. Key Features Reasoning Transparency: Real-time visualization of chain-of-thought processes. Benchmark Excellence: Exceptional accuracy across AIME, MATH, GQA Diamond, Codeforces, LiveCodeBench, and ZebraLogic. Scalability: Accuracy improves with extended reasoning steps, achieving up to 52.5% on AIME with 100,000 tokens. Advanced Infrastructure: Operates on 50,000 H100 GPUs, matching the resources of leading global AI labs. Performance Highlights AIME 2024: 52.5% accuracy, surpassing GPT-o1-preview’s 44.6%, excelling in multi-step reasoning. MATH: 91.6% accuracy, exceeding GPT-o1-preview by 6.1%. Codeforces: Elo rating of 1450, slightly ahead of GPT-o1-preview’s 1428 in algorithmic problem-solving. Transparent Reasoning DeepSeek’s R1-Lite-Preview offers real-time Chain-of-Thought reasoning, enabling users to trace intermediate steps. This enhances the interpretability of its outputs, highlighting DeepSeek’s commitment to transparency. Accessibility The model is available via DeepSeek Chat, offering: Unlimited basic chat access. Premium reasoning features capped at 50 daily messages. Plans to open-source the full R1 model for broader adoption. 🌐 Try it now at http://chat.deepseek.com
To view or add a comment, sign in
-
-
Hi! I am working with LLMs and would like to share with you some insights about two popular models: Llama 🦙 and Qwen. Both are powerful tools in the world of artificial intelligence, but they have different strengths 💪 , especially when it comes to speed and memory usage. If you're deciding between them for your projects, here's a quick comparison based on their performance. Comparing LLMs: Qwen vs. Llama in Speed and Memory Large Language Models (LLMs) like Qwen and Llama are revolutionizing AI, but when it comes to deployment, speedand memory usage are key factors that influence performance. Qwen models, optimized for efficiency, typically offer faster response times with lower memory footprints. For example, Qwen-2.5 can process queries in approximately 200ms per token, consuming 2GB of VRAM during inference on a standard setup, making it ideal for applications requiring speed and resource efficiency. Llama 🦙 models, developed by Meta, provide state-of-the-art accuracy, but at the cost of higher resource requirements. Llama 2 models (e.g., Llama 2-13B) may take around 350ms per token, with memory consumption reaching 8GB of VRAM for inference. This makes Llama suitable for more complex tasks where precision is a higher priority than speed. In Summary 🫨 : Qwen offers faster inference (200ms per token) and lower memory usage (2GB VRAM), making it more efficient for real-time applications. Llama focuses on accuracy, but consumes more resources (8GB VRAM) and is slower (350ms per token), making it ideal for high-accuracy tasks where computational power is less of a concern. Choosing between them depends on your application’s priorities: Qwen for speed and efficiency, Llama for cutting-edge accuracy.
To view or add a comment, sign in
-
NotebookLM: an AI notebook for everyone
NotebookLM: How to try Google’s experimental AI-first notebook
google.smh.re
To view or add a comment, sign in
-
NotebookLM: an AI notebook for everyone
NotebookLM: How to try Google’s experimental AI-first notebook
google.smh.re
To view or add a comment, sign in
-
NotebookLM: an AI notebook for everyone
NotebookLM: How to try Google’s experimental AI-first notebook
google.smh.re
To view or add a comment, sign in
-
NotebookLM: an AI notebook for everyone
NotebookLM: How to try Google’s experimental AI-first notebook
google.smh.re
To view or add a comment, sign in
-
NotebookLM: an AI notebook for everyone
NotebookLM: How to try Google’s experimental AI-first notebook
google.smh.re
To view or add a comment, sign in
-
NotebookLM: an AI notebook for everyone
NotebookLM: How to try Google’s experimental AI-first notebook
google.smh.re
To view or add a comment, sign in