Eduardo Muñoz’s Post

View profile for Eduardo Muñoz, graphic

Software Architecture | Data Engineer | Project Management Lead | Machine Learning | NLP | ITSM Manager

💥 New Qwen2 series The developers behind the Qwen series have unveiled the next-generation Qwen2 models, marking a significant advancement from Qwen1.5. These new models are characterized by enhanced size diversity, multilingual proficiency, extended context handling, and state-of-the-art performance across various benchmarks. 👏 Diverse Model Sizes: The Qwen2 series introduces a range of models in five different sizes: Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B. Including both base and instruction-tuned versions. 🌏 Multilingual Training: Qwen2 models have been trained on data spanning 27 additional languages besides English and Chinese. This broad linguistic training improves the models' generalization and understanding across various languages, setting a new standard in multilingual LLM performance. 📥 Enhanced Context Length: Support for extended context lengths: Qwen2-7B-Instruct and Qwen2-72B-Instruct models now support context lengths up to 128K tokens. All base models have been pretrained on data with a context length of 32K tokens. The ability to handle such extensive contexts allows these models to manage and interpret lengthy documents and conversations more effectively. 🧐 Group Query Attention (GQA): GQA has been applied across all Qwen2 model sizes, enhancing inference speed and reducing memory usage, optimizing performance for both small and large models. 🎢 Improved Performance in Coding and Mathematics: The Qwen2 series exhibits significantly improved performance in coding and mathematical tasks, reflecting advancements in model architecture and training processes. 📢 Opensource Availability: The Qwen2 models have been open-sourced on Hugging Face and ModelScope. 🛠 Instruction-Tuned Models: The instruction-tuned models’ ability to manage long contexts is assessed through tasks like the "Needle in a Haystack," revealing their capability to handle context lengths up to 128K tokens, especially when augmented with YARN. 📚 Dataset Augmentation: Extensive efforts were made to expand and improve the datasets used for pretraining and instruction-tuning. This includes a focus on increasing both the volume and quality of data across various languages, enhancing the models' competencies beyond the default English and Chinese. Link to the model repo in the comments. #llm #nlp #machinelearning https://lnkd.in/disrdyrF

GitHub - QwenLM/Qwen2: Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.

GitHub - QwenLM/Qwen2: Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.

github.com

Eduardo Muñoz

Software Architecture | Data Engineer | Project Management Lead | Machine Learning | NLP | ITSM Manager

5mo
Like
Reply

To view or add a comment, sign in

Explore topics