Models / Libraries / Frameworks

Jul 22, 2025

Kimi-K2-Instruct Now Available as NVIDIA NIM

Try the new 1T-parameter open source MoE LLM today.

1 MIN READ

Jul 17, 2025

Feature Engineering at Scale: Optimizing ML Models in Semiconductor Manufacturing with NVIDIA CUDA‑X Data Science

In our previous post, we introduced the setup of predictive modeling in chip manufacturing and operations, highlighting common challenges such as imbalanced...

6 MIN READ

Jul 16, 2025

CUTLASS 3.x: Orthogonal, Reusable, and Composable Abstractions for GEMM Kernel Design

GEMM optimization on GPUs is a modular problem. Performant implementations need to specify hyperparameters such as tile shapes, math and copy instructions, and...

12 MIN READ

Jul 16, 2025

CUTLASS: Principled Abstractions for Handling Multidimensional Data Through Tensors and Spatial Microkernels

In the era of generative AI, utilizing GPUs to their maximum potential is essential to training better models and serving users at scale. Often, these models...

12 MIN READ

Jul 14, 2025

Enabling Fast Inference and Resilient Training with NCCL 2.27

As AI workloads scale, fast and reliable GPU communication becomes vital, not just for training, but increasingly for inference at scale. The NVIDIA Collective...

9 MIN READ

Jul 14, 2025

Enhancing Multilingual Human-Like Speech and Voice Cloning with NVIDIA Riva TTS

While speech AI is used to build digital assistants and voice agents, its impact extends far beyond these applications. Core technologies like text-to-speech...

10 MIN READ

Jul 11, 2025

Forecasting the Weather Beyond Two Weeks Using NVIDIA Earth-2

Being able to predict extreme weather events is essential as such conditions become more common and destructive. Subseasonal climate forecasting—predicting...

9 MIN READ

Jul 09, 2025

Reinforcement Learning with NVIDIA NeMo-RL: Reproducing a DeepScaleR Recipe Using GRPO

Reinforcement learning (RL) is the backbone of interactive AI. It is fundamental for teaching agents to reason and learn from human preferences, enabling...

5 MIN READ

Jul 09, 2025

Delivering the Missing Building Blocks for NVIDIA CUDA Kernel Fusion in Python

C++ libraries like CUB and Thrust provide high-level building blocks that enable NVIDIA CUDA application and library developers to write speed-of-light code...

5 MIN READ

Jul 07, 2025

Think Smart and Ask an Encyclopedia-Sized Question: Multi-Million Token Real-Time Inference for 32X More Users

Modern AI applications increasingly rely on models that combine huge parameter counts with multi-million-token context windows. Whether it is AI agents...

8 MIN READ

Jul 07, 2025

NVIDIA cuQuantum Adds Dynamics Gradients, DMRG, and Simulation Speedup

NVIDIA cuQuantum is an SDK of optimized libraries and tools that accelerate quantum computing emulations at both the circuit and device level by orders of...

5 MIN READ

Jul 02, 2025

Advanced NVIDIA CUDA Kernel Optimization Techniques: Handwritten PTX

As accelerated computing continues to drive application performance in all areas of AI and scientific computing, there's a renewed interest in GPU optimization...

11 MIN READ

Jul 01, 2025

Per-Tensor and Per-Block Scaling Strategies for Effective FP8 Training

In this blog post, we’ll break down the main FP8 scaling strategies—per-tensor scaling, delayed and current scaling, and per-block scaling (including the...

10 MIN READ

Jun 26, 2025

Run Google DeepMind’s Gemma 3n on NVIDIA Jetson and RTX

As of today, NVIDIA now supports the general availability of Gemma 3n on NVIDIA RTX and Jetson. Gemma, previewed by Google DeepMind at Google I/O last month,...

4 MIN READ

Jun 25, 2025

Join Us at We Are Developers World Congress 2025

Join us at We Are Developers World Congress from July 9 to 11 to attend our workshops and connect with experts.

1 MIN READ

Jun 18, 2025

Compiler Explorer: An Essential Kernel Playground for CUDA Developers

Have you ever wondered exactly what the CUDA compiler generates when you write GPU kernels? Ever wanted to share a minimal CUDA example with a colleague...

7 MIN READ

Models / Libraries / Frameworks

Kimi-K2-Instruct Now Available as NVIDIA NIM

Feature Engineering at Scale: Optimizing ML Models in Semiconductor Manufacturing with NVIDIA CUDA‑X Data Science

CUTLASS 3.x: Orthogonal, Reusable, and Composable Abstractions for GEMM Kernel Design

CUTLASS: Principled Abstractions for Handling Multidimensional Data Through Tensors and Spatial Microkernels

Enabling Fast Inference and Resilient Training with NCCL 2.27

Enhancing Multilingual Human-Like Speech and Voice Cloning with NVIDIA Riva TTS

Forecasting the Weather Beyond Two Weeks Using NVIDIA Earth-2

Reinforcement Learning with NVIDIA NeMo-RL: Reproducing a DeepScaleR Recipe Using GRPO

Delivering the Missing Building Blocks for NVIDIA CUDA Kernel Fusion in Python

Think Smart and Ask an Encyclopedia-Sized Question: Multi-Million Token Real-Time Inference for 32X More Users

NVIDIA cuQuantum Adds Dynamics Gradients, DMRG, and Simulation Speedup

Advanced NVIDIA CUDA Kernel Optimization Techniques: Handwritten PTX

Per-Tensor and Per-Block Scaling Strategies for Effective FP8 Training

Run Google DeepMind’s Gemma 3n on NVIDIA Jetson and RTX

Join Us at We Are Developers World Congress 2025

Compiler Explorer: An Essential Kernel Playground for CUDA Developers

Feature Engineering at Scale: Optimizing ML Models in Semiconductor Manufacturing with NVIDIA CUDA‑X Data Science