nvidia cuda free download

Showing 94 open source projects for "nvidia cuda"

View related business solutions

AI-generated apps that pass security review
Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.

Try Retool free
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
1

CV-CUDA

CV-CUDA™ is an open-source, GPU accelerated library

CV-CUDA is an open-source project that enables building efficient cloud-scale Artificial Intelligence (AI) imaging and computer vision (CV) applications. It uses graphics processing unit (GPU) acceleration to help developers build highly efficient pre- and post-processing pipelines. CV-CUDA originated as a collaborative effort between NVIDIA and ByteDance.

Downloads: 5 This Week

Last Update: 2025-11-15
See Project
2

CUDA.jl

CUDA programming in Julia

High-performance GPU programming in a high-level language. JuliaGPU is a GitHub organization created to unify the many packages for programming GPUs in Julia. With its high-level syntax and flexible compiler, Julia is well-positioned to productively program hardware accelerators like GPUs without sacrificing performance. The latest development version of CUDA.jl requires Julia 1.8 or higher. If you are using an older version of Julia, you need to use a previous version of CUDA.jl. This will...

Downloads: 1 This Week

Last Update: 2026-01-03
See Project
3

CUDA API Wrappers

Thin, unified, C++-flavored wrappers for the CUDA APIs

CUDA API Wrappers is a C++ library providing high-level, modern wrappers for NVIDIA’s CUDA runtime and driver APIs, enhancing usability and efficiency. It is intended for those who would otherwise use these APIs directly, to make working with them more intuitive and consistent, making use of modern C++ language capabilities, programming idioms, and best practices. In a nutshell - making CUDA API work more fun.

Downloads: 1 This Week

Last Update: 5 days ago
See Project
4

NVIDIA GPU Operator

NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes

...These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Runtime, automatic node labeling, DCGM-based monitoring, and others.

Downloads: 1 This Week

Last Update: 2025-12-03
See Project
Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
5

CuPy

A NumPy-compatible array library accelerated by CUDA

CuPy is an open source implementation of NumPy-compatible multi-dimensional array accelerated with NVIDIA CUDA. It consists of cupy.ndarray, a core multi-dimensional array class and many functions on it. CuPy offers GPU accelerated computing with Python, using CUDA-related libraries to fully utilize the GPU architecture. According to benchmarks, it can even speed up some operations by more than 100X. CuPy is highly compatible with NumPy, serving as a drop-in replacement in most cases. ...

Downloads: 23 This Week

Last Update: 2025-08-18
See Project
6

KeyKiller-Cuda

Solving the Satoshi Puzzle

KeyKiller is a GPU-accelerated version of the KeyKiller project, designed to achieve extreme performance in solving Satoshi Nakamoto's puzzles using modern NVIDIA GPUs. KeyKiller CUDA pushes the limits of cryptographic key search performance by leveraging CUDA, thread-beam parallelism, and batch EC operations. The command-line version is open-source and free to use. For the paid advanced graphics version, please visit: https://gitlab.com/8891689/KeyKiller-Cuda/

Downloads: 26 This Week

Last Update: 2025-12-13
See Project
7

Tiny CUDA Neural Networks

Lightning fast C++/CUDA neural network framework

This is a small, self-contained framework for training and querying neural networks. Most notably, it contains a lightning-fast "fully fused" multi-layer perceptron (technical paper), a versatile multiresolution hash encoding (technical paper), as well as support for various other input encodings, losses, and optimizers. We provide a sample application where an image function (x,y) -> (R,G,B) is learned. The fully fused MLP component of this framework requires a very large amount of shared...

Downloads: 0 This Week

Last Update: 2025-07-08
See Project
8

Nvitop

An interactive NVIDIA-GPU process viewer and beyond

nvitop is an interactive NVIDIA device and process monitoring tool. It has a colorful and informative interface that continuously updates the status of the devices and processes. As a resource monitor, it includes many features and options, such as tree-view, environment variable viewing, process filtering, process metrics monitoring, etc. Beyond that, the package also ships a CUDA device selection tool nvisel for deep learning researchers.

Downloads: 4 This Week

Last Update: 2026-01-27
See Project
9

TensorRT

C++ library for high performance inference on NVIDIA GPUs

...TensorRT is built on CUDA®, NVIDIA’s parallel programming model, and enables you to optimize inference leveraging libraries, development tools, and technologies in CUDA-X™ for artificial intelligence, autonomous machines, high-performance computing, and graphics. With new NVIDIA Ampere Architecture GPUs, TensorRT also leverages sparse tensor cores providing an additional performance boost.

Downloads: 30 This Week

Last Update: 2026-02-03
See Project
Gen AI apps are built with MongoDB Atlas
The database for AI-powered applications.

MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.

Start Free
10

XMRig

RandomX, KawPow, CryptoNight, AstroBWT and GhostRider unified miner

High performance, open-source, cross-platform RandomX, KawPow, CryptoNight, and AstroBWT CPU/GPU miner, RandomX benchmark, and stratum proxy. XMRig is a high-performance, open-source, cross-platform RandomX, KawPow, CryptoNight, and AstroBWT unified CPU/GPU miner and RandomX benchmark. Official binaries are available for Windows, Linux, macOS, and FreeBSD. The preferred way to configure the miner is the JSON config file as it is more flexible and human-friendly. The command-line interface...

1 Review

Downloads: 107 This Week

Last Update: 2025-12-23
See Project
11

Torch-TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

Torch-TensorRT is a compiler for PyTorch/TorchScript, targeting NVIDIA GPUs via NVIDIA’s TensorRT Deep Learning Optimizer and Runtime. Unlike PyTorch’s Just-In-Time (JIT) compiler, Torch-TensorRT is an Ahead-of-Time (AOT) compiler, meaning that before you deploy your TorchScript code, you go through an explicit compile step to convert a standard TorchScript program into a module targeting a TensorRT engine. Torch-TensorRT operates as a PyTorch extension and compiles modules that integrate...

Downloads: 8 This Week

Last Update: 2025-10-17
See Project
12

OuteTTS

Interface for OuteTTS models

...The project supports multiple backends including llama.cpp (Python bindings and server), Hugging Face Transformers, ExLlamaV2, VLLM and a JavaScript interface via Transformers.js, allowing it to run on CPUs, NVIDIA CUDA GPUs, AMD ROCm, Vulkan-capable GPUs, and Apple Metal. It also includes a notion of speaker profiles: you can create a speaker from a short audio sample, save it as JSON, and reuse it for consistent voice identity across generations and sessions. For best quality, the model is designed to work with a reference speaker clip and will inherit emotion, style, and accent from that reference.

Downloads: 1 This Week

Last Update: 2025-11-28
See Project
13

Jan.ai

Open source alternative to ChatGPT that runs 100% offline

Jan.ai is an open-source, privacy-focused AI assistant that serves as an alternative to ChatGPT, running completely locally on your device. It allows you to download and run LLMs (local language models) offline while also offering optional integration with cloud-based model providers—giving you full control over your data and AI interactions. Download and run LLMs (Llama, Gemma, Qwen, GPT-oss etc.) from HuggingFace. Connect to GPT models via OpenAI, Claude models via Anthropic, Mistral,...

Downloads: 37 This Week

Last Update: 3 days ago
See Project
14

cuDF

GPU DataFrame Library

...It relies on NVIDIA® CUDA® primitives for low-level compute optimization but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

Downloads: 0 This Week

Last Update: 2026-02-05
See Project
15

Simple StyleGan2 for Pytorch

Simplest working implementation of Stylegan2

Simple Pytorch implementation of Stylegan2 that can be completely trained from the command-line, no coding needed. You will need a machine with a GPU and CUDA installed. You can also specify the location where intermediate results and model checkpoints should be stored. You can increase the network capacity (which defaults to 16) to improve generation results, at the cost of more memory. By default, if the training gets cut off, it will automatically resume from the last checkpointed file....

Downloads: 4 This Week

Last Update: 2025-01-12
See Project
16

InvokeAI

InvokeAI is a leading creative engine for Stable Diffusion models

...InvokeAI offers an industry leading Web Interface, interactive Command Line Interface, and also serves as the foundation for multiple commercial products. This fork is supported across Linux, Windows and Macintosh. Linux users can use either an Nvidia-based card (with CUDA support) or an AMD card (using the ROCm driver). We do not recommend the GTX 1650 or 1660 series video cards. They are unable to run in half-precision mode and do not have sufficient VRAM to render 512x512 images.

1 Review

Downloads: 27 This Week

Last Update: 2026-02-06
See Project
17

ParallelStencil.jl

Package for writing high-level code for parallel stencil computations

ParallelStencil empowers domain scientists to write architecture-agnostic high-level code for parallel high-performance stencil computations on GPUs and CPUs. Performance similar to CUDA C / HIP can be achieved, which is typically a large improvement over the performance reached when using only CUDA.jl or AMDGPU.jl GPU Array programming. For example, a 2-D shallow ice solver presented at JuliaCon 2020 [1] achieved a nearly 20 times better performance than a corresponding GPU Array programming implementation; in absolute terms, it reached 70% of the theoretical upper performance bound of the used Nvidia P100 GPU, as defined by the effective throughput metric, T_eff. ...

Downloads: 0 This Week

Last Update: 2026-01-27
See Project
18

GoCV

Go package for computer vision using OpenCV 4 and beyond

GoCV gives programmers who use the Go programming language access to the OpenCV 4 computer vision library. The GoCV package supports the latest releases of Go and OpenCV v4.5.4 on Linux, macOS, and Windows. Our mission is to make the Go language a “first-class” client compatible with the latest developments in the OpenCV ecosystem. Computer Vision (CV) is the ability of computers to process visual information, and perform tasks normally associated with those performed by humans. CV software...

Downloads: 0 This Week

Last Update: 2026-01-05
See Project
19

AWS Deep Learning Containers

A set of Docker images for training and serving models in TensorFlow

AWS Deep Learning Containers (DLCs) are a set of Docker images for training and serving models in TensorFlow, TensorFlow 2, PyTorch, and MXNet. Deep Learning Containers provide optimized environments with TensorFlow and MXNet, Nvidia CUDA (for GPU instances), and Intel MKL (for CPU instances) libraries and are available in the Amazon Elastic Container Registry (Amazon ECR). The AWS DLCs are used in Amazon SageMaker as the default vehicles for your SageMaker jobs such as training, inference, transforms etc. They've been tested for machine learning workloads on Amazon EC2, Amazon ECS and Amazon EKS services as well. ...

Downloads: 1 This Week

Last Update: 5 hours ago
See Project
20

OpenFace Face Recognition

Face recognition with deep neural networks

OpenFace is a Python and Torch implementation of face recognition with deep neural networks and is based on the CVPR 2015 paper FaceNet: A Unified Embedding for Face Recognition and Clustering by Florian Schroff, Dmitry Kalenichenko, and James Philbin at Google. Torch allows the network to be executed on a CPU or with CUDA. This research was supported by the National Science Foundation (NSF) under grant number CNS-1518865. Additional support was provided by the Intel Corporation, Google, Vodafone, NVIDIA, and the Conklin Kistler family fund. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and should not be attributed to their employers or funding sources. ...

Downloads: 0 This Week

Last Update: 2024-07-20
See Project
21

Dogecoin Mining - Software

This is a multi-threaded multi-pool FPGA and ASIC miner for DOGECOIN

Dogecoin Mining - Software is an open source miner for ASIC, GPU and FPGA. It works on Windows, Linux and macOS. This miner is extremely flexible in terms of platform and can work with a variety of hardware miners and GPUs including AMD, CUDA and NVIDIA platforms.

3 Reviews

Downloads: 56 This Week

Last Update: 5 days ago
See Project
22

bitResurrector

Bitcoin private key recovery tool with Bloom Filter & CUDA

...Technological Stack: - Zero-Latency Bloom Filter: Real-time matching against 58M+ active addresses (Loyce Club data). - Turbo Core: C++/AVX-512 optimization with processor affinity. - GPU Mode: Massive parallel computation via NVIDIA CUDA. - Memory Management: Zero disk I/O lag using mmap (Memory-Mapped Files). BitResurrector is a closed-source, proprietary tool for researchers and professionals.

Downloads: 12 This Week

Last Update: 2026-02-02
See Project
23

VanitySearch

VanitySearch is a Bitcoin address prefix lookup tool.

VanitySearch is a Bitcoin address prefix lookup tool. If you want to generate a secure private key, use the `-s` option to enter your passphrase, which will be used to generate the base key conforming to the BIP38 standard (e.g., `VanitySearch.exe -s "my passphrase" 1MyPrefix"`). You can also use `VanitySearch.exe -ps "my passphrase"`, which adds a cryptographically secure seed to your passphrase.Fixed custom address matching errors and private key conversion errors, changed the randomizer,...

Downloads: 40 This Week

Last Update: 2025-12-03
See Project
24

HanoiVM

HanoiVM is a recursive, AI-augmented ternary virtual machine

🚀 HanoiVM — Recursive Ternary Virtual Machine **HanoiVM** is a recursive, AI-augmented **ternary virtual machine** built on a symbolic base-81 architecture. It is the execution core of the **Axion + T81Lang** ecosystem, enabling stack-tier promotion, symbolic AI opcodes, and entropy-aware transformations across three levels of logic: - 🔹 `T81`: 81-bit operand logic (register-like) - 🔸 `T243`: Symbolic BigInt + FSM state logic - 🔺 `T729`: Tensor-based AI macros with semantic...

Downloads: 0 This Week

Last Update: 2025-05-31
See Project
25

GSvit

Fast FDTD solver with graphics card support

Fast FDTD solver with graphics card support. Optimized for nanoscale optics - scanning near field optical microscopy, rough surface scattering and solar cells. Uses CUDA environment for graphics card operation.

1 Review

Downloads: 12 This Week

Last Update: 2023-11-18
See Project