CUDA (Windows)
KeyDNN’s Windows CUDA backend loads a native DLL and relies on the CUDA runtime (and optionally cuDNN) being discoverable by the current process.
If CUDA native libraries are unavailable, CUDA tests are skipped and CUDA execution paths will raise or fall back where explicitly documented.
Requirements
- A compatible NVIDIA GPU
- A working CUDA runtime installation
- (Optional) cuDNN, depending on which operations you use and how your native backend is configured
You can check availability from Python:
from keydnn import cuda_available
print(cuda_available())
Environment variables
KeyDNN uses environment variables to help Windows locate CUDA runtime dependencies:
-
CUDA_PATH(recommended): points to your CUDA install root, e.g.C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2 -
CUDNN_PATH(optional): points to your cuDNN root that containsbin/,lib/,include/, e.g.C:\cudnn
If you copied cuDNN DLLs into the CUDA install (common manual setup), you typically do not
need CUDNN_PATH as long as cudnn*.dll exists in <CUDA_PATH>\bin.
PowerShell examples
# For the current terminal session only:
$env:CUDA_PATH = "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2"
$env:CUDNN_PATH = "C:\cudnn" # optional (only needed if cuDNN is not in CUDA\bin)
Note: If you change environment variables, restart the Python process (and sometimes the terminal) before retrying.
Common issues
cuda_available() is False
- Verify your NVIDIA driver + CUDA toolkit installation.
- Confirm
CUDA_PATHpoints to your CUDA install root. - Ensure
<CUDA_PATH>\bincontains CUDA runtime DLLs (e.g.,cudart64_*.dll).
cuDNN not found
-
If using cuDNN-accelerated ops, ensure
cudnn*.dllis discoverable: -
either present in
<CUDA_PATH>\bin, or - reachable via
CUDNN_PATH→<CUDNN_PATH>\bin.
Import-time CUDA failures
If importing keydnn triggers CUDA loading failures on machines without CUDA, prefer:
- lazy CUDA initialization in backend code paths (only load CUDA DLLs when CUDA is requested), and
- using
cuda_available()before selecting CUDA devices.
Recommended usage pattern
from keydnn import Device, cuda_available
device = Device("cuda:0") if cuda_available() else Device("cpu")
Use this pattern to keep scripts runnable on both CPU-only and CUDA machines.
Notes on performance and correctness
- CUDA kernels are asynchronous by default; timing measurements may require explicit synchronization.
- Some CUDA features may use correctness-first fallbacks in specific paths until full kernel coverage is implemented.
- For evaluation/inference, prefer mini-batching to avoid large-tensor limits in certain kernels.