Quickstart
This page provides minimal, tested examples to get started with KeyDNN.
KeyDNN’s public APIs are available from the top-level keydnn package.
Minimal Tensor + autograd
from keydnn import Tensor, Device
x = Tensor(shape=(2, 3), device=Device("cpu"), requires_grad=True)
y = (x * 2.0).sum()
y.backward()
print(x.grad.to_numpy())
CUDA example (device-resident ops)
This example runs on CUDA if the backend is available.
from keydnn import Tensor, Device, cuda_available
device = Device("cuda:0") if cuda_available() else Device("cpu")
x = Tensor.rand((1024, 1024), device=device, requires_grad=True)
y = (x @ x.T).mean()
y.backward()
print("device:", str(device))
print("y:", y.item())
Reproducibility (seed + determinism)
KeyDNN provides two separate knobs for reproducibility:
- Random seeding controls random number generation (Python/NumPy).
- Determinism policy controls nondeterminism from CPU threading (BLAS/OpenMP scheduling).
Recommended order
from keydnn import seed, set_deterministic
seed(42)
set_deterministic(True) # defaults to cpu_threads=1
# build / initialize model after reproducibility is configured
Note: Thread-related environment variables may need to be set before importing NumPy (or any BLAS-backed library) to take full effect.
CLI demo (MNIST & CIFAR smoke tests)
KeyDNN includes a small runnable training example wired through the package CLI:
# CPU (always available)
python -m keydnn test --train_mnist_example --device cpu --epochs 4 --limit-train 50000 --limit-test 1000
# CUDA (if CUDA backend + native libraries are available)
python -m keydnn test --train_mnist_example --device cuda:0 --epochs 4 --limit-train 50000 --limit-test 1000
CIFAR-10 CNN smoke test:
python -m keydnn test --train_cifar_example --device cuda:0 --epochs 4 --limit-train 50000 --limit-test 1000
Training example (Model.fit + callbacks)
This example trains a small XOR network using Sequential, an optimizer, and callbacks.
import numpy as np
from keydnn import (
EarlyStopping,
ModelCheckpoint,
cuda_available,
Tensor,
Device,
numpy_to_tensor,
Sigmoid,
Sequential,
Linear,
)
def _xor_data_numpy():
x_np = np.array(
[[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]],
dtype=np.float32,
)
y_np = np.array([[0.0], [1.0], [1.0], [0.0]], dtype=np.float32)
return x_np, y_np
def _accuracy_from_pred_np(y_true_np: np.ndarray, pred_np: np.ndarray) -> float:
y_hat = (pred_np >= 0.5).astype(np.float32)
return float((y_hat == y_true_np).mean())
if __name__ == "__main__":
# --------------------------------------------------------------
# Device
# --------------------------------------------------------------
device = Device("cuda:0") if cuda_available() else Device("cpu")
# --------------------------------------------------------------
# XOR dataset (repeat to form a small batchable dataset)
# --------------------------------------------------------------
x_base, y_base = _xor_data_numpy()
repeats = 256
x_np = np.repeat(x_base, repeats=repeats, axis=0)
y_np = np.repeat(y_base, repeats=repeats, axis=0)
x = numpy_to_tensor(x_np, device=device)
y = numpy_to_tensor(y_np, device=device)
# --------------------------------------------------------------
# Model
# --------------------------------------------------------------
hidden_dim = 8
model = Sequential(
Linear(2, hidden_dim),
Sigmoid(),
Linear(hidden_dim, 1),
Sigmoid(),
)
model.to_(device)
model.build((1, 2), device=device)
# --------------------------------------------------------------
# Callbacks
# --------------------------------------------------------------
callbacks = [
EarlyStopping(
monitor="acc",
mode="max",
patience=5,
min_delta=1e-4,
restore_best_weights=True,
),
ModelCheckpoint(
filepath="xor_epoch{epoch:03d}_loss{loss:.6f}.json",
monitor="acc",
mode="max",
save_best_only=True,
verbose=1,
),
]
# --------------------------------------------------------------
# Training
# --------------------------------------------------------------
history = model.fit(
x,
y,
loss="mse",
optimizer="sgd",
optimizer_kwargs={"lr": 1.0},
metrics=["acc"],
batch_size=32,
epochs=2000,
shuffle=True,
callbacks=callbacks,
verbose=1,
)
# --------------------------------------------------------------
# Evaluation
# --------------------------------------------------------------
x_eval = numpy_to_tensor(x_base, device=device)
pred: Tensor = model(x_eval)
pred_np = np.asarray(pred.to_numpy(), dtype=np.float32)
print("device:", str(device))
print("pred:", pred_np.reshape(-1).round(3).tolist())
print("acc:", _accuracy_from_pred_np(y_base, pred_np))
Notes on CUDA inference batching
When running on CUDA, KeyDNN currently expects evaluation to be performed in mini-batches
(e.g., batch_size=128) rather than passing an entire dataset tensor through the model at once.
If you hit a runtime error during evaluation with a large N, rerun inference in batches.