01
Runtime
Edge AI execution — silicon to model.
TensorRT Edge-LLM for Jetson / DRIVE LLMs and VLMs (FP8, NVFP4, INT4 quantization, EAGLE-3 speculative decoding). LiteRT and ONNX Runtime as the cross-platform default. ExecuTorch for PyTorch on microcontrollers. OpenVINO for Intel-tuned inference. WasmEdge / Wasmtime for sandboxed serverless edge — 1-5 ms cold starts vs 100ms-1s+ for containers.
How runtime works