01
TensorRT Edge-LLM
NVIDIA's high-performance C++ runtime for LLMs and VLMs on Jetson and DRIVE platforms. FP8, NVFP4, and INT4 quantization with EAGLE-3 speculative decoding and KV-cache compression. Demonstrated at CES 2026 with Bosch, ThunderSoft, MediaTek partner showcases. Fits Qwen3-VL, Llama 3.2, Phi-4 in the 4-8 GB envelope of an Orin Nano Super.