Capability · Runtime

Models execute at the edge of your perimeter.

Furcate composes the best open-source edge AI runtimes — different models live happily on different silicon, and the operations layer treats them all the same. The runtime is the abstraction; the hardware is your choice.

TensorRT Edge-LLM

NVIDIA's high-performance C++ runtime for LLMs and VLMs on Jetson and DRIVE platforms. FP8, NVFP4, and INT4 quantization with EAGLE-3 speculative decoding and KV-cache compression. Demonstrated at CES 2026 with Bosch, ThunderSoft, MediaTek partner showcases. Fits Qwen3-VL, Llama 3.2, Phi-4 in the 4-8 GB envelope of an Orin Nano Super.

LiteRT (TensorFlow Lite)

Cross-platform inference — Android, embedded Linux, microcontrollers via TFLM. Built-in quantization and compression. The default runtime for vendor-portable deployments and the only viable option on most microcontrollers.

ONNX Runtime

Cross-platform inference engine optimised across CPUs, GPUs, NPUs, FPGAs. The hardware-agnostic interchange runtime when the customer's silicon mix is heterogeneous. Strong vendor neutrality story for OEM-distributed products.

ExecuTorch

PyTorch on microcontrollers and embedded edge — bytecode VM with AOT compilation. Pairs cleanly with NVIDIA FLARE for federated fine-tuning on mobile (Meta + NVIDIA collaboration). The pattern when your model lineage starts in a research lab on PyTorch and needs to land on a phone or wearable.

OpenVINO

Intel-optimised inference for CPUs, GPUs, VPUs, FPGAs. Strong fit for industrial vision (smart cameras, intelligent retail) and any deployment standardised on Intel silicon. Native pairing with Movidius accelerators.

WasmEdge + Wasmtime

WebAssembly runtimes for sandboxed serverless edge. WASI 0.3 (February 2026) and WASI 1.0 (late 2026 / early 2027). 1-5 ms cold starts vs 100ms-1s+ for containers — a 100x improvement that lets edge inference become serverless. Cloudflare Workers runs ~10M Wasm requests / sec across 300+ edge locations as the proof of scale.

Quantization, packaging, lineage

FP32 → FP8 / NVFP4 / INT8 / INT4 with documented quality-bound preservation. GGUF / GGML packaging for cross-runtime portability. Every quantization step is logged so audits can replay the path from foundation-model checkpoint to edge-deployed binary.

Stack in play

Open foundations composed at this layer.

TensorRT Edge-LLM

NVIDIA · Jetson + DRIVE

FP8/NVFP4/INT4 LLM/VLM inference at edge power budgets.

LiteRT

Google · cross-platform

Default for Android, embedded Linux, microcontrollers via TFLM.

ONNX Runtime

Microsoft · vendor-neutral

Hardware-agnostic interchange for heterogeneous silicon mixes.

ExecuTorch

Meta · PyTorch on micro / mobile

AOT bytecode VM; pairs with FLARE for federated mobile FT.

OpenVINO

Intel · CPU/GPU/VPU/FPGA

Strong industrial-vision and Movidius integration.

WasmEdge / Wasmtime

Bytecode Alliance / CNCF

1-5 ms cold start; sandboxed serverless edge functions.

How orchestration uses these runtimes

Related capabilities