Research

What we’re actively working on.

A short list. Bar is the same as for production: open foundations, cited references, repeatable experiments, public benchmarks where they exist.

WASM-native model serving

WasmEdge + Wasmtime cold-start at 1-5 ms is faster than container-based serving by 100×, but model weights are still big. We're studying GGUF / GGML packaging inside WASI components for shipping LLMs as Wasm cold-starts.

Federated fine-tuning on mobile

ExecuTorch + NVIDIA FLARE shipped a working pattern in 2025. We're benchmarking how much per-device fine-tuning compute is justified vs aggregating deltas centrally — and what the privacy + battery trade-offs look like in real residential / wearable fleets.

Cross-runtime model compatibility

ONNX is the official interchange but real deployments use TensorRT, OpenVINO, LiteRT, ExecuTorch, GGUF — each with different quantization options. We're building a verifier that proves a quantized model preserves the bounds of its FP32 reference.

Mesh-resilient OTA campaigns

What's the right rollout strategy for a fleet that's intermittently online via LoRa or private-5G mesh? We're modelling canary + TPM gating with mesh self-healing as a single coupled optimisation problem.

On-device LLM serving on Pi-class hardware

Hailo-10H runs 2B-param LLMs at ~10 tok/s on a Pi 5 for $130. We're characterising which model classes (Llama 3.2 3B, Phi-4, Qwen3) survive INT4 quantization with acceptable quality, and how to compose multi-model ensembles in the 4-8 GB RAM envelope.

Sovereign control plane in air-gapped operation

Azure Local Disconnected Operations (April 2026) and Microsoft Sovereign Private Cloud (April 27 2026) prove the pattern works at scale. We're studying open-source equivalents — KubeEdge + Akri + Wasmtime + TPM attestation — that don't require Azure subscription.

Working on something adjacent? We collaborate with universities, national labs, and OEMs on these questions. Reach out.