Runtime Infrastructure Research

The intelligence in AI
shouldn't stop at
the model.

We are building the intelligence layer that makes AI systems cheaper to run at scale.

Scroll

What we're working on

Stop recomputing LLM work

Most LLM systems recompute identical work because they can't prove reuse is safe. We're building infrastructure that can prove it, turning reuse from a correctness risk into a deliberate decision.

Routing is the new runtime

Execution today defaults to the cloud, even though cost and latency vary per request. We're building systems that decide where computation should run at request time, choosing between on-device, edge, or cloud based on what each request actually needs.

Stay close to our ideas.

Research updates, new papers, and ideas from the lab.