We build small, specialized AI models so intelligence can be fast, affordable, and invisible — embedded in the products that already do the work.
Small — specialized beats general. The frontier worth racing to isn't the biggest model that can do anything. It's the smallest model that can do one thing exceptionally well. A well-trained 1B specialist often outperforms a 500B generalist on the tasks it was built for, at a fraction of the cost and latency.
Instant — latency is a feature. When AI responds in 80ms instead of 80s, it stops being a tool you visit and becomes a capability that lives inside everything else you do.
Invisible — the best AI disappears. You don't "launch" your car's ABS or "open" autocorrect. AI should work the same way: inside workflows, inside documents, inside the tools people already use.
We publish what worked and what didn't — papers, engineering write-ups, and the dead ends in between.
Models & results
- Kinetic-4B: A 4B model that outperforms Claude Haiku at tool calling — fine-tuned a small open-source model to beat frontier models at picking the right tool with the right arguments. p95 < 2s on a single GPU. (Apr 2026)
- LLM Inference at the Edge — mobile, NPU, and GPU performance/efficiency trade-offs under sustained load. (arXiv, Mar 2026)
Distributed training series
- From single-GPU to distributed training: a framework for making the right call
- DistributedDataParallel: how it actually works
- Tensor parallelism and sequence parallelism
- Pipeline parallelism: how it actually works
- ZeRO and FSDP: model sharding
- 3D parallelism: how frontier models use every GPU
Agent systems
- Hybrid lexical–semantic retrieval for tool selection in agent systems — routing natural-language intent across large API catalogs.
We're a small team in Bangalore building the model layer for a world where AI is everywhere and no one has to think about it.