The long-form pieces. How we think about getting AI to work in places where the answer matters and the rules live in people’s heads.
When an expert overrides an AI system, that correction is not noise. It is a teacher. We study how to turn corrections into rules the system can apply next time, without retraining.
A model is only as good as what you put in front of it. We study how to pick the right context — the right documents, the right examples, the right structure — so accuracy comes from what the model sees, not how big it is.
Operators rarely write down what they do. We study how to watch real work and turn it into rules an AI system can follow — no interviews, no workshops, no policy documents.
Fine-tuning is expensive and hard to undo. We study how a living knowledge graph can turn a general-purpose model into a domain expert at the moment it answers, with no model weights touched.
Every tweak to a production AI pipeline used to mean paying to re-run the model on thousands of examples. Here is how we changed that, so the team can ship a fix on a Tuesday afternoon without waiting two weeks for a GPU bill.
The standard accuracy number on an AI retrieval dashboard can look fine while one slice of users gets wrong answers 7× more often than the rest. Here is what that looks like, why averaging hides it, and what we do instead.
We collaborate with researchers and teams building serious enterprise AI.
Get in touch