RAG systems
I design and tune retrieval pipelines end-to-end: chunking, embeddings, hybrid search, and reranking.
I build, measure what matters, and share what I learn. No "thought-leadership speak", just findings with numbers.
I design and tune retrieval pipelines end-to-end: chunking, embeddings, hybrid search, and reranking.
I build measurement workflows with retrieval metrics like MRR, NDCG, and Recall@K to prove what actually improved.
I ship LLM-powered features with structured outputs, tool use, and practical reliability constraints.
I spent 25 years in enterprise consulting working with companies like MarkLogic, Oracle, and RightNow. I've led technical delivery for banks, telcos, and government agencies across APAC and North America. Now I'm applying that lens to AI engineering: evaluation pipelines, RAG systems, and the infrastructure that tells you whether an LLM actually works before it ships.
Background: Technical leadership and delivery where things need to work reliably, at scale, for real users.
Focus: Measurable outcomes over demos. Retrieval metrics, failure-mode analysis, production-minded AI.
What metric matters? What's the target? What does "good enough" look like?
Before optimizing, know where you are. Can't claim improvement without a starting point.
Each experiment answers a question. The data tells you what to try next.
Four-agent pipeline scoring sentiment, discovering themes, and aligning to a roadmap across 4,742 reviews from Amazon, Yelp, and the App Store.
RAG system for 1,000 academic papers. Hybrid retrieval, cross-encoder reranking, validated against 3,045 human-authored queries.
I tuned the constants until the math passed. The deliverable was still empty, and that was the right answer.
Spec implied
≥1 high-priority gap
Corpus actually had
0 high-priority gaps