Work related ramblings: RAG and Domino: A prototype we built while we wait for Domino 2026

As the year comes to an end, I wanted to share a quick update on a side project we’ve been working on.

Regular readers will know that we have spent a considerable amount of time developing Preemptive AI for Domino, which is an end-to-end AI solution for Domino versions 12, 14, and 14.5. Along the way, we even added audio-to-text transcription. If you missed that series, you can find some the details here

One of the biggest practical challenges with LLMs is that to enable a model to understand your data, you must provide it securely, in a useful format, and in a way that scales.

There are several ways to do that, and one of the most common is Retrieval-Augmented Generation (RAG).

So to learn more, we built a prototype.

The goal was to build a system where we could query a knowledge store (Domino) and have an LLM respond using the most relevant source material from that store. We used three months of my email for this experiment, which made it easy to validate the results.

At a high level, the app worked like this:

1. Extract text from emails, clean it, and store it in corresponding JSON files.

The text was split into chunks suitable for embedding (about 1,500 characters each).
Words are not split across chunk boundaries.
Chunks include a 5-word overlap to preserve context.

2. Generate embeddings using the nomic-embed-text model via a local Ollama server.

Each chunk of text got its embeddings and was turned into a vector in a matrix that has 768 dimensions. The embeddings capture meanings, not so much keywords. The math here is unbelievable and a big part of the magic that makes this work.
Embeddings are stored in a local vector-enabled database (not Domino).

When it is time to answer a question, we follow these steps:

3. Run queries against the vector store and return the top X matches.

4. Augment the user prompt with those retrieved results, then send the expanded prompt to the LLM for final processing.

Performance and results:

Note: A Mac Mini M4 Pro handled all the computational tasks. No efforts were made to optimise processing.

Sample set: 11,208 email messages (1.48GB)

Text extraction: 192 messages/sec, 6,488,707 words

Embedding generation + storage: 11 messages/sec

Resulting vector database size: 95.7 MB

Queries: A typical vector query takes less than one second—it’s unbelievably fast. The hit rate is excellent for the kinds of messages you’d hope it would find.

Conclusion: The results were fantastic. Embedded retrieval is extremely fast, and when it functions effectively, it feels a bit magical.

So what’s next?

We learned a lot building this prototype. However, we understand that HCL has RAG support planned, and since we have no idea what that is going to look like, for now, we’ll wait to see what is included with Domino 2026. Once that is clearer, we will decide whether it is worth investing more time into this concept.

I think that's it for 2026. All the best over the holiday break—and we'll catch up in 2026.

2 comments:

Shane said...: Hi Adam, this is Amazing... we are also working on different things. We have our own internal LLM server and have enabled effectively Domino IQ Summarize, Email Correction in Verse. I would love to know more about your project. Or to collaborate some. Could we connect?-Shane; December 21, 2025 at 3:53 AM
Adam Osborne said...: Thanks Shane, sure thing, just drop me a line at aosborne@preemptive.com.au; December 21, 2025 at 3:58 PM

Friday, December 19, 2025

RAG and Domino: A prototype we built while we wait for Domino 2026

2 comments: