As the year comes to an end, I wanted to share a quick update on a side project we’ve been working on.
Regular readers will know that we have spent a considerable amount of time developing Preemptive AI for Domino, which is an end-to-end AI solution for Domino versions 12, 14, and 14.5. Along the way, we even added audio-to-text transcription. If you missed that series, you can find some the details here
One of the biggest practical challenges with LLMs is that to enable a model to understand your data, you must provide it securely, in a useful format, and in a way that scales.
There are several ways to do that, and one of the most common is Retrieval-Augmented Generation (RAG).
So to learn more, we built a prototype.
The goal was to build a system where we could query a knowledge store (Domino) and have an LLM respond using the most relevant source material from that store. We used three months of my email for this experiment, which made it easy to validate the results.
At a high level, the app worked like this:
1. Extract text from emails, clean it, and store it in corresponding JSON files.
- The text was split into chunks suitable for embedding (about 1,500 characters each).
- Words are not split across chunk boundaries.
- Chunks include a 5-word overlap to preserve context.
2. Generate embeddings using the nomic-embed-text model via a local Ollama server.
- Each chunk of text got its embeddings and was turned into a vector in a matrix that has 768 dimensions. The embeddings capture meanings, not so much keywords. The math here is unbelievable and a big part of the magic that makes this work.
- Embeddings are stored in a local vector-enabled database (not Domino).
3. Run queries against the vector store and return the top X matches.
4. Augment the user prompt with those retrieved results, then send the expanded prompt to the LLM for final processing.
Performance and results:
Note: A Mac Mini M4 Pro handled all the computational tasks. No efforts were made to optimise processing.
Sample set: 11,208 email messages (1.48GB)
Text extraction: 192 messages/sec, 6,488,707 words
Embedding generation + storage: 11 messages/sec
Resulting vector database size: 95.7 MB
Queries: A typical vector query takes less than one second—it’s unbelievably fast. The hit rate is excellent for the kinds of messages you’d hope it would find.
Conclusion: The results were fantastic. Embedded retrieval is extremely fast, and when it functions effectively, it feels a bit magical.
So what’s next?
We learned a lot building this prototype. However, we understand that HCL has RAG support planned, and since we have no idea what that is going to look like, for now, we’ll wait to see what is included with Domino 2026. Once that is clearer, we will decide whether it is worth investing more time into this concept.
I think that's it for 2026. All the best over the holiday break—and we'll catch up in 2026.
No comments:
Post a Comment