Friday, November 17, 2023

Indexing attachments without filters -- madness?

While building the new FTI feature of the Domino Optimizer, we had the chance to revisit the attachment filtering feature of Domino.

To recap, when you create a Full-Text index and choose to index attachments, you have another option on how Domino will extract words and numbers from attachments. It can do it either by using an attachment filtering process (Apache Tika) or just taking the raw data stream and running with it.

Now, in our recent testing, the second option nearly ALWAYS leads to a worse outcome. It takes longer, the index is bigger, and I'd guess it is full of rubbish. What a waste. So, as of 2023, I think you really need to double-check why you'd ever want to select this option!

New Rule of Thumb: Always index attachments using the attachment filter process.

I'd love to hear your thoughts and experiences.

