About the idea
Searching for a phrase in multiple documents is not a new thing and many implementations exist, however such searching will usually only provide you if and roughly where in the documents a searched phrase exists. With Collabora Online and LibreOffice we can do better than this and in addition provide the search result in form of a thumbnail of the search location. In this way it is easier for the user to see the context, where the searched phrase is located. For example, if it is located in a table, shape, footer/header, or is it figure text or maybe "alt" text of an image.
Thanks to the sponsor of the work - NLnet Foundation, we are implementing this solution for Writer documents.
The solution to this consist in 3 parts:
- preparing the data for indexing,
- indexing and searching
- rendering of the result
In this post I will describe what has been done for milestone 1.
Milestone 1 - preparing data for indexing
- indexing: indexing paragraph text with the ModelTraverser
- indexing: indexing graphics for the IndexingExport
- indexing: indexing OLE objects for the IndexingExport
- indexing: indexing shapes/text boxes for the IndexingExport
- indexing: indexing tables for the IndexingExport
- indexing: write parent index to paragraphs if possible
- indexing: indexing sections for the IndexingExport
- indexing: add test case for fontworks and footer/header paragraphs
- indexing: add indexing export as an export filter for Writer