Thursday, July 1, 2021

Document searching and indexing export - Part 1

About the idea

Searching for a phrase in multiple documents is not a new thing and many implementations exist, however such searching will usually only provide you if and roughly where in the documents a searched phrase exists. With Collabora Online and LibreOffice we can do better than this and in addition provide the search result in form of a thumbnail of the search location. In this way it is easier for the user to see the context, where the searched phrase is located. For example, if it is located in a table, shape, footer/header, or is it figure text or maybe "alt" text of an image. 

Thanks to the sponsor of the work - NLnet Foundation, we are implementing this solution for Writer documents.

The solution to this consist in 3 parts:

  • preparing the data for indexing, 
  • indexing and searching 
  • rendering of the result
Preparing the data for indexing and rendering of the search result is done in LibreOffice core, while the actual indexing and searching is delegated to one of the existing indexing and searching databases / frameworks (we will provide support for Apache Solr). 

In this post I will describe what has been done for milestone 1.

Milestone 1 - preparing data for indexing

Indexing data usually consists of (enriched) text, however in our case we also need to provide additional internal information, where the text is located, so it is possible to later go to the search result location and create a thumbnail of the document. In Writer we can provide a node index of the paragraph, with which it is possible to quickly identify the text in the document model and generate a thumbnail of the area around the text.

The data for indexing is provided by a "indexing export" filter in LibreOffice, which creates a XML document with a custom structure. The root element is <indexing> and the child elements are paragraphs with index and text, which can be nested in sub-elements (like image, shape, table, section) depending on where the paragraph is located. 

For example:

 <?xml version="1.0" encoding="UTF-8"?>
<indexing>
 <paragraph index="6">Drawing : Just a Diamond</paragraph>
 <paragraph index="12"></paragraph>
 <shape name="Circle" alt="" description="">
  <paragraph index="0">This is a circle</paragraph>
  <paragraph index="1">This is a second paragraph</paragraph>
 </shape>
 <shape name="Diamond" alt="" description="">
  <paragraph index="0">This is a diamond</paragraph>
 </shape>
 <shape name="Text Frame 1" alt="" description="">
  <paragraph index="0">This is a TextBox - Para1</paragraph>
  <paragraph index="1">Para2</paragraph>
  <paragraph index="2">Para3</paragraph>
 </shape>
</indexing>

The indexing export is build upon a ModelTraverser class, which was created for the indexing purpose, but can be reused for other purposes (it is similar to what AccessibilityCheck does, but generalised, so AccessibilityCheck can in the future be refactored to use it). 

The purpose of ModelTraverser is to traverse through the Writer document model, and provide SwNode and SdrObjects to the consuming objects - in our case IndexingExport class, which extracts the text from those objects (depending on the object type) and with help of a XmlWriter, writes the indexing data to the XML file.

Indexing export filter can be tested with the LibreOffice command line "convert-to" tool in the following way:

soffice --convert-to xml:writer_indexing_export <Writer document file path>


The commits implementing this milestone 1 functionality:

In the next milestone, we will render the thumbnail with the provided search result data.


To be continued...

Thursday, May 13, 2021

Command Popup HUD for LibreOffice

Command Popup is a pop-up window that lets you search for commands that are present in the main menu and run them. This was requested in bug tdf#91874 and over-time accumulated over 14 duplicated bugs reports, so it was a very requested feature.

I'm intrigued by similar functionality in other programs, because it enables very quick access to commands (or programs) and at the same time don't need to move your hand off the keyboard. It also makes it easy to search for commands - especially in an application like LibreOffice with humongous main menu. So I decided to try to implement it for LibreOffice.

Figure 1: Command Popup window

I was working on it here and there in my free time and managed to make it work as I imagined, however it was very rough around the edges and needed a lot of polish. Luckily in April, we had a hack week at Collabora, where I decided to use some time to work on finishing the command popup. I dusted up the old code and converted it to use the weld framework for widgets and fixed the many bugs, but I didn't manage to finish it completely so it took until recently that I actually pushed the code upstream into master.

The main UX focus is to easy search and navigate with the keyboard. When the Command pop-up is focused, all keyboard events should go to the search edit box, so it is possible to change the search term, however hitting up/down should change the selection in the tree view, where the search results are shown, and enter should execute the command. To get this working correctly was quite a challenge, but I found the correct formula eventually after trying some different ideas. Of course using the mouse should still work as well. 

To show the Command Popup, there is a menu entry in "Help > Search Commands" and is by default bind to "Ctrl+F1" shortcut (however this may change). 

The Command Popup will be available in LibreOffice 7.2, but if you want to try it out, you can get the current daily build, or wait for the LibreOffice 7.2 Alpha1. Any suggestions and comments are welcome. If you find a bug, please report it in the LibreOffice bugzilla page.


Wednesday, March 24, 2021

Built-in "Xray" like UNO object inspector – Part 3

DevTools implementation has been completed and this is the third and final part of the mini-series. The focus of this part is on the object inspector, but I have also improved or changed other DevTools parts, so first I will briefly mention those.

New menu position


Figure 1: Development Tools location in the menu

The position in menu has changed from "Help / Development Tools" to "Tools / Development Tools". The new position fits better as it is near the position of macro editor and they go hand in hand with each-other.

Updates to the document model tree view



Figure 2: Document model tree view and "Current Selection" toggle button

The left-hand side document model tree view has been changed to use the same approach than the object inspector tree view uses, where the object attached tree view (in this case DocumentModelTreeEntry) has all the behavioural logic for a specific tree view node. This makes it easier to manipulate the tree view and add new nodes (if necessary).

In the document object tree view I have added a top toolbar with "Refresh" (icon) button and I changed the existing “Current Selection” button to a toolbar toggle button, so it looks more consistent as the is only the toolbar now (see figure 2).

In Writer, each paragraph tree view node now has text portion child nodes (as shown in figure 2), to make it possible to quickly inspect all the individual parts of a paragraph.

The object inspector tree view

Figure 3: Top toolbar and tab bar in object inspector


The object inspector shows various information about an object. Each object has a implementation class, which is always shown for the current object (see figure 3).

The other information that the object inspector shows are divided into four main categories:
  • "Interfaces" - the interfaces that the current object implements
  • "Services" - the services that the current object supports
  • "Properties" - the properties of the current object
  • "Methods" - the combined methods that can be called on the current object
On the user interface, the categories are divided with a tab bar (see figure 3). Each tab represents a different categories. The tabs are filled on "entry" - when the user clicks on the tab.

In the code the tabs and tree views are all handled by the ObjectInspectorTreeHandler and the hierarchy of objects attached the tree view that implement the ObjectInspectorNodeInterface (see include/sfx2/devtools/ObjectInspectorTreeHandler.hxx). 

The two major categories are "Properties" and "Methods", which I describe in more detail next.

Properties

Figure 4: Object inspector "Properties" tab


There are three types of properties of an object:
  • Properties accessible view XPropertySet.
  • Properties defined as an attribute (marked "[attribute]" in IDL). 
  • Pseudo properties defined by a get and set method. 
All the three types are represented in the properties tab and can be identified by the "Info" column. Attribute properties have an "attribute" flag, pseudo properties have either a "get", "set" or both flags. If neither flags exists, it represents a property that is from XPropertySet. 

In the tree view there are four columns for each property - "Name", "Value", "Type" and already mentioned "Info". The "Name" of the property is always available, but it is possible that two properties have the same name (because of multiple property types). The "Value" shows the value of the property, which is converted to the string, if this is possible (it should be if the property is basic), otherwise a representation string is created. The representation string for objects shows the implementation name ("<Object@SwXTextRange>"), for sequences it shows the size ("<Sequence [5]>") and for structs it just mentions the type ("<Struct>").

A node in the “Properties” tree view can be expanded (if offered) so it is possible to recursively inspect the the objects further. In case the property is a struct, it shows the struct members, and if it is a sequence, it shows the indices each object has in the sequence.

There are special properties, which names start with "@". This are added for convenience, so it is possible to inspect objects, that implement XIndexContainer, XNameContainer or XEnumeration interfaces. When such an object is found, the entries gathered using those interfaces are added to the tree view, and are prefixed with "@". For example "@3" (XIndexContainer)   "@PageStyles" (XNameContainer or XEnumeration). The type of the special property is written in the "Info" column ("index container", "name container" and "enumeration" flags). Note that this functionality is not present in Xray or the Macro editor's debugger, but was added for convenience.

Figure 5: "Properties" tab and the text view

On the bottom of the "Properties" tab there is a text view, which shows the full value of the current selected property. In the tree view, the value shown is always using a short form (shortened to 60 chars with new-line characters removed) to not make the tree view too wide. The full value therefor is written in the text view, where it can be inspected in full and has also a working copy/paste (see figure 5).

Object Stack

Related to the "Properties" is the object stack. It is possible to select an object in the tree view and inspect the object (either using the context menu or the toolbar "Inspect" action). In this case the object in the object inspector will change to the selected one. This is convenient when you are only interested in one object down in the tree view hierarchy and want to inspect only that. In that case the previous object will be added to the stack, and can be returned to with hitting the "Back" button in the toolbar.

Note that going to another object (not using "Inspect" action) will always remove the object stack. 

Methods

Figure 6: Object inspector "Methods" tab


The "Methods" tab contains a tree view that shows all the methods, that can be called for the current object. Each method is represented by four columns (see figure 6):
  • "Method" - name of the method
  • "Return type" - the return (simplified) type of the method
  • "Parameters" - list of input parameters, where each one lists the direction ("in", "out" or "in/out"), the parameter name and the simplified type 
  • "Implementation Class" - class/interface where the method is implemented
Currently the types of parameters and return types are simplified, with only basic types, "void", "any", "sequence" and "object" that represents all the objects and the type of the object isn't written. The reason for this is to make it easier to read. 

Future ideas

There are many improvements that can still be made, but aren't included in the current implementation. 

I think it would be quite convenient to have the ability to open a object inspector in mode-less dialog separate to the DevTools, just to quickly look up a property. 

Another big upgrade would also be the ability to change values of basic types for the properties and structs, so it is possible to quickly see what effect the change would have. Similar to changing property values is to call methods with defining the parameters, but only if the parameters are basic types.

My initial vision of DevTools was not that it will be only one tool (object inspector), but more tools just like the development tools in the browser, so I'm sure there will be more useful things integrated over time.

I think there are a lot of ideas you may also have, so please tell me if you have a good one. Of course if you find something that is not working as expected, please let me know.

Credits

Many thanks to TDF and users that support the foundation by providing donations, to make this work possible.

Monday, March 1, 2021

Built-in "Xray" like UNO object inspector – Part 2

Since my last blog post I've been continuing the work on DevTools and since then a lot of things have progressed. Point & click has been implemented and the object inspector view has been greatly improved to show current object’s properties and methods. In this part I will mainly talk about the point & click and a bit about the current state, and in the next blog I will extensively talk about the object inspector.

Point & click

Figure 1: Current selection button


The idea of this functionality is to provide a way to inspect selected objects in the document, for example an image or a shape. For this, I have implemented a selection change listener (sfx2/source/devtools/SelectionChangeHandler.hxx), whose purpose is to listen to the selection changes that happen in the document and store the latest selection object. It is started when the DevTools docking window is instantiated and shown. I have added a new toggle button “Current Selection” (see Figure 1) to the UI. When the button is selected, it automatically shows the current selected object (gathered with the selection change listener) in the object inspector. 

Figure 2: Current selected shape's properties shown in the object inspector

In the example shown in Figure 2, we can see the shape is selected in the document and its properties are shown in the object inspector. If the "Current Selection" button wouldn't be toggled, then the document top-level object would be shown in the object inspector or the selected object in the DOM tree view.

While the "Current Selection" button is toggled, selecting any object in the DOM tree view (left-hand side tree view) has no effect, however if the current selected object is also present in the current DOM tree view, it will be selected. Note that if the object is not present in the tree, it won't be selected, because the DOM tree view will not force creation of on-demand object because of performance considerations.

Figure 3: "Inspect Object" command in "Customize" dialog 


In addition to showing the selected object, I have added a UNO command named “Inspect Object” (.uno:InspectSelectedObject), which can be added to context menus for objects (See Figure 3). The purpose of this command is to provide a way to open the DevTools docking window and automatically show the current selected object. If a user regularly uses the object inspector, this may be a more convenient way for them to access DevTools. Note that by default the command isn't added to any context menu, this is up to the user. However, if there will be demand to add this to context menus, it can be easily added. 

Figure 4: "Inspect Object" context menu entry on a shape object

The example in Figure 4 shows the context menu of a shape object, where the selected entry is the added "Inspect Object". 

From the implementation standpoint, it was necessary to move the whole DevTools from svx to sfx2 module. This was mainly necessary to get .uno:InspectSelectedObject to work, because we need to signal to DevTools that we want to show the current selection and not the document root in the object inspector. Because the svx depends on sfx2 module, it is not possible to access svx from sfx2 (only the other way around). 

Improvements to object inspector

The object inspector was previously a single tree view only, which had services, interfaces, properties and methods categories as root tree entries. This has now been changed so that the categories are now pages in a tab view, and each category has its own tree view (can be seen in Figure 2). The main problem with one tree view is that columns for each of the categories are different. For example, the properties category has object, value and type categories but the same columns make no sense for methods (which has return type and input parameters). 

For methods it now shows the method name, return type and parameters. The types are currently simplified types, which are easier to read (instead of exact type name of the object it just writes "object"), but the user will want to know the exact type too, so this is a WIP.

For properties it shows the type and value of the property, and it is possible to expand a property if the type is a complex type (object, struct) so it lists nested properties. If the value is an enum, then we get the name of the enum value automatically and show the name instead. 

Support for sequences was also added, so the sequence can be expanded and a list of indices and values is presented. If the current object supports XNameAccess or XIndexAccess, the names and indices are added into the property list, so the user can navigate to those. 

With this additions, it is already easier to inspect objects than it previously was using the Xray tool, and I'm sure it will get even better when it is finished. 

Next steps

The object inspector is already in a very good shape so I encourage everyone to try it and give feedback, what can be improved, changed or added - especially if you use Xray or MRI regularly. 

For the next steps the major focus will be to fix a couple of bugs and crashes (mainly due to missing checks if objects are available), work on the UI, object stack (so it is possible to go back to the previous object) and finalizing all the features of the object inspector. 

Credits

Many thanks to TDF and users that support the foundation by providing donations, to make this work possible. 

To be continued...

Thursday, January 21, 2021

Built-in "Xray" like UNO object inspector – Part 1

When developing macros and extensions in LibreOffice it is very useful to have an object inspector. With that tool you inspect the object tree and the methods, properties and interfaces of individual objects to understand its structure. There are quite some different object inspectors available for LibreOffice as extension. Probably the best known is called “XrayTool”, but there were also others like MRI, which was build for a similar purpose and various other more simple object inspectors (for example one is provided as an code example in ODK).

As a tool like this is very valuable it makes sense that it would be provided out-of-the-box by LibreOffice without the need for the user to get it separately. Also the user could even be unaware the such a tool exists.  For this reasons The Document Foundation (TDF) put up a tender to create a built-in Xray like UNO object inspector, which was awarded to Collabora and we are now in the process of implementing it.

Thank you TDF and the donating users of LibreOffice to make the work on this tool possible.

The Plan

The major gripe with Xray and other tools is that the user needs to go into the macro editor and run a the script with the inspecting object as the parameter. 

For example to inspect the root UNO object:

Xray ThisComponent

or for example to inspect a specific (first) Calc sheet:

Document = ThisComponent
AllSheets = Document.getSheets()
MySheet = AllSheets.getByIndex(0)
Xray MySheet

Figure 1: Watch window in the macro editor

We can do much better than this. So the idea is to have a object inspector tool as a dockable bottom widget, very similar to “developer tools” that you can find in popular web browsers.

The left hand side of the tool will have a subset of the document model (DOM) presented as a tree and on the right hand side a object inspector view that will show the current UNO object’s properties, methods in a tree view that is taken from the Watch window in the LibreOffice macro editor (see Figure 1). When the user would select an UNO object in the DOM tree view (left-hand side), the object inspector (right-hand side) would show it.

An additional functionality that is present in developer tools in web browsers is the ability to point & click an object directly in the document, which can then be inspected in the developer tool. A similar functionality to this will be possible in LibreOffice, where you will be able to select an object in the document, that is then be shown in the object inspector (right-hand side).

Implementation so-far

The implementation has been divided into 4 parts:

  1. Introduce a new dockable window on the bottom of the UI.
  2. DOM tree view (left-hand side)
  3. Implement point&click functionality.
  4. Object inspector view (right-hand side)

Currently the 1. and 2. part have been implemented, so this blog post serves as a report of the work that has been done until now. For 3. and 4. part, a limited subset of the functionality has been  already implemented, but it is not yet expected to work correctly and may change a lot in the future.

Dockable window

Figure 2: Development Tool in the menu

The dockable window has been added to all components of LibreOffice, which can be enabled in the menu (see Figure 2) under Help / Development tool (this location and the name can still change). The development tool code is all contained under svx/source/devtools/ where the docking window is implemented in DevelopmentToolDockingWindow.cxx.

DOM tree view (Left-hand)

Figure 3: Development tool dockable window

The idea of the DOM tree view is to provide a useful subset of the DOM objects, that the user can quickly access and inspect. Without this, the users would have to traverse to the objects they are interested in inside the object inspector itself, which may not be as straight forward to the users that aren’t familiar with the DOM, and even if they are, the DOM subset still makes it easier.

The DOM tree view as currently implemented (see Figure 3), has a root “Document” element always available, which represents the root document UNO object, from where you can traverse to any other relevant UNO object in the DOM tree. Other available objects depend on the component.

In Writer the available objects are:

  • Paragraphs
  • Shapes
  • Tables
  • Frames
  • Graphic Objects
  • Embedded Objects (OLE)
  • Style Families & Styles

In Calc:

  • Sheets
  • Shapes (per sheet)
  • Charts (per sheet)
  • Pivot Tables (per sheet)
  • Style Families & Styles

In Impress and Draw:

  • Pages / Slides
  • Shapes (per page / slide)
  • Master Slides
  • Style Families & Styles

If there are other object(s) you would like to see in the DOM tree view, please let me know.

The main implementation file for the left-side DOM tree view is DocumentModelTreeHandler.cxx in the svx/source/devtools/ folder. For each node of the tree view, there is an object attached (subclass  DocumentModelTreeEntry) to the node, which is responsible to fill the child nodes and provide the UNO object of the current node. The whole DOM tree view is populated on-demand, when the tree is expanded. This makes sure that we don’t take a lot of time inserting the whole object tree before-hand and have a more up-to-date view of the UNO objects. 

All the code is present in the LibreOffice master repository. Please try out the development tools in a recent daily build. Any comments or suggestion are welcome. 

Next part - point&click functionality

Partial support for the point&click has already been added. There is a selection listener added, which remembers what the selected object in the document is. This object is the available in the DOM tree view under the name “Current Selection”. If the user clicks on the node in the tree view, then the right-hand object inspector shows the UNO object.

To be continued..