Monday, March 7, 2022

Sparklines in Calc

Sparklines are mini charts available in OOXML (XLSX) documents, but until now  were not supported by LibreOffice Calc. Thanks to the funding of NGI, this missing feature is now being implemented.


This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871498






To add support in LibreOffice for sparklines, we need to first read them into the LibreOffice data model, but the data model for sparklines doesn't yet exists, so we need to create that first. Sparklines are defined for one cell, but multiple sparklines can be grouped together into a group, which shares the same properties for rendering the sparkline. The unique data that is defined only for one sparkline is the data range, that a sparkline will use for rendering.

With this in mind we create a data model that consists of classes: SparklineCell -> Sparkline -> SparklineGroup, where SparklineCell is "added" to a cell and just holds a pointer to the Sparkline.

There are 3 types of sparklines supported in OOXML: line, column and stacked. The "line" type renders a line for the data range, "column" type shows each data point as a column bar, and "stacked" is a win/loss column bar, that shows if the data is positive or negative. 


Figure 1: "Line" and "Column" sparklines in LibreOffice Calc

There are many properties for a sparklines that can be customised by the user. These are some of them:
  • First Point, Last Point - if enabled, it shows the first and/or last point in different custom color (See Figure 1 - "column" sparkline in A3 cell, first is green and last is blue).
  • High Point, Low Point - if enabled, it shows the highest point and/or lowest point in different custom color (See Figure 1 - "column" sparkline at A6 cell, low points are in yellow).
  • Negative Point - if enabled, it shows the negative points in a different custom color (See Figure 1 - "column" sparkline at A6 cell, negative points are red).
  • Markes - if enabled, it shows the markers (only for "line" type) (See Figure 1 - "line" sparkline at A2 cell - shows markers).
  • Axis - if enabled, it shows the axis line
  • Right-to-left - if enabled, it shows the data in the right to left order
  • ...

Figure 2: Horizontal and vertical sparklines of all 3 types


Once we have the data model ready we can render the sparklines in the cell area. In Calc this currently looks like in Figure 1 and Figure 2. The Figure 1 shows examples of "line" and "column" sparklines (2 of each), and Figure 2 shows all three types for a block of (random) data (horizontally and vertically).

Currently the code for this is in a feature branch (feature/sparklines), but is in the process of being up-streamed to master. The feature will be available in LibreOffice 7.4.

Next step

With the current implementation, we can open a XLSX document with sparklines and they will be rendered in LibreOffice Calc, but we can't save the document and preserve the sparklines yet. It is also not possible to create new sparklines from scratch or change any of the properties yet (there is no UI for it). ODF support is also missing.

This things will be implemented in the following weeks, and when they are ready, I will blog about them again. 

Monday, September 13, 2021

Document searching and indexing export - Part 3

Milestone 3 - Gluing everything together into a search solution

In the part 1 we looked into indexing XML export, and in the part 2 into rendering a search result as an image. In this part we will glue together both parts with an indexing search engine (Solr) into a full solution for searching and introduce a "proof of concept" application for searching of documents.

Thanks to NLnet Foundation for sponsoring this work.

Solr search platform

Apache Solr is a popular platform for searching and the idea is to use it as our search and indexing engine. First we need to figure out how to put the indexing data from our indexing XML into Solr. Solr uses the concept of documents (not to be confused with a LibreOffice document), which is an entry in the database, which can contain multiple fields. To add documents into the database, we can use a specially structured Solr XML file (many others are supported, like JSON) and simply send it using a HTTP POST request.   

So we need to convert our indexing XML into Solr structure, which is done like this:
  • Each paragraph or object is a Solr document.
  • All the attributes of a paragraph or object is an field of a Solr document.
  • The paragraph text is stored in a "content" field.
  • An additional field is "filename", which is the name of the source (Writer) document.
For example:
    <paragraph index="9" node_type="writer">Lorem ipsum</paragraph>

transforms to:
    <add>
      <doc>
        <field name="filename">Lorem.odt</field>
        <field name="type">paragraph</field>
        <field name="index">9</field>
        <field name="node_type">writer</field>
        <field name="content">Lorem ipsum</field>
      </doc>
      ...
    </add>

Searching using Solr

Solr has a extensive API for querying/searching, but for our needs we just need a small subset of those. Searching is done by sending a HTTP GET to Solr server. For example with the following URL in browser:

http://localhost:8983/solr/documents/select?q=content:Lorem*

"documents" in the URL is the name of the collection (where we put our index data), "q" parameter is the query string, "content" is the field we want to search in (we put the paragraphs text in "content" field) and "Lorem*" is the expression we want to search for.


Proof of concept web application

Figure 1: Search "proof of concept" web application


The application is written in python for the server side processing and HTTP server and the client side HTML+JavaScript using AngularJS (for data binding, REST services) and Bootstrap (UI). The purpose of the web app is to demonstrate how to implement searching and rendering in other web applications.

The web app (see Figure 1) shows a list of documents in a configurable folder, where each document can be opened in Collabora Online instance. On top there is a edit filed and the "Search" button, with which we can search the documents, and a "Re-Index Documents" button, which triggers re-indexing of all the documents. 
Figure 2: Search "proof of concept" web application - Search Results

After we enter a search expression and click the "Search" button, we get a page with search results, which is a table of the document filename and the rendered image from the document, where in the document the search result has been found. See Figure 2 for an example.
There is a "Clear" button at the bottom, which clears the search results and shows the initial list of documents again.

About Server.py - REST and HTTP server

The server has the following services:
  • Provide the HTML and JS documents to the browser, so the web app can be shown
  • GET service "/document" - returns a list of documents
  • POST service "/search" - triggers a query in Solr and returns the result
  • POST service "/reindex" - triggers the re-indexing process
  • POST service "/image" - triggers rendering of an image for the input search result, and returns the image as base64 encoded string

Re-indexing service

Re-indexing glues together the "convert-to" service of the Collabora Online server, to get the indexing XML for a input document, conversion of the indexing XML to Solr supported XML and updating the entries in the Solr server.

Search service

Search service is using the Solr query REST service to search, and transforms the result to a JSON format, that we can use in the web app and is also compatible to use as an input to render a search result.

Image service

Sending a search result and the document to "render-search-result" HTTP POST service on Collabora Online server, the image of the search result is rendered and sent back. For easier use in the web client, the image is converted to base64 string. 

Demo video

Video showing searching in the WebApp:


Video showing re-indexing in the WebApp:




Proof of concept web app source location and relevant commits

The proof of concept web application is located in Collabora Online source tree inside the indexing sub-folder. Please check the README file on how to start it up.

Collabora Online:

Fixes and changes for LibreOffice core:




Tuesday, August 17, 2021

Document searching and indexing export - Part 2

Milestone 2 - Rendering an image of the search result


In the part 1, I talked about the functionality added to LibreOffice to create indexing XML file from the document, which can be used to feed into a search indexing engine. After we search, we expect a search hit will contain the added internal node information from the indexing XML file. The next step is that with the help of that information, we now render that part of the document into an image.

Thanks to NLnet Foundation for sponsoring this work.

Figure 1: Example of a rectangle for a search string

Calculating the result rectangle

To render an image, we first need to get the area of the document, where the search hit is located. This is implemented in the SearchResultLocator class, which takes SearchIndexData that contains the internal model index of the hit location (object and paragraph). The algorithm then finds the location of the paragraph in the document model, and then it determines, what the rectangle of the paragraph is.

The search hit can span over multiple paragraphs, so we need to handle multiple hit locations. With that we get multiple rectangles, which need to be combine into the final rectangle (union of all rectangles). See figure 1 for an example.

Rendering the image from the rectangle in LOKit

This part is implemented for the LOKit API, which can already handle rendering part of the document with an existing API, using rendering of the tiles.

The new function added to the API is:
bool renderSearchResult(const char* pSearchResult, unsigned char** pBitmapBuffer, int* pWidth, int* pHeight, size_t* pByteSize);

The method renders an image for the search result. The input is the pSearchResult (XML), and pBitmapBufferpWidthpHeightpByteSize are output parameters.

If the command succeeded, the function returns true, the pBitmapBuffer contains the raw image, pWidth and pHeight contain the width and height of the image in pixels, and pByteSize the byte size of the image. 

What happens internally in the function is, that the content of pSearchResult is parsed with a XML parser, so that a SearchIndexData can be created and send to SearchResultLocator to get the rectangle of the search hit area. A call to doc_paintTile then renders the part of the document enclosed by the rectangle to the input pBitmapBuffer.  

See desktop/source/lib/init.cxx - function "doc_renderSearchResult"

Collabora Online service "render-search-result"

To actually be useful, we need to provide the functionality in a form that can be "glued" together with the search provider and indexer to show the rendered image of the search hit from the document. For this we have implemented a service in the Collabora Online. The service is a just a HTTP POST request/response, where the in the request we send the document and the search result to the service, and the response is the image.

What the service does is:
  • load the document
  • run the "renderSearchResult" with the search result XML
  • interpret the bitmap and encode into the PNG format
  • return the PNG image
As an example how the service can be used, see in Collabora Online repository: test/integration-http-server.cpp - test method HTTPServerTest::testRenderSearchResult 

The following commits are implementing this milestone 2 functionality:

Core:
 

Thursday, July 1, 2021

Document searching and indexing export - Part 1

About the idea

Searching for a phrase in multiple documents is not a new thing and many implementations exist, however such searching will usually only provide you if and roughly where in the documents a searched phrase exists. With Collabora Online and LibreOffice we can do better than this and in addition provide the search result in form of a thumbnail of the search location. In this way it is easier for the user to see the context, where the searched phrase is located. For example, if it is located in a table, shape, footer/header, or is it figure text or maybe "alt" text of an image. 

Thanks to the sponsor of the work - NLnet Foundation, we are implementing this solution for Writer documents.

The solution to this consist in 3 parts:

  • preparing the data for indexing, 
  • indexing and searching 
  • rendering of the result
Preparing the data for indexing and rendering of the search result is done in LibreOffice core, while the actual indexing and searching is delegated to one of the existing indexing and searching databases / frameworks (we will provide support for Apache Solr). 

In this post I will describe what has been done for milestone 1.

Milestone 1 - preparing data for indexing

Indexing data usually consists of (enriched) text, however in our case we also need to provide additional internal information, where the text is located, so it is possible to later go to the search result location and create a thumbnail of the document. In Writer we can provide a node index of the paragraph, with which it is possible to quickly identify the text in the document model and generate a thumbnail of the area around the text.

The data for indexing is provided by a "indexing export" filter in LibreOffice, which creates a XML document with a custom structure. The root element is <indexing> and the child elements are paragraphs with index and text, which can be nested in sub-elements (like image, shape, table, section) depending on where the paragraph is located. 

For example:

 <?xml version="1.0" encoding="UTF-8"?>
<indexing>
 <paragraph index="6">Drawing : Just a Diamond</paragraph>
 <paragraph index="12"></paragraph>
 <shape name="Circle" alt="" description="">
  <paragraph index="0">This is a circle</paragraph>
  <paragraph index="1">This is a second paragraph</paragraph>
 </shape>
 <shape name="Diamond" alt="" description="">
  <paragraph index="0">This is a diamond</paragraph>
 </shape>
 <shape name="Text Frame 1" alt="" description="">
  <paragraph index="0">This is a TextBox - Para1</paragraph>
  <paragraph index="1">Para2</paragraph>
  <paragraph index="2">Para3</paragraph>
 </shape>
</indexing>

The indexing export is build upon a ModelTraverser class, which was created for the indexing purpose, but can be reused for other purposes (it is similar to what AccessibilityCheck does, but generalised, so AccessibilityCheck can in the future be refactored to use it). 

The purpose of ModelTraverser is to traverse through the Writer document model, and provide SwNode and SdrObjects to the consuming objects - in our case IndexingExport class, which extracts the text from those objects (depending on the object type) and with help of a XmlWriter, writes the indexing data to the XML file.

Indexing export filter can be tested with the LibreOffice command line "convert-to" tool in the following way:

soffice --convert-to xml:writer_indexing_export <Writer document file path>


The commits implementing this milestone 1 functionality:

In the next milestone, we will render the thumbnail with the provided search result data.


To be continued...

Thursday, May 13, 2021

Command Popup HUD for LibreOffice

Command Popup is a pop-up window that lets you search for commands that are present in the main menu and run them. This was requested in bug tdf#91874 and over-time accumulated over 14 duplicated bugs reports, so it was a very requested feature.

I'm intrigued by similar functionality in other programs, because it enables very quick access to commands (or programs) and at the same time don't need to move your hand off the keyboard. It also makes it easy to search for commands - especially in an application like LibreOffice with humongous main menu. So I decided to try to implement it for LibreOffice.

Figure 1: Command Popup window

I was working on it here and there in my free time and managed to make it work as I imagined, however it was very rough around the edges and needed a lot of polish. Luckily in April, we had a hack week at Collabora, where I decided to use some time to work on finishing the command popup. I dusted up the old code and converted it to use the weld framework for widgets and fixed the many bugs, but I didn't manage to finish it completely so it took until recently that I actually pushed the code upstream into master.

The main UX focus is to easy search and navigate with the keyboard. When the Command pop-up is focused, all keyboard events should go to the search edit box, so it is possible to change the search term, however hitting up/down should change the selection in the tree view, where the search results are shown, and enter should execute the command. To get this working correctly was quite a challenge, but I found the correct formula eventually after trying some different ideas. Of course using the mouse should still work as well. 

To show the Command Popup, there is a menu entry in "Help > Search Commands" and is by default bind to "Ctrl+F1" shortcut (however this may change). 

The Command Popup will be available in LibreOffice 7.2, but if you want to try it out, you can get the current daily build, or wait for the LibreOffice 7.2 Alpha1. Any suggestions and comments are welcome. If you find a bug, please report it in the LibreOffice bugzilla page.


Wednesday, March 24, 2021

Built-in "Xray" like UNO object inspector – Part 3

DevTools implementation has been completed and this is the third and final part of the mini-series. The focus of this part is on the object inspector, but I have also improved or changed other DevTools parts, so first I will briefly mention those.

New menu position


Figure 1: Development Tools location in the menu

The position in menu has changed from "Help / Development Tools" to "Tools / Development Tools". The new position fits better as it is near the position of macro editor and they go hand in hand with each-other.

Updates to the document model tree view



Figure 2: Document model tree view and "Current Selection" toggle button

The left-hand side document model tree view has been changed to use the same approach than the object inspector tree view uses, where the object attached tree view (in this case DocumentModelTreeEntry) has all the behavioural logic for a specific tree view node. This makes it easier to manipulate the tree view and add new nodes (if necessary).

In the document object tree view I have added a top toolbar with "Refresh" (icon) button and I changed the existing “Current Selection” button to a toolbar toggle button, so it looks more consistent as the is only the toolbar now (see figure 2).

In Writer, each paragraph tree view node now has text portion child nodes (as shown in figure 2), to make it possible to quickly inspect all the individual parts of a paragraph.

The object inspector tree view

Figure 3: Top toolbar and tab bar in object inspector


The object inspector shows various information about an object. Each object has a implementation class, which is always shown for the current object (see figure 3).

The other information that the object inspector shows are divided into four main categories:
  • "Interfaces" - the interfaces that the current object implements
  • "Services" - the services that the current object supports
  • "Properties" - the properties of the current object
  • "Methods" - the combined methods that can be called on the current object
On the user interface, the categories are divided with a tab bar (see figure 3). Each tab represents a different categories. The tabs are filled on "entry" - when the user clicks on the tab.

In the code the tabs and tree views are all handled by the ObjectInspectorTreeHandler and the hierarchy of objects attached the tree view that implement the ObjectInspectorNodeInterface (see include/sfx2/devtools/ObjectInspectorTreeHandler.hxx). 

The two major categories are "Properties" and "Methods", which I describe in more detail next.

Properties

Figure 4: Object inspector "Properties" tab


There are three types of properties of an object:
  • Properties accessible view XPropertySet.
  • Properties defined as an attribute (marked "[attribute]" in IDL). 
  • Pseudo properties defined by a get and set method. 
All the three types are represented in the properties tab and can be identified by the "Info" column. Attribute properties have an "attribute" flag, pseudo properties have either a "get", "set" or both flags. If neither flags exists, it represents a property that is from XPropertySet. 

In the tree view there are four columns for each property - "Name", "Value", "Type" and already mentioned "Info". The "Name" of the property is always available, but it is possible that two properties have the same name (because of multiple property types). The "Value" shows the value of the property, which is converted to the string, if this is possible (it should be if the property is basic), otherwise a representation string is created. The representation string for objects shows the implementation name ("<Object@SwXTextRange>"), for sequences it shows the size ("<Sequence [5]>") and for structs it just mentions the type ("<Struct>").

A node in the “Properties” tree view can be expanded (if offered) so it is possible to recursively inspect the the objects further. In case the property is a struct, it shows the struct members, and if it is a sequence, it shows the indices each object has in the sequence.

There are special properties, which names start with "@". This are added for convenience, so it is possible to inspect objects, that implement XIndexContainer, XNameContainer or XEnumeration interfaces. When such an object is found, the entries gathered using those interfaces are added to the tree view, and are prefixed with "@". For example "@3" (XIndexContainer)   "@PageStyles" (XNameContainer or XEnumeration). The type of the special property is written in the "Info" column ("index container", "name container" and "enumeration" flags). Note that this functionality is not present in Xray or the Macro editor's debugger, but was added for convenience.

Figure 5: "Properties" tab and the text view

On the bottom of the "Properties" tab there is a text view, which shows the full value of the current selected property. In the tree view, the value shown is always using a short form (shortened to 60 chars with new-line characters removed) to not make the tree view too wide. The full value therefor is written in the text view, where it can be inspected in full and has also a working copy/paste (see figure 5).

Object Stack

Related to the "Properties" is the object stack. It is possible to select an object in the tree view and inspect the object (either using the context menu or the toolbar "Inspect" action). In this case the object in the object inspector will change to the selected one. This is convenient when you are only interested in one object down in the tree view hierarchy and want to inspect only that. In that case the previous object will be added to the stack, and can be returned to with hitting the "Back" button in the toolbar.

Note that going to another object (not using "Inspect" action) will always remove the object stack. 

Methods

Figure 6: Object inspector "Methods" tab


The "Methods" tab contains a tree view that shows all the methods, that can be called for the current object. Each method is represented by four columns (see figure 6):
  • "Method" - name of the method
  • "Return type" - the return (simplified) type of the method
  • "Parameters" - list of input parameters, where each one lists the direction ("in", "out" or "in/out"), the parameter name and the simplified type 
  • "Implementation Class" - class/interface where the method is implemented
Currently the types of parameters and return types are simplified, with only basic types, "void", "any", "sequence" and "object" that represents all the objects and the type of the object isn't written. The reason for this is to make it easier to read. 

Future ideas

There are many improvements that can still be made, but aren't included in the current implementation. 

I think it would be quite convenient to have the ability to open a object inspector in mode-less dialog separate to the DevTools, just to quickly look up a property. 

Another big upgrade would also be the ability to change values of basic types for the properties and structs, so it is possible to quickly see what effect the change would have. Similar to changing property values is to call methods with defining the parameters, but only if the parameters are basic types.

My initial vision of DevTools was not that it will be only one tool (object inspector), but more tools just like the development tools in the browser, so I'm sure there will be more useful things integrated over time.

I think there are a lot of ideas you may also have, so please tell me if you have a good one. Of course if you find something that is not working as expected, please let me know.

Credits

Many thanks to TDF and users that support the foundation by providing donations, to make this work possible.

Monday, March 1, 2021

Built-in "Xray" like UNO object inspector – Part 2

Since my last blog post I've been continuing the work on DevTools and since then a lot of things have progressed. Point & click has been implemented and the object inspector view has been greatly improved to show current object’s properties and methods. In this part I will mainly talk about the point & click and a bit about the current state, and in the next blog I will extensively talk about the object inspector.

Point & click

Figure 1: Current selection button


The idea of this functionality is to provide a way to inspect selected objects in the document, for example an image or a shape. For this, I have implemented a selection change listener (sfx2/source/devtools/SelectionChangeHandler.hxx), whose purpose is to listen to the selection changes that happen in the document and store the latest selection object. It is started when the DevTools docking window is instantiated and shown. I have added a new toggle button “Current Selection” (see Figure 1) to the UI. When the button is selected, it automatically shows the current selected object (gathered with the selection change listener) in the object inspector. 

Figure 2: Current selected shape's properties shown in the object inspector

In the example shown in Figure 2, we can see the shape is selected in the document and its properties are shown in the object inspector. If the "Current Selection" button wouldn't be toggled, then the document top-level object would be shown in the object inspector or the selected object in the DOM tree view.

While the "Current Selection" button is toggled, selecting any object in the DOM tree view (left-hand side tree view) has no effect, however if the current selected object is also present in the current DOM tree view, it will be selected. Note that if the object is not present in the tree, it won't be selected, because the DOM tree view will not force creation of on-demand object because of performance considerations.

Figure 3: "Inspect Object" command in "Customize" dialog 


In addition to showing the selected object, I have added a UNO command named “Inspect Object” (.uno:InspectSelectedObject), which can be added to context menus for objects (See Figure 3). The purpose of this command is to provide a way to open the DevTools docking window and automatically show the current selected object. If a user regularly uses the object inspector, this may be a more convenient way for them to access DevTools. Note that by default the command isn't added to any context menu, this is up to the user. However, if there will be demand to add this to context menus, it can be easily added. 

Figure 4: "Inspect Object" context menu entry on a shape object

The example in Figure 4 shows the context menu of a shape object, where the selected entry is the added "Inspect Object". 

From the implementation standpoint, it was necessary to move the whole DevTools from svx to sfx2 module. This was mainly necessary to get .uno:InspectSelectedObject to work, because we need to signal to DevTools that we want to show the current selection and not the document root in the object inspector. Because the svx depends on sfx2 module, it is not possible to access svx from sfx2 (only the other way around). 

Improvements to object inspector

The object inspector was previously a single tree view only, which had services, interfaces, properties and methods categories as root tree entries. This has now been changed so that the categories are now pages in a tab view, and each category has its own tree view (can be seen in Figure 2). The main problem with one tree view is that columns for each of the categories are different. For example, the properties category has object, value and type categories but the same columns make no sense for methods (which has return type and input parameters). 

For methods it now shows the method name, return type and parameters. The types are currently simplified types, which are easier to read (instead of exact type name of the object it just writes "object"), but the user will want to know the exact type too, so this is a WIP.

For properties it shows the type and value of the property, and it is possible to expand a property if the type is a complex type (object, struct) so it lists nested properties. If the value is an enum, then we get the name of the enum value automatically and show the name instead. 

Support for sequences was also added, so the sequence can be expanded and a list of indices and values is presented. If the current object supports XNameAccess or XIndexAccess, the names and indices are added into the property list, so the user can navigate to those. 

With this additions, it is already easier to inspect objects than it previously was using the Xray tool, and I'm sure it will get even better when it is finished. 

Next steps

The object inspector is already in a very good shape so I encourage everyone to try it and give feedback, what can be improved, changed or added - especially if you use Xray or MRI regularly. 

For the next steps the major focus will be to fix a couple of bugs and crashes (mainly due to missing checks if objects are available), work on the UI, object stack (so it is possible to go back to the previous object) and finalizing all the features of the object inspector. 

Credits

Many thanks to TDF and users that support the foundation by providing donations, to make this work possible. 

To be continued...