Monday, March 1, 2021

Built-in "Xray" like UNO object inspector – Part 2

Since my last blog post I've been continuing the work on DevTools and since then a lot of things have progressed. Point & click has been implemented and the object inspector view has been greatly improved to show current object’s properties and methods. In this part I will mainly talk about the point & click and a bit about the current state, and in the next blog I will extensively talk about the object inspector.

Point & click

Figure 1: Current selection button


The idea of this functionality is to provide a way to inspect selected objects in the document, for example an image or a shape. For this, I have implemented a selection change listener (sfx2/source/devtools/SelectionChangeHandler.hxx), whose purpose is to listen to the selection changes that happen in the document and store the latest selection object. It is started when the DevTools docking window is instantiated and shown. I have added a new toggle button “Current Selection” (see Figure 1) to the UI. When the button is selected, it automatically shows the current selected object (gathered with the selection change listener) in the object inspector. 

Figure 2: Current selected shape's properties shown in the object inspector

In the example shown in Figure 2, we can see the shape is selected in the document and its properties are shown in the object inspector. If the "Current Selection" button wouldn't be toggled, then the document top-level object would be shown in the object inspector or the selected object in the DOM tree view.

While the "Current Selection" button is toggled, selecting any object in the DOM tree view (left-hand side tree view) has no effect, however if the current selected object is also present in the current DOM tree view, it will be selected. Note that if the object is not present in the tree, it won't be selected, because the DOM tree view will not force creation of on-demand object because of performance considerations.

Figure 3: "Inspect Object" command in "Customize" dialog 


In addition to showing the selected object, I have added a UNO command named “Inspect Object” (.uno:InspectSelectedObject), which can be added to context menus for objects (See Figure 3). The purpose of this command is to provide a way to open the DevTools docking window and automatically show the current selected object. If a user regularly uses the object inspector, this may be a more convenient way for them to access DevTools. Note that by default the command isn't added to any context menu, this is up to the user. However, if there will be demand to add this to context menus, it can be easily added. 

Figure 4: "Inspect Object" context menu entry on a shape object

The example in Figure 4 shows the context menu of a shape object, where the selected entry is the added "Inspect Object". 

From the implementation standpoint, it was necessary to move the whole DevTools from svx to sfx2 module. This was mainly necessary to get .uno:InspectSelectedObject to work, because we need to signal to DevTools that we want to show the current selection and not the document root in the object inspector. Because the svx depends on sfx2 module, it is not possible to access svx from sfx2 (only the other way around). 

Improvements to object inspector

The object inspector was previously a single tree view only, which had services, interfaces, properties and methods categories as root tree entries. This has now been changed so that the categories are now pages in a tab view, and each category has its own tree view (can be seen in Figure 2). The main problem with one tree view is that columns for each of the categories are different. For example, the properties category has object, value and type categories but the same columns make no sense for methods (which has return type and input parameters). 

For methods it now shows the method name, return type and parameters. The types are currently simplified types, which are easier to read (instead of exact type name of the object it just writes "object"), but the user will want to know the exact type too, so this is a WIP.

For properties it shows the type and value of the property, and it is possible to expand a property if the type is a complex type (object, struct) so it lists nested properties. If the value is an enum, then we get the name of the enum value automatically and show the name instead. 

Support for sequences was also added, so the sequence can be expanded and a list of indices and values is presented. If the current object supports XNameAccess or XIndexAccess, the names and indices are added into the property list, so the user can navigate to those. 

With this additions, it is already easier to inspect objects than it previously was using the Xray tool, and I'm sure it will get even better when it is finished. 

Next steps

The object inspector is already in a very good shape so I encourage everyone to try it and give feedback, what can be improved, changed or added - especially if you use Xray or MRI regularly. 

For the next steps the major focus will be to fix a couple of bugs and crashes (mainly due to missing checks if objects are available), work on the UI, object stack (so it is possible to go back to the previous object) and finalizing all the features of the object inspector. 

Credits

Many thanks to TDF and users that support the foundation by providing donations, to make this work possible. 

To be continued...

Thursday, January 21, 2021

Built-in "Xray" like UNO object inspector – Part 1

When developing macros and extensions in LibreOffice it is very useful to have an object inspector. With that tool you inspect the object tree and the methods, properties and interfaces of individual objects to understand its structure. There are quite some different object inspectors available for LibreOffice as extension. Probably the best known is called “XrayTool”, but there were also others like MRI, which was build for a similar purpose and various other more simple object inspectors (for example one is provided as an code example in ODK).

As a tool like this is very valuable it makes sense that it would be provided out-of-the-box by LibreOffice without the need for the user to get it separately. Also the user could even be unaware the such a tool exists.  For this reasons The Document Foundation (TDF) put up a tender to create a built-in Xray like UNO object inspector, which was awarded to Collabora and we are now in the process of implementing it.

Thank you TDF and the donating users of LibreOffice to make the work on this tool possible.

The Plan

The major gripe with Xray and other tools is that the user needs to go into the macro editor and run a the script with the inspecting object as the parameter. 

For example to inspect the root UNO object:

Xray ThisComponent

or for example to inspect a specific (first) Calc sheet:

Document = ThisComponent
AllSheets = Document.getSheets()
MySheet = AllSheets.getByIndex(0)
Xray MySheet

Figure 1: Watch window in the macro editor

We can do much better than this. So the idea is to have a object inspector tool as a dockable bottom widget, very similar to “developer tools” that you can find in popular web browsers.

The left hand side of the tool will have a subset of the document model (DOM) presented as a tree and on the right hand side a object inspector view that will show the current UNO object’s properties, methods in a tree view that is taken from the Watch window in the LibreOffice macro editor (see Figure 1). When the user would select an UNO object in the DOM tree view (left-hand side), the object inspector (right-hand side) would show it.

An additional functionality that is present in developer tools in web browsers is the ability to point & click an object directly in the document, which can then be inspected in the developer tool. A similar functionality to this will be possible in LibreOffice, where you will be able to select an object in the document, that is then be shown in the object inspector (right-hand side).

Implementation so-far

The implementation has been divided into 4 parts:

  1. Introduce a new dockable window on the bottom of the UI.
  2. DOM tree view (left-hand side)
  3. Implement point&click functionality.
  4. Object inspector view (right-hand side)

Currently the 1. and 2. part have been implemented, so this blog post serves as a report of the work that has been done until now. For 3. and 4. part, a limited subset of the functionality has been  already implemented, but it is not yet expected to work correctly and may change a lot in the future.

Dockable window

Figure 2: Development Tool in the menu

The dockable window has been added to all components of LibreOffice, which can be enabled in the menu (see Figure 2) under Help / Development tool (this location and the name can still change). The development tool code is all contained under svx/source/devtools/ where the docking window is implemented in DevelopmentToolDockingWindow.cxx.

DOM tree view (Left-hand)

Figure 3: Development tool dockable window

The idea of the DOM tree view is to provide a useful subset of the DOM objects, that the user can quickly access and inspect. Without this, the users would have to traverse to the objects they are interested in inside the object inspector itself, which may not be as straight forward to the users that aren’t familiar with the DOM, and even if they are, the DOM subset still makes it easier.

The DOM tree view as currently implemented (see Figure 3), has a root “Document” element always available, which represents the root document UNO object, from where you can traverse to any other relevant UNO object in the DOM tree. Other available objects depend on the component.

In Writer the available objects are:

  • Paragraphs
  • Shapes
  • Tables
  • Frames
  • Graphic Objects
  • Embedded Objects (OLE)
  • Style Families & Styles

In Calc:

  • Sheets
  • Shapes (per sheet)
  • Charts (per sheet)
  • Pivot Tables (per sheet)
  • Style Families & Styles

In Impress and Draw:

  • Pages / Slides
  • Shapes (per page / slide)
  • Master Slides
  • Style Families & Styles

If there are other object(s) you would like to see in the DOM tree view, please let me know.

The main implementation file for the left-side DOM tree view is DocumentModelTreeHandler.cxx in the svx/source/devtools/ folder. For each node of the tree view, there is an object attached (subclass  DocumentModelTreeEntry) to the node, which is responsible to fill the child nodes and provide the UNO object of the current node. The whole DOM tree view is populated on-demand, when the tree is expanded. This makes sure that we don’t take a lot of time inserting the whole object tree before-hand and have a more up-to-date view of the UNO objects. 

All the code is present in the LibreOffice master repository. Please try out the development tools in a recent daily build. Any comments or suggestion are welcome. 

Next part - point&click functionality

Partial support for the point&click has already been added. There is a selection listener added, which remembers what the selected object in the document is. This object is the available in the DOM tree view under the name “Current Selection”. If the user clicks on the node in the tree view, then the right-hand object inspector shows the UNO object.

To be continued..

Tuesday, October 13, 2020

PDF annotations support

In Draw, a PDF document can be opened using the PDFium library for rendering, where each page in the Draw document contains an rendered image from the PDF. This mode is useful for viewing PDFs and allows for the best fidelity. With viewing, there is also a need to review and comment and this is where PDF annotations come in as adding the support for the PDF annotations and to support a review based workflow has been one of my recent task at Collabora Productivity.

PDF supports a wide variety of annotations, but we don't support all of them in Draw. What we do support are comments, which are similar to pop-up note annotations in PDF, so the easiest is to add those first. To be able to use pop-up notes in Draw, we need to import them. This is done at import by using PDFium after we created the PDF graphic for rendering. In Draw, we then insert this as comments and so we get the basic support for manipulating with annotations, but how to save the changes? PDF export already supports saving comments as annotations, so this mostly already works (I needed to fix some bugs and add support for saving all needed properties).


Figure 1. Pop-up Note annotation in PDF viewer (Evince) and Draw

How the output compares between a PDF viewer and inside LibreOffice can be seen in Figure 1. All this is available in LibreOffice master and should be included in LibreOffice 7.1.

What about other annotations that are supported in PDF? The work to add those is ongoing and the recent success has been adding support to draw vector graphic annotations like polygon, ink (freehand), squares (rectangles) and circles (ellipses). So the idea is that instead of the usual marker for a comment, we draw a vector graphic that we read from the PDF annotation. 

The first thing I needed to do is to extend the PDFium library, which didn't support reading all polygon vertices and ink strokes from the document. What also wasn't supported is reading the border information, which is needed for line widths.

Next thing is extending the import to read the geometry data, so I dded a special PDFAnnotationMarker class for that. Then the geometry data needs to be stored on the sd::Annotation class (implementation of XAnnotation, but I didn't extend XAnnotation at this point as it wasn't needed - yet) . Drawing is performed in AnnotationTag, where we create a new OverlayPolyPolygon object, that is responsible for creating the Primitive2D for the marker. What we get after this is done is shown in figure 2.

 

Figure 2. Multiple annotations in PDF Viewer and Draw

This work will shortly be merged to the LibreOffice master. 

Wednesday, January 22, 2020

Accessibility checker and support for PDF/UA specs

PDF/UA or ISO 14289 is a specifications that defines the requirements for accessibility in a PDF document. The specification defines the required structure and formatting of the document (also refers to WCAG specification from W3C for use on the web) and PDF features, which should be enabled or disabled so the document is better suited for accessibility (for example PDF tags are required).

Thanks to the Dutch Standardisation Forum for financially sponsoring and Collabora Productivity in cooperation with Nou&Off for the work on implementing this specification into LibreOffice.

Figure 1: PDF Export
The implementation in LibreOffice is currently done only for Writer. It consists of two parts.: First is to enable PDF/UA support into PDF export, which writes a PDF/UA flag into the PDF document and enables all the required features. The PDF export dialog was extended with a checkbox (see figure 1).

Figure 2: Accessibility Check in menu 

The second part is an accessibility check functionality, which traverses the document structure and gather all possible accessibility issues. The accessibility check can be run manually through the menu under Tools - Accessibility Checker (see figure 2), or it will be triggered after PDF export dialog, if the PDF/UA support is enabled. The accessibility issues are presented in a dialog (see figure 3).

Figure 3: Accessibility check dialog

The checks that are (currently) implemented are:

  • Check that the document title is set.
  • Check that the document language is set, or that all styles that are in use, have the language set.
  • Check all images, graphics, OLE objects for the alt (or title in some objects) text.
  • Check tables don't include splits or merges, which aren't allowed by the specifications. The table should be 
  • Check for fake/manual numbering (not using integrated numbering). For example writing "1." "2." "3." at the beginning of the paragraphs.  
  • Check that hyperlink text is not a hyperlink itself - hyperlink should be described. 
  • Check for the contrast between text and the background. The algorithm is described in the WCAG specification.  
  • Check for blinking text, discouraged for the obvious reasons.
  • Check for footnotes and endnotes, which should be avoided.
  • Check for heading order. Order of the headings must increase incrementally with no skips (for example Heading 1 to Heading 3, skipping Heading 2).
  • Check, if text conveys additional meaning with (direct) formatting.

The PDF/UA support will be included in LibreOffice 7.

Monday, April 23, 2018

Improving the image handling in LibreOffice - Part 3

GraphicObject refactoring


GraphicObject and the implementation of XGraphicObject (UnoGraphicObject) and XGraphic (UnoGraphic) were located in module svtools, which is hierarchically above vcl. This is problematic when creating new instances like in Graphic.GetXGraphic method, which needs to bend backward to make it even work (ugly hack by sending the pointer value as URL string to GraphicProvider). The solution to this is to move all GraphicObject related things to vcl, which surprisingly didn't cause a lot problem and once done, it looks like a much more "natural" place.

Regarding the UNO API of XGraphicObject - what is left to do here is to properly clean up the uniqueID, as it is not possible to use it anymore for anything else as a uniqueID (used only in filters for the image names, if the name is not yet known).

Managing memory used by images


Figure1: Hierarchy before refactoring


Previously the memory managing was done on the level of GraphicObjects, where a GraphicManager and GraphicCache (see figure 1) were responsible to create new instances from uniqueID and manage the memory usage that GraphicObject take. This is not possible anymore as we don't operate with uniqueIDs anymore, but always use Graphic and XGraphic objects  (in UNO), so we need to manage the creation of Graphic object or more precisely - ImpGraphic (Graphic objects are just ref. counted objects of ImpGraphic). 
Figure 2: Hierarchy after refactoring
So to make this possible GraphicManager and GraphicCache need to be decoupled and removed from GraphicObject and a new manager needs to be introduced between Graphic and ImpGraphic, where the manager controls the creation and accounts for the memory usage (see Figure 2).

Graphic swapping and swapping strategy


In the To release the memory of graphic objects, we swap them out to a temp file and read back (swap-in) when we need them again. In the previous implementation this was partially directed by the SdrGrafObj (common image implementation) and SwGrfNode (Writer image implementation). For each graphic object there was a timer when to trigger an automatic swap-out + the swap-out that can happen when a memory limit is exceeded.

For the new code external swapping directed from SdrGrafObj and SwGrfNode was removed, so they can't influence when swapping will happen (maybe in the future they can provide hints when it is a good time to do swapping). There is now a global timer which triggers checking of all Graphic objects if any of them can be swapped out in case we exceed memory limit. Same code is triggered when a new object is created too. A object will be swapped out if it is not used for a certain amount of time. Each object tracks the timestamp when it was last used.

A swap-in happens if the object is swapped-out (obviously) and certain data is needed (under-laying bitmap, animation or metafile). This is checked at the same code-path where the timestamp updating happens.

The new swapping strategy is relatively simple - if a lot of memory is needed by graphic objects in a certain time, we let it use it and don't try to over-aggressively try to free it. In the past this cased swap-out and swap-in cycle that made the application completely unusable. In the future, external hints when a certain Graphic object can be swapped out may be added, so we can perform swapping more effectively. There are also several other ideas which will increase performance and reduce memory usage that can be implemented now with the new hierarchy where most all of the swapping is contained inside the Graphic itself, but all of this is currently out of the scope of this work.

Other changes to Graphic


Another changes to Graphic done were related to lazy loading. When a document is loaded, we don't want to load Graphic into memory, if it is not needed yet (for example we display the first page but the graphic is on page 10). In document filters (ODF for example) we previously transported the URL of an external or internal graphic to the document model, where it was lazily loaded when it was actually needed. This is not possible now anymore as we need to create a XGraphic object already in the document filter. To overcome this we need to to have an unloaded Graphic, which is created already in a swapped-out state and swapped-in when needed.

The GraphicFilter didn't allow something like this, so I needed to add a new method, which doesn't actually load the image, but just gathers what kind of the image is loaded and its metadata (image size) and creates a GfxLink object that includes the (compressed) image data. The metadata is needed as we don't want to actually force a load when this basic information is requested. Actually we want to load the image as late as this is possible.

Another issue is also that we can have an external image (loaded from a file or even URL on the internet). The issue is similar to the lazy loading scenario, but it is different that a Graphic now must know the URL with which it was created and can be created completely empty (no loading of any kind). The reason for this is that loading is directed by the LinkManager, which is part of the document model. For security reasons the LinkManager can not allow that a Graphic is loaded so loading is directed by the LinkManager on demand (first usage). LinkManager also takes care of all URLs of various external resources. The user can look at those resources and change the URL of them or trigger an update. Changing URL and updating an object was previously done in SdrGrafObj and SwGrfNode, but now this is moved to the common code in Graphic object where SdrGrafObj and SwGrfNode only direct what to do. There are still rooms to improve things here, however not the scope of this work.

Next steps


Finishing up this work by revising the UNO API and fixing known bugs.

Credits


Many thanks to Collabora ProductivityTDF and users that support the foundation by providing donations, to make this work possible.

To be continued...

Tuesday, March 13, 2018

Improving the image handling in LibreOffice - Part 2

It's been some times from my last blog post and in that time I continued with refactoring the code to get rid of use of GraphicObject uniqueID being passed around and stored in the model. The state of the code now is looking fine as we almost don't use the uniqueID anymore, which means that I can start with the next step of Graphic and GraphicObject improvements. 

There is a thing I forgot to clarify in the last blog post and this is that the GraphicObject uniqueID is usually passed around in the form of a URL string. The string has the prefix "vnd.sun.star.GraphicObject" and followed by the GraphicObject uniqueID. Using that URL it is possible to re-create a GraphicObject by passing the unique ID as the construction parameter (see constructor with OUString parameter on GraphicObject or UNO serviceGraphicObject::createWithId).

Usage in filters

The most "heavy" users of the uniqueID were the document format filters (xmloff, oox & writerfilter) which generally use it to read the images from the storage (usually ZIP) and convert the GraphicObject and pass GraphicObject uniqueID around. At writing it does the reverse, get the GraphicObject URL and "resolve" the URL to the package URL. At conversion the GraphicObject is created and the image is stored into the storage. To do this there XGraphicObjectResolver published UNO interface which has only resolveGraphicObjectURL which converts a GraphicObject URL to the Package URL and back. 

Resolving the URL is not the correct approach anymore so I had to do it in a different way. The result of that is XGraphicStorageHandler, which has explicit method to load and save an XGraphic from the package URL, which does everything without the need to use the GraphicObject unique ID.

In addition a graphic can also be external - somewhere on the disk or internet, identified by an external URL. For this case I implemented a GraphicLoader, which is generally just uses XGraphicProvider to load the graphic (in one of the next steps this will be reversed so that XGraphicProvider is just a UNO interface that uses GraphicLoader).

The special case with external URLs is also that we need to remember the URL, which was used to load the graphic, so that we can later just save the URL and not the Graphic into the storage. Previously the URL was always passed along as string so this wasn't a problem, but now we pass XGraphic. So for this I had to extend the Graphic in VCL with an origin URL attribute, to solve this use case. In a next steps the URL loading will be extended even more so the Graphic itself will handle URL completely transparently to the outside.

UNO properties

Usually the filters used the UNO API to set the GraphicObject unique ID into the document model. This was mostly implemented as a properties on various interfaces in UNO. Mostly used name of the properties was GraphicURL (used in different places), but there were also other properties: 
  • BackGraphicURL (for backgrounds)
  • HeaderBackGraphicURL (for backgrounds in header)
  • FooterBackGraphicURL (for backgrounds in footer)
  • ParaBackGraphicURL (for backgrounds in paragraphs)
  • ThumbnailURL (for thumbnail of a graphic - not in IDL)
  • ReplacementGraphicURL (for replacement graphic - not in IDL)
  • FillBitmapURL (BitmapTable)
All these properties are now deprecated and removed and an alternative was added (where needed) that uses the XGraphic or XBitmap types (they use the same implementation so either can be used). This was done as following:
  • GraphicURL -> Graphic (type XGraphic) and GraphicBitmap (type XBitmap) for bullets
  • BackGraphicURL -> BackGraphic (type XGraphic) 
  • HeaderBackGraphicURL -> HeaderBackGraphic (type XGraphic)
  • FooterBackGraphicURL -> FooterBackGraphic (type XGraphic)
  • ParaBackGraphicURL -> ParaBackGraphic (type XGraphic)
  • ThumbnailURL  -> Thumbnail (type XGraphic)
  • ReplacementGraphicURL -> ReplacementGraphic (type XGraphic)
  • FillBitmapURL -> FillBitmap (type XBitmap)
There is also ImageURL which is used in form controls, but this was still left inside as there is already a Graphic property which is an alternative.

As the GraphicObject URL is going away (they won't be created anymore and won't be possible to get the GraphicObject back using the URL), so will the properties, as the content of them won't make much sense anymore.

Next steps

The next step is now to finally work on Graphic itself, which I'm much more excited about. The managing of memory (GraphicManager) will move from the GraphicObject to the Graphic itself.  When a new Graphic is created, the original bit-stream needs to be saved immediately to the temp folder, where the Graphic can always load the image again and is always free to release the memory if needed (this also means it won't need to load the image to the memory until it is actually needed). With this I think that handling of images will finally be a lot more predictable and homogeneous (no different implementations of things through different modules) and we can actually introduce new features in the future much more easily. 

Back to work...

Credits

Many thanks to Collabora Productivity, TDF and users that support the foundation by providing donations, to make this work possible.

Wednesday, January 31, 2018

Improving the image handling in LibreOffice - Part 1

Prologue

It is known for some time that the image life-cycle in LibreOffice is problematic and can potentially lead to image loss, but to make the life-cycle more robust against loss, a lot of refactoring would need to be done. Another issue is also the mechanism of images swapping in and out of the memory. Keeping images in memory takes a lot of space so when a certain amount is hit, the images get swapped to disk and memory is freed. The problem is that it can happen that the cache handler starts constantly to swap images in and out (especially with with multi-megapixel images that are the norm today) and LibreOffice stalls to halt.

Because of this issues, TDF put up a tender to improve the situation with image handling and Collabora Productivity was selected to implement it, and I will do the development work.


Problems with the image life-cycle - detailed

Currently, when an image is read from a document, a GraphicObject is created for the image and handled over to the GraphicManager which manages the life-cycle. When this happens we usually get back the string based unique ID of the GraphicObject with which we can always get access the image by creating a new GraphicObject with the unique ID (GraphicManager will look for the image with that unique ID). Usually the unique ID is the one that is passed on between layers in LibreOffice (for example from ODF filter when loaded, to the model, where it is manipulated and then to the OOXML filter when saving) but the unique ID itself is just a "reference" to the image and by itself it doesn't have any control over when the image can safely be removed and when not. It could happen that in a certain situation we would still have the unique ID referenced somewhere in the model, but the image would already be removed. This is dangerous and needs to be changed. 
Usually for this kind of object we use reference counting technique, where we pass a objects around that holds a reference to the object resource. When the object is created, the reference count is increased, when destroyed, the reference count is decreased, when the reference count reaches zero, the resource object is destroyed.


The solution for the life-cycle

So instead of passing around of unique ID the idea is to use the usual reference counting technique, that is normally used in this situation. The GraphicObject in mainly a wrapper around Graphic (which then holds a pixel-based image, or animated image, or possibly a vector image), and in addition it keeps additional attributes (gamma, crop, transparency, ...). It also has the implementation of swapping-in and out (but I'll explain this another time). On the other hand Graphic is properly reference-counted already (Graphic objects are reference counting the private ImpGraphic) so the solution to the life-cycle problem is that instead of GraphicObject unique ID we would just pass along the Graphic object instead, or XGraphic, XBitmap which are just UNO wrappers around Graphic. Potentially we could also pass along the GraphicObject or XGraphicObject (UNO wrapper for the GraphicObject) when we would need to take into account the graphic attributes too. This should make the life-cycle much more manageable, but the problem is that there are many many places this needs to be changed.
I will do the work as much incrementally as possible, with ensuring that the test cover the code and if needed add new tests or extend the existing ones. 

Currently almost finished is refactoring of the bitmap table (a list of named bitmaps, mostly used for shape fills or backgrounds) to use XBitmap instead of string based unique ID in the table. For this I needed to change OOXML (oox) and especially the ODF (xmloff) filter, and the document model.


Credits

Many thanks to TDF and users that support the foundation by providing donations, to make this work possible. 


To be continued...