Tomaz's dev blog: 2018

Monday, April 23, 2018

Improving the image handling in LibreOffice - Part 3

GraphicObject refactoring

GraphicObject and the implementation of XGraphicObject (UnoGraphicObject) and XGraphic (UnoGraphic) were located in module svtools, which is hierarchically above vcl. This is problematic when creating new instances like in Graphic.GetXGraphic method, which needs to bend backward to make it even work (ugly hack by sending the pointer value as URL string to GraphicProvider). The solution to this is to move all GraphicObject related things to vcl, which surprisingly didn't cause a lot problem and once done, it looks like a much more "natural" place.

Regarding the UNO API of XGraphicObject - what is left to do here is to properly clean up the uniqueID, as it is not possible to use it anymore for anything else as a uniqueID (used only in filters for the image names, if the name is not yet known).

Managing memory used by images

Figure1: Hierarchy before refactoring

Previously the memory managing was done on the level of GraphicObjects, where a GraphicManager and GraphicCache (see figure 1) were responsible to create new instances from uniqueID and manage the memory usage that GraphicObject take. This is not possible anymore as we don't operate with uniqueIDs anymore, but always use Graphic and XGraphic objects (in UNO), so we need to manage the creation of Graphic object or more precisely - ImpGraphic (Graphic objects are just ref. counted objects of ImpGraphic).

Figure 2: Hierarchy after refactoring

So to make this possible GraphicManager and GraphicCache need to be decoupled and removed from GraphicObject and a new manager needs to be introduced between Graphic and ImpGraphic, where the manager controls the creation and accounts for the memory usage (see Figure 2).

Graphic swapping and swapping strategy

In the To release the memory of graphic objects, we swap them out to a temp file and read back (swap-in) when we need them again. In the previous implementation this was partially directed by the SdrGrafObj (common image implementation) and SwGrfNode (Writer image implementation). For each graphic object there was a timer when to trigger an automatic swap-out + the swap-out that can happen when a memory limit is exceeded.

For the new code external swapping directed from SdrGrafObj and SwGrfNode was removed, so they can't influence when swapping will happen (maybe in the future they can provide hints when it is a good time to do swapping). There is now a global timer which triggers checking of all Graphic objects if any of them can be swapped out in case we exceed memory limit. Same code is triggered when a new object is created too. A object will be swapped out if it is not used for a certain amount of time. Each object tracks the timestamp when it was last used.

A swap-in happens if the object is swapped-out (obviously) and certain data is needed (under-laying bitmap, animation or metafile). This is checked at the same code-path where the timestamp updating happens.

The new swapping strategy is relatively simple - if a lot of memory is needed by graphic objects in a certain time, we let it use it and don't try to over-aggressively try to free it. In the past this cased swap-out and swap-in cycle that made the application completely unusable. In the future, external hints when a certain Graphic object can be swapped out may be added, so we can perform swapping more effectively. There are also several other ideas which will increase performance and reduce memory usage that can be implemented now with the new hierarchy where most all of the swapping is contained inside the Graphic itself, but all of this is currently out of the scope of this work.

Other changes to Graphic

Another changes to Graphic done were related to lazy loading. When a document is loaded, we don't want to load Graphic into memory, if it is not needed yet (for example we display the first page but the graphic is on page 10). In document filters (ODF for example) we previously transported the URL of an external or internal graphic to the document model, where it was lazily loaded when it was actually needed. This is not possible now anymore as we need to create a XGraphic object already in the document filter. To overcome this we need to to have an unloaded Graphic, which is created already in a swapped-out state and swapped-in when needed.

The GraphicFilter didn't allow something like this, so I needed to add a new method, which doesn't actually load the image, but just gathers what kind of the image is loaded and its metadata (image size) and creates a GfxLink object that includes the (compressed) image data. The metadata is needed as we don't want to actually force a load when this basic information is requested. Actually we want to load the image as late as this is possible.

Another issue is also that we can have an external image (loaded from a file or even URL on the internet). The issue is similar to the lazy loading scenario, but it is different that a Graphic now must know the URL with which it was created and can be created completely empty (no loading of any kind). The reason for this is that loading is directed by the LinkManager, which is part of the document model. For security reasons the LinkManager can not allow that a Graphic is loaded so loading is directed by the LinkManager on demand (first usage). LinkManager also takes care of all URLs of various external resources. The user can look at those resources and change the URL of them or trigger an update. Changing URL and updating an object was previously done in SdrGrafObj and SwGrfNode, but now this is moved to the common code in Graphic object where SdrGrafObj and SwGrfNode only direct what to do. There are still rooms to improve things here, however not the scope of this work.

Next steps

Finishing up this work by revising the UNO API and fixing known bugs.

Credits

Many thanks to Collabora Productivity, TDF and users that support the foundation by providing donations, to make this work possible.

To be continued...

Tuesday, March 13, 2018

Improving the image handling in LibreOffice - Part 2

It's been some times from my last blog post and in that time I continued with refactoring the code to get rid of use of GraphicObject uniqueID being passed around and stored in the model. The state of the code now is looking fine as we almost don't use the uniqueID anymore, which means that I can start with the next step of Graphic and GraphicObject improvements.

There is a thing I forgot to clarify in the last blog post and this is that the GraphicObject uniqueID is usually passed around in the form of a URL string. The string has the prefix "vnd.sun.star.GraphicObject" and followed by the GraphicObject uniqueID. Using that URL it is possible to re-create a GraphicObject by passing the unique ID as the construction parameter (see constructor with OUString parameter on GraphicObject or UNO serviceGraphicObject::createWithId).

Usage in filters

The most "heavy" users of the uniqueID were the document format filters (xmloff, oox & writerfilter) which generally use it to read the images from the storage (usually ZIP) and convert the GraphicObject and pass GraphicObject uniqueID around. At writing it does the reverse, get the GraphicObject URL and "resolve" the URL to the package URL. At conversion the GraphicObject is created and the image is stored into the storage. To do this there XGraphicObjectResolver published UNO interface which has only resolveGraphicObjectURL which converts a GraphicObject URL to the Package URL and back.

Resolving the URL is not the correct approach anymore so I had to do it in a different way. The result of that is XGraphicStorageHandler, which has explicit method to load and save an XGraphic from the package URL, which does everything without the need to use the GraphicObject unique ID.

In addition a graphic can also be external - somewhere on the disk or internet, identified by an external URL. For this case I implemented a GraphicLoader, which is generally just uses XGraphicProvider to load the graphic (in one of the next steps this will be reversed so that XGraphicProvider is just a UNO interface that uses GraphicLoader).

The special case with external URLs is also that we need to remember the URL, which was used to load the graphic, so that we can later just save the URL and not the Graphic into the storage. Previously the URL was always passed along as string so this wasn't a problem, but now we pass XGraphic. So for this I had to extend the Graphic in VCL with an origin URL attribute, to solve this use case. In a next steps the URL loading will be extended even more so the Graphic itself will handle URL completely transparently to the outside.

UNO properties

Usually the filters used the UNO API to set the GraphicObject unique ID into the document model. This was mostly implemented as a properties on various interfaces in UNO. Mostly used name of the properties was GraphicURL (used in different places), but there were also other properties:

BackGraphicURL (for backgrounds)
HeaderBackGraphicURL (for backgrounds in header)
FooterBackGraphicURL (for backgrounds in footer)
ParaBackGraphicURL (for backgrounds in paragraphs)
ThumbnailURL (for thumbnail of a graphic - not in IDL)
ReplacementGraphicURL (for replacement graphic - not in IDL)
FillBitmapURL (BitmapTable)

All these properties are now deprecated and removed and an alternative was added (where needed) that uses the XGraphic or XBitmap types (they use the same implementation so either can be used). This was done as following:

GraphicURL -> Graphic (type XGraphic) and GraphicBitmap (type XBitmap) for bullets
BackGraphicURL -> BackGraphic (type XGraphic)
HeaderBackGraphicURL -> HeaderBackGraphic (type XGraphic)
FooterBackGraphicURL -> FooterBackGraphic (type XGraphic)
ParaBackGraphicURL -> ParaBackGraphic (type XGraphic)
ThumbnailURL -> Thumbnail (type XGraphic)
ReplacementGraphicURL -> ReplacementGraphic (type XGraphic)
FillBitmapURL -> FillBitmap (type XBitmap)

There is also ImageURL which is used in form controls, but this was still left inside as there is already a Graphic property which is an alternative.

As the GraphicObject URL is going away (they won't be created anymore and won't be possible to get the GraphicObject back using the URL), so will the properties, as the content of them won't make much sense anymore.

Next steps

The next step is now to finally work on Graphic itself, which I'm much more excited about. The managing of memory (GraphicManager) will move from the GraphicObject to the Graphic itself. When a new Graphic is created, the original bit-stream needs to be saved immediately to the temp folder, where the Graphic can always load the image again and is always free to release the memory if needed (this also means it won't need to load the image to the memory until it is actually needed). With this I think that handling of images will finally be a lot more predictable and homogeneous (no different implementations of things through different modules) and we can actually introduce new features in the future much more easily.

Back to work...

Credits

Many thanks to Collabora Productivity, TDF and users that support the foundation by providing donations, to make this work possible.

Wednesday, January 31, 2018

Improving the image handling in LibreOffice - Part 1

Prologue

It is known for some time that the image life-cycle in LibreOffice is problematic and can potentially lead to image loss, but to make the life-cycle more robust against loss, a lot of refactoring would need to be done. Another issue is also the mechanism of images swapping in and out of the memory. Keeping images in memory takes a lot of space so when a certain amount is hit, the images get swapped to disk and memory is freed. The problem is that it can happen that the cache handler starts constantly to swap images in and out (especially with with multi-megapixel images that are the norm today) and LibreOffice stalls to halt.

Because of this issues, TDF put up a tender to improve the situation with image handling and Collabora Productivity was selected to implement it, and I will do the development work.

Problems with the image life-cycle - detailed

Currently, when an image is read from a document, a GraphicObject is created for the image and handled over to the GraphicManager which manages the life-cycle. When this happens we usually get back the string based unique ID of the GraphicObject with which we can always get access the image by creating a new GraphicObject with the unique ID (GraphicManager will look for the image with that unique ID). Usually the unique ID is the one that is passed on between layers in LibreOffice (for example from ODF filter when loaded, to the model, where it is manipulated and then to the OOXML filter when saving) but the unique ID itself is just a "reference" to the image and by itself it doesn't have any control over when the image can safely be removed and when not. It could happen that in a certain situation we would still have the unique ID referenced somewhere in the model, but the image would already be removed. This is dangerous and needs to be changed.

Usually for this kind of object we use reference counting technique, where we pass a objects around that holds a reference to the object resource. When the object is created, the reference count is increased, when destroyed, the reference count is decreased, when the reference count reaches zero, the resource object is destroyed.

The solution for the life-cycle

So instead of passing around of unique ID the idea is to use the usual reference counting technique, that is normally used in this situation. The GraphicObject in mainly a wrapper around Graphic (which then holds a pixel-based image, or animated image, or possibly a vector image), and in addition it keeps additional attributes (gamma, crop, transparency, ...). It also has the implementation of swapping-in and out (but I'll explain this another time). On the other hand Graphic is properly reference-counted already (Graphic objects are reference counting the private ImpGraphic) so the solution to the life-cycle problem is that instead of GraphicObject unique ID we would just pass along the Graphic object instead, or XGraphic, XBitmap which are just UNO wrappers around Graphic. Potentially we could also pass along the GraphicObject or XGraphicObject (UNO wrapper for the GraphicObject) when we would need to take into account the graphic attributes too. This should make the life-cycle much more manageable, but the problem is that there are many many places this needs to be changed.

I will do the work as much incrementally as possible, with ensuring that the test cover the code and if needed add new tests or extend the existing ones.

Currently almost finished is refactoring of the bitmap table (a list of named bitmaps, mostly used for shape fills or backgrounds) to use XBitmap instead of string based unique ID in the table. For this I needed to change OOXML (oox) and especially the ODF (xmloff) filter, and the document model.

Credits

Many thanks to TDF and users that support the foundation by providing donations, to make this work possible.

To be continued...