Wednesday, January 31, 2018

Improving the image handling in LibreOffice - Part 1


It is known for some time that the image life-cycle in LibreOffice is problematic and can potentially lead to image loss, but to make the life-cycle more robust against loss, a lot of refactoring would need to be done. Another issue is also the mechanism of images swapping in and out of the memory. Keeping images in memory takes a lot of space so when a certain amount is hit, the images get swapped to disk and memory is freed. The problem is that it can happen that the cache handler starts constantly to swap images in and out (especially with with multi-megapixel images that are the norm today) and LibreOffice stalls to halt.

Because of this issues, TDF put up a tender to improve the situation with image handling and Collabora Productivity was selected to implement it, and I will do the development work.

Problems with the image life-cycle - detailed

Currently, when an image is read from a document, a GraphicObject is created for the image and handled over to the GraphicManager which manages the life-cycle. When this happens we usually get back the string based unique ID of the GraphicObject with which we can always get access the image by creating a new GraphicObject with the unique ID (GraphicManager will look for the image with that unique ID). Usually the unique ID is the one that is passed on between layers in LibreOffice (for example from ODF filter when loaded, to the model, where it is manipulated and then to the OOXML filter when saving) but the unique ID itself is just a "reference" to the image and by itself it doesn't have any control over when the image can safely be removed and when not. It could happen that in a certain situation we would still have the unique ID referenced somewhere in the model, but the image would already be removed. This is dangerous and needs to be changed. 
Usually for this kind of object we use reference counting technique, where we pass a objects around that holds a reference to the object resource. When the object is created, the reference count is increased, when destroyed, the reference count is decreased, when the reference count reaches zero, the resource object is destroyed.

The solution for the life-cycle

So instead of passing around of unique ID the idea is to use the usual reference counting technique, that is normally used in this situation. The GraphicObject in mainly a wrapper around Graphic (which then holds a pixel-based image, or animated image, or possibly a vector image), and in addition it keeps additional attributes (gamma, crop, transparency, ...). It also has the implementation of swapping-in and out (but I'll explain this another time). On the other hand Graphic is properly reference-counted already (Graphic objects are reference counting the private ImpGraphic) so the solution to the life-cycle problem is that instead of GraphicObject unique ID we would just pass along the Graphic object instead, or XGraphic, XBitmap which are just UNO wrappers around Graphic. Potentially we could also pass along the GraphicObject or XGraphicObject (UNO wrapper for the GraphicObject) when we would need to take into account the graphic attributes too. This should make the life-cycle much more manageable, but the problem is that there are many many places this needs to be changed.
I will do the work as much incrementally as possible, with ensuring that the test cover the code and if needed add new tests or extend the existing ones. 

Currently almost finished is refactoring of the bitmap table (a list of named bitmaps, mostly used for shape fills or backgrounds) to use XBitmap instead of string based unique ID in the table. For this I needed to change OOXML (oox) and especially the ODF (xmloff) filter, and the document model.


Many thanks to TDF and users that support the foundation by providing donations, to make this work possible. 

To be continued...