Monday, April 23, 2018

Improving the image handling in LibreOffice - Part 3

GraphicObject refactoring


GraphicObject and the implementation of XGraphicObject (UnoGraphicObject) and XGraphic (UnoGraphic) were located in module svtools, which is hierarchically above vcl. This is problematic when creating new instances like in Graphic.GetXGraphic method, which needs to bend backward to make it even work (ugly hack by sending the pointer value as URL string to GraphicProvider). The solution to this is to move all GraphicObject related things to vcl, which surprisingly didn't cause a lot problem and once done, it looks like a much more "natural" place.

Regarding the UNO API of XGraphicObject - what is left to do here is to properly clean up the uniqueID, as it is not possible to use it anymore for anything else as a uniqueID (used only in filters for the image names, if the name is not yet known).

Managing memory used by images


Figure1: Hierarchy before refactoring


Previously the memory managing was done on the level of GraphicObjects, where a GraphicManager and GraphicCache (see figure 1) were responsible to create new instances from uniqueID and manage the memory usage that GraphicObject take. This is not possible anymore as we don't operate with uniqueIDs anymore, but always use Graphic and XGraphic objects  (in UNO), so we need to manage the creation of Graphic object or more precisely - ImpGraphic (Graphic objects are just ref. counted objects of ImpGraphic). 
Figure 2: Hierarchy after refactoring
So to make this possible GraphicManager and GraphicCache need to be decoupled and removed from GraphicObject and a new manager needs to be introduced between Graphic and ImpGraphic, where the manager controls the creation and accounts for the memory usage (see Figure 2).

Graphic swapping and swapping strategy


In the To release the memory of graphic objects, we swap them out to a temp file and read back (swap-in) when we need them again. In the previous implementation this was partially directed by the SdrGrafObj (common image implementation) and SwGrfNode (Writer image implementation). For each graphic object there was a timer when to trigger an automatic swap-out + the swap-out that can happen when a memory limit is exceeded.

For the new code external swapping directed from SdrGrafObj and SwGrfNode was removed, so they can't influence when swapping will happen (maybe in the future they can provide hints when it is a good time to do swapping). There is now a global timer which triggers checking of all Graphic objects if any of them can be swapped out in case we exceed memory limit. Same code is triggered when a new object is created too. A object will be swapped out if it is not used for a certain amount of time. Each object tracks the timestamp when it was last used.

A swap-in happens if the object is swapped-out (obviously) and certain data is needed (under-laying bitmap, animation or metafile). This is checked at the same code-path where the timestamp updating happens.

The new swapping strategy is relatively simple - if a lot of memory is needed by graphic objects in a certain time, we let it use it and don't try to over-aggressively try to free it. In the past this cased swap-out and swap-in cycle that made the application completely unusable. In the future, external hints when a certain Graphic object can be swapped out may be added, so we can perform swapping more effectively. There are also several other ideas which will increase performance and reduce memory usage that can be implemented now with the new hierarchy where most all of the swapping is contained inside the Graphic itself, but all of this is currently out of the scope of this work.

Other changes to Graphic


Another changes to Graphic done were related to lazy loading. When a document is loaded, we don't want to load Graphic into memory, if it is not needed yet (for example we display the first page but the graphic is on page 10). In document filters (ODF for example) we previously transported the URL of an external or internal graphic to the document model, where it was lazily loaded when it was actually needed. This is not possible now anymore as we need to create a XGraphic object already in the document filter. To overcome this we need to to have an unloaded Graphic, which is created already in a swapped-out state and swapped-in when needed.

The GraphicFilter didn't allow something like this, so I needed to add a new method, which doesn't actually load the image, but just gathers what kind of the image is loaded and its metadata (image size) and creates a GfxLink object that includes the (compressed) image data. The metadata is needed as we don't want to actually force a load when this basic information is requested. Actually we want to load the image as late as this is possible.

Another issue is also that we can have an external image (loaded from a file or even URL on the internet). The issue is similar to the lazy loading scenario, but it is different that a Graphic now must know the URL with which it was created and can be created completely empty (no loading of any kind). The reason for this is that loading is directed by the LinkManager, which is part of the document model. For security reasons the LinkManager can not allow that a Graphic is loaded so loading is directed by the LinkManager on demand (first usage). LinkManager also takes care of all URLs of various external resources. The user can look at those resources and change the URL of them or trigger an update. Changing URL and updating an object was previously done in SdrGrafObj and SwGrfNode, but now this is moved to the common code in Graphic object where SdrGrafObj and SwGrfNode only direct what to do. There are still rooms to improve things here, however not the scope of this work.

Next steps


Finishing up this work by revising the UNO API and fixing known bugs.

Credits


Many thanks to Collabora ProductivityTDF and users that support the foundation by providing donations, to make this work possible.

To be continued...