Tuesday, October 13, 2020

PDF annotations support

In Draw, a PDF document can be opened using the PDFium library for rendering, where each page in the Draw document contains an rendered image from the PDF. This mode is useful for viewing PDFs and allows for the best fidelity. With viewing, there is also a need to review and comment and this is where PDF annotations come in as adding the support for the PDF annotations and to support a review based workflow has been one of my recent task at Collabora Productivity.

PDF supports a wide variety of annotations, but we don't support all of them in Draw. What we do support are comments, which are similar to pop-up note annotations in PDF, so the easiest is to add those first. To be able to use pop-up notes in Draw, we need to import them. This is done at import by using PDFium after we created the PDF graphic for rendering. In Draw, we then insert this as comments and so we get the basic support for manipulating with annotations, but how to save the changes? PDF export already supports saving comments as annotations, so this mostly already works (I needed to fix some bugs and add support for saving all needed properties).

Figure 1. Pop-up Note annotation in PDF viewer (Evince) and Draw

How the output compares between a PDF viewer and inside LibreOffice can be seen in Figure 1. All this is available in LibreOffice master and should be included in LibreOffice 7.1.

What about other annotations that are supported in PDF? The work to add those is ongoing and the recent success has been adding support to draw vector graphic annotations like polygon, ink (freehand), squares (rectangles) and circles (ellipses). So the idea is that instead of the usual marker for a comment, we draw a vector graphic that we read from the PDF annotation. 

The first thing I needed to do is to extend the PDFium library, which didn't support reading all polygon vertices and ink strokes from the document. What also wasn't supported is reading the border information, which is needed for line widths.

Next thing is extending the import to read the geometry data, so I dded a special PDFAnnotationMarker class for that. Then the geometry data needs to be stored on the sd::Annotation class (implementation of XAnnotation, but I didn't extend XAnnotation at this point as it wasn't needed - yet) . Drawing is performed in AnnotationTag, where we create a new OverlayPolyPolygon object, that is responsible for creating the Primitive2D for the marker. What we get after this is done is shown in figure 2.


Figure 2. Multiple annotations in PDF Viewer and Draw

This work will shortly be merged to the LibreOffice master. 

Wednesday, January 22, 2020

Accessibility checker and support for PDF/UA specs

PDF/UA or ISO 14289 is a specifications that defines the requirements for accessibility in a PDF document. The specification defines the required structure and formatting of the document (also refers to WCAG specification from W3C for use on the web) and PDF features, which should be enabled or disabled so the document is better suited for accessibility (for example PDF tags are required).

Thanks to the Dutch Standardisation Forum for financially sponsoring and Collabora Productivity in cooperation with Nou&Off for the work on implementing this specification into LibreOffice.

Figure 1: PDF Export
The implementation in LibreOffice is currently done only for Writer. It consists of two parts.: First is to enable PDF/UA support into PDF export, which writes a PDF/UA flag into the PDF document and enables all the required features. The PDF export dialog was extended with a checkbox (see figure 1).

Figure 2: Accessibility Check in menu 

The second part is an accessibility check functionality, which traverses the document structure and gather all possible accessibility issues. The accessibility check can be run manually through the menu under Tools - Accessibility Checker (see figure 2), or it will be triggered after PDF export dialog, if the PDF/UA support is enabled. The accessibility issues are presented in a dialog (see figure 3).

Figure 3: Accessibility check dialog

The checks that are (currently) implemented are:

  • Check that the document title is set.
  • Check that the document language is set, or that all styles that are in use, have the language set.
  • Check all images, graphics, OLE objects for the alt (or title in some objects) text.
  • Check tables don't include splits or merges, which aren't allowed by the specifications. The table should be 
  • Check for fake/manual numbering (not using integrated numbering). For example writing "1." "2." "3." at the beginning of the paragraphs.  
  • Check that hyperlink text is not a hyperlink itself - hyperlink should be described. 
  • Check for the contrast between text and the background. The algorithm is described in the WCAG specification.  
  • Check for blinking text, discouraged for the obvious reasons.
  • Check for footnotes and endnotes, which should be avoided.
  • Check for heading order. Order of the headings must increase incrementally with no skips (for example Heading 1 to Heading 3, skipping Heading 2).
  • Check, if text conveys additional meaning with (direct) formatting.

The PDF/UA support will be included in LibreOffice 7.