Class PageFlushingHelper

java.lang.Object
com.itextpdf.kernel.pdf.PageFlushingHelper

public class PageFlushingHelper extends Object
This class allows to free the memory taken by already processed pages when handling big PDF files. It provides three alternative approaches for this, each of which has its own advantages and most suitable use cases: unsafeFlushDeep(int), releaseDeep(int), appendModeFlush(int).

Each approach is designed to be most suitable for specific modes of document processing. There are four document processing modes: reading, writing, stamping and append mode.

Reading mode: The PdfDocument instance is initialized using only PdfReader by PdfDocument(PdfReader) constructor.

Writing mode: The PdfDocument instance is initialized using only PdfWriter by PdfDocument(PdfWriter) constructor.

Stamping mode: The PdfDocument instance is initialized using both PdfReader and PdfWriter by PdfDocument(PdfReader, PdfWriter) constructor. If the optional third StampingProperties argument is passed, its StampingProperties.useAppendMode() method shall NOT be called.
This mode allows to update the existing document by completely recreating it. The complete document will be rewritten by the end of PdfDocument.close() call.

Append mode: The PdfDocument instance is initialized using both PdfReader and PdfWriter by PdfDocument(PdfReader, PdfWriter, StampingProperties) constructor. The third StampingProperties argument shall have StampingProperties.useAppendMode() method called.
This mode preserves the document intact with all its data, but adds additional data at the end of the file, which "overrides" and introduces amends to the original document. In this mode it's not required to rewrite the complete document which can be highly beneficial for big PDF documents handling.

The PageFlushingHelper class operates with two concepts of PDF objects states: flushed and released objects.

Flushed object is the one which is finalized and has been completely written to the output stream. This frees its memory but makes it impossible to modify it or read data from it. Whenever there is an attempt to modify or to fetch flushed object inner contents an exception will be thrown. Flushing is only possible for objects in the writing and stamping modes, also its possible to flush modified objects in append mode.

Released object is the one which has not been modified and has been "detached" from the PdfDocument, making it possible to remove it from memory during the GC, even if the document is not closed yet. All released object instances become read-only and any modifications will not be reflected in the resultant document. Read-only instances should be considered as copies of the original objects. Released objects can be re-read, however after re-reading new object instances are created. Releasing is only possible for not modified objects in reading, stamping and append modes. It's important to remember though, that during PdfDocument.close() in stamping mode all released objects will be re-read.

The PageFlushingHelper class doesn't work with PdfADocument instances.

  • Constructor Details

    • PageFlushingHelper

      public PageFlushingHelper (PdfDocument pdfDoc)
  • Method Details

    • unsafeFlushDeep

      public void unsafeFlushDeep (int pageNum)
      Flushes to the output stream all objects belonging to the given page. This frees the memory taken by those objects, but makes it impossible to modify them or read data from them.

      This method is mainly designed for writing and stamping modes. It will throw an exception for documents opened in reading mode (see PageFlushingHelper for more details on modes). This method can also be used for append mode if new pages are added or existing pages are heavily modified and appendModeFlush(int) is not enough.

      This method is highly effective in freeing the memory and works properly for the vast majority of documents and use cases, however it can potentially cause failures. If document handling fails with exception after using this method, one should re-process the document with a "safe flushing" alternative (see PdfPage.flush() or consider using append mode and appendModeFlush(int) method).

      The unsafety comes from the possibility of objects being shared between pages and the fact that object data cannot be read after the flushing. Whenever flushed object is attempted to be modified or its data is fetched the exception will be thrown (flushed object can be added to the other objects, though).

      In stamping/append mode the issue occurs if some object is shared between two or more pages, and the first page is flushed, and later for processing of the second page this object is required to be read/modified. Normally only page resources (like images and fonts) are shared, which are often not required for page processing: for example for page stamping (e.g. adding watermarks, headers, etc) only new resources are added. Among examples of when the page resources are indeed required (and therefore the risk of this method causing failures being high) would be page contents parsing: text extraction, any general PdfCanvasProcessor class usage, usage of pdfSweep addon.

      In writing mode this method normally will work without issues: by default iText creates page objects in such way that they are independent from each other. Again, the resources can be shared, but as mentioned above it's safe to add already flushed resources to the other pages because this doesn't require reading data from them.

      For append mode only modified objects are flushed, all others are released and can be re-read later on.

      This method shall be used only when it's known that the page and its inner structures processing is finished. This includes reading data from pages, page modification and page handling via addons/utilities.

      Parameters:
      pageNum - the page number which low level objects structure is to be flushed to the output stream.
    • releaseDeep

      public void releaseDeep (int pageNum)
      Releases memory taken by all not modified objects belonging to the given page, including the page dictionary itself. This affects only the objects that are read from the existing input PDF.

      This method is mainly designed for reading mode and also can be used in append mode (see PageFlushingHelper for more details on modes). In append mode modified objects will be kept in memory. The page and all its inner structure objects can be re-read again.

      This method will not have any effect in the writing mode. It is also not advised to be used in stamping mode: even though it will indeed release the objects, they will be definitely re-read again on document closing, which would affect performance.

      When using this method in append mode (or in stamping mode), be careful not to try to modify the object instances obtained before the releasing! See PageFlushingHelper for details on released objects state.

      This method shall be used only when it's known that the page and its inner structures processing is finished. This includes reading data from pages, page modification and page handling via addons/utilities.

      Parameters:
      pageNum - the page number which low level objects structure is to be released from memory.
    • appendModeFlush

      public void appendModeFlush (int pageNum)
      Flushes to the output stream modified objects that can belong only to the given page, which makes this method "safe" compared to the unsafeFlushDeep(int). Flushed object frees the memory, but it's impossible to modify such objects or read data from them. This method releases all other page structure objects that are not modified.

      This method is mainly designed for the append mode. It is similar to the PdfPage.flush(), but it additionally releases all page objects that were not flushed. This method is ideal for small amendments of pages, but it makes more sense to use PdfPage.flush() for newly created or heavily modified pages.
      This method will throw an exception for documents opened in reading mode (see PageFlushingHelper for more details on modes). It is also not advised to be used in stamping mode: even though it will indeed release the objects and free the memory, the released objects will definitely be re-read again on document closing, which would affect performance.

      When using this method in append mode (or in stamping mode), be careful not to try to modify the object instances obtained before this method call! See PageFlushingHelper for details on released and flushed objects state.

      This method shall be used only when it's known that the page and its inner structures processing is finished. This includes reading data from pages, page modification and page handling via addons/utilities.

      Parameters:
      pageNum - the page number which low level objects structure is to be flushed or released from memory.