Class CompareTool


public class CompareTool extends Object
This class provides means to compare two PDF files both by content and visually and gives the report on their differences.

For visual comparison it uses external tools: Ghostscript and ImageMagick, which should be installed on your machine. To allow CompareTool to use them, you need to pass either java properties or environment variables with names "ITEXT_GS_EXEC" and "ITEXT_MAGICK_COMPARE_EXEC", which would contain the commands to execute the Ghostscript and ImageMagick tools.

CompareTool class was mainly designed for the testing purposes of iText in order to ensure that the same code produces the same PDF document. For this reason you will often encounter such parameter names as "outDoc" and "cmpDoc" which stand for output document and document-for-comparison. The first one is viewed as the current result, and the second one is referred as normal or ideal result. OutDoc is compared to the ideal cmpDoc. Therefore all reports of the comparison are in the form: "Expected ..., but was ...". This should be interpreted in the following way: "expected" part stands for the content of the cmpDoc and "but was" part stands for the content of the outDoc.

  • Constructor Details

    • CompareTool

      public CompareTool()
  • Method Details

    • compareByCatalog

      public CompareTool.CompareResult compareByCatalog (PdfDocument outDocument, PdfDocument cmpDocument)
      Compares two PDF documents by content starting from Catalog dictionary and then recursively comparing corresponding objects which are referenced from it. You can roughly imagine it as depth-first traversal of the two trees that represent pdf objects structure of the documents.

      The main difference between this method and the compareByContent(String, String, String, String) methods is the return value. This method returns a CompareTool.CompareResult class instance, which could be used in code, whilst compareByContent methods in case of the differences simply return String value, which could only be printed. Also, keep in mind that this method doesn't perform visual comparison of the documents.

      For more explanations about what outDoc and cmpDoc are see last paragraph of the CompareTool class description.

      outDocument - a PdfDocument corresponding to the output file, which is to be compared with cmp-file.
      cmpDocument - a PdfDocument corresponding to the cmp-file, which is to be compared with output file.
      the report on comparison of two files in the form of the custom class CompareTool.CompareResult instance.
      See Also:
    • disableCachedPagesComparison

      public CompareTool disableCachedPagesComparison()
      Disables the default logic of pages comparison. This option makes sense only for compareByCatalog(PdfDocument, PdfDocument) method.

      By default, pages are treated as special objects and if they are met in the process of comparison, then they are not checked as objects, but rather simply checked that they have same page numbers in both documents. This behaviour is intended for the compareByContent(java.lang.String, java.lang.String, java.lang.String) set of methods, because in them documents are compared in page by page basis. Thus, we don't need to check if pages are of the same content when they are met in comparison process, we are sure that we will compare their content or we have already compared them.

      However, if you would use compareByCatalog(com.itextpdf.kernel.pdf.PdfDocument, com.itextpdf.kernel.pdf.PdfDocument) with default behaviour of pages comparison, pages won't be checked at all, every time when reference to the page dictionary is met, only page numbers will be compared for both documents. You can say that in this case, comparison will be performed for all document's catalog entries except /Pages (However in fact, document's page tree structures will be compared, but pages themselves - won't).

      this CompareTool instance.
    • setCompareByContentErrorsLimit

      public CompareTool setCompareByContentErrorsLimit (int compareByContentMaxErrorCount)
      Sets the maximum errors count which will be returned as the result of the comparison.
      compareByContentMaxErrorCount - the errors count.
      this CompareTool instance.
    • setGenerateCompareByContentXmlReport

      public CompareTool setGenerateCompareByContentXmlReport (boolean generateCompareByContentXmlReport)
      Enables or disables the generation of the comparison report in the form of an xml document.

      IMPORTANT NOTE: this flag affects only the comparison performed by compareByContent methods!

      generateCompareByContentXmlReport - true to enable xml report generation, false - to disable.
      this CompareTool instance.
    • setEventCountingMetaInfo

      public void setEventCountingMetaInfo (IMetaInfo metaInfo)
      Sets IMetaInfo info that will be used for both read and written documents creation.
      metaInfo - meta info to set
    • enableEncryptionCompare

      public CompareTool enableEncryptionCompare()
      Enables the comparison of the encryption properties of the documents. Encryption properties comparison results are returned along with all other comparison results.

      IMPORTANT NOTE: this flag affects only the comparison performed by compareByContent methods! compareByCatalog(PdfDocument, PdfDocument) doesn't compare encryption properties because encryption properties aren't part of the document's Catalog.

      this CompareTool instance.
    • getOutReaderProperties

      public ReaderProperties getOutReaderProperties()
      Gets ReaderProperties to be passed later to the PdfReader of the output document.

      Documents for comparison are opened in reader mode. This method is intended to alter ReaderProperties which are used to open the output document. This is particularly useful for comparison of encrypted documents.

      For more explanations about what outDoc and cmpDoc are see last paragraph of the CompareTool class description.

      ReaderProperties instance to be passed later to the PdfReader of the output document.
    • getCmpReaderProperties

      public ReaderProperties getCmpReaderProperties()
      Gets ReaderProperties to be passed later to the PdfReader of the cmp document.

      Documents for comparison are opened in reader mode. This method is intended to alter ReaderProperties which are used to open the cmp document. This is particularly useful for comparison of encrypted documents.

      For more explanations about what outDoc and cmpDoc are see last paragraph of the CompareTool class description.

      ReaderProperties instance to be passed later to the PdfReader of the cmp document.
    • compareVisually

      public String compareVisually (String outPdf, String cmpPdf, String outPath, String differenceImagePrefix) throws InterruptedException, IOException
      Compares two documents visually. For the comparison two external tools are used: Ghostscript and ImageMagick. For more info about needed configuration for visual comparison process see CompareTool class description.

      Note, that this method uses ImageMagickHelper and GhostscriptHelper classes and therefore may create temporary files and directories.

      During comparison for every page of the two documents an image file will be created in the folder specified by outPath parameter. Then those page images will be compared and if there are any differences for some pages, another image file will be created with marked differences on it.

      outPdf - the absolute path to the output file, which is to be compared to cmp-file.
      cmpPdf - the absolute path to the cmp-file, which is to be compared to output file.
      outPath - the absolute path to the folder, which will be used to store image files for visual comparison.
      differenceImagePrefix - file name prefix for image files with marked differences if there is any.
      string containing list of the pages that are visually different, or null if there are no visual differences.
      InterruptedException - if the current thread is interrupted by another thread while it is waiting for ghostscript or imagemagic processes, then the wait is ended and an InterruptedException is thrown.
      IOException - is thrown if any of the input files are missing or any of the auxiliary files that are created during comparison process weren't possible to be created.
    • compareVisually

      public String compareVisually (String outPdf, String cmpPdf, String outPath, String differenceImagePrefix, Map<Integer,List<Rectangle>> ignoredAreas) throws InterruptedException, IOException
      Compares two documents visually. For the comparison two external tools are used: Ghostscript and ImageMagick. For more info about needed configuration for visual comparison process see CompareTool class description.

      Note, that this method uses ImageMagickHelper and GhostscriptHelper classes and therefore may create temporary files and directories.

      During comparison for every page of two documents an image file will be created in the folder specified by outPath parameter. Then those page images will be compared and if there are any differences for some pages, another image file will be created with marked differences on it.

      It is possible to ignore certain areas of the document pages during visual comparison. This is useful for example in case if documents should be the same except certain page area with date on it. In this case, in the folder specified by the outPath, new pdf documents will be created with the black rectangles at the specified ignored areas, and visual comparison will be performed on these new documents.

      outPdf - the absolute path to the output file, which is to be compared to cmp-file.
      cmpPdf - the absolute path to the cmp-file, which is to be compared to output file.
      outPath - the absolute path to the folder, which will be used to store image files for visual comparison.
      differenceImagePrefix - file name prefix for image files with marked differences if there is any.
      ignoredAreas - a map with one-based page numbers as keys and lists of ignored rectangles as values.
      string containing list of the pages that are visually different, or null if there are no visual differences.
      InterruptedException - if the current thread is interrupted by another thread while it is waiting for ghostscript or imagemagic processes, then the wait is ended and an InterruptedException is thrown.
      IOException - is thrown if any of the input files are missing or any of the auxiliary files that are created during comparison process weren't possible to be created.
    • compareByContent

      public String compareByContent (String outPdf, String cmpPdf, String outPath) throws InterruptedException, IOException
      Compares two PDF documents by content starting from page dictionaries and then recursively comparing corresponding objects which are referenced from them. You can roughly imagine it as depth-first traversal of the two trees that represent pdf objects structure of the documents.

      When comparison by content is finished, if any differences were found, visual comparison is automatically started. For this overload, differenceImagePrefix value is generated using diff_%outPdfFileName%_ format.

      For more explanations about what outPdf and cmpPdf are see last paragraph of the CompareTool class description.

      outPdf - the absolute path to the output file, which is to be compared to cmp-file.
      cmpPdf - the absolute path to the cmp-file, which is to be compared to output file.
      outPath - the absolute path to the folder, which will be used to store image files for visual comparison.
      string containing text report on the encountered content differences and also list of the pages that are visually different, or null if there are no content and therefore no visual differences.
      InterruptedException - if the current thread is interrupted by another thread while it is waiting for ghostscript or imagemagic processes, then the wait is ended and an InterruptedException is thrown.
      IOException - is thrown if any of the input files are missing or any of the auxiliary files that are created during comparison process weren't possible to be created.
      See Also:
    • compareByContent

      public String compareByContent (String outPdf, String cmpPdf, String outPath, String differenceImagePrefix) throws InterruptedException, IOException
      Compares two PDF documents by content starting from page dictionaries and then recursively comparing corresponding objects which are referenced from them. You can roughly imagine it as depth-first traversal of the two trees that represent pdf objects structure of the documents.

      When comparison by content is finished, if any differences were found, visual comparison is automatically started.

      For more explanations about what outPdf and cmpPdf are see last paragraph of the CompareTool class description.

      outPdf - the absolute path to the output file, which is to be compared to cmp-file.
      cmpPdf - the absolute path to the cmp-file, which is to be compared to output file.
      outPath - the absolute path to the folder, which will be used to store image files for visual comparison.
      differenceImagePrefix - file name prefix for image files with marked visual differences if there are any; if it's set to null the prefix defaults to diff_%outPdfFileName%_ format.
      string containing text report on the encountered content differences and also list of the pages that are visually different, or null if there are no content and therefore no visual differences.
      InterruptedException - if the current thread is interrupted by another thread while it is waiting for ghostscript or imagemagic processes, then the wait is ended and an InterruptedException is thrown.
      IOException - is thrown if any of the input files are missing or any of the auxiliary files that are created during comparison process weren't possible to be created.
      See Also:
    • compareByContent

      public String compareByContent (String outPdf, String cmpPdf, String outPath, String differenceImagePrefix, byte[] outPass, byte[] cmpPass) throws InterruptedException, IOException
      This method overload is used to compare two encrypted PDF documents. Document passwords are passed with outPass and cmpPass parameters.

      Compares two PDF documents by content starting from page dictionaries and then recursively comparing corresponding objects which are referenced from them. You can roughly imagine it as depth-first traversal of the two trees that represent pdf objects structure of the documents.

      When comparison by content is finished, if any differences were found, visual comparison is automatically started. For more info see compareVisually(String, String, String, String).

      For more explanations about what outPdf and cmpPdf are see last paragraph of the CompareTool class description.

      outPdf - the absolute path to the output file, which is to be compared to cmp-file.
      cmpPdf - the absolute path to the cmp-file, which is to be compared to output file.
      outPath - the absolute path to the folder, which will be used to store image files for visual comparison.
      differenceImagePrefix - file name prefix for image files with marked visual differences if there is any; if it's set to null the prefix defaults to diff_%outPdfFileName%_ format.
      outPass - password for the encrypted document specified by the outPdf absolute path.
      cmpPass - password for the encrypted document specified by the cmpPdf absolute path.
      string containing text report on the encountered content differences and also list of the pages that are visually different, or null if there are no content and therefore no visual differences.
      InterruptedException - if the current thread is interrupted by another thread while it is waiting for ghostscript or imagemagic processes, then the wait is ended and an InterruptedException is thrown.
      IOException - is thrown if any of the input files are missing or any of the auxiliary files that are created during comparison process weren't possible to be created.
      See Also:
    • compareByContent

      public String compareByContent (String outPdf, String cmpPdf, String outPath, String differenceImagePrefix, Map<Integer,List<Rectangle>> ignoredAreas) throws InterruptedException, IOException
      Compares two PDF documents by content starting from page dictionaries and then recursively comparing corresponding objects which are referenced from them. You can roughly imagine it as depth-first traversal of the two trees that represent pdf objects structure of the documents.

      When comparison by content is finished, if any differences were found, visual comparison is automatically started.

      For more explanations about what outPdf and cmpPdf are see last paragraph of the CompareTool class description.

      outPdf - the absolute path to the output file, which is to be compared to cmp-file.
      cmpPdf - the absolute path to the cmp-file, which is to be compared to output file.
      outPath - the absolute path to the folder, which will be used to store image files for visual comparison.
      differenceImagePrefix - file name prefix for image files with marked visual differences if there are any; if it's set to null the prefix defaults to diff_%outPdfFileName%_ format.
      ignoredAreas - a map with one-based page numbers as keys and lists of ignored rectangles as values.
      string containing text report on the encountered content differences and also list of the pages that are visually different, or null if there are no content and therefore no visual differences.
      InterruptedException - if the current thread is interrupted by another thread while it is waiting for ghostscript or imagemagic processes, then the wait is ended and an InterruptedException is thrown.
      IOException - is thrown if any of the input files are missing or any of the auxiliary files that are created during comparison process weren't possible to be created.
      See Also:
    • compareByContent

      public String compareByContent (String outPdf, String cmpPdf, String outPath, String differenceImagePrefix, Map<Integer,List<Rectangle>> ignoredAreas, byte[] outPass, byte[] cmpPass) throws InterruptedException, IOException
      This method overload is used to compare two encrypted PDF documents. Document passwords are passed with outPass and cmpPass parameters.

      Compares two PDF documents by content starting from page dictionaries and then recursively comparing corresponding objects which are referenced from them. You can roughly imagine it as depth-first traversal of the two trees that represent pdf objects structure of the documents.

      When comparison by content is finished, if any differences were found, visual comparison is automatically started.

      For more explanations about what outPdf and cmpPdf are see last paragraph of the CompareTool class description.

      outPdf - the absolute path to the output file, which is to be compared to cmp-file.
      cmpPdf - the absolute path to the cmp-file, which is to be compared to output file.
      outPath - the absolute path to the folder, which will be used to store image files for visual comparison.
      differenceImagePrefix - file name prefix for image files with marked visual differences if there are any; if it's set to null the prefix defaults to diff_%outPdfFileName%_ format.
      ignoredAreas - a map with one-based page numbers as keys and lists of ignored rectangles as values.
      outPass - password for the encrypted document specified by the outPdf absolute path.
      cmpPass - password for the encrypted document specified by the cmpPdf absolute path.
      string containing text report on the encountered content differences and also list of the pages that are visually different, or null if there are no content and therefore no visual differences.
      InterruptedException - if the current thread is interrupted by another thread while it is waiting for ghostscript or imagemagic processes, then the wait is ended and an InterruptedException is thrown.
      IOException - is thrown if any of the input files are missing or any of the auxiliary files that are created during comparison process weren't possible to be created.
      See Also:
    • compareDictionaries

      public boolean compareDictionaries (PdfDictionary outDict, PdfDictionary cmpDict)
      Simple method that compares two given PdfDictionaries by content. This is "deep" comparing, which means that all nested objects are also compared by content.
      outDict - dictionary to compare.
      cmpDict - dictionary to compare.
      true if dictionaries are equal by content, otherwise false.
    • compareDictionariesStructure

      public CompareTool.CompareResult compareDictionariesStructure (PdfDictionary outDict, PdfDictionary cmpDict)
      Recursively compares structures of two corresponding dictionaries from out and cmp PDF documents. You can roughly imagine it as depth-first traversal of the two trees that represent pdf objects structure of the documents.

      Both out and cmp PdfDictionary shall have indirect references.

      By default page dictionaries are excluded from the comparison when met and are instead compared in a special manner, simply comparing their page numbers. This behavior can be disabled by calling disableCachedPagesComparison().

      For more explanations about what outPdf and cmpPdf are see last paragraph of the CompareTool class description.

      outDict - an indirect PdfDictionary from the output file, which is to be compared to cmp-file dictionary.
      cmpDict - an indirect PdfDictionary from the cmp-file file, which is to be compared to output file dictionary.
      CompareTool.CompareResult instance containing differences between the two dictionaries, or null if dictionaries are equal.
    • compareDictionariesStructure

      public CompareTool.CompareResult compareDictionariesStructure (PdfDictionary outDict, PdfDictionary cmpDict, Set<PdfName> excludedKeys)
      Recursively compares structures of two corresponding dictionaries from out and cmp PDF documents. You can roughly imagine it as depth-first traversal of the two trees that represent pdf objects structure of the documents.

      Both out and cmp PdfDictionary shall have indirect references.

      By default page dictionaries are excluded from the comparison when met and are instead compared in a special manner, simply comparing their page numbers. This behavior can be disabled by calling disableCachedPagesComparison().

      For more explanations about what outPdf and cmpPdf are see last paragraph of the CompareTool class description.

      outDict - an indirect PdfDictionary from the output file, which is to be compared to cmp-file dictionary.
      cmpDict - an indirect PdfDictionary from the cmp-file file, which is to be compared to output file dictionary.
      excludedKeys - a Set of names that designate entries from outDict and cmpDict dictionaries which are to be skipped during comparison.
      CompareTool.CompareResult instance containing differences between the two dictionaries, or null if dictionaries are equal.
    • compareStreamsStructure

      public CompareTool.CompareResult compareStreamsStructure (PdfStream outStream, PdfStream cmpStream)
      Compares structures of two corresponding streams from out and cmp PDF documents. You can roughly imagine it as depth-first traversal of the two trees that represent pdf objects structure of the documents.

      For more explanations about what outPdf and cmpPdf are see last paragraph of the CompareTool class description.

      outStream - a PdfStream from the output file, which is to be compared to cmp-file stream.
      cmpStream - a PdfStream from the cmp-file file, which is to be compared to output file stream.
      CompareTool.CompareResult instance containing differences between the two streams, or null if streams are equal.
    • compareStreams

      public boolean compareStreams (PdfStream outStream, PdfStream cmpStream)
      Simple method that compares two given PdfStreams by content. This is "deep" comparing, which means that all nested objects are also compared by content.
      outStream - stream to compare.
      cmpStream - stream to compare.
      true if stream are equal by content, otherwise false.
    • compareArrays

      public boolean compareArrays (PdfArray outArray, PdfArray cmpArray)
      Simple method that compares two given PdfArrays by content. This is "deep" comparing, which means that all nested objects are also compared by content.
      outArray - array to compare.
      cmpArray - array to compare.
      true if arrays are equal by content, otherwise false.
    • compareNames

      public boolean compareNames (PdfName outName, PdfName cmpName)
      Simple method that compares two given PdfNames.
      outName - name to compare.
      cmpName - name to compare.
      true if names are equal, otherwise false.
    • compareNumbers

      public boolean compareNumbers (PdfNumber outNumber, PdfNumber cmpNumber)
      Simple method that compares two given PdfNumbers.
      outNumber - number to compare.
      cmpNumber - number to compare.
      true if numbers are equal, otherwise false.
    • compareStrings

      public boolean compareStrings (PdfString outString, PdfString cmpString)
      Simple method that compares two given PdfStrings.
      outString - string to compare.
      cmpString - string to compare.
      true if strings are equal, otherwise false.
    • compareBooleans

      public boolean compareBooleans (PdfBoolean outBoolean, PdfBoolean cmpBoolean)
      Simple method that compares two given PdfBooleans.
      outBoolean - boolean to compare.
      cmpBoolean - boolean to compare.
      true if booleans are equal, otherwise false.
    • compareXmp

      public String compareXmp (String outPdf, String cmpPdf)
      Compares xmp metadata of the two given PDF documents.
      outPdf - the absolute path to the output file, which xmp is to be compared to cmp-file.
      cmpPdf - the absolute path to the cmp-file, which xmp is to be compared to output file.
      text report on the xmp differences, or null if there are no differences.
    • compareXmp

      public String compareXmp (String outPdf, String cmpPdf, boolean ignoreDateAndProducerProperties)
      Compares xmp metadata of the two given PDF documents.
      outPdf - the absolute path to the output file, which xmp is to be compared to cmp-file.
      cmpPdf - the absolute path to the cmp-file, which xmp is to be compared to output file.
      ignoreDateAndProducerProperties - true, if to ignore differences in date or producer xmp metadata properties.
      text report on the xmp differences, or null if there are no differences.
    • compareXmls

      public boolean compareXmls (byte[] xml1, byte[] xml2) throws ParserConfigurationException, SAXException, IOException
      Utility method that provides simple comparison of the two xml files stored in byte arrays.
      xml1 - first xml file data to compare.
      xml2 - second xml file data to compare.
      true if xml structures are identical, false otherwise.
      ParserConfigurationException - if a XML DocumentBuilder cannot be created which satisfies the configuration requested.
      SAXException - if any XML parse errors occur.
      IOException - If any IO errors occur during reading XML files.
    • compareXmls

      public boolean compareXmls (String outXmlFile, String cmpXmlFile) throws ParserConfigurationException, SAXException, IOException
      Utility method that provides simple comparison of the two xml files.
      outXmlFile - absolute path to the out xml file to compare.
      cmpXmlFile - absolute path to the cmp xml file to compare.
      true if xml structures are identical, false otherwise.
      ParserConfigurationException - if a XML DocumentBuilder cannot be created which satisfies the configuration requested.
      SAXException - if any XML parse errors occur.
      IOException - If any IO errors occur during reading XML files.
    • compareDocumentInfo

      public String compareDocumentInfo (String outPdf, String cmpPdf, byte[] outPass, byte[] cmpPass) throws IOException
      Compares document info dictionaries of two pdf documents.

      This method overload is used to compare two encrypted PDF documents. Document passwords are passed with outPass and cmpPass parameters.

      outPdf - the absolute path to the output file, which info is to be compared to cmp-file info.
      cmpPdf - the absolute path to the cmp-file, which info is to be compared to output file info.
      outPass - password for the encrypted document specified by the outPdf absolute path.
      cmpPass - password for the encrypted document specified by the cmpPdf absolute path.
      text report on the differences in documents infos.
      IOException - if PDF reader cannot be created due to IO issues
    • compareDocumentInfo

      public String compareDocumentInfo (String outPdf, String cmpPdf) throws IOException
      Compares document info dictionaries of two pdf documents.
      outPdf - the absolute path to the output file, which info is to be compared to cmp-file info.
      cmpPdf - the absolute path to the cmp-file, which info is to be compared to output file info.
      text report on the differences in documents infos.
      IOException - if PDF reader cannot be created due to IO issues
    • compareLinkAnnotations

      public String compareLinkAnnotations (String outPdf, String cmpPdf) throws IOException
      Checks if two documents have identical link annotations on corresponding pages.
      outPdf - the absolute path to the output file, which links are to be compared to cmp-file links.
      cmpPdf - the absolute path to the cmp-file, which links are to be compared to output file links.
      text report on the differences in documents links.
      IOException - if PDF reader cannot be created due to IO issues
    • compareTagStructures

      public String compareTagStructures (String outPdf, String cmpPdf) throws IOException, ParserConfigurationException, SAXException
      Compares tag structures of the two PDF documents.

      This method creates xml files in the same folder with outPdf file. These xml files contain documents tag structures converted into the xml structure. These xml files are compared if they are equal.

      outPdf - the absolute path to the output file, which tags are to be compared to cmp-file tags.
      cmpPdf - the absolute path to the cmp-file, which tags are to be compared to output file tags.
      text report of the differences in documents tags.
      IOException - is thrown if any of the input files are missing or any of the auxiliary files that are created during comparison process weren't possible to be created.
      ParserConfigurationException - if a XML DocumentBuilder cannot be created which satisfies the configuration requested.
      SAXException - if any XML parse errors occur.
    • convertDocInfoToStrings

      protected String[] convertDocInfoToStrings (PdfDocumentInfo info)
      Converts document info into a string array.

      Converts document info into a string array. It can be used to compare PdfDocumentInfo later on. Default implementation retrieves title, author, subject, keywords and producer.

      info - an instance of PdfDocumentInfo to be converted.
      String array with all the document info tester is interested in.
    • compareObjects

      protected boolean compareObjects (PdfObject outObj, PdfObject cmpObj, ObjectPath currentPath, CompareTool.CompareResult compareResult)