Class TaggedPdfReaderTool

java.lang.Object
com.itextpdf.kernel.utils.TaggedPdfReaderTool

public class TaggedPdfReaderTool extends Object
Converts a tagged PDF document into an XML file.
  • Field Details

  • Constructor Details

  • Method Details

    • isValidCharacterValue

      public static boolean isValidCharacterValue (int c)
      Checks if a character value should be escaped/unescaped.
      Parameters:
      c - a character value
      Returns:
      true if it's OK to escape or unescape this value
    • convertToXml

      public void convertToXml (OutputStream os) throws IOException
      Converts the current tag structure into an XML file with default encoding (UTF-8).
      Parameters:
      os - the output stream to save XML file to
      Throws:
      IOException - in case of any I/O error
    • convertToXml

      public void convertToXml (OutputStream os, String charset) throws IOException
      Converts the current tag structure into an XML file with provided encoding.
      Parameters:
      os - the output stream to save XML file to
      charset - the charset of the resultant XML file
      Throws:
      IOException - in case of any I/O error
    • setRootTag

      public TaggedPdfReaderTool setRootTag (String rootTagName)
      Sets the name of the root tag of the resultant XML file
      Parameters:
      rootTagName - the name of the root tag
      Returns:
      this object
    • inspectKids

      protected void inspectKids (List<IStructureNode> kids)
    • inspectKid

      protected void inspectKid (IStructureNode kid)
    • inspectAttributes

      protected void inspectAttributes (PdfStructElem kid)
    • parseTag

      protected void parseTag (PdfMcr kid)
    • fixTagName

      protected static String fixTagName (String tag)
    • escapeXML

      protected static String escapeXML (String s, boolean onlyASCII)
      NOTE: copied from itext5 XMLUtils class Escapes a string with the appropriated XML codes.
      Parameters:
      s - the string to be escaped
      onlyASCII - codes above 127 will always be escaped with &#nn; if true
      Returns:
      the escaped string