Package com.itextpdf.kernel.utils
Class TaggedPdfReaderTool
java.lang.Object
com.itextpdf.kernel.utils.TaggedPdfReaderTool
Converts a tagged PDF document into an XML file.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected PdfDocumentprotected OutputStreamWriterprotected Map<PdfDictionary,Map<Integer, String>> protected String -
Constructor Summary
ConstructorsConstructorDescriptionTaggedPdfReaderTool(PdfDocument document) Constructs aTaggedPdfReaderToolvia a givenPdfDocument. -
Method Summary
Modifier and TypeMethodDescriptionvoidConverts the current tag structure into an XML file with default encoding (UTF-8).voidconvertToXml(OutputStream os, String charset) Converts the current tag structure into an XML file with provided encoding.protected static StringNOTE: copied from itext5 XMLUtils class Escapes a string with the appropriated XML codes.protected static StringfixTagName(String tag) Fixes specified tag name to be valid XML tag.protected voidInspects attributes dictionary of the StructTreeRoot child.protected voidinspectKid(IStructureNode kid) Inspect the child of the StructTreeRoot.protected voidinspectKids(List<IStructureNode> kids) Inspect the children of the StructTreeRoot.static booleanisValidCharacterValue(int c) Checks if a character value should be escaped/unescaped.protected voidParses tag of the Marked Content Reference (MCR) kid of the StructTreeRoot.setRootTag(String rootTagName) Sets the name of the root tag of the resultant XML file
-
Field Details
-
document
-
out
-
rootTag
-
parsedTags
-
-
Constructor Details
-
TaggedPdfReaderTool
Constructs aTaggedPdfReaderToolvia a givenPdfDocument.- Parameters:
-
document- the document to read tag structure from
-
-
Method Details
-
isValidCharacterValue
public static boolean isValidCharacterValue(int c) Checks if a character value should be escaped/unescaped.- Parameters:
-
c- a character value - Returns:
- true if it's OK to escape or unescape this value.
-
convertToXml
Converts the current tag structure into an XML file with default encoding (UTF-8).- Parameters:
-
os- the output stream to save XML file to - Throws:
-
IOException- in case of any I/O error
-
convertToXml
Converts the current tag structure into an XML file with provided encoding.- Parameters:
-
os- the output stream to save XML file to -
charset- the charset of the resultant XML file - Throws:
-
IOException- in case of any I/O error
-
setRootTag
Sets the name of the root tag of the resultant XML file- Parameters:
-
rootTagName- the name of the root tag - Returns:
- this object
-
inspectKids
Inspect the children of the StructTreeRoot.- Parameters:
-
kids- list of the direct kids of the StructTreeRoot
-
inspectKid
Inspect the child of the StructTreeRoot.- Parameters:
-
kid- the direct kid of the StructTreeRoot
-
inspectAttributes
Inspects attributes dictionary of the StructTreeRoot child.- Parameters:
-
kid- the direct kid of the StructTreeRoot
-
parseTag
Parses tag of the Marked Content Reference (MCR) kid of the StructTreeRoot.- Parameters:
-
kid- the directPdfMcrkid of the StructTreeRoot
-
fixTagName
Fixes specified tag name to be valid XML tag.- Parameters:
-
tag- tag name to fix - Returns:
- fixed tag name.
-
escapeXML
NOTE: copied from itext5 XMLUtils class Escapes a string with the appropriated XML codes.- Parameters:
-
s- the string to be escaped -
onlyASCII- codes above 127 will always be escaped with nn; iftrue - Returns:
- the escaped string
-