Package com.itextpdf.kernel.utils
Class TaggedPdfReaderTool
java.lang.Object
com.itextpdf.kernel.utils.TaggedPdfReaderTool
Converts a tagged PDF document into an XML file.
-
Field Summary
Modifier and TypeFieldDescriptionprotected PdfDocument
protected OutputStreamWriter
protected Map<PdfDictionary,
Map<Integer, String>> protected String
-
Constructor Summary
ConstructorDescriptionTaggedPdfReaderTool
(PdfDocument document) Constructs aTaggedPdfReaderTool
via a givenPdfDocument
. -
Method Summary
Modifier and TypeMethodDescriptionvoid
Converts the current tag structure into an XML file with default encoding (UTF-8).void
convertToXml
(OutputStream os, String charset) Converts the current tag structure into an XML file with provided encoding.protected static String
NOTE: copied from itext5 XMLUtils class Escapes a string with the appropriated XML codes.protected static String
fixTagName
(String tag) Fixes specified tag name to be valid XML tag.protected void
Inspects attributes dictionary of the StructTreeRoot child.protected void
inspectKid
(IStructureNode kid) Inspect the child of the StructTreeRoot.protected void
inspectKids
(List<IStructureNode> kids) Inspect the children of the StructTreeRoot.static boolean
isValidCharacterValue
(int c) Checks if a character value should be escaped/unescaped.protected void
Parses tag of the Marked Content Reference (MCR) kid of the StructTreeRoot.setRootTag
(String rootTagName) Sets the name of the root tag of the resultant XML file
-
Field Details
-
document
-
out
-
rootTag
-
parsedTags
-
-
Constructor Details
-
TaggedPdfReaderTool
Constructs aTaggedPdfReaderTool
via a givenPdfDocument
.- Parameters:
-
document
- the document to read tag structure from
-
-
Method Details
-
isValidCharacterValue
public static boolean isValidCharacterValue(int c) Checks if a character value should be escaped/unescaped.- Parameters:
-
c
- a character value - Returns:
- true if it's OK to escape or unescape this value.
-
convertToXml
Converts the current tag structure into an XML file with default encoding (UTF-8).- Parameters:
-
os
- the output stream to save XML file to - Throws:
-
IOException
- in case of any I/O error
-
convertToXml
Converts the current tag structure into an XML file with provided encoding.- Parameters:
-
os
- the output stream to save XML file to -
charset
- the charset of the resultant XML file - Throws:
-
IOException
- in case of any I/O error
-
setRootTag
Sets the name of the root tag of the resultant XML file- Parameters:
-
rootTagName
- the name of the root tag - Returns:
- this object
-
inspectKids
Inspect the children of the StructTreeRoot.- Parameters:
-
kids
- list of the direct kids of the StructTreeRoot
-
inspectKid
Inspect the child of the StructTreeRoot.- Parameters:
-
kid
- the direct kid of the StructTreeRoot
-
inspectAttributes
Inspects attributes dictionary of the StructTreeRoot child.- Parameters:
-
kid
- the direct kid of the StructTreeRoot
-
parseTag
Parses tag of the Marked Content Reference (MCR) kid of the StructTreeRoot.- Parameters:
-
kid
- the directPdfMcr
kid of the StructTreeRoot
-
fixTagName
Fixes specified tag name to be valid XML tag.- Parameters:
-
tag
- tag name to fix - Returns:
- fixed tag name.
-
escapeXML
NOTE: copied from itext5 XMLUtils class Escapes a string with the appropriated XML codes.- Parameters:
-
s
- the string to be escaped -
onlyASCII
- codes above 127 will always be escaped with nn; iftrue
- Returns:
- the escaped string
-