Class PdfTextExtractor
java.lang.Object
com.itextpdf.kernel.pdf.canvas.parser.PdfTextExtractor
-
Method Summary
Modifier and TypeMethodDescriptionstatic String
getTextFromPage
(PdfPage page) Extract text from a specified page using the default strategy.static String
getTextFromPage
(PdfPage page, ITextExtractionStrategy strategy) Extract text from a specified page using an extraction strategy.static String
getTextFromPage
(PdfPage page, ITextExtractionStrategy strategy, Map<String, IContentOperator> additionalContentOperators) Extract text from a specified page using an extraction strategy.
-
Method Details
-
getTextFromPage
public static String getTextFromPage(PdfPage page, ITextExtractionStrategy strategy, Map<String, IContentOperator> additionalContentOperators) Extract text from a specified page using an extraction strategy. Also allows registration of custom IContentOperators that can influence how (and whether or not) the PDF instructions will be parsed. Extraction strategy must be passed as a new object for every single page.- Parameters:
-
page
- the page for the text to be extracted from -
strategy
- the strategy to use for extracting text -
additionalContentOperators
- an optional map of customIContentOperator
s for rendering instructions - Returns:
- the extracted text
-
getTextFromPage
Extract text from a specified page using an extraction strategy. Extraction strategy must be passed as a new object for every single page.- Parameters:
-
page
- the page for the text to be extracted from -
strategy
- the strategy to use for extracting text - Returns:
- the extracted text
-
getTextFromPage
Extract text from a specified page using the default strategy. Node: the default strategy is subject to change. If using a specific strategy is important, please usegetTextFromPage(PdfPage, ITextExtractionStrategy)
.- Parameters:
-
page
- the page for the text to be extracted from - Returns:
- the extracted text
-