Class PdfTextExtractor

java.lang.Object
com.itextpdf.kernel.pdf.canvas.parser.PdfTextExtractor

public final class PdfTextExtractor extends Object
  • Method Details

    • getTextFromPage

      public static String getTextFromPage (PdfPage page, ITextExtractionStrategy strategy, Map<String,IContentOperator> additionalContentOperators)
      Extract text from a specified page using an extraction strategy. Also allows registration of custom IContentOperators that can influence how (and whether or not) the PDF instructions will be parsed. Extraction strategy must be passed as a new object for every single page.
      Parameters:
      page - the page for the text to be extracted from
      strategy - the strategy to use for extracting text
      additionalContentOperators - an optional map of custom IContentOperators for rendering instructions
      Returns:
      the extracted text
    • getTextFromPage

      public static String getTextFromPage (PdfPage page, ITextExtractionStrategy strategy)
      Extract text from a specified page using an extraction strategy. Extraction strategy must be passed as a new object for every single page.
      Parameters:
      page - the page for the text to be extracted from
      strategy - the strategy to use for extracting text
      Returns:
      the extracted text
    • getTextFromPage

      public static String getTextFromPage (PdfPage page)
      Extract text from a specified page using the default strategy. Node: the default strategy is subject to change. If using a specific strategy is important, please use getTextFromPage(PdfPage, ITextExtractionStrategy).
      Parameters:
      page - the page for the text to be extracted from
      Returns:
      the extracted text