Class PdfOcrTextBuilder

java.lang.Object
com.itextpdf.pdfocr.util.PdfOcrTextBuilder

public final class PdfOcrTextBuilder extends Object
Class to build text output from the provided image OCR result and write it to the TXT file.
  • Method Details

    • buildText

      public static String buildText (Map<Integer,List<TextInfo>> textInfos)
      Constructs string output from the provided IOcrEngine.doImageOcr(java.io.File) result.
      Parameters:
      textInfos - Map where key is Integer representing the number of the page and value is List of TextInfo elements where each TextInfo element contains a word or a line and its 4 coordinates (bbox)
      Returns:
      string output of the OCR result
    • generifyWordBBoxesByLine

      public static void generifyWordBBoxesByLine (Map<Integer,List<TextInfo>> textInfos)
      Sorts the provided IOcrEngine.doImageOcr(java.io.File) result by lines and updates line bboxes to match the largest words.
      Parameters:
      textInfos - Map where key is Integer representing the number of the page and value is List of TextInfo elements where each TextInfo element contains a word or a line and its 4 coordinates (bbox)
    • collectWordsIntoLines

      public static void collectWordsIntoLines (Map<Integer,List<TextInfo>> textInfos)
      Merges the provided IOcrEngine.doImageOcr(java.io.File) result into lines and updates line bounding boxes to match the largest words.
      Parameters:
      textInfos - Map where key is Integer representing the number of the page and value is List of TextInfo elements where each TextInfo element contains a word or a line and its 4 coordinates (bbox)
    • sortTextInfosByLines

      public static void sortTextInfosByLines (Map<Integer,List<TextInfo>> textInfos)
      Sorts the provided IOcrEngine.doImageOcr(java.io.File) result by lines.
      Parameters:
      textInfos - Map where key is Integer representing the number of the page and value is List of TextInfo elements where each TextInfo element contains a word or a line and its 4 coordinates (bbox)
    • correctRotationAngle

      public static Map<Integer,List<TextInfo>> correctRotationAngle (Map<Integer,List<TextInfo>> result)
      Processes all text infos to round the rotation angle to either 0, 90, 180 or 270 degrees. Text bounding rectangle will be used for updated text bounding points.
      Parameters:
      result - OCR result to process
      Returns:
      same result, but corrected