Interface IOcrEngine

All Known Implementing Classes:
AbstractTesseract4OcrEngine, Tesseract4ExecutableOcrEngine, Tesseract4LibOcrEngine

public interface IOcrEngine
IOcrEngine interface is used for instantiating new OcrReader objects. IOcrEngine interface provides possibility to perform OCR, to read data from input files and to return the contained text in the required format.
  • Method Details

    • doImageOcr

      Map<Integer,List<TextInfo>> doImageOcr (File input)
      Reads data from the provided input image file and returns retrieved data in the format described below.
      Parameters:
      input - input image File
      Returns:
      Map where key is Integer representing the number of the page and value is List of TextInfo elements where each TextInfo element contains a word or a line and its 4 coordinates(bbox)
    • doImageOcr

      Map<Integer,List<TextInfo>> doImageOcr (File input, OcrProcessContext ocrProcessContext)
      Reads data from the provided input image file and returns retrieved data in the format described below.
      Parameters:
      input - input image File
      ocrProcessContext - ocr processing context
      Returns:
      Map where key is Integer representing the number of the page and value is List of TextInfo elements where each TextInfo element contains a word or a line and its 4 coordinates(bbox)
    • createTxtFile

      void createTxtFile (List<File> inputImages, File txtFile)
      Performs OCR using provided IOcrEngine for the given list of input images and saves output to a text file using provided path. Note that a human reading order is not guaranteed due to possible specifics of input images (multi column layout, tables etc)
      Parameters:
      inputImages - List of images to be OCRed
      txtFile - file to be created
    • createTxtFile

      void createTxtFile (List<File> inputImages, File txtFile, OcrProcessContext ocrProcessContext)
      Performs OCR using provided IOcrEngine for the given list of input images and saves output to a text file using provided path. Note that a human reading order is not guaranteed due to possible specifics of input images (multi column layout, tables etc)
      Parameters:
      inputImages - List of images to be OCRed
      txtFile - file to be created
      ocrProcessContext - ocr processing context