Class OnnxTrOcrEngine

java.lang.Object
com.itextpdf.pdfocr.onnxtr.OnnxTrOcrEngine
All Implemented Interfaces:
IOcrEngine, IProductAware, AutoCloseable

public class OnnxTrOcrEngine extends Object implements IOcrEngine, AutoCloseable, IProductAware
IOcrEngine implementation, based on OnnxTR/DocTR machine learning OCR projects.

NOTE: OnnxTrOcrEngine instance shall be closed after all usages to avoid native allocations leak.

  • Constructor Details

    • OnnxTrOcrEngine

      public OnnxTrOcrEngine (IDetectionPredictor detectionPredictor, IOrientationPredictor orientationPredictor, IRecognitionPredictor recognitionPredictor)
      Create a new OCR engine with the provided predictors.
      Parameters:
      detectionPredictor - text detector. For an input image it outputs a list of text boxes
      orientationPredictor - text orientation predictor. For an input image, which is a tight crop of text, it outputs its orientation in 90 degrees steps. Can be null, in that case all text is assumed to be upright
      recognitionPredictor - text recognizer. For an input image, which is a tight crop of text, it outputs the displayed string
    • OnnxTrOcrEngine

      public OnnxTrOcrEngine (IDetectionPredictor detectionPredictor, IOrientationPredictor orientationPredictor, IRecognitionPredictor recognitionPredictor, OnnxTrEngineProperties properties)
      Create a new OCR engine with the provided predictors.
      Parameters:
      detectionPredictor - text detector. For an input image it outputs a list of text boxes
      orientationPredictor - text orientation predictor. For an input image, which is a tight crop of text, it outputs its orientation in 90 degrees steps. Can be null, in that case all text is assumed to be upright
      recognitionPredictor - text recognizer. For an input image, which is a tight crop of text, it outputs the displayed string
      properties - set of properties
    • OnnxTrOcrEngine

      public OnnxTrOcrEngine (IDetectionPredictor detectionPredictor, IRecognitionPredictor recognitionPredictor)
      Create a new OCR engine with the provided predictors, without text orientation prediction.
      Parameters:
      detectionPredictor - text detector. For an input image it outputs a list of text boxes
      recognitionPredictor - text recognizer. For an input image, which is a tight crop of text, it outputs the displayed string
  • Method Details

    • close

      public void close() throws Exception
      Specified by:
      close in interface AutoCloseable
      Throws:
      Exception
    • doImageOcr

      public Map<Integer,List<TextInfo>> doImageOcr (File input)
      Reads data from the provided input image file and returns retrieved data in the format described below.
      Specified by:
      doImageOcr in interface IOcrEngine
      Parameters:
      input - input image File
      Returns:
      Map where key is Integer representing the number of the page and value is List of TextInfo elements where each TextInfo element contains a word or a line and its 4 coordinates(bbox)
    • doImageOcr

      public Map<Integer,List<TextInfo>> doImageOcr (File input, OcrProcessContext ocrProcessContext)
      Reads data from the provided input image file and returns retrieved data in the format described below.
      Specified by:
      doImageOcr in interface IOcrEngine
      Parameters:
      input - input image File
      ocrProcessContext - ocr processing context
      Returns:
      Map where key is Integer representing the number of the page and value is List of TextInfo elements where each TextInfo element contains a word or a line and its 4 coordinates(bbox)
    • createTxtFile

      public void createTxtFile (List<File> inputImages, File txtFile)
      Performs OCR using provided IOcrEngine for the given list of input images and saves output to a text file using provided path. Note that a human reading order is not guaranteed due to possible specifics of input images (multi column layout, tables etc)
      Specified by:
      createTxtFile in interface IOcrEngine
      Parameters:
      inputImages - List of images to be OCRed
      txtFile - file to be created
    • createTxtFile

      public void createTxtFile (List<File> inputImages, File txtFile, OcrProcessContext ocrProcessContext)
      Performs OCR using provided IOcrEngine for the given list of input images and saves output to a text file using provided path. Note that a human reading order is not guaranteed due to possible specifics of input images (multi column layout, tables etc)
      Specified by:
      createTxtFile in interface IOcrEngine
      Parameters:
      inputImages - List of images to be OCRed
      txtFile - file to be created
      ocrProcessContext - ocr processing context
    • isTaggingSupported

      public boolean isTaggingSupported()
      Checks whether tagging is supported by the OCR engine.
      Specified by:
      isTaggingSupported in interface IOcrEngine
      Returns:
      true if tagging is supported by the engine, false otherwise
    • getMetaInfoContainer

      public PdfOcrMetaInfoContainer getMetaInfoContainer()
      Gets the container with meta info.
      Specified by:
      getMetaInfoContainer in interface IProductAware
      Returns:
      the held meta info container
    • getProductData

      public com.itextpdf.commons.actions.data.ProductData getProductData()
      Gets object containing information about the product.
      Specified by:
      getProductData in interface IProductAware
      Returns:
      product data