Class BasicDetectionPostProcessor

java.lang.Object
com.itextpdf.pdfocr.onnx.detection.BasicDetectionPostProcessor
All Implemented Interfaces:
IDetectionPostProcessor
Direct Known Subclasses:
EasyOcrDetectionPostProcessor, OnnxDetectionPostProcessor, PaddleOcrDetectionPostProcessor

public abstract class BasicDetectionPostProcessor extends Object implements IDetectionPostProcessor
Implementation of a text detection predictor post-processor, which is used as a basis for creating post-processors for handling OnnxTR, EasyOCR and PaddleOCR model outputs.

Base implementation works somewhat like this:

  1. Model output is binarized to create a predictions mask.
  2. Large-enough contours from the mask in the previous step are found.
  3. Contours with less certainty score are discarded.
  4. Remaining contours are wrapped into boxes with relative [0, 1] coordinates.
  • Constructor Summary

    Constructors
    Modifier
    Constructor
    Description
    protected
    BasicDetectionPostProcessor(float binarizationThreshold, float scoreThreshold, int maxCandidates)
    Creates a new post-processor.
  • Method Summary

    Modifier and Type
    Method
    Description
    protected org.bytedeco.opencv.opencv_core.Mat
    buildTextContourPredictionMask(org.bytedeco.opencv.opencv_core.Mat contour, org.bytedeco.opencv.opencv_core.Rect contourBox)
    Builds and return a mask for calculating prediction score for the provided contour.
    protected double
    calcTextBoxEnlargement(double width, double height)
    Calculates by how much the dimensions of a text box should be enlarged compared to the ones gotten from the model output.
    protected IScoreCalculator
    Creates a new score calculator for calculating score over a text contour.
    protected org.bytedeco.opencv.opencv_core.MatVector
    findTextContours(org.bytedeco.opencv.opencv_core.Mat mask)
    Extracts text contours from the provided 0 - 255 mask.
    protected FloatBufferMdArray
    Returns the array to be used, when building a mask for contour detection.
    protected FloatBufferMdArray
    Returns the preds array from the output buffer.
    protected boolean
    isValidContour(org.bytedeco.opencv.opencv_core.Mat contour, org.bytedeco.opencv.opencv_core.Rect contourBox)
    Returns whether the contour is good enough to be a text box.
    protected float
    mapPredToSample(float pred)
    Calculates the score sample value, based on a prediction value from the buffer.
    Process ML model output for a specified image and return a list of detected objects.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • BasicDetectionPostProcessor

      protected BasicDetectionPostProcessor (float binarizationThreshold, float scoreThreshold, int maxCandidates)
      Creates a new post-processor.
      Parameters:
      binarizationThreshold - threshold value used, when binarizing a monochromatic image. If pixel value is greater or equal to the threshold, it is mapped to 1, otherwise it is mapped to 0
      scoreThreshold - score threshold for a detected box. If score is lower than this value, the box gets discarded
      maxCandidates - maximum amount of text box contours, that will be handled in the post processor
  • Method Details

    • process

      public List process (BufferedImage input, FloatBufferMdArray output)
      Process ML model output for a specified image and return a list of detected objects.
      Specified by:
      process in interface IDetectionPostProcessor
      Parameters:
      input - input image, which was used to produce the inputs to the ML model
      output - output of the ML model
      Returns:
      a list of detected objects. See interface documentation for more information
    • getPredsArray

      protected FloatBufferMdArray getPredsArray (FloatBufferMdArray output)
      Returns the preds array from the output buffer.
      Parameters:
      output - output buffer from the model
      Returns:
      the preds array
    • getMaskSourceArray

      protected FloatBufferMdArray getMaskSourceArray (FloatBufferMdArray output)
      Returns the array to be used, when building a mask for contour detection.
      Parameters:
      output - output buffer from the model
      Returns:
      the array to build the mask from
    • findTextContours

      protected org.bytedeco.opencv.opencv_core.MatVector findTextContours (org.bytedeco.opencv.opencv_core.Mat mask)
      Extracts text contours from the provided 0 - 255 mask.
      Parameters:
      mask - mask to find contours in, can be modified, should not be closed
      Returns:
      found text contours
    • isValidContour

      protected boolean isValidContour (org.bytedeco.opencv.opencv_core.Mat contour, org.bytedeco.opencv.opencv_core.Rect contourBox)
      Returns whether the contour is good enough to be a text box. Called before score calculations.
      Parameters:
      contour - contour to check
      contourBox - bounding box of the contour to check
      Returns:
      whether the contour is good enough to be a text box
    • buildTextContourPredictionMask

      protected org.bytedeco.opencv.opencv_core.Mat buildTextContourPredictionMask (org.bytedeco.opencv.opencv_core.Mat contour, org.bytedeco.opencv.opencv_core.Rect contourBox)
      Builds and return a mask for calculating prediction score for the provided contour.

      Mask should adhere to the following requirements:

      • Mask should have the same dimensions as the contour box.
      • Data type should be CV_8U.
      • Pixels, that should be counted towards the score, should have a non-zero value in the mask.
      Parameters:
      contour - contour to build mask for
      contourBox - bounding box of the contour to build mask for
      Returns:
      the built mask
    • createScoreCalculator

      protected IScoreCalculator createScoreCalculator()
      Creates a new score calculator for calculating score over a text contour.
      Returns:
      a new score calculator
    • mapPredToSample

      protected float mapPredToSample (float pred)
      Calculates the score sample value, based on a prediction value from the buffer.
      Parameters:
      pred - prediction value to map
      Returns:
      mapped score
    • calcTextBoxEnlargement

      protected double calcTextBoxEnlargement (double width, double height)
      Calculates by how much the dimensions of a text box should be enlarged compared to the ones gotten from the model output.
      Parameters:
      width - original width of the text box
      height - original height of the text box
      Returns:
      value to enlarge the dimensions by