Class CtcLabelPostProcessor

java.lang.Object
com.itextpdf.pdfocr.onnx.recognition.BasicLabelPostProcessor
com.itextpdf.pdfocr.onnx.recognition.CtcLabelPostProcessor
All Implemented Interfaces:
IRecognitionPostProcessor

public class CtcLabelPostProcessor extends BasicLabelPostProcessor
Implementation of a text recognition predictor post-processor, used for EasyOCR and PaddleOCR model outputs.

It has a single blank token, which is the first one just before the vocabulary. Multiple of the same label in a row is aggregated into one.

  • Constructor Details

    • CtcLabelPostProcessor

      public CtcLabelPostProcessor (IOutputLabelMapper<String> labelMapper)
      Creates a new post-processor.
      Parameters:
      labelMapper - label mapper used for the model output (without special tokens)
  • Method Details

    • appendLabel

      protected void appendLabel (StringBuilder output, int labelIndex)
      Adds label to the string output, based on the label's index. Can be a noop, if label index should be ignored.
      Specified by:
      appendLabel in class BasicLabelPostProcessor
      Parameters:
      output - string builder to append the label to
      labelIndex - index of the label to append, guaranteed to be in the [0; labelDimension()) range.
    • labelDimension

      public int labelDimension()
      Returns the size of the output character label vector. I.e. how many distinct tokens/characters the model recognizes.
      Returns:
      the size of the output character label vector