Class EndOfStringPostProcessor

java.lang.Object
com.itextpdf.pdfocr.onnxtr.recognition.EndOfStringPostProcessor
All Implemented Interfaces:
IRecognitionPostProcessor

public class EndOfStringPostProcessor extends Object implements IRecognitionPostProcessor
Implementation of a text recognition predictor post-processor, used for OnnxTR non-CRNN model outputs.

This assumes there is an end-of-string token just after the vocabulary. You can specify additional tokens afterward, but they are not used in the processing. No same character aggregation is done. Output is read till an end-of-string token in encountered.

  • Constructor Details

    • EndOfStringPostProcessor

      public EndOfStringPostProcessor (Vocabulary vocabulary, int additionalTokens)
      Creates a new post-processor.
      Parameters:
      vocabulary - vocabulary used for the model output (without special tokens)
      additionalTokens - amount of additional tokens in the total vocabulary after the end-of-string token
    • EndOfStringPostProcessor

      public EndOfStringPostProcessor (Vocabulary vocabulary)
      Creates a new post-processor without any additional tokens.
      Parameters:
      vocabulary - vocabulary used for the model output (without special tokens)
    • EndOfStringPostProcessor

      public EndOfStringPostProcessor()
      Creates a new post-processor with the default vocabulary.
  • Method Details

    • process

      public String process (FloatBufferMdArray output)
      Process ML model output and return recognized characters as string.
      Specified by:
      process in interface IRecognitionPostProcessor
      Parameters:
      output - raw output of the ML model
      Returns:
      recognized characters as string
    • labelDimension

      public int labelDimension()
      Returns the size of the output character label vector. I.e. how many distinct tokens/characters the model recognizes.
      Specified by:
      labelDimension in interface IRecognitionPostProcessor
      Returns:
      the size of the output character label vector