pdfOCR 3.0.2 API
iText.Pdfocr.Tesseract4 Namespace Reference

Classes

class   AbstractTesseract4OcrEngine
  The implementation of iText.Pdfocr.IOcrEngine. More...
 
class   ImagePreprocessingOptions
  Additional options applied on image preprocessing step. More...
 
class   ImagePreprocessingUtil
  Utilities class to work with images.
 
class   LeptonicaImageRotationHandler
  Leptonica based implementation of iText.Pdfocr.IImageRotationHandler. More...
 
class   Tesseract4EventHelper
  Helper class for working with events.
 
class   Tesseract4ExecutableOcrEngine
  The implementation of AbstractTesseract4OcrEngine for tesseract OCR. More...
 
class   Tesseract4FileResultEventHelper
  Helper class for working with events.
 
class   Tesseract4LibOcrEngine
  The implementation of AbstractTesseract4OcrEngine for tesseract OCR. More...
 
class   Tesseract4MetaInfo
 
class   Tesseract4OcrEngineProperties
  Properties that will be used by the iText.Pdfocr.IOcrEngine. More...
 
class   TesseractHelper
  Helper class. More...
 
class   TesseractOcrUtil
  Utilities class to work with tesseract command line tool and image preprocessing using Net.Sourceforge.Lept4j.ILeptonica.
 

Enumerations

enum   OutputFormat { OutputFormat.HOCR, OutputFormat.TXT }
  Enumeration of the available output formats. More...
 
enum   TextPositioning { TextPositioning.BY_LINES, TextPositioning.BY_WORDS, TextPositioning.BY_WORDS_AND_LINES }
  Enumeration of the possible types of text positioning. More...
 

Enumeration Type Documentation

◆ OutputFormat

Enumeration of the available output formats.

Enumeration of the available output formats. It is used when there is possibility in selected Reader to process input file and to return result in the required output format.

Enumerator
HOCR 

Reader will produce XHTML output compliant with the hOCR specification.

Reader will produce XHTML output compliant with the hOCR specification. Output will be parsed and represented as IList of TextInfo objects

TXT 

Reader will produce plain txt file.

◆ TextPositioning

Enumeration of the possible types of text positioning.

Enumeration of the possible types of text positioning. It is used when there is possibility in selected Reader to process the text by lines or by words and to return coordinates for the selected type of item. For tesseract this value makes sense only if selected OutputFormat is OutputFormat.HOCR.

Enumerator
BY_LINES 

Text will be located by lines retrieved from hocr file.

Text will be located by lines retrieved from hocr file. (default value)

BY_WORDS 

Text will be located by words retrieved from hocr file.

BY_WORDS_AND_LINES 

Similar to BY_WORDS mode, but top and bottom of word BBox are inherited from line.