pdfOCR 3.0.2 API
iText.Pdfocr.Tesseract4.Tesseract4LibOcrEngine Class Reference

The implementation of AbstractTesseract4OcrEngine for tesseract OCR. More...

Inheritance diagram for iText.Pdfocr.Tesseract4.Tesseract4LibOcrEngine:
iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine iText.Pdfocr.IOcrEngine iText.Pdfocr.IProductAware

Public Member Functions

  Tesseract4LibOcrEngine (Tesseract4OcrEngineProperties tesseract4OcrEngineProperties)
  Creates a new Tesseract4LibOcrEngine instance. More...
 
virtual TesseractEngine  GetTesseractInstance ()
  Gets tesseract instance. More...
 
virtual void  InitializeTesseract (OutputFormat outputFormat)
  Initializes instance of tesseract if it haven't been already initialized or it have been disposed and sets all the required properties. More...
 
- Public Member Functions inherited from iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine
  AbstractTesseract4OcrEngine (Tesseract4OcrEngineProperties tesseract4OcrEngineProperties)
  Creates a new Tesseract4OcrEngineProperties instance based on another Tesseract4OcrEngineProperties instance (copy constructor). More...
 
virtual void  DoTesseractOcr (FileInfo inputImage, FileInfo outputFile, OutputFormat outputFormat)
  Performs tesseract OCR for the first (or for the only) image page. More...
 
virtual void  DoTesseractOcr (FileInfo inputImage, FileInfo outputFile, OutputFormat outputFormat, OcrProcessContext ocrProcessContext)
  Performs tesseract OCR for the first (or for the only) image page. More...
 
virtual void  CreateTxtFile (IList< FileInfo > inputImages, FileInfo txtFile)
  Performs OCR using provided iText.Pdfocr.IOcrEngine for the given list of input images and saves output to a text file using provided path. More...
 
virtual void  CreateTxtFile (IList< FileInfo > inputImages, FileInfo txtFile, OcrProcessContext ocrProcessContext)
  Performs OCR using provided iText.Pdfocr.IOcrEngine for the given list of input images and saves output to a text file using provided path. More...
 
Tesseract4OcrEngineProperties  GetTesseract4OcrEngineProperties ()
  Gets properties for AbstractTesseract4OcrEngine. More...
 
void  SetTesseract4OcrEngineProperties (Tesseract4OcrEngineProperties tesseract4OcrEngineProperties)
  Sets properties for AbstractTesseract4OcrEngine. More...
 
String  GetLanguagesAsString ()
  Gets list of languages concatenated with "+" symbol to a string in format required by tesseract. More...
 
IDictionary< int, IList< TextInfo > >  DoImageOcr (FileInfo input)
  Reads data from the provided input image file and returns retrieved data in the format described below. More...
 
IDictionary< int, IList< TextInfo > >  DoImageOcr (FileInfo input, OcrProcessContext ocrProcessContext)
  Reads data from the provided input image file and returns retrieved data in the format described below. More...
 
String  DoImageOcr (FileInfo input, OutputFormat outputFormat, OcrProcessContext ocrProcessContext)
  Reads data from the provided input image file and returns retrieved data as string. More...
 
String  DoImageOcr (FileInfo input, OutputFormat outputFormat)
  Reads data from the provided input image file and returns retrieved data as string. More...
 
virtual bool  IsWindows ()
  Checks current os type. More...
 
virtual String  IdentifyOsType ()
  Identifies type of current OS and return it (win, linux). More...
 
virtual void  ValidateLanguages (IList< String > languagesList)
  Validates list of provided languages and checks if they all exist in given tess data directory. More...
 
virtual PdfOcrMetaInfoContainer  GetMetaInfoContainer ()
  Gets the container with meta info. More...
 
virtual ProductData  GetProductData ()
  Gets object containing information about the product. More...
 

Detailed Description

The implementation of AbstractTesseract4OcrEngine for tesseract OCR.

The implementation of AbstractTesseract4OcrEngine for tesseract OCR. This class provides possibilities to use features of "tesseract" using tess4j. Please note that this class is not thread-safe, in other words this Tesseract engine cannot be used for multithreaded processing. You should create one instance per thread

Constructor & Destructor Documentation

◆ Tesseract4LibOcrEngine()

iText.Pdfocr.Tesseract4.Tesseract4LibOcrEngine.Tesseract4LibOcrEngine ( Tesseract4OcrEngineProperties  tesseract4OcrEngineProperties )
inline

Creates a new Tesseract4LibOcrEngine instance.

Parameters
tesseract4OcrEngineProperties set of properteis

Member Function Documentation

◆ GetTesseractInstance()

virtual TesseractEngine iText.Pdfocr.Tesseract4.Tesseract4LibOcrEngine.GetTesseractInstance ( )
inlinevirtual

Gets tesseract instance.

Returns
initialized Tesseract.TesseractEngine instance

◆ InitializeTesseract()

virtual void iText.Pdfocr.Tesseract4.Tesseract4LibOcrEngine.InitializeTesseract ( OutputFormat  outputFormat )
inlinevirtual

Initializes instance of tesseract if it haven't been already initialized or it have been disposed and sets all the required properties.

Parameters
outputFormat selected OutputFormat for tesseract