pdfOCR 3.0.2 API
iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine Class Referenceabstract

The implementation of iText.Pdfocr.IOcrEngine. More...

Inheritance diagram for iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine:
iText.Pdfocr.IOcrEngine iText.Pdfocr.IProductAware iText.Pdfocr.Tesseract4.Tesseract4ExecutableOcrEngine iText.Pdfocr.Tesseract4.Tesseract4LibOcrEngine

Classes

interface   ITesseractOcrResult
 
class   StringTesseractOcrResult
 
class   TextInfoTesseractOcrResult
 

Public Member Functions

  AbstractTesseract4OcrEngine (Tesseract4OcrEngineProperties tesseract4OcrEngineProperties)
  Creates a new Tesseract4OcrEngineProperties instance based on another Tesseract4OcrEngineProperties instance (copy constructor). More...
 
virtual void  DoTesseractOcr (FileInfo inputImage, FileInfo outputFile, OutputFormat outputFormat)
  Performs tesseract OCR for the first (or for the only) image page. More...
 
virtual void  DoTesseractOcr (FileInfo inputImage, FileInfo outputFile, OutputFormat outputFormat, OcrProcessContext ocrProcessContext)
  Performs tesseract OCR for the first (or for the only) image page. More...
 
virtual void  CreateTxtFile (IList< FileInfo > inputImages, FileInfo txtFile)
  Performs OCR using provided iText.Pdfocr.IOcrEngine for the given list of input images and saves output to a text file using provided path. More...
 
virtual void  CreateTxtFile (IList< FileInfo > inputImages, FileInfo txtFile, OcrProcessContext ocrProcessContext)
  Performs OCR using provided iText.Pdfocr.IOcrEngine for the given list of input images and saves output to a text file using provided path. More...
 
Tesseract4OcrEngineProperties  GetTesseract4OcrEngineProperties ()
  Gets properties for AbstractTesseract4OcrEngine. More...
 
void  SetTesseract4OcrEngineProperties (Tesseract4OcrEngineProperties tesseract4OcrEngineProperties)
  Sets properties for AbstractTesseract4OcrEngine. More...
 
String  GetLanguagesAsString ()
  Gets list of languages concatenated with "+" symbol to a string in format required by tesseract. More...
 
IDictionary< int, IList< TextInfo > >  DoImageOcr (FileInfo input)
  Reads data from the provided input image file and returns retrieved data in the format described below. More...
 
IDictionary< int, IList< TextInfo > >  DoImageOcr (FileInfo input, OcrProcessContext ocrProcessContext)
  Reads data from the provided input image file and returns retrieved data in the format described below. More...
 
String  DoImageOcr (FileInfo input, OutputFormat outputFormat, OcrProcessContext ocrProcessContext)
  Reads data from the provided input image file and returns retrieved data as string. More...
 
String  DoImageOcr (FileInfo input, OutputFormat outputFormat)
  Reads data from the provided input image file and returns retrieved data as string. More...
 
virtual bool  IsWindows ()
  Checks current os type. More...
 
virtual String  IdentifyOsType ()
  Identifies type of current OS and return it (win, linux). More...
 
virtual void  ValidateLanguages (IList< String > languagesList)
  Validates list of provided languages and checks if they all exist in given tess data directory. More...
 
virtual PdfOcrMetaInfoContainer  GetMetaInfoContainer ()
  Gets the container with meta info. More...
 
virtual ProductData  GetProductData ()
  Gets object containing information about the product. More...
 

Detailed Description

The implementation of iText.Pdfocr.IOcrEngine.

The implementation of iText.Pdfocr.IOcrEngine. This class provides possibilities to perform OCR, to read data from input files and to return contained text in the required format. Also there are possibilities to use features of "tesseract" (optical character recognition engine for various operating systems).

Constructor & Destructor Documentation

◆ AbstractTesseract4OcrEngine()

iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine.AbstractTesseract4OcrEngine ( Tesseract4OcrEngineProperties  tesseract4OcrEngineProperties )
inline

Creates a new Tesseract4OcrEngineProperties instance based on another Tesseract4OcrEngineProperties instance (copy constructor).

Parameters
tesseract4OcrEngineProperties the other Tesseract4OcrEngineProperties instance

Member Function Documentation

◆ CreateTxtFile() [1/2]

virtual void iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine.CreateTxtFile ( IList< FileInfo >  inputImages,
FileInfo  txtFile 
)
inlinevirtual

Performs OCR using provided iText.Pdfocr.IOcrEngine for the given list of input images and saves output to a text file using provided path.

Parameters
inputImages

System.Collections.IList of images to be OCRed

Parameters
txtFile file to be created

Implements iText.Pdfocr.IOcrEngine.

◆ CreateTxtFile() [2/2]

virtual void iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine.CreateTxtFile ( IList< FileInfo >  inputImages,
FileInfo  txtFile,
OcrProcessContext  ocrProcessContext 
)
inlinevirtual

Performs OCR using provided iText.Pdfocr.IOcrEngine for the given list of input images and saves output to a text file using provided path.

Parameters
inputImages

System.Collections.IList of images to be OCRed

Parameters
txtFile file to be created
ocrProcessContext ocr process context

Implements iText.Pdfocr.IOcrEngine.

◆ DoImageOcr() [1/4]

IDictionaryTextInfo> > iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine.DoImageOcr ( FileInfo  input )
inline

Reads data from the provided input image file and returns retrieved data in the format described below.

Parameters
input input image System.IO.FileInfo
Returns

System.Collections.IDictionary where key is int? representing the number of the page and value is System.Collections.IList of iText.Pdfocr.TextInfo elements where each iText.Pdfocr.TextInfo element contains a word or a line and its 4 coordinates(bbox)

Implements iText.Pdfocr.IOcrEngine.

◆ DoImageOcr() [2/4]

IDictionaryTextInfo> > iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine.DoImageOcr ( FileInfo  input,
OcrProcessContext  ocrProcessContext 
)
inline

Reads data from the provided input image file and returns retrieved data in the format described below.

Parameters
input input image System.IO.FileInfo
ocrProcessContext ocr process context
Returns

System.Collections.IDictionary where key is int? representing the number of the page and value is System.Collections.IList of iText.Pdfocr.TextInfo elements where each iText.Pdfocr.TextInfo element contains a word or a line and its 4 coordinates(bbox)

Implements iText.Pdfocr.IOcrEngine.

◆ DoImageOcr() [3/4]

String iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine.DoImageOcr ( FileInfo  input,
OutputFormat  outputFormat 
)
inline

Reads data from the provided input image file and returns retrieved data as string.

Parameters
input input image System.IO.FileInfo
outputFormat return OutputFormat result
Returns
OCR result as a System.String that is returned after processing the given image

◆ DoImageOcr() [4/4]

String iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine.DoImageOcr ( FileInfo  input,
OutputFormat  outputFormat,
OcrProcessContext  ocrProcessContext 
)
inline

Reads data from the provided input image file and returns retrieved data as string.

Parameters
input input image System.IO.FileInfo
outputFormat return OutputFormat result
ocrProcessContext ocr process context
Returns
OCR result as a System.String that is returned after processing the given image

◆ DoTesseractOcr() [1/2]

virtual void iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine.DoTesseractOcr ( FileInfo  inputImage,
FileInfo  outputFile,
OutputFormat  outputFormat 
)
inlinevirtual

Performs tesseract OCR for the first (or for the only) image page.

Parameters
inputImage input image System.IO.FileInfo
outputFile output file for the result for the first page
outputFormat selected OutputFormat for tesseract

◆ DoTesseractOcr() [2/2]

virtual void iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine.DoTesseractOcr ( FileInfo  inputImage,
FileInfo  outputFile,
OutputFormat  outputFormat,
OcrProcessContext  ocrProcessContext 
)
inlinevirtual

Performs tesseract OCR for the first (or for the only) image page.

Parameters
inputImage input image System.IO.FileInfo
outputFile output file for the result for the first page
outputFormat selected OutputFormat for tesseract
ocrProcessContext ocr process context

◆ GetLanguagesAsString()

String iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine.GetLanguagesAsString ( )
inline

Gets list of languages concatenated with "+" symbol to a string in format required by tesseract.

Returns

System.String of concatenated languages

◆ GetMetaInfoContainer()

virtual PdfOcrMetaInfoContainer iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine.GetMetaInfoContainer ( )
inlinevirtual

Gets the container with meta info.

Implements iText.Pdfocr.IProductAware.

◆ GetProductData()

virtual ProductData iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine.GetProductData ( )
inlinevirtual

Gets object containing information about the product.

Returns
product data

Implements iText.Pdfocr.IProductAware.

◆ GetTesseract4OcrEngineProperties()

Tesseract4OcrEngineProperties iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine.GetTesseract4OcrEngineProperties ( )
inline

Gets properties for AbstractTesseract4OcrEngine.

Returns
set properties Tesseract4OcrEngineProperties

◆ IdentifyOsType()

virtual String iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine.IdentifyOsType ( )
inlinevirtual

Identifies type of current OS and return it (win, linux).

Returns
type of current os as System.String

◆ IsWindows()

virtual bool iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine.IsWindows ( )
inlinevirtual

Checks current os type.

Returns
boolean true is current os is windows, otherwise - false

◆ SetTesseract4OcrEngineProperties()

void iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine.SetTesseract4OcrEngineProperties ( Tesseract4OcrEngineProperties  tesseract4OcrEngineProperties )
inline

Sets properties for AbstractTesseract4OcrEngine.

Parameters
tesseract4OcrEngineProperties set of properties Tesseract4OcrEngineProperties for AbstractTesseract4OcrEngine

◆ ValidateLanguages()

virtual void iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine.ValidateLanguages ( IList< String >  languagesList )
inlinevirtual

Validates list of provided languages and checks if they all exist in given tess data directory.

Parameters
languagesList

System.Collections.IList of provided languages