pdfOCR 1.0.1 API
iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties Class Reference

Properties that will be used by the iText.Pdfocr.IOcrEngine. More...

Inheritance diagram for iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties:
iText.Pdfocr.OcrEngineProperties

Public Member Functions

 Tesseract4OcrEngineProperties ()
 Creates a new Tesseract4OcrEngineProperties instance. More...
 
 Tesseract4OcrEngineProperties (iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties other)
 Creates a new Tesseract4OcrEngineProperties instance based on another Tesseract4OcrEngineProperties instance (copy constructor). More...
 
String GetDefaultLanguage ()
 Gets default language for ocr. More...
 
String GetDefaultUserWordsSuffix ()
 Gets default user words suffix. More...
 
FileInfo GetPathToTessData ()
 Gets path to directory with tess data. More...
 
iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties SetPathToTessData (FileInfo tessData)
 Sets path to directory with tess data. More...
 
int? GetPageSegMode ()
 Gets Page Segmentation Mode. More...
 
iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties SetPageSegMode (int? mode)
 Sets Page Segmentation Mode. More...
 
bool IsPreprocessingImages ()
 Checks whether image preprocessing is needed. More...
 
iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties SetPreprocessingImages (bool preprocess)
 Sets true if image preprocessing is needed. More...
 
TextPositioning GetTextPositioning ()
 Defines the way text is retrieved from tesseract output using TextPositioning. More...
 
iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties SetTextPositioning (TextPositioning positioning)
 Defines the way text is retrieved from tesseract output using TextPositioning. More...
 
- Public Member Functions inherited from iText.Pdfocr.OcrEngineProperties
 OcrEngineProperties ()
 Creates a new OcrEngineProperties instance. More...
 
 OcrEngineProperties (iText.Pdfocr.OcrEngineProperties other)
 Creates a new OcrEngineProperties instance based on another OcrEngineProperties instance (copy constructor). More...
 
IList< String > GetLanguages ()
 Gets list of languages required for provided images. More...
 
iText.Pdfocr.OcrEngineProperties SetLanguages (IList< String > requiredLanguages)
 Sets list of languages to be recognized in provided images. More...
 

Detailed Description

Properties that will be used by the iText.Pdfocr.IOcrEngine.

Constructor & Destructor Documentation

◆ Tesseract4OcrEngineProperties() [1/2]

iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties.Tesseract4OcrEngineProperties ( )
inline

Creates a new Tesseract4OcrEngineProperties instance.

◆ Tesseract4OcrEngineProperties() [2/2]

iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties.Tesseract4OcrEngineProperties ( iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties  other)
inline

Creates a new Tesseract4OcrEngineProperties instance based on another Tesseract4OcrEngineProperties instance (copy constructor).

Parameters
otherthe other Tesseract4OcrEngineProperties instance

Member Function Documentation

◆ GetDefaultLanguage()

String iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties.GetDefaultLanguage ( )
inline

Gets default language for ocr.

Returns
default language - "eng"

◆ GetDefaultUserWordsSuffix()

String iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties.GetDefaultUserWordsSuffix ( )
inline

Gets default user words suffix.

Returns
default suffix for user words files

◆ GetPageSegMode()

int? iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties.GetPageSegMode ( )
inline

Gets Page Segmentation Mode.

Returns
psm mode as int?

◆ GetPathToTessData()

FileInfo iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties.GetPathToTessData ( )
inline

Gets path to directory with tess data.

Returns
path to directory with tess data

◆ GetTextPositioning()

TextPositioning iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties.GetTextPositioning ( )
inline

Defines the way text is retrieved from tesseract output using TextPositioning.

Returns
the way text is retrieved

◆ IsPreprocessingImages()

bool iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties.IsPreprocessingImages ( )
inline

Checks whether image preprocessing is needed.

Returns
true if images need to be preprocessed, otherwise - false

◆ SetPageSegMode()

iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties.SetPageSegMode ( int?  mode)
inline

Sets Page Segmentation Mode.

Sets Page Segmentation Mode. More detailed explanation about psm modes could be found here https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#options Note that in documentation it is stated that default value of PSM is 3. This is true for tesseract executable, but for tesseract lib it is -1 which has negative impact on some documents. That's why in the code we set it explicitly to 3.

Parameters
modepsm mode as int?
Returns
the Tesseract4OcrEngineProperties instance

◆ SetPathToTessData()

iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties.SetPathToTessData ( FileInfo  tessData)
inline

Sets path to directory with tess data.

Parameters
tessDatapath to train directory as System.IO.FileInfo
Returns
the Tesseract4OcrEngineProperties instance

◆ SetPreprocessingImages()

iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties.SetPreprocessingImages ( bool  preprocess)
inline

Sets true if image preprocessing is needed.

Parameters
preprocesstrue if images need to be preprocessed, otherwise - false
Returns
the Tesseract4OcrEngineProperties instance

◆ SetTextPositioning()

iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties.SetTextPositioning ( TextPositioning  positioning)
inline

Defines the way text is retrieved from tesseract output using TextPositioning.

Parameters
positioningthe way text is retrieved
Returns
the Tesseract4OcrEngineProperties instance