pdfOCR 4.0.0 API
|
Properties that will be used by the iText.Pdfocr.IOcrEngine. More...
Public Member Functions |
|
Tesseract4OcrEngineProperties () | |
Creates a new Tesseract4OcrEngineProperties instance. More... |
|
Tesseract4OcrEngineProperties (iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties other) | |
Creates a new Tesseract4OcrEngineProperties instance based on another Tesseract4OcrEngineProperties instance (copy constructor). More... |
|
String | GetDefaultLanguage () |
Gets default language for ocr. More... |
|
String | GetDefaultUserWordsSuffix () |
Gets default user words suffix. More... |
|
FileInfo | GetPathToTessData () |
Gets path to directory with tess data. More... |
|
iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties | SetPathToTessData (FileInfo tessData) |
Sets path to directory with tess data. More... |
|
int? | GetPageSegMode () |
Gets Page Segmentation Mode. More... |
|
iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties | SetPageSegMode (int? mode) |
Sets Page Segmentation Mode. More... |
|
bool | IsPreprocessingImages () |
Checks whether image preprocessing is needed. More... |
|
iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties | SetPreprocessingImages (bool preprocess) |
Sets true if image preprocessing is needed. More... |
|
TextPositioning | GetTextPositioning () |
Defines the way text is retrieved from tesseract output using TextPositioning. More... |
|
iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties | SetTextPositioning (TextPositioning positioning) |
Defines the way text is retrieved from tesseract output using TextPositioning. More... |
|
bool | IsUseTxtToImproveHocrParsing () |
Gets useTxtToImproveHocrParsing. More... |
|
iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties | SetUseTxtToImproveHocrParsing (bool useTxtToImproveHocrParsing) |
Sets useTxtToImproveHocrParsing. More... |
|
ImagePreprocessingOptions | GetImagePreprocessingOptions () |
Gets imagePreprocessingOptions. More... |
|
iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties | SetImagePreprocessingOptions (ImagePreprocessingOptions imagePreprocessingOptions) |
Sets imagePreprocessingOptions. More... |
|
int | GetMinimalConfidenceLevel () |
Gets minimal confidence level for HOCR line to be considered as properly recognized. More... |
|
iText.Pdfocr.Tesseract4.Tesseract4OcrEngineProperties | SetMinimalConfidenceLevel (int minimalConfidenceLevel) |
Sets minimal confidence level for HOCR line to be considered as properly recognized. More... |
|
![]() |
|
OcrEngineProperties () | |
Creates a new OcrEngineProperties instance. More... |
|
OcrEngineProperties (iText.Pdfocr.OcrEngineProperties other) | |
Creates a new OcrEngineProperties instance based on another OcrEngineProperties instance (copy constructor). More... |
|
IList< String > | GetLanguages () |
Gets list of languages required for provided images. More... |
|
iText.Pdfocr.OcrEngineProperties | SetLanguages (IList< String > requiredLanguages) |
Sets list of languages to be recognized in provided images. More... |
|
Properties that will be used by the iText.Pdfocr.IOcrEngine.
|
inline |
Creates a new Tesseract4OcrEngineProperties instance.
|
inline |
Creates a new Tesseract4OcrEngineProperties instance based on another Tesseract4OcrEngineProperties instance (copy constructor).
other | the other Tesseract4OcrEngineProperties instance |
|
inline |
Gets default language for ocr.
|
inline |
Gets default user words suffix.
|
inline |
Gets imagePreprocessingOptions.
|
inline |
Gets minimal confidence level for HOCR line to be considered as properly recognized.
Gets minimal confidence level for HOCR line to be considered as properly recognized. If real confidence level is lower then line is ignored Default value is 0 which means that everything is considered as properly recognized Value may vary in range of 0-100
|
inline |
Gets Page Segmentation Mode.
|
inline |
Gets path to directory with tess data.
|
inline |
Defines the way text is retrieved from tesseract output using TextPositioning.
|
inline |
Checks whether image preprocessing is needed.
|
inline |
Gets useTxtToImproveHocrParsing.
Gets useTxtToImproveHocrParsing. Used to make HOCR recognition result more precise. This is needed for cases of Thai language or some Chinese dialects where every character is interpreted as a single word. For more information see https://github.com/tesseract-ocr/tesseract/issues/2702
useTxtToImproveHocrParsing
|
inline |
Sets imagePreprocessingOptions.
imagePreprocessingOptions |
|
inline |
Sets minimal confidence level for HOCR line to be considered as properly recognized.
Sets minimal confidence level for HOCR line to be considered as properly recognized. If real confidence level is lower then line is ignored Default value is 0 which means that everything is considered as properly recognized Value may vary in range of 0-100
minimalConfidenceLevel | minimal confidence level value |
|
inline |
Sets Page Segmentation Mode.
Sets Page Segmentation Mode. More detailed explanation about psm modes could be found here https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#options Note that in documentation it is stated that default value of PSM is 3. This is true for tesseract executable, but for tesseract lib it is -1 which has negative impact on some documents. That's why in the code we set it explicitly to 3.
mode | psm mode as int? |
|
inline |
Sets path to directory with tess data.
tessData | path to train directory as System.IO.FileInfo |
|
inline |
Sets true if image preprocessing is needed.
preprocess | true if images need to be preprocessed, otherwise - false |
|
inline |
Defines the way text is retrieved from tesseract output using TextPositioning.
positioning | the way text is retrieved |
|
inline |
Sets useTxtToImproveHocrParsing.
Sets useTxtToImproveHocrParsing. Used to make HOCR recognition result more precise. This is needed for cases of Thai language or some Chinese dialects where every character is interpreted as a single word. For more information see https://github.com/tesseract-ocr/tesseract/issues/2702
useTxtToImproveHocrParsing |
useTxtToImproveHocrParsing