public class Tesseract4OcrEngineProperties extends OcrEngineProperties
IOcrEngine.
| Constructor and Description |
|---|
Tesseract4OcrEngineProperties()
Creates a new Tesseract4OcrEngineProperties instance.
|
Tesseract4OcrEngineProperties(Tesseract4OcrEngineProperties other)
Creates a new Tesseract4OcrEngineProperties instance based on another Tesseract4OcrEngineProperties instance (copy constructor).
|
| Modifier and Type | Method and Description |
|---|---|
String |
getDefaultLanguage()
Gets default language for ocr.
|
String |
getDefaultUserWordsSuffix()
Gets default user words suffix.
|
ImagePreprocessingOptions |
getImagePreprocessingOptions()
|
int |
getMinimalConfidenceLevel()
Gets minimal confidence level for HOCR line to be considered as properly recognized.
|
Integer |
getPageSegMode()
Gets Page Segmentation Mode.
|
File |
getPathToTessData()
Gets path to directory with tess data.
|
TextPositioning |
getTextPositioning()
Defines the way text is retrieved from tesseract output using TextPositioning.
|
boolean |
isPreprocessingImages()
Checks whether image preprocessing is needed.
|
boolean |
isUseTxtToImproveHocrParsing()
|
Tesseract4OcrEngineProperties |
setImagePreprocessingOptions(ImagePreprocessingOptions imagePreprocessingOptions)
|
Tesseract4OcrEngineProperties |
setMinimalConfidenceLevel(int minimalConfidenceLevel)
Sets minimal confidence level for HOCR line to be considered as properly recognized.
|
Tesseract4OcrEngineProperties |
setPageSegMode(Integer mode)
Sets Page Segmentation Mode.
|
Tesseract4OcrEngineProperties |
setPathToTessData(File tessData)
Sets path to directory with tess data.
|
Tesseract4OcrEngineProperties |
setPreprocessingImages(boolean preprocess)
Sets true if image preprocessing is needed.
|
Tesseract4OcrEngineProperties |
setTextPositioning(TextPositioning positioning)
Defines the way text is retrieved from tesseract output using TextPositioning.
|
Tesseract4OcrEngineProperties |
setUseTxtToImproveHocrParsing(boolean useTxtToImproveHocrParsing)
|
getLanguages, setLanguagespublic Tesseract4OcrEngineProperties()
Tesseract4OcrEngineProperties instance.
public Tesseract4OcrEngineProperties(Tesseract4OcrEngineProperties other)
Tesseract4OcrEngineProperties instance based on another Tesseract4OcrEngineProperties instance (copy constructor).
other - the other Tesseract4OcrEngineProperties instance
public final String getDefaultLanguage()
public final String getDefaultUserWordsSuffix()
public final File getPathToTessData()
public final Tesseract4OcrEngineProperties setPathToTessData(File tessData)
tessData - path to train directory as File
Tesseract4OcrEngineProperties instance
PdfOcrTesseract4Exception - if path to tess data directory is null or empty, or provided directory does not exist? or it is not a directory
public final Integer getPageSegMode()
Integer
public final Tesseract4OcrEngineProperties setPageSegMode(Integer mode)
mode - psm mode as Integer
Tesseract4OcrEngineProperties instance
public final boolean isPreprocessingImages()
public final Tesseract4OcrEngineProperties setPreprocessingImages(boolean preprocess)
preprocess - true if images need to be preprocessed, otherwise - false
Tesseract4OcrEngineProperties instance
public final TextPositioning getTextPositioning()
TextPositioning.
public final Tesseract4OcrEngineProperties setTextPositioning(TextPositioning positioning)
TextPositioning.
positioning - the way text is retrieved
Tesseract4OcrEngineProperties instance
public final boolean isUseTxtToImproveHocrParsing()
useTxtToImproveHocrParsing. Used to make HOCR recognition result more precise. This is needed for cases of Thai language or some Chinese dialects where every character is interpreted as a single word. For more information see https://github.com/tesseract-ocr/tesseract/issues/2702
useTxtToImproveHocrParsing
public final Tesseract4OcrEngineProperties setUseTxtToImproveHocrParsing(boolean useTxtToImproveHocrParsing)
useTxtToImproveHocrParsing. Used to make HOCR recognition result more precise. This is needed for cases of Thai language or some Chinese dialects where every character is interpreted as a single word. For more information see https://github.com/tesseract-ocr/tesseract/issues/2702
useTxtToImproveHocrParsing - useTxtToImproveHocrParsing
Tesseract4OcrEngineProperties instance.
public final ImagePreprocessingOptions getImagePreprocessingOptions()
ImagePreprocessingOptions
public final Tesseract4OcrEngineProperties setImagePreprocessingOptions(ImagePreprocessingOptions imagePreprocessingOptions)
imagePreprocessingOptions - ImagePreprocessingOptions
Tesseract4OcrEngineProperties instance
public final int getMinimalConfidenceLevel()
public final Tesseract4OcrEngineProperties setMinimalConfidenceLevel(int minimalConfidenceLevel)
minimalConfidenceLevel - minimal confidence level value
Tesseract4OcrEngineProperties instance
Copyright © 1998–2022 iText Group NV. All rights reserved.