public class Tesseract4OcrEngineProperties extends OcrEngineProperties
IOcrEngine
.
Constructor and Description |
---|
Tesseract4OcrEngineProperties()
Creates a new Tesseract4OcrEngineProperties instance.
|
Tesseract4OcrEngineProperties(Tesseract4OcrEngineProperties other)
Creates a new Tesseract4OcrEngineProperties instance based on another Tesseract4OcrEngineProperties instance (copy constructor).
|
Modifier and Type | Method and Description |
---|---|
String |
getDefaultLanguage()
Gets default language for ocr.
|
String |
getDefaultUserWordsSuffix()
Gets default user words suffix.
|
ImagePreprocessingOptions |
getImagePreprocessingOptions()
|
int |
getMinimalConfidenceLevel()
Gets minimal confidence level for HOCR line to be considered as properly recognized.
|
Integer |
getPageSegMode()
Gets Page Segmentation Mode.
|
File |
getPathToTessData()
Gets path to directory with tess data.
|
TextPositioning |
getTextPositioning()
Defines the way text is retrieved from tesseract output using TextPositioning .
|
boolean |
isPreprocessingImages()
Checks whether image preprocessing is needed.
|
boolean |
isUseTxtToImproveHocrParsing()
|
Tesseract4OcrEngineProperties |
setImagePreprocessingOptions(ImagePreprocessingOptions imagePreprocessingOptions)
|
Tesseract4OcrEngineProperties |
setMinimalConfidenceLevel(int minimalConfidenceLevel)
Sets minimal confidence level for HOCR line to be considered as properly recognized.
|
Tesseract4OcrEngineProperties |
setPageSegMode(Integer mode)
Sets Page Segmentation Mode.
|
Tesseract4OcrEngineProperties |
setPathToTessData(File tessData)
Sets path to directory with tess data.
|
Tesseract4OcrEngineProperties |
setPreprocessingImages(boolean preprocess)
Sets true if image preprocessing is needed.
|
Tesseract4OcrEngineProperties |
setTextPositioning(TextPositioning positioning)
Defines the way text is retrieved from tesseract output using TextPositioning .
|
Tesseract4OcrEngineProperties |
setUseTxtToImproveHocrParsing(boolean useTxtToImproveHocrParsing)
|
getLanguages, setLanguages
public Tesseract4OcrEngineProperties()
Tesseract4OcrEngineProperties
instance.
public Tesseract4OcrEngineProperties(Tesseract4OcrEngineProperties other)
Tesseract4OcrEngineProperties
instance based on another Tesseract4OcrEngineProperties
instance (copy constructor).
other
- the other Tesseract4OcrEngineProperties
instance
public final String getDefaultLanguage()
public final String getDefaultUserWordsSuffix()
public final File getPathToTessData()
public final Tesseract4OcrEngineProperties setPathToTessData(File tessData)
tessData
- path to train directory as File
Tesseract4OcrEngineProperties
instance
PdfOcrTesseract4Exception
- if path to tess data directory is null or empty, or provided directory does not exist? or it is not a directory
public final Integer getPageSegMode()
Integer
public final Tesseract4OcrEngineProperties setPageSegMode(Integer mode)
mode
- psm mode as Integer
Tesseract4OcrEngineProperties
instance
public final boolean isPreprocessingImages()
public final Tesseract4OcrEngineProperties setPreprocessingImages(boolean preprocess)
preprocess
- true if images need to be preprocessed, otherwise - false
Tesseract4OcrEngineProperties
instance
public final TextPositioning getTextPositioning()
TextPositioning
.
public final Tesseract4OcrEngineProperties setTextPositioning(TextPositioning positioning)
TextPositioning
.
positioning
- the way text is retrieved
Tesseract4OcrEngineProperties
instance
public final boolean isUseTxtToImproveHocrParsing()
useTxtToImproveHocrParsing
. Used to make HOCR recognition result more precise. This is needed for cases of Thai language or some Chinese dialects where every character is interpreted as a single word. For more information see https://github.com/tesseract-ocr/tesseract/issues/2702
useTxtToImproveHocrParsing
public final Tesseract4OcrEngineProperties setUseTxtToImproveHocrParsing(boolean useTxtToImproveHocrParsing)
useTxtToImproveHocrParsing
. Used to make HOCR recognition result more precise. This is needed for cases of Thai language or some Chinese dialects where every character is interpreted as a single word. For more information see https://github.com/tesseract-ocr/tesseract/issues/2702
useTxtToImproveHocrParsing
- useTxtToImproveHocrParsing
Tesseract4OcrEngineProperties
instance.
public final ImagePreprocessingOptions getImagePreprocessingOptions()
ImagePreprocessingOptions
public final Tesseract4OcrEngineProperties setImagePreprocessingOptions(ImagePreprocessingOptions imagePreprocessingOptions)
imagePreprocessingOptions
- ImagePreprocessingOptions
Tesseract4OcrEngineProperties
instance
public final int getMinimalConfidenceLevel()
public final Tesseract4OcrEngineProperties setMinimalConfidenceLevel(int minimalConfidenceLevel)
minimalConfidenceLevel
- minimal confidence level value
Tesseract4OcrEngineProperties
instance
Copyright © 1998–2022 iText Group NV. All rights reserved.