public abstract class AbstractTesseract4OcrEngine extends Object implements IOcrEngine, IThreadLocalMetaInfoAware
IOcrEngine. This class provides possibilities to perform OCR, to read data from input files and to return contained text in the required format. Also there are possibilities to use features of "tesseract" (optical character recognition engine for various operating systems).
| Constructor and Description |
|---|
AbstractTesseract4OcrEngine(Tesseract4OcrEngineProperties tesseract4OcrEngineProperties) |
| Modifier and Type | Method and Description |
|---|---|
void |
createTxtFile(List<File> inputImages, File txtFile)
Performs OCR using provided IOcrEngine for the given list of input images and saves output to a text file using provided path.
|
Map<Integer,List<TextInfo>> |
doImageOcr(File input)
Reads data from the provided input image file and returns retrieved data in the format described below.
|
String |
doImageOcr(File input, OutputFormat outputFormat)
Reads data from the provided input image file and returns retrieved data as string.
|
void |
doTesseractOcr(File inputImage, File outputFile, OutputFormat outputFormat)
Performs tesseract OCR for the first (or for the only) image page.
|
String |
getLanguagesAsString()
Gets list of languages concatenated with "+" symbol to a string in format required by tesseract.
|
Tesseract4OcrEngineProperties |
getTesseract4OcrEngineProperties()
Gets properties for AbstractTesseract4OcrEngine.
|
com.itextpdf.kernel.counter.event.IMetaInfo |
getThreadLocalMetaInfo()
Gets the meta info which is held by the interface.
|
String |
identifyOsType()
Identifies type of current OS and return it (win, linux).
|
boolean |
isWindows()
Checks current os type.
|
void |
setTesseract4OcrEngineProperties(Tesseract4OcrEngineProperties tesseract4OcrEngineProperties)
Sets properties for AbstractTesseract4OcrEngine.
|
IThreadLocalMetaInfoAware |
setThreadLocalMetaInfo(com.itextpdf.kernel.counter.event.IMetaInfo metaInfo)
Sets a thread local meta info.
|
void |
validateLanguages(List<String> languagesList)
Validates list of provided languages and checks if they all exist in given tess data directory.
|
public AbstractTesseract4OcrEngine(Tesseract4OcrEngineProperties tesseract4OcrEngineProperties)
public void doTesseractOcr(File inputImage, File outputFile, OutputFormat outputFormat)
inputImage - input image File
outputFile - output file for the result for the first page
outputFormat - selected OutputFormat for tesseract
public void createTxtFile(List<File> inputImages, File txtFile)
IOcrEngine for the given list of input images and saves output to a text file using provided path.
createTxtFile in interface IOcrEngine
inputImages - List of images to be OCRed
txtFile - file to be created
public final Tesseract4OcrEngineProperties getTesseract4OcrEngineProperties()
AbstractTesseract4OcrEngine.
Tesseract4OcrEngineProperties
public final void setTesseract4OcrEngineProperties(Tesseract4OcrEngineProperties tesseract4OcrEngineProperties)
AbstractTesseract4OcrEngine.
tesseract4OcrEngineProperties - set of properties Tesseract4OcrEngineProperties for AbstractTesseract4OcrEngine
public final String getLanguagesAsString()
String of concatenated languages
public final Map<Integer,List<TextInfo>> doImageOcr(File input)
doImageOcr in interface IOcrEngine
input - input image File
Map where key is Integer representing the number of the page and value is List of TextInfo elements where each TextInfo element contains a word or a line and its 4 coordinates(bbox)
public final String doImageOcr(File input, OutputFormat outputFormat)
input - input image File
outputFormat - return OutputFormat result
String that is returned after processing the given image
public boolean isWindows()
public String identifyOsType()
String
public void validateLanguages(List<String> languagesList) throws Tesseract4OcrException
languagesList - List of provided languages
Tesseract4OcrException - if tess data wasn't found for one of the languages from the provided list
public com.itextpdf.kernel.counter.event.IMetaInfo getThreadLocalMetaInfo()
getThreadLocalMetaInfo in interface IThreadLocalMetaInfoAware
public IThreadLocalMetaInfoAware setThreadLocalMetaInfo(com.itextpdf.kernel.counter.event.IMetaInfo metaInfo)
setThreadLocalMetaInfo in interface IThreadLocalMetaInfoAware
metaInfo - a thread local meta info to be held
IThreadLocalMetaInfoAware
Copyright © 1998–2020 iText Group NV. All rights reserved.