Package com.itextpdf.pdf2data.ocr.engine
Class Tesseract4BasedEngine
java.lang.Object
com.itextpdf.pdf2data.OcrWithPostProcessingEngine
com.itextpdf.pdf2data.ocr.engine.Tesseract4BasedEngine
- All Implemented Interfaces:
-
com.itextpdf.pdfocr.IOcrEngine
Engine which uses
Tesseract4LibOcrEngine
as based ocr engine. And apply Pdf2DataTATRPostProcessor
if needed afterward.
-
Nested Class Summary
-
Method Summary
Modifier and TypeMethodDescriptioncreateBuilder
(List<String> languages, File tessDataPath) Creates newTesseract4BasedEngine.Builder
.Methods inherited from class com.itextpdf.pdf2data.OcrWithPostProcessingEngine
createTxtFile, createTxtFile, doImageOcr, doImageOcr, isTaggingSupported
-
Method Details
-
createBuilder
public static Tesseract4BasedEngine.Builder createBuilder(List<String> languages, File tessDataPath) Creates newTesseract4BasedEngine.Builder
. Note that it's required to provide path to your train tesseract data directory.- Parameters:
-
languages
-List
of languages which you'd like to extract from image. If missing or empty english will be default language. -
tessDataPath
- path to your train tesseract data directory asFile
- Returns:
-
new instance of
Tesseract4BasedEngine.Builder
-