Package com.itextpdf.pdf2data.ocr.engine
Class Tesseract4BasedEngine
java.lang.Object
com.itextpdf.pdf2data.OcrWithPostProcessingEngine
com.itextpdf.pdf2data.ocr.engine.Tesseract4BasedEngine
- All Implemented Interfaces:
-
com.itextpdf.pdfocr.IOcrEngine
Engine which uses
Tesseract4LibOcrEngine as based ocr engine. And apply Pdf2DataTATRPostProcessor if needed afterward.
-
Nested Class Summary
Nested Classes -
Method Summary
Modifier and TypeMethodDescriptioncreateBuilder(List<String> languages, File tessDataPath) Creates newTesseract4BasedEngine.Builder.Methods inherited from class com.itextpdf.pdf2data.OcrWithPostProcessingEngine
createTxtFile, createTxtFile, doImageOcr, doImageOcr, isTaggingSupported
-
Method Details
-
createBuilder
public static Tesseract4BasedEngine.Builder createBuilder(List<String> languages, File tessDataPath) Creates newTesseract4BasedEngine.Builder. Note that it's required to provide path to your train tesseract data directory.- Parameters:
-
languages-Listof languages which you'd like to extract from image. If missing or empty english will be default language. -
tessDataPath- path to your train tesseract data directory asFile - Returns:
-
new instance of
Tesseract4BasedEngine.Builder
-