Package com.itextpdf.pdf2data
Class OcrWithPostProcessingEngine
java.lang.Object
com.itextpdf.pdf2data.OcrWithPostProcessingEngine
- All Implemented Interfaces:
-
com.itextpdf.pdfocr.IOcrEngine
- Direct Known Subclasses:
-
Tesseract4BasedEngine
Engine which will apply post processors (if present) to results of base ocr engine.
-
Constructor Summary
ConstructorDescriptionOcrWithPostProcessingEngine
(com.itextpdf.pdfocr.IOcrEngine baseOcrEngine, List<IOcrEnginePostProcessor> postProcessors, boolean isTaggingSupported) Creates newOcrWithPostProcessingEngine
instance. -
Method Summary
Modifier and TypeMethodDescriptionvoid
createTxtFile
(List<File> inputImages, File txtFile) Performs OCR using providedIOcrEngine
for the given list of input images and saves output to a text file using provided path.void
createTxtFile
(List<File> inputImages, File txtFile, com.itextpdf.pdfocr.OcrProcessContext ocrProcessContext) Performs OCR using providedIOcrEngine
for the given list of input images and saves output to a text file using provided path.doImageOcr
(File input) Performs ocr with post-processing to your input file.doImageOcr
(File input, com.itextpdf.pdfocr.OcrProcessContext ocrProcessContext) Performs ocr with post-processing to your input file.boolean
Gets whether results will be tagged or not.
-
Constructor Details
-
OcrWithPostProcessingEngine
public OcrWithPostProcessingEngine(com.itextpdf.pdfocr.IOcrEngine baseOcrEngine, List<IOcrEnginePostProcessor> postProcessors, boolean isTaggingSupported) Creates newOcrWithPostProcessingEngine
instance.- Parameters:
-
baseOcrEngine
- base ocr engine which implementsIOcrEngine
-
postProcessors
-List
ofIOcrEnginePostProcessor
-
isTaggingSupported
- iftrue
results will be tagged, otherwise tag structure will be missing
-
-
Method Details
-
isTaggingSupported
public boolean isTaggingSupported()Gets whether results will be tagged or not.- Returns:
-
true
if results will be tagged,false otherwise
;
-
doImageOcr
Performs ocr with post-processing to your input file. -
doImageOcr
public Map<Integer,List doImageOcr> (File input, com.itextpdf.pdfocr.OcrProcessContext ocrProcessContext) Performs ocr with post-processing to your input file.- Specified by:
-
doImageOcr
in interfacecom.itextpdf.pdfocr.IOcrEngine
- Parameters:
-
input
- input imageFile
-
ocrProcessContext
- ocr processing context - Returns:
-
Map
where key isInteger
representing the number of the page and value isList
ofTextInfo
elements where eachTextInfo
element contains a word or a line and its 4 coordinates (bbox)
-
createTxtFile
Performs OCR using providedIOcrEngine
for the given list of input images and saves output to a text file using provided path. Note that a human reading order is not guaranteed due to possible specifics of input images (multicolumn layout, tables etc.).- Specified by:
-
createTxtFile
in interfacecom.itextpdf.pdfocr.IOcrEngine
- Parameters:
-
inputImages
-List
of images to be OCRed -
txtFile
- file to be created
-
createTxtFile
public void createTxtFile(List<File> inputImages, File txtFile, com.itextpdf.pdfocr.OcrProcessContext ocrProcessContext) Performs OCR using providedIOcrEngine
for the given list of input images and saves output to a text file using provided path. Note that a human reading order is not guaranteed due to possible specifics of input images (multicolumn layout, tables etc.).- Specified by:
-
createTxtFile
in interfacecom.itextpdf.pdfocr.IOcrEngine
- Parameters:
-
inputImages
-List
of images to be OCRed -
txtFile
- file to be created -
ocrProcessContext
- ocr processing context
-