Package com.itextpdf.pdf2data
Class OcrWithPostProcessingEngine
java.lang.Object
com.itextpdf.pdf2data.OcrWithPostProcessingEngine
- All Implemented Interfaces:
-
com.itextpdf.pdfocr.IOcrEngine
- Direct Known Subclasses:
-
Tesseract4BasedEngine
Engine which will apply post processors (if present) to results of base ocr engine.
-
Constructor Summary
ConstructorsConstructorDescriptionOcrWithPostProcessingEngine(com.itextpdf.pdfocr.IOcrEngine baseOcrEngine, List<IOcrEnginePostProcessor> postProcessors, boolean isTaggingSupported) Creates newOcrWithPostProcessingEngineinstance. -
Method Summary
Modifier and TypeMethodDescriptionvoidcreateTxtFile(List<File> inputImages, File txtFile) Performs OCR using providedIOcrEnginefor the given list of input images and saves output to a text file using provided path.voidcreateTxtFile(List<File> inputImages, File txtFile, com.itextpdf.pdfocr.OcrProcessContext ocrProcessContext) Performs OCR using providedIOcrEnginefor the given list of input images and saves output to a text file using provided path.doImageOcr(File input) Performs ocr with post-processing to your input file.doImageOcr(File input, com.itextpdf.pdfocr.OcrProcessContext ocrProcessContext) Performs ocr with post-processing to your input file.booleanGets whether results will be tagged or not.
-
Constructor Details
-
OcrWithPostProcessingEngine
public OcrWithPostProcessingEngine(com.itextpdf.pdfocr.IOcrEngine baseOcrEngine, List<IOcrEnginePostProcessor> postProcessors, boolean isTaggingSupported) Creates newOcrWithPostProcessingEngineinstance.- Parameters:
-
baseOcrEngine- base ocr engine which implementsIOcrEngine -
postProcessors-ListofIOcrEnginePostProcessor -
isTaggingSupported- iftrueresults will be tagged, otherwise tag structure will be missing
-
-
Method Details
-
isTaggingSupported
public boolean isTaggingSupported()Gets whether results will be tagged or not.- Returns:
-
trueif results will be tagged,false otherwise;
-
doImageOcr
Performs ocr with post-processing to your input file. -
doImageOcr
public Map<Integer,List doImageOcr> (File input, com.itextpdf.pdfocr.OcrProcessContext ocrProcessContext) Performs ocr with post-processing to your input file.- Specified by:
-
doImageOcrin interfacecom.itextpdf.pdfocr.IOcrEngine - Parameters:
-
input- input imageFile -
ocrProcessContext- ocr processing context - Returns:
-
Mapwhere key isIntegerrepresenting the number of the page and value isListofTextInfoelements where eachTextInfoelement contains a word or a line and its 4 coordinates (bbox)
-
createTxtFile
Performs OCR using providedIOcrEnginefor the given list of input images and saves output to a text file using provided path. Note that a human reading order is not guaranteed due to possible specifics of input images (multicolumn layout, tables etc.).- Specified by:
-
createTxtFilein interfacecom.itextpdf.pdfocr.IOcrEngine - Parameters:
-
inputImages-Listof images to be OCRed -
txtFile- file to be created
-
createTxtFile
public void createTxtFile(List<File> inputImages, File txtFile, com.itextpdf.pdfocr.OcrProcessContext ocrProcessContext) Performs OCR using providedIOcrEnginefor the given list of input images and saves output to a text file using provided path. Note that a human reading order is not guaranteed due to possible specifics of input images (multicolumn layout, tables etc.).- Specified by:
-
createTxtFilein interfacecom.itextpdf.pdfocr.IOcrEngine - Parameters:
-
inputImages-Listof images to be OCRed -
txtFile- file to be created -
ocrProcessContext- ocr processing context
-