Class OcrPdfCreator
OcrPdfCreator is the class that creates PDF documents containing input images and text that was recognized using provided IOcrEngine.
OcrPdfCreator provides possibilities to set list of input images to be used for OCR, to set scaling mode for images, to set color of text in output PDF document, to set fixed size of the PDF document's page and to perform OCR using given images and to return PdfDocument as result. OCR is based on the provided IOcrEngine (e.g. tesseract reader). This parameter is obligatory and it should be provided in constructor or using setter.
-
Constructor Summary
ConstructorsConstructorDescriptionOcrPdfCreator(IOcrEngine ocrEngine) Creates a newOcrPdfCreatorinstance.OcrPdfCreator(IOcrEngine ocrEngine, OcrPdfCreatorProperties ocrPdfCreatorProperties) Creates a newOcrPdfCreatorinstance. -
Method Summary
Modifier and TypeMethodDescriptionfinal com.itextpdf.kernel.pdf.PdfDocumentPerforms OCR with set parameters using providedIOcrEngineand creates PDF using providedPdfWriter.final com.itextpdf.kernel.pdf.PdfDocumentcreatePdf(List<File> inputImages, com.itextpdf.kernel.pdf.PdfWriter pdfWriter, com.itextpdf.kernel.pdf.DocumentProperties documentProperties) Performs OCR with set parameters using providedIOcrEngineand creates PDF using providedPdfWriter.final com.itextpdf.kernel.pdf.PdfDocumentcreatePdf(List<File> inputImages, com.itextpdf.kernel.pdf.PdfWriter pdfWriter, com.itextpdf.kernel.pdf.DocumentProperties documentProperties, IOcrProcessProperties ocrProcessProperties) Performs OCR with set parameters using providedIOcrEngineand creates PDF using providedPdfWriter.final com.itextpdf.kernel.pdf.PdfDocumentcreatePdfA(List<File> inputImages, com.itextpdf.kernel.pdf.PdfWriter pdfWriter, com.itextpdf.kernel.pdf.DocumentProperties documentProperties, com.itextpdf.kernel.pdf.PdfOutputIntent pdfOutputIntent) Performs OCR with set parameters using providedIOcrEngineand creates PDF using providedPdfWriter,DocumentPropertiesandPdfOutputIntent.final com.itextpdf.kernel.pdf.PdfDocumentcreatePdfA(List<File> inputImages, com.itextpdf.kernel.pdf.PdfWriter pdfWriter, com.itextpdf.kernel.pdf.DocumentProperties documentProperties, com.itextpdf.kernel.pdf.PdfOutputIntent pdfOutputIntent, IOcrProcessProperties ocrProcessProperties) Performs OCR with set parameters using providedIOcrEngineand creates PDF using providedPdfWriter,DocumentPropertiesandPdfOutputIntent.final com.itextpdf.kernel.pdf.PdfDocumentcreatePdfA(List<File> inputImages, com.itextpdf.kernel.pdf.PdfWriter pdfWriter, com.itextpdf.kernel.pdf.PdfOutputIntent pdfOutputIntent) Performs OCR with set parameters using providedIOcrEngineand creates PDF using providedPdfWriterandPdfOutputIntent.voidcreatePdfAFile(List<File> inputImages, File outPdfFile, com.itextpdf.kernel.pdf.PdfOutputIntent pdfOutputIntent) Performs OCR with set parameters using providedIOcrEngineand creates PDF using providedFileandPdfOutputIntent.voidcreatePdfFile(List<File> inputImages, File outPdfFile) Performs OCR with set parameters using providedIOcrEngineand creates PDF using providedFile.final IOcrEngineGets usedIOcrEnginereader object to perform OCR.final OcrPdfCreatorPropertiesGets properties forOcrPdfCreator.voidmakePdfSearchable(com.itextpdf.kernel.pdf.PdfDocument pdfDoc) Performs OCR of all images in an input PDF document and adds recognized text on top of the images.voidmakePdfSearchable(com.itextpdf.kernel.pdf.PdfDocument pdfDoc, IOcrProcessProperties ocrProcessProperties) Performs OCR of all images in an input PDF document and adds recognized text on top of the images.voidmakePdfSearchable(File inputPdf, File outputPdf) Performs OCR of all images in an input PDF file and generates searchable PDF.voidmakePdfSearchable(File inputPdf, File outputPdf, IOcrProcessProperties ocrProcessProperties) Performs OCR of all images in an input PDF file and generates searchable PDF.final voidsetOcrEngine(IOcrEngine reader) SetsIOcrEnginereader object to perform OCR.final voidsetOcrPdfCreatorProperties(OcrPdfCreatorProperties ocrPdfCreatorProperties) Sets properties forOcrPdfCreator.protected voidvalidateInputPdfDocument(com.itextpdf.kernel.pdf.PdfDocument pdfDoc) Validates input PDF document.
-
Constructor Details
-
OcrPdfCreator
Creates a newOcrPdfCreatorinstance.- Parameters:
-
ocrEngine-IOcrEngineselected OCR Reader
-
OcrPdfCreator
Creates a newOcrPdfCreatorinstance.- Parameters:
-
ocrEngine- selected OCR ReaderIOcrEngine -
ocrPdfCreatorProperties- set of properties forOcrPdfCreator
-
-
Method Details
-
getOcrPdfCreatorProperties
Gets properties forOcrPdfCreator.- Returns:
-
set properties
OcrPdfCreatorProperties
-
setOcrPdfCreatorProperties
Sets properties forOcrPdfCreator.- Parameters:
-
ocrPdfCreatorProperties- set of propertiesOcrPdfCreatorPropertiesforOcrPdfCreator
-
createPdfA
public final com.itextpdf.kernel.pdf.PdfDocument createPdfA(List<File> inputImages, com.itextpdf.kernel.pdf.PdfWriter pdfWriter, com.itextpdf.kernel.pdf.DocumentProperties documentProperties, com.itextpdf.kernel.pdf.PdfOutputIntent pdfOutputIntent, IOcrProcessProperties ocrProcessProperties) throws PdfOcrException Performs OCR with set parameters using providedIOcrEngineand creates PDF using providedPdfWriter,DocumentPropertiesandPdfOutputIntent. PDF/A-3u document will be created if providedPdfOutputIntentis not null.NOTE that after executing this method you will have a product event from the both itextcore and pdfOcr. Therefore, use this method only if you need to work with the generated
PdfDocument. If you don't need this, use thecreatePdfAFile(java.util.Listmethod. In this case, only the pdfOcr event will be dispatched., java.io.File, com.itextpdf.kernel.pdf.PdfOutputIntent) - Parameters:
-
inputImages-Listof images to be OCRed -
pdfWriter- thePdfWriterobject to write final PDF document to -
documentProperties- document properties -
pdfOutputIntent-PdfOutputIntentfor PDF/A-3u document -
ocrProcessProperties- extra OCR process properties passed toOcrProcessContext - Returns:
-
result PDF/A-3u
PdfDocumentobject - Throws:
-
PdfOcrException- if it was not possible to read provided or default font
-
createPdfA
public final com.itextpdf.kernel.pdf.PdfDocument createPdfA(List<File> inputImages, com.itextpdf.kernel.pdf.PdfWriter pdfWriter, com.itextpdf.kernel.pdf.PdfOutputIntent pdfOutputIntent) throws PdfOcrException Performs OCR with set parameters using providedIOcrEngineand creates PDF using providedPdfWriterandPdfOutputIntent. PDF/A-3u document will be created if providedPdfOutputIntentis not null.NOTE that after executing this method you will have a product event from the both itextcore and pdfOcr. Therefore, use this method only if you need to work with the generated
PdfDocument. If you don't need this, use thecreatePdfAFile(java.util.Listmethod. In this case, only the pdfOcr event will be dispatched., java.io.File, com.itextpdf.kernel.pdf.PdfOutputIntent) - Parameters:
-
inputImages-Listof images to be OCRed -
pdfWriter- thePdfWriterobject to write final PDF document to -
pdfOutputIntent-PdfOutputIntentfor PDF/A-3u document - Returns:
-
result PDF/A-3u
PdfDocumentobject - Throws:
-
PdfOcrException- if it was not possible to read provided or default font
-
createPdfA
public final com.itextpdf.kernel.pdf.PdfDocument createPdfA(List<File> inputImages, com.itextpdf.kernel.pdf.PdfWriter pdfWriter, com.itextpdf.kernel.pdf.DocumentProperties documentProperties, com.itextpdf.kernel.pdf.PdfOutputIntent pdfOutputIntent) throws PdfOcrException Performs OCR with set parameters using providedIOcrEngineand creates PDF using providedPdfWriter,DocumentPropertiesandPdfOutputIntent. PDF/A-3u document will be created if providedPdfOutputIntentis not null.NOTE that after executing this method you will have a product event from the both itextcore and pdfOcr. Therefore, use this method only if you need to work with the generated
PdfDocument. If you don't need this, use thecreatePdfAFile(java.util.Listmethod. In this case, only the pdfOcr event will be dispatched., java.io.File, com.itextpdf.kernel.pdf.PdfOutputIntent) - Parameters:
-
inputImages-Listof images to be OCRed -
pdfWriter- thePdfWriterobject to write final PDF document to -
documentProperties- document properties -
pdfOutputIntent-PdfOutputIntentfor PDF/A-3u document - Returns:
-
result PDF/A-3u
PdfDocumentobject - Throws:
-
PdfOcrException- if it was not possible to read provided or default font
-
createPdf
public final com.itextpdf.kernel.pdf.PdfDocument createPdf(List<File> inputImages, com.itextpdf.kernel.pdf.PdfWriter pdfWriter, com.itextpdf.kernel.pdf.DocumentProperties documentProperties, IOcrProcessProperties ocrProcessProperties) throws PdfOcrException Performs OCR with set parameters using providedIOcrEngineand creates PDF using providedPdfWriter.NOTE that after executing this method you will have a product event from the both itextcore and pdfOcr. Therefore, use this method only if you need to work with the generated
PdfDocument. If you don't need this, use thecreatePdfFile(java.util.Listmethod. In this case, only the pdfOcr event will be dispatched., java.io.File) - Parameters:
-
inputImages-Listof images to be OCRed -
pdfWriter- thePdfWriterobject to write final PDF document to -
documentProperties- document properties -
ocrProcessProperties- extra OCR process properties passed to OcrProcessContext - Returns:
-
result
PdfDocumentobject - Throws:
-
PdfOcrException- if provided font is incorrect
-
createPdf
public final com.itextpdf.kernel.pdf.PdfDocument createPdf(List<File> inputImages, com.itextpdf.kernel.pdf.PdfWriter pdfWriter, com.itextpdf.kernel.pdf.DocumentProperties documentProperties) throws PdfOcrException Performs OCR with set parameters using providedIOcrEngineand creates PDF using providedPdfWriter.NOTE that after executing this method you will have a product event from the both itextcore and pdfOcr. Therefore, use this method only if you need to work with the generated
PdfDocument. If you don't need this, use thecreatePdfFile(java.util.Listmethod. In this case, only the pdfOcr event will be dispatched., java.io.File) - Parameters:
-
inputImages-Listof images to be OCRed -
pdfWriter- thePdfWriterobject to write final PDF document to -
documentProperties- document properties - Returns:
-
result
PdfDocumentobject - Throws:
-
PdfOcrException- if provided font is incorrect
-
createPdf
public final com.itextpdf.kernel.pdf.PdfDocument createPdf(List<File> inputImages, com.itextpdf.kernel.pdf.PdfWriter pdfWriter) throws PdfOcrException Performs OCR with set parameters using providedIOcrEngineand creates PDF using providedPdfWriter.NOTE that after executing this method you will have a product event from the both itextcore and pdfOcr. Therefore, use this method only if you need to work with the generated
PdfDocument. If you don't need this, use thecreatePdfFile(java.util.Listmethod. In this case, only the pdfOcr event will be dispatched., java.io.File) - Parameters:
-
inputImages-Listof images to be OCRed -
pdfWriter- thePdfWriterobject to write final PDF document to - Returns:
-
result
PdfDocumentobject - Throws:
-
PdfOcrException- if provided font is incorrect
-
createPdfFile
public void createPdfFile(List<File> inputImages, File outPdfFile) throws PdfOcrException, IOException Performs OCR with set parameters using providedIOcrEngineand creates PDF using providedFile.- Parameters:
-
inputImages-Listof images to be OCRed -
outPdfFile- theFileobject to write final PDF document to - Throws:
-
IOException- signals that an I/O exception of some sort has occurred. -
PdfOcrException- if it was not possible to read provided or default font
-
createPdfAFile
public void createPdfAFile(List<File> inputImages, File outPdfFile, com.itextpdf.kernel.pdf.PdfOutputIntent pdfOutputIntent) throws PdfOcrException, IOException Performs OCR with set parameters using providedIOcrEngineand creates PDF using providedFileandPdfOutputIntent. PDF/A-3u document will be created if providedPdfOutputIntentis not null.- Parameters:
-
inputImages-Listof images to be OCRed -
outPdfFile- theFileobject to write final PDF document to -
pdfOutputIntent-PdfOutputIntentfor PDF/A-3u document - Throws:
-
IOException- signals that an I/O exception of some sort has occurred -
PdfOcrException- if it was not possible to read provided or default font
-
getOcrEngine
Gets usedIOcrEnginereader object to perform OCR.- Returns:
-
selected
IOcrEngineinstance
-
setOcrEngine
SetsIOcrEnginereader object to perform OCR.- Parameters:
-
reader- selectedIOcrEngineinstance
-
makePdfSearchable
public void makePdfSearchable(File inputPdf, File outputPdf) throws com.itextpdf.io.exceptions.IOException, PdfOcrException Performs OCR of all images in an input PDF file and generates searchable PDF.By default, it does not allow to OCR PDF/A documents and tagged documents. The reason is that the result document might not comply with PDF/A specification and an added content might be not tagged depending on the
IOcrEngineimplementation. To overrule this behavior one can overridevalidateInputPdfDocument(com.itextpdf.kernel.pdf.PdfDocument)with an empty implementation.Note that
OcrPdfCreatorProperties.setPageSize(com.itextpdf.kernel.geom.Rectangle),OcrPdfCreatorProperties.setScaleMode(ScaleMode)andOcrPdfCreatorProperties.setImageLayerName(String)have no effect for this method.- Parameters:
-
inputPdf- PDF file to OCR -
outputPdf- searchable PDF with the recognized text on top of the images - Throws:
-
com.itextpdf.io.exceptions.IOException- if an image cannot be extracted from a PDF file -
PdfOcrException- in case of any other OCR error
-
makePdfSearchable
public void makePdfSearchable(File inputPdf, File outputPdf, IOcrProcessProperties ocrProcessProperties) throws com.itextpdf.io.exceptions.IOException, PdfOcrException Performs OCR of all images in an input PDF file and generates searchable PDF.By default, it does not allow to OCR PDF/A documents and tagged documents. The reason is that the result document might not comply with PDF/A specification and an added content might be not tagged depending on the
IOcrEngineimplementation. To overrule this behavior one can overridevalidateInputPdfDocument(com.itextpdf.kernel.pdf.PdfDocument)with an empty implementation.Note that
OcrPdfCreatorProperties.setPageSize(com.itextpdf.kernel.geom.Rectangle),OcrPdfCreatorProperties.setScaleMode(ScaleMode)andOcrPdfCreatorProperties.setImageLayerName(String)have no effect for this method.- Parameters:
-
inputPdf- PDF file to OCR -
outputPdf- searchable PDF with the recognized text on top of the images -
ocrProcessProperties- extra OCR process properties passed toOcrProcessContext. - Throws:
-
com.itextpdf.io.exceptions.IOException- if an image cannot be extracted from a pdf -
PdfOcrException- in case of any other OCR error
-
makePdfSearchable
public void makePdfSearchable(com.itextpdf.kernel.pdf.PdfDocument pdfDoc) throws com.itextpdf.io.exceptions.IOException, PdfOcrException Performs OCR of all images in an input PDF document and adds recognized text on top of the images.By default, it does not allow to OCR PDF/A documents and tagged documents. The reason is that the result document might not comply with PDF/A specification and an added content might be not tagged depending on the
IOcrEngineimplementation. To overrule this behavior one can overridevalidateInputPdfDocument(com.itextpdf.kernel.pdf.PdfDocument)with an empty implementation.Note that
OcrPdfCreatorProperties.setPageSize(com.itextpdf.kernel.geom.Rectangle),OcrPdfCreatorProperties.setScaleMode(ScaleMode)andOcrPdfCreatorProperties.setImageLayerName(String)have no effect for this method.- Parameters:
-
pdfDoc- PDF document with images to OCR - Throws:
-
com.itextpdf.io.exceptions.IOException- if an image cannot be extracted from a pdf -
PdfOcrException- in case of any other OCR error
-
makePdfSearchable
public void makePdfSearchable(com.itextpdf.kernel.pdf.PdfDocument pdfDoc, IOcrProcessProperties ocrProcessProperties) throws com.itextpdf.io.exceptions.IOException, PdfOcrException Performs OCR of all images in an input PDF document and adds recognized text on top of the images.By default, it does not allow to OCR PDF/A documents and tagged documents. The reason is that the result document might not comply with PDF/A specification and an added content might be not tagged depending on the
IOcrEngineimplementation. To overrule this behavior one can overridevalidateInputPdfDocument(com.itextpdf.kernel.pdf.PdfDocument)with an empty implementation.Note that
OcrPdfCreatorProperties.setPageSize(com.itextpdf.kernel.geom.Rectangle),OcrPdfCreatorProperties.setScaleMode(ScaleMode)andOcrPdfCreatorProperties.setImageLayerName(String)have no effect for this method.- Parameters:
-
pdfDoc- PDF document with images to OCR -
ocrProcessProperties- extra OCR process properties passed toOcrProcessContext - Throws:
-
com.itextpdf.io.exceptions.IOException- if an image cannot be extracted from a pdf -
PdfOcrException- in case of any other OCR error
-
validateInputPdfDocument
protected void validateInputPdfDocument(com.itextpdf.kernel.pdf.PdfDocument pdfDoc) Validates input PDF document.It checks that an input document is not tagged and not PDF/A. If you need to OCR tagged and/or PDF/A documents, override this method with empty implementation. In that case it would be best to use
makePdfSearchable(PdfDocument, IOcrProcessProperties)overload because there you can passPdfADocumentor PdfUADocument instance which will do the validation of the output document.- Parameters:
-
pdfDoc- a PDF document to check
-