Class OcrPdfCreator
OcrPdfCreator
is the class that creates PDF documents containing input images and text that was recognized using provided IOcrEngine
.
OcrPdfCreator
provides possibilities to set list of input images to be used for OCR, to set scaling mode for images, to set color of text in output PDF document, to set fixed size of the PDF document's page and to perform OCR using given images and to return PdfDocument
as result. OCR is based on the provided IOcrEngine
(e.g. tesseract reader). This parameter is obligatory and it should be provided in constructor or using setter.
-
Constructor Summary
ConstructorsConstructorDescriptionOcrPdfCreator
(IOcrEngine ocrEngine) Creates a newOcrPdfCreator
instance.OcrPdfCreator
(IOcrEngine ocrEngine, OcrPdfCreatorProperties ocrPdfCreatorProperties) Creates a newOcrPdfCreator
instance. -
Method Summary
Modifier and TypeMethodDescriptionfinal com.itextpdf.kernel.pdf.PdfDocument
Performs OCR with set parameters using providedIOcrEngine
and creates PDF using providedPdfWriter
.final com.itextpdf.kernel.pdf.PdfDocument
createPdf
(List<File> inputImages, com.itextpdf.kernel.pdf.PdfWriter pdfWriter, com.itextpdf.kernel.pdf.DocumentProperties documentProperties) Performs OCR with set parameters using providedIOcrEngine
and creates PDF using providedPdfWriter
.final com.itextpdf.kernel.pdf.PdfDocument
createPdf
(List<File> inputImages, com.itextpdf.kernel.pdf.PdfWriter pdfWriter, com.itextpdf.kernel.pdf.DocumentProperties documentProperties, IOcrProcessProperties ocrProcessProperties) Performs OCR with set parameters using providedIOcrEngine
and creates PDF using providedPdfWriter
.final com.itextpdf.kernel.pdf.PdfDocument
createPdfA
(List<File> inputImages, com.itextpdf.kernel.pdf.PdfWriter pdfWriter, com.itextpdf.kernel.pdf.DocumentProperties documentProperties, com.itextpdf.kernel.pdf.PdfOutputIntent pdfOutputIntent) Performs OCR with set parameters using providedIOcrEngine
and creates PDF using providedPdfWriter
,DocumentProperties
andPdfOutputIntent
.final com.itextpdf.kernel.pdf.PdfDocument
createPdfA
(List<File> inputImages, com.itextpdf.kernel.pdf.PdfWriter pdfWriter, com.itextpdf.kernel.pdf.DocumentProperties documentProperties, com.itextpdf.kernel.pdf.PdfOutputIntent pdfOutputIntent, IOcrProcessProperties ocrProcessProperties) Performs OCR with set parameters using providedIOcrEngine
and creates PDF using providedPdfWriter
,DocumentProperties
andPdfOutputIntent
.final com.itextpdf.kernel.pdf.PdfDocument
createPdfA
(List<File> inputImages, com.itextpdf.kernel.pdf.PdfWriter pdfWriter, com.itextpdf.kernel.pdf.PdfOutputIntent pdfOutputIntent) Performs OCR with set parameters using providedIOcrEngine
and creates PDF using providedPdfWriter
andPdfOutputIntent
.void
createPdfAFile
(List<File> inputImages, File outPdfFile, com.itextpdf.kernel.pdf.PdfOutputIntent pdfOutputIntent) Performs OCR with set parameters using providedIOcrEngine
and creates PDF using providedFile
andPdfOutputIntent
.void
createPdfFile
(List<File> inputImages, File outPdfFile) Performs OCR with set parameters using providedIOcrEngine
and creates PDF using providedFile
.final IOcrEngine
Gets usedIOcrEngine
reader object to perform OCR.final OcrPdfCreatorProperties
Gets properties forOcrPdfCreator
.void
makePdfSearchable
(com.itextpdf.kernel.pdf.PdfDocument pdfDoc) Performs OCR of all images in an input PDF document and adds recognized text on top of the images.void
makePdfSearchable
(com.itextpdf.kernel.pdf.PdfDocument pdfDoc, IOcrProcessProperties ocrProcessProperties) Performs OCR of all images in an input PDF document and adds recognized text on top of the images.void
makePdfSearchable
(File inputPdf, File outputPdf) Performs OCR of all images in an input PDF file and generates searchable PDF.void
makePdfSearchable
(File inputPdf, File outputPdf, IOcrProcessProperties ocrProcessProperties) Performs OCR of all images in an input PDF file and generates searchable PDF.final void
setOcrEngine
(IOcrEngine reader) SetsIOcrEngine
reader object to perform OCR.final void
setOcrPdfCreatorProperties
(OcrPdfCreatorProperties ocrPdfCreatorProperties) Sets properties forOcrPdfCreator
.protected void
validateInputPdfDocument
(com.itextpdf.kernel.pdf.PdfDocument pdfDoc) Validates input PDF document.
-
Constructor Details
-
OcrPdfCreator
Creates a newOcrPdfCreator
instance.- Parameters:
-
ocrEngine
-IOcrEngine
selected OCR Reader
-
OcrPdfCreator
Creates a newOcrPdfCreator
instance.- Parameters:
-
ocrEngine
- selected OCR ReaderIOcrEngine
-
ocrPdfCreatorProperties
- set of properties forOcrPdfCreator
-
-
Method Details
-
getOcrPdfCreatorProperties
Gets properties forOcrPdfCreator
.- Returns:
-
set properties
OcrPdfCreatorProperties
-
setOcrPdfCreatorProperties
Sets properties forOcrPdfCreator
.- Parameters:
-
ocrPdfCreatorProperties
- set of propertiesOcrPdfCreatorProperties
forOcrPdfCreator
-
createPdfA
public final com.itextpdf.kernel.pdf.PdfDocument createPdfA(List<File> inputImages, com.itextpdf.kernel.pdf.PdfWriter pdfWriter, com.itextpdf.kernel.pdf.DocumentProperties documentProperties, com.itextpdf.kernel.pdf.PdfOutputIntent pdfOutputIntent, IOcrProcessProperties ocrProcessProperties) throws PdfOcrException Performs OCR with set parameters using providedIOcrEngine
and creates PDF using providedPdfWriter
,DocumentProperties
andPdfOutputIntent
. PDF/A-3u document will be created if providedPdfOutputIntent
is not null.NOTE that after executing this method you will have a product event from the both itextcore and pdfOcr. Therefore, use this method only if you need to work with the generated
PdfDocument
. If you don't need this, use thecreatePdfAFile(java.util.List
method. In this case, only the pdfOcr event will be dispatched., java.io.File, com.itextpdf.kernel.pdf.PdfOutputIntent) - Parameters:
-
inputImages
-List
of images to be OCRed -
pdfWriter
- thePdfWriter
object to write final PDF document to -
documentProperties
- document properties -
pdfOutputIntent
-PdfOutputIntent
for PDF/A-3u document -
ocrProcessProperties
- extra OCR process properties passed toOcrProcessContext
- Returns:
-
result PDF/A-3u
PdfDocument
object - Throws:
-
PdfOcrException
- if it was not possible to read provided or default font
-
createPdfA
public final com.itextpdf.kernel.pdf.PdfDocument createPdfA(List<File> inputImages, com.itextpdf.kernel.pdf.PdfWriter pdfWriter, com.itextpdf.kernel.pdf.PdfOutputIntent pdfOutputIntent) throws PdfOcrException Performs OCR with set parameters using providedIOcrEngine
and creates PDF using providedPdfWriter
andPdfOutputIntent
. PDF/A-3u document will be created if providedPdfOutputIntent
is not null.NOTE that after executing this method you will have a product event from the both itextcore and pdfOcr. Therefore, use this method only if you need to work with the generated
PdfDocument
. If you don't need this, use thecreatePdfAFile(java.util.List
method. In this case, only the pdfOcr event will be dispatched., java.io.File, com.itextpdf.kernel.pdf.PdfOutputIntent) - Parameters:
-
inputImages
-List
of images to be OCRed -
pdfWriter
- thePdfWriter
object to write final PDF document to -
pdfOutputIntent
-PdfOutputIntent
for PDF/A-3u document - Returns:
-
result PDF/A-3u
PdfDocument
object - Throws:
-
PdfOcrException
- if it was not possible to read provided or default font
-
createPdfA
public final com.itextpdf.kernel.pdf.PdfDocument createPdfA(List<File> inputImages, com.itextpdf.kernel.pdf.PdfWriter pdfWriter, com.itextpdf.kernel.pdf.DocumentProperties documentProperties, com.itextpdf.kernel.pdf.PdfOutputIntent pdfOutputIntent) throws PdfOcrException Performs OCR with set parameters using providedIOcrEngine
and creates PDF using providedPdfWriter
,DocumentProperties
andPdfOutputIntent
. PDF/A-3u document will be created if providedPdfOutputIntent
is not null.NOTE that after executing this method you will have a product event from the both itextcore and pdfOcr. Therefore, use this method only if you need to work with the generated
PdfDocument
. If you don't need this, use thecreatePdfAFile(java.util.List
method. In this case, only the pdfOcr event will be dispatched., java.io.File, com.itextpdf.kernel.pdf.PdfOutputIntent) - Parameters:
-
inputImages
-List
of images to be OCRed -
pdfWriter
- thePdfWriter
object to write final PDF document to -
documentProperties
- document properties -
pdfOutputIntent
-PdfOutputIntent
for PDF/A-3u document - Returns:
-
result PDF/A-3u
PdfDocument
object - Throws:
-
PdfOcrException
- if it was not possible to read provided or default font
-
createPdf
public final com.itextpdf.kernel.pdf.PdfDocument createPdf(List<File> inputImages, com.itextpdf.kernel.pdf.PdfWriter pdfWriter, com.itextpdf.kernel.pdf.DocumentProperties documentProperties, IOcrProcessProperties ocrProcessProperties) throws PdfOcrException Performs OCR with set parameters using providedIOcrEngine
and creates PDF using providedPdfWriter
.NOTE that after executing this method you will have a product event from the both itextcore and pdfOcr. Therefore, use this method only if you need to work with the generated
PdfDocument
. If you don't need this, use thecreatePdfFile(java.util.List
method. In this case, only the pdfOcr event will be dispatched., java.io.File) - Parameters:
-
inputImages
-List
of images to be OCRed -
pdfWriter
- thePdfWriter
object to write final PDF document to -
documentProperties
- document properties -
ocrProcessProperties
- extra OCR process properties passed to OcrProcessContext - Returns:
-
result
PdfDocument
object - Throws:
-
PdfOcrException
- if provided font is incorrect
-
createPdf
public final com.itextpdf.kernel.pdf.PdfDocument createPdf(List<File> inputImages, com.itextpdf.kernel.pdf.PdfWriter pdfWriter, com.itextpdf.kernel.pdf.DocumentProperties documentProperties) throws PdfOcrException Performs OCR with set parameters using providedIOcrEngine
and creates PDF using providedPdfWriter
.NOTE that after executing this method you will have a product event from the both itextcore and pdfOcr. Therefore, use this method only if you need to work with the generated
PdfDocument
. If you don't need this, use thecreatePdfFile(java.util.List
method. In this case, only the pdfOcr event will be dispatched., java.io.File) - Parameters:
-
inputImages
-List
of images to be OCRed -
pdfWriter
- thePdfWriter
object to write final PDF document to -
documentProperties
- document properties - Returns:
-
result
PdfDocument
object - Throws:
-
PdfOcrException
- if provided font is incorrect
-
createPdf
public final com.itextpdf.kernel.pdf.PdfDocument createPdf(List<File> inputImages, com.itextpdf.kernel.pdf.PdfWriter pdfWriter) throws PdfOcrException Performs OCR with set parameters using providedIOcrEngine
and creates PDF using providedPdfWriter
.NOTE that after executing this method you will have a product event from the both itextcore and pdfOcr. Therefore, use this method only if you need to work with the generated
PdfDocument
. If you don't need this, use thecreatePdfFile(java.util.List
method. In this case, only the pdfOcr event will be dispatched., java.io.File) - Parameters:
-
inputImages
-List
of images to be OCRed -
pdfWriter
- thePdfWriter
object to write final PDF document to - Returns:
-
result
PdfDocument
object - Throws:
-
PdfOcrException
- if provided font is incorrect
-
createPdfFile
public void createPdfFile(List<File> inputImages, File outPdfFile) throws PdfOcrException, IOException Performs OCR with set parameters using providedIOcrEngine
and creates PDF using providedFile
.- Parameters:
-
inputImages
-List
of images to be OCRed -
outPdfFile
- theFile
object to write final PDF document to - Throws:
-
IOException
- signals that an I/O exception of some sort has occurred. -
PdfOcrException
- if it was not possible to read provided or default font
-
createPdfAFile
public void createPdfAFile(List<File> inputImages, File outPdfFile, com.itextpdf.kernel.pdf.PdfOutputIntent pdfOutputIntent) throws PdfOcrException, IOException Performs OCR with set parameters using providedIOcrEngine
and creates PDF using providedFile
andPdfOutputIntent
. PDF/A-3u document will be created if providedPdfOutputIntent
is not null.- Parameters:
-
inputImages
-List
of images to be OCRed -
outPdfFile
- theFile
object to write final PDF document to -
pdfOutputIntent
-PdfOutputIntent
for PDF/A-3u document - Throws:
-
IOException
- signals that an I/O exception of some sort has occurred -
PdfOcrException
- if it was not possible to read provided or default font
-
getOcrEngine
Gets usedIOcrEngine
reader object to perform OCR.- Returns:
-
selected
IOcrEngine
instance
-
setOcrEngine
SetsIOcrEngine
reader object to perform OCR.- Parameters:
-
reader
- selectedIOcrEngine
instance
-
makePdfSearchable
public void makePdfSearchable(File inputPdf, File outputPdf) throws com.itextpdf.io.exceptions.IOException, PdfOcrException Performs OCR of all images in an input PDF file and generates searchable PDF.By default, it does not allow to OCR PDF/A documents and tagged documents. The reason is that the result document might not comply with PDF/A specification and an added content might be not tagged depending on the
IOcrEngine
implementation. To overrule this behavior one can overridevalidateInputPdfDocument(com.itextpdf.kernel.pdf.PdfDocument)
with an empty implementation.Note that
OcrPdfCreatorProperties.setPageSize(com.itextpdf.kernel.geom.Rectangle)
,OcrPdfCreatorProperties.setScaleMode(ScaleMode)
andOcrPdfCreatorProperties.setImageLayerName(String)
have no effect for this method.- Parameters:
-
inputPdf
- PDF file to OCR -
outputPdf
- searchable PDF with the recognized text on top of the images - Throws:
-
com.itextpdf.io.exceptions.IOException
- if an image cannot be extracted from a PDF file -
PdfOcrException
- in case of any other OCR error
-
makePdfSearchable
public void makePdfSearchable(File inputPdf, File outputPdf, IOcrProcessProperties ocrProcessProperties) throws com.itextpdf.io.exceptions.IOException, PdfOcrException Performs OCR of all images in an input PDF file and generates searchable PDF.By default, it does not allow to OCR PDF/A documents and tagged documents. The reason is that the result document might not comply with PDF/A specification and an added content might be not tagged depending on the
IOcrEngine
implementation. To overrule this behavior one can overridevalidateInputPdfDocument(com.itextpdf.kernel.pdf.PdfDocument)
with an empty implementation.Note that
OcrPdfCreatorProperties.setPageSize(com.itextpdf.kernel.geom.Rectangle)
,OcrPdfCreatorProperties.setScaleMode(ScaleMode)
andOcrPdfCreatorProperties.setImageLayerName(String)
have no effect for this method.- Parameters:
-
inputPdf
- PDF file to OCR -
outputPdf
- searchable PDF with the recognized text on top of the images -
ocrProcessProperties
- extra OCR process properties passed toOcrProcessContext
. - Throws:
-
com.itextpdf.io.exceptions.IOException
- if an image cannot be extracted from a pdf -
PdfOcrException
- in case of any other OCR error
-
makePdfSearchable
public void makePdfSearchable(com.itextpdf.kernel.pdf.PdfDocument pdfDoc) throws com.itextpdf.io.exceptions.IOException, PdfOcrException Performs OCR of all images in an input PDF document and adds recognized text on top of the images.By default, it does not allow to OCR PDF/A documents and tagged documents. The reason is that the result document might not comply with PDF/A specification and an added content might be not tagged depending on the
IOcrEngine
implementation. To overrule this behavior one can overridevalidateInputPdfDocument(com.itextpdf.kernel.pdf.PdfDocument)
with an empty implementation.Note that
OcrPdfCreatorProperties.setPageSize(com.itextpdf.kernel.geom.Rectangle)
,OcrPdfCreatorProperties.setScaleMode(ScaleMode)
andOcrPdfCreatorProperties.setImageLayerName(String)
have no effect for this method.- Parameters:
-
pdfDoc
- PDF document with images to OCR - Throws:
-
com.itextpdf.io.exceptions.IOException
- if an image cannot be extracted from a pdf -
PdfOcrException
- in case of any other OCR error
-
makePdfSearchable
public void makePdfSearchable(com.itextpdf.kernel.pdf.PdfDocument pdfDoc, IOcrProcessProperties ocrProcessProperties) throws com.itextpdf.io.exceptions.IOException, PdfOcrException Performs OCR of all images in an input PDF document and adds recognized text on top of the images.By default, it does not allow to OCR PDF/A documents and tagged documents. The reason is that the result document might not comply with PDF/A specification and an added content might be not tagged depending on the
IOcrEngine
implementation. To overrule this behavior one can overridevalidateInputPdfDocument(com.itextpdf.kernel.pdf.PdfDocument)
with an empty implementation.Note that
OcrPdfCreatorProperties.setPageSize(com.itextpdf.kernel.geom.Rectangle)
,OcrPdfCreatorProperties.setScaleMode(ScaleMode)
andOcrPdfCreatorProperties.setImageLayerName(String)
have no effect for this method.- Parameters:
-
pdfDoc
- PDF document with images to OCR -
ocrProcessProperties
- extra OCR process properties passed toOcrProcessContext
- Throws:
-
com.itextpdf.io.exceptions.IOException
- if an image cannot be extracted from a pdf -
PdfOcrException
- in case of any other OCR error
-
validateInputPdfDocument
protected void validateInputPdfDocument(com.itextpdf.kernel.pdf.PdfDocument pdfDoc) Validates input PDF document.It checks that an input document is not tagged and not PDF/A. If you need to OCR tagged and/or PDF/A documents, override this method with empty implementation. In that case it would be best to use
makePdfSearchable(PdfDocument, IOcrProcessProperties)
overload because there you can passPdfADocument
or PdfUADocument instance which will do the validation of the output document.- Parameters:
-
pdfDoc
- a PDF document to check
-