Pdf2Data 5.0.1 API
Loading...
Searching...
No Matches
iText.Pdf2Data.OcrWithPostProcessingEngine Class Reference

Engine which will apply post processors (if present) to results of base ocr engine. More...

Inheritance diagram for iText.Pdf2Data.OcrWithPostProcessingEngine:
iText.Pdf2Data.Ocr.Engine.Tesseract4BasedEngine

Public Member Functions

  OcrWithPostProcessingEngine (IOcrEngine baseOcrEngine, IList< IOcrEnginePostProcessor > postProcessors, bool isTaggingSupported)
  Creates new OcrWithPostProcessingEngine instance.
 
virtual bool  IsTaggingSupported ()
  Gets whether results will be tagged or not.
 
virtual IDictionary< int, IList< TextInfo > >  DoImageOcr (FileInfo input)
  Performs ocr with post-processing to your input file.
 
virtual IDictionary< int, IList< TextInfo > >  DoImageOcr (FileInfo input, OcrProcessContext ocrProcessContext)
  Performs ocr with post-processing to your input file.
 
virtual void  CreateTxtFile (IList< FileInfo > inputImages, FileInfo txtFile)
  Performs OCR using provided iText.Pdfocr.IOcrEngine for the given list of input images and saves output to a text file using provided path.
 
virtual void  CreateTxtFile (IList< FileInfo > inputImages, FileInfo txtFile, OcrProcessContext ocrProcessContext)
  Performs OCR using provided iText.Pdfocr.IOcrEngine for the given list of input images and saves output to a text file using provided path.
 

Detailed Description

Engine which will apply post processors (if present) to results of base ocr engine.

Constructor & Destructor Documentation

◆ OcrWithPostProcessingEngine()

iText.Pdf2Data.OcrWithPostProcessingEngine.OcrWithPostProcessingEngine ( IOcrEngine baseOcrEngine,
IList< IOcrEnginePostProcessor > postProcessors,
bool isTaggingSupported )
inline

Creates new OcrWithPostProcessingEngine instance.

Parameters
baseOcrEngine base ocr engine which implements iText.Pdfocr.IOcrEngine
postProcessors

System.Collections.IList of IOcrEnginePostProcessor

Parameters
isTaggingSupported if true results will be tagged, otherwise tag structure will be missing

Member Function Documentation

◆ CreateTxtFile() [1/2]

virtual void iText.Pdf2Data.OcrWithPostProcessingEngine.CreateTxtFile ( IList< FileInfo > inputImages,
FileInfo txtFile )
inlinevirtual

Performs OCR using provided iText.Pdfocr.IOcrEngine for the given list of input images and saves output to a text file using provided path.

Performs OCR using provided iText.Pdfocr.IOcrEngine for the given list of input images and saves output to a text file using provided path. Note that a human reading order is not guaranteed due to possible specifics of input images (multicolumn layout, tables etc.).

Parameters
inputImages

System.Collections.IList of images to be OCRed

Parameters
txtFile file to be created

◆ CreateTxtFile() [2/2]

virtual void iText.Pdf2Data.OcrWithPostProcessingEngine.CreateTxtFile ( IList< FileInfo > inputImages,
FileInfo txtFile,
OcrProcessContext ocrProcessContext )
inlinevirtual

Performs OCR using provided iText.Pdfocr.IOcrEngine for the given list of input images and saves output to a text file using provided path.

Performs OCR using provided iText.Pdfocr.IOcrEngine for the given list of input images and saves output to a text file using provided path. Note that a human reading order is not guaranteed due to possible specifics of input images (multicolumn layout, tables etc.).

Parameters
inputImages

System.Collections.IList of images to be OCRed

Parameters
txtFile file to be created
ocrProcessContext ocr processing context

◆ DoImageOcr() [1/2]

virtual IDictionary< int, IList< TextInfo > > iText.Pdf2Data.OcrWithPostProcessingEngine.DoImageOcr ( FileInfo input )
inlinevirtual

Performs ocr with post-processing to your input file.

Parameters
input input image System.IO.FileInfo
Returns

System.Collections.IDictionary where key is int? representing the number of the page and value is System.Collections.IList of iText.Pdfocr.TextInfo elements where each iText.Pdfocr.TextInfo element contains a word or a line and its 4 coordinates (bbox)

◆ DoImageOcr() [2/2]

virtual IDictionary< int, IList< TextInfo > > iText.Pdf2Data.OcrWithPostProcessingEngine.DoImageOcr ( FileInfo input,
OcrProcessContext ocrProcessContext )
inlinevirtual

Performs ocr with post-processing to your input file.

Parameters
input input image System.IO.FileInfo
ocrProcessContext ocr processing context
Returns

System.Collections.IDictionary where key is int? representing the number of the page and value is System.Collections.IList of iText.Pdfocr.TextInfo elements where each iText.Pdfocr.TextInfo element contains a word or a line and its 4 coordinates (bbox)

◆ IsTaggingSupported()

virtual bool iText.Pdf2Data.OcrWithPostProcessingEngine.IsTaggingSupported ( )
inlinevirtual

Gets whether results will be tagged or not.

Returns

true if results will be tagged, false otherwise ;