pdfOCR 3.0.2 API
iText.Pdfocr.IOcrEngine Interface Reference

IOcrEngine interface is used for instantiating new OcrReader objects. More...

Inheritance diagram for iText.Pdfocr.IOcrEngine:
iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine iText.Pdfocr.Tesseract4.Tesseract4ExecutableOcrEngine iText.Pdfocr.Tesseract4.Tesseract4LibOcrEngine

Public Member Functions

IDictionary< int, IList< TextInfo > >  DoImageOcr (FileInfo input)
  Reads data from the provided input image file and returns retrieved data in the format described below. More...
 
IDictionary< int, IList< TextInfo > >  DoImageOcr (FileInfo input, OcrProcessContext ocrProcessContext)
  Reads data from the provided input image file and returns retrieved data in the format described below. More...
 
void  CreateTxtFile (IList< FileInfo > inputImages, FileInfo txtFile)
  Performs OCR using provided IOcrEngine for the given list of input images and saves output to a text file using provided path. More...
 
void  CreateTxtFile (IList< FileInfo > inputImages, FileInfo txtFile, OcrProcessContext ocrProcessContext)
  Performs OCR using provided IOcrEngine for the given list of input images and saves output to a text file using provided path. More...
 

Detailed Description

IOcrEngine interface is used for instantiating new OcrReader objects.

IOcrEngine interface is used for instantiating new OcrReader objects. IOcrEngine interface provides possibility to perform OCR, to read data from input files and to return the contained text in the required format.

Member Function Documentation

◆ CreateTxtFile() [1/2]

void iText.Pdfocr.IOcrEngine.CreateTxtFile ( IList< FileInfo >  inputImages,
FileInfo  txtFile 
)

Performs OCR using provided IOcrEngine for the given list of input images and saves output to a text file using provided path.

Performs OCR using provided IOcrEngine for the given list of input images and saves output to a text file using provided path. Note that a human reading order is not guaranteed due to possible specifics of input images (multi column layout, tables etc)

Parameters
inputImages

System.Collections.IList of images to be OCRed

Parameters
txtFile file to be created

Implemented in iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine.

◆ CreateTxtFile() [2/2]

void iText.Pdfocr.IOcrEngine.CreateTxtFile ( IList< FileInfo >  inputImages,
FileInfo  txtFile,
OcrProcessContext  ocrProcessContext 
)

Performs OCR using provided IOcrEngine for the given list of input images and saves output to a text file using provided path.

Performs OCR using provided IOcrEngine for the given list of input images and saves output to a text file using provided path. Note that a human reading order is not guaranteed due to possible specifics of input images (multi column layout, tables etc)

Parameters
inputImages

System.Collections.IList of images to be OCRed

Parameters
txtFile file to be created
ocrProcessContext ocr processing context

Implemented in iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine.

◆ DoImageOcr() [1/2]

IDictionaryTextInfo> > iText.Pdfocr.IOcrEngine.DoImageOcr ( FileInfo  input )

Reads data from the provided input image file and returns retrieved data in the format described below.

Parameters
input input image System.IO.FileInfo
Returns

System.Collections.IDictionary where key is int? representing the number of the page and value is System.Collections.IList of TextInfo elements where each TextInfo element contains a word or a line and its 4 coordinates(bbox)

Implemented in iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine.

◆ DoImageOcr() [2/2]

IDictionaryTextInfo> > iText.Pdfocr.IOcrEngine.DoImageOcr ( FileInfo  input,
OcrProcessContext  ocrProcessContext 
)

Reads data from the provided input image file and returns retrieved data in the format described below.

Parameters
input input image System.IO.FileInfo
ocrProcessContext ocr processing context
Returns

System.Collections.IDictionary where key is int? representing the number of the page and value is System.Collections.IList of TextInfo elements where each TextInfo element contains a word or a line and its 4 coordinates(bbox)

Implemented in iText.Pdfocr.Tesseract4.AbstractTesseract4OcrEngine.