pdfOCR 4.0.1 API
|
The implementation of iText.Pdfocr.IOcrEngine. More...
Public Member Functions |
|
AbstractTesseract4OcrEngine (Tesseract4OcrEngineProperties tesseract4OcrEngineProperties) | |
Creates a new Tesseract4OcrEngineProperties instance based on another Tesseract4OcrEngineProperties instance (copy constructor). More... |
|
virtual void | DoTesseractOcr (FileInfo inputImage, FileInfo outputFile, OutputFormat outputFormat) |
Performs tesseract OCR for the first (or for the only) image page. More... |
|
virtual void | DoTesseractOcr (FileInfo inputImage, FileInfo outputFile, OutputFormat outputFormat, OcrProcessContext ocrProcessContext) |
Performs tesseract OCR for the first (or for the only) image page. More... |
|
virtual void | CreateTxtFile (IList< FileInfo > inputImages, FileInfo txtFile) |
Performs OCR using provided iText.Pdfocr.IOcrEngine for the given list of input images and saves output to a text file using provided path. More... |
|
virtual void | CreateTxtFile (IList< FileInfo > inputImages, FileInfo txtFile, OcrProcessContext ocrProcessContext) |
Performs OCR using provided iText.Pdfocr.IOcrEngine for the given list of input images and saves output to a text file using provided path. More... |
|
Tesseract4OcrEngineProperties | GetTesseract4OcrEngineProperties () |
Gets properties for AbstractTesseract4OcrEngine. More... |
|
void | SetTesseract4OcrEngineProperties (Tesseract4OcrEngineProperties tesseract4OcrEngineProperties) |
Sets properties for AbstractTesseract4OcrEngine. More... |
|
String | GetLanguagesAsString () |
Gets list of languages concatenated with "+" symbol to a string in format required by tesseract. More... |
|
IDictionary< int, IList< TextInfo > > | DoImageOcr (FileInfo input) |
Reads data from the provided input image file and returns retrieved data in the format described below. More... |
|
IDictionary< int, IList< TextInfo > > | DoImageOcr (FileInfo input, OcrProcessContext ocrProcessContext) |
Reads data from the provided input image file and returns retrieved data in the format described below. More... |
|
String | DoImageOcr (FileInfo input, OutputFormat outputFormat, OcrProcessContext ocrProcessContext) |
Reads data from the provided input image file and returns retrieved data as string. More... |
|
String | DoImageOcr (FileInfo input, OutputFormat outputFormat) |
Reads data from the provided input image file and returns retrieved data as string. More... |
|
virtual bool | IsWindows () |
Checks current os type. More... |
|
virtual String | IdentifyOsType () |
Identifies type of current OS and return it (win, linux). More... |
|
virtual void | ValidateLanguages (IList< String > languagesList) |
Validates list of provided languages and checks if they all exist in given tess data directory. More... |
|
virtual PdfOcrMetaInfoContainer | GetMetaInfoContainer () |
Gets the container with meta info. More... |
|
virtual ProductData | GetProductData () |
Gets object containing information about the product. More... |
|
virtual bool | IsTaggingSupported () |
Checks whether tagging is supported by the OCR engine. More... |
|
The implementation of iText.Pdfocr.IOcrEngine.
The implementation of iText.Pdfocr.IOcrEngine.
This class provides possibilities to perform OCR, to read data from input files and to return contained text in the required format. Also, there are possibilities to use features of "tesseract" (optical character recognition engine for various operating systems).
|
inline |
Creates a new Tesseract4OcrEngineProperties instance based on another Tesseract4OcrEngineProperties instance (copy constructor).
tesseract4OcrEngineProperties | the other Tesseract4OcrEngineProperties instance |
|
inlinevirtual |
Performs OCR using provided iText.Pdfocr.IOcrEngine for the given list of input images and saves output to a text file using provided path.
inputImages |
System.Collections.IList
txtFile | file to be created |
Implements iText.Pdfocr.IOcrEngine.
|
inlinevirtual |
Performs OCR using provided iText.Pdfocr.IOcrEngine for the given list of input images and saves output to a text file using provided path.
inputImages |
System.Collections.IList
txtFile | file to be created |
ocrProcessContext | ocr process context |
Implements iText.Pdfocr.IOcrEngine.
|
inline |
Reads data from the provided input image file and returns retrieved data in the format described below.
input | input image System.IO.FileInfo |
System.Collections.IDictionary
Implements iText.Pdfocr.IOcrEngine.
|
inline |
Reads data from the provided input image file and returns retrieved data in the format described below.
input | input image System.IO.FileInfo |
ocrProcessContext | ocr process context |
System.Collections.IDictionary
Implements iText.Pdfocr.IOcrEngine.
|
inline |
Reads data from the provided input image file and returns retrieved data as string.
input | input image System.IO.FileInfo |
outputFormat | return OutputFormat result |
|
inline |
Reads data from the provided input image file and returns retrieved data as string.
input | input image System.IO.FileInfo |
outputFormat | return OutputFormat result |
ocrProcessContext | ocr process context |
|
inlinevirtual |
Performs tesseract OCR for the first (or for the only) image page.
inputImage | input image System.IO.FileInfo |
outputFile | output file for the result for the first page |
outputFormat | selected OutputFormat for tesseract |
|
inlinevirtual |
Performs tesseract OCR for the first (or for the only) image page.
inputImage | input image System.IO.FileInfo |
outputFile | output file for the result for the first page |
outputFormat | selected OutputFormat for tesseract |
ocrProcessContext | ocr process context |
|
inline |
Gets list of languages concatenated with "+" symbol to a string in format required by tesseract.
System.String of concatenated languages
|
inlinevirtual |
Gets the container with meta info.
Implements iText.Pdfocr.IProductAware.
|
inlinevirtual |
Gets object containing information about the product.
Implements iText.Pdfocr.IProductAware.
|
inline |
Gets properties for AbstractTesseract4OcrEngine.
|
inlinevirtual |
Identifies type of current OS and return it (win, linux).
|
inlinevirtual |
Checks whether tagging is supported by the OCR engine.
true
if tagging is supported by the engine, false
otherwise
Implements iText.Pdfocr.IOcrEngine.
|
inlinevirtual |
Checks current os type.
|
inline |
Sets properties for AbstractTesseract4OcrEngine.
tesseract4OcrEngineProperties | set of properties Tesseract4OcrEngineProperties for AbstractTesseract4OcrEngine |
|
inlinevirtual |
Validates list of provided languages and checks if they all exist in given tess data directory.
languagesList |
System.Collections.IList