Pdf2Data 5.0.1 API
Loading...
Searching...
No Matches
iText.Pdf2Data.Pdf2DataExtractor Class Reference

Pdf2DataExtractor is a class for extracting data from files. More...

Public Member Functions

virtual iText.Pdf2Data.Template.Template  GetTemplate ()
  Gets current template instance.
 
virtual IOcrEngine  GetOcrEngine ()
  Gets current OCR engine instance.
 
virtual RecognitionResultHolder  Extract (FileInfo targetPDF)
  Recognize the pdf file.
 
virtual RecognitionResultHolder  Extract (FileInfo targetFile, RecognitionProperties properties)
  Recognize the file.
 
virtual RecognitionResultHolder  Extract (Stream targetInputStream)
  Recognize the pdf file.
 
virtual RecognitionResultHolder  Extract (Stream targetInputStream, RecognitionProperties properties)
  Recognize the file.
 
virtual IDictionary< String, int?>  Check (FileInfo targetPDF)
  Recognize the pdf file and returns recognition results amount.
 
virtual IDictionary< String, int?>  Check (FileInfo targetFile, RecognitionProperties properties)
  Recognize the document and returns recognition results amount.
 
virtual IDictionary< String, int?>  Check (Stream targetInputStream)
  Recognize the pdf file and returns recognition results amount.
 
virtual IDictionary< String, int?>  Check (Stream targetInputStream, RecognitionProperties properties)
  Recognize the document and returns recognition results amount.
 

Static Public Member Functions

static iText.Pdf2Data.Pdf2DataExtractor  Create (FileInfo p2dFile)
  Creates instance of Pdf2DataExtractor from pdf2data template file.
 
static iText.Pdf2Data.Pdf2DataExtractor  Create (FileInfo p2dFile, OcrWithPostProcessingEngine ocrEngine)
  Creates instance of Pdf2DataExtractor from pdf2data template file with provided OCR engine.
 
static iText.Pdf2Data.Pdf2DataExtractor  CreateFromTemplateContentJson (Stream templateContentJsonStream)
  Creates instance of Pdf2DataExtractor from stream which contants pdf2data template content in JSON format.
 
static iText.Pdf2Data.Pdf2DataExtractor  CreateFromTemplateContentJson (Stream templateContentJsonStream, OcrWithPostProcessingEngine ocrEngine)
  Creates instance of Pdf2DataExtractor from stream which contants pdf2data template content in JSON format.
 

Detailed Description

Pdf2DataExtractor is a class for extracting data from files.

Pdf2DataExtractor is a class for extracting data from files.

To create instance of Pdf2DataExtractor to extract data from PDF file, use Create(System.IO.FileInfo).

To create instance of Pdf2DataExtractor to extract data from image, use Create(System.IO.FileInfo, OcrWithPostProcessingEngine).

To extract data from PDF file use Extract(System.IO.FileInfo) method.

To extract data from image use Extract(System.IO.FileInfo, RecognitionProperties) method with file type specified via RecognitionProperties instance.

Member Function Documentation

◆ Check() [1/4]

virtual IDictionary< String, int?> iText.Pdf2Data.Pdf2DataExtractor.Check ( FileInfo targetFile,
RecognitionProperties properties )
inlinevirtual

Recognize the document and returns recognition results amount.

Parameters
targetFile file for recognition
properties a RecognitionProperties instance
Returns
A System.Collections.IDictionary containing the recognition results as key-value pairs of strings and integers.

◆ Check() [2/4]

virtual IDictionary< String, int?> iText.Pdf2Data.Pdf2DataExtractor.Check ( FileInfo targetPDF )
inlinevirtual

Recognize the pdf file and returns recognition results amount.

Parameters
targetPDF pdf file for recognition
Returns
A System.Collections.IDictionary containing the recognition results as key-value pairs of strings and integers.

◆ Check() [3/4]

virtual IDictionary< String, int?> iText.Pdf2Data.Pdf2DataExtractor.Check ( Stream targetInputStream )
inlinevirtual

Recognize the pdf file and returns recognition results amount.

Parameters
targetInputStream input stream from pdf file for recognition
Returns
A System.Collections.IDictionary containing the recognition results as key-value pairs of strings and integers.

◆ Check() [4/4]

virtual IDictionary< String, int?> iText.Pdf2Data.Pdf2DataExtractor.Check ( Stream targetInputStream,
RecognitionProperties properties )
inlinevirtual

Recognize the document and returns recognition results amount.

Parameters
targetInputStream input stream from file for recognition
properties a RecognitionProperties instance
Returns
A System.Collections.IDictionary containing the recognition results as key-value pairs of strings and integers.

◆ Create() [1/2]

static iText.Pdf2Data.Pdf2DataExtractor iText.Pdf2Data.Pdf2DataExtractor.Create ( FileInfo p2dFile )
inlinestatic

Creates instance of Pdf2DataExtractor from pdf2data template file.

Creates instance of Pdf2DataExtractor from pdf2data template file. Note that template should be processed.

Parameters
p2dFile pdf2data template archive
Returns
a Pdf2DataExtractor instance

◆ Create() [2/2]

static iText.Pdf2Data.Pdf2DataExtractor iText.Pdf2Data.Pdf2DataExtractor.Create ( FileInfo p2dFile,
OcrWithPostProcessingEngine ocrEngine )
inlinestatic

Creates instance of Pdf2DataExtractor from pdf2data template file with provided OCR engine.

Creates instance of Pdf2DataExtractor from pdf2data template file with provided OCR engine. Note that template should be processed.

Parameters
p2dFile pdf2data template archive
ocrEngine OCR engine to be used for OCR involving recognitions. May be null if no OCR involving recognitions would be used.
Returns
a Pdf2DataExtractor instance

◆ CreateFromTemplateContentJson() [1/2]

static iText.Pdf2Data.Pdf2DataExtractor iText.Pdf2Data.Pdf2DataExtractor.CreateFromTemplateContentJson ( Stream templateContentJsonStream )
inlinestatic

Creates instance of Pdf2DataExtractor from stream which contants pdf2data template content in JSON format.

Creates instance of Pdf2DataExtractor from stream which contants pdf2data template content in JSON format. Note that template should be processed.

Parameters
templateContentJsonStream processed template content stream
Returns
a Pdf2DataExtractor instance

◆ CreateFromTemplateContentJson() [2/2]

static iText.Pdf2Data.Pdf2DataExtractor iText.Pdf2Data.Pdf2DataExtractor.CreateFromTemplateContentJson ( Stream templateContentJsonStream,
OcrWithPostProcessingEngine ocrEngine )
inlinestatic

Creates instance of Pdf2DataExtractor from stream which contants pdf2data template content in JSON format.

Creates instance of Pdf2DataExtractor from stream which contants pdf2data template content in JSON format. Note that template should be processed.

Parameters
templateContentJsonStream processed template content stream
ocrEngine OCR engine to be used for OCR involving recognitions. May be null if no OCR involving recognitions would be used.
Returns
a Pdf2DataExtractor instance

◆ Extract() [1/4]

virtual RecognitionResultHolder iText.Pdf2Data.Pdf2DataExtractor.Extract ( FileInfo targetFile,
RecognitionProperties properties )
inlinevirtual

Recognize the file.

Parameters
targetFile file for recognition
properties a RecognitionProperties instance
Returns

RecognitionResultHolder instance

◆ Extract() [2/4]

virtual RecognitionResultHolder iText.Pdf2Data.Pdf2DataExtractor.Extract ( FileInfo targetPDF )
inlinevirtual

Recognize the pdf file.

Parameters
targetPDF pdf file for recognition
Returns

RecognitionResultHolder instance

◆ Extract() [3/4]

virtual RecognitionResultHolder iText.Pdf2Data.Pdf2DataExtractor.Extract ( Stream targetInputStream )
inlinevirtual

Recognize the pdf file.

Parameters
targetInputStream input stream from pdf file for recognition
Returns

RecognitionResultHolder instance

◆ Extract() [4/4]

virtual RecognitionResultHolder iText.Pdf2Data.Pdf2DataExtractor.Extract ( Stream targetInputStream,
RecognitionProperties properties )
inlinevirtual

Recognize the file.

Parameters
targetInputStream input stream from file for recognition
properties a RecognitionProperties instance
Returns

RecognitionResultHolder instance

◆ GetOcrEngine()

virtual IOcrEngine iText.Pdf2Data.Pdf2DataExtractor.GetOcrEngine ( )
inlinevirtual

Gets current OCR engine instance.

Returns
current OCR engine instance.

◆ GetTemplate()

virtual iText.Pdf2Data.Template.Template iText.Pdf2Data.Pdf2DataExtractor.GetTemplate ( )
inlinevirtual

Gets current template instance.

Returns
current template instance