Pdf2DataExtractor (pdf2data 5.0.0 API)

java.lang.Object
- com.itextpdf.pdf2data.Pdf2DataExtractor

```
public class Pdf2DataExtractor
extends Object
```
Pdf2DataExtractor is a class for extracting data from files.
To create instance of Pdf2DataExtractor to extract data from PDF file, use create(File).

To create instance of Pdf2DataExtractor to extract data from image, use create(File, OcrWithPostProcessingEngine).

To extract data from PDF file use extract(File) method.

To extract data from image use extract(File, RecognitionProperties) method with file type specified via RecognitionProperties instance.

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`Map<String,Integer>`	`check(File targetPDF)` Recognize the pdf file and returns recognition results amount.
`Map<String,Integer>`	`check(File targetFile, RecognitionProperties properties)` Recognize the document and returns recognition results amount.
`Map<String,Integer>`	`check(InputStream targetInputStream)` Recognize the pdf file and returns recognition results amount.
`Map<String,Integer>`	`check(InputStream targetInputStream, RecognitionProperties properties)` Recognize the document and returns recognition results amount.
`static Pdf2DataExtractor`	`create(File p2dFile)` Creates instance of `Pdf2DataExtractor` from pdf2data template file.
`static Pdf2DataExtractor`	`create(File p2dFile, OcrWithPostProcessingEngine ocrEngine)` Creates instance of `Pdf2DataExtractor` from pdf2data template file with provided OCR engine.
`static Pdf2DataExtractor`	`createFromTemplateContentJson(InputStream templateContentJsonStream)` Creates instance of `Pdf2DataExtractor` from stream which contants pdf2data template content in JSON format.
`static Pdf2DataExtractor`	`createFromTemplateContentJson(InputStream templateContentJsonStream, OcrWithPostProcessingEngine ocrEngine)` Creates instance of `Pdf2DataExtractor` from stream which contants pdf2data template content in JSON format.
`RecognitionResultHolder`	`extract(File targetPDF)` Recognize the pdf file.
`RecognitionResultHolder`	`extract(File targetFile, RecognitionProperties properties)` Recognize the file.
`RecognitionResultHolder`	`extract(InputStream targetInputStream)` Recognize the pdf file.
`RecognitionResultHolder`	`extract(InputStream targetInputStream, RecognitionProperties properties)` Recognize the file.
`com.itextpdf.pdfocr.IOcrEngine`	`getOcrEngine()` Gets current OCR engine instance.
`com.itextpdf.pdf2data.template.Template`	`getTemplate()` Gets current template instance.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Method Detail
  - create
```
public static Pdf2DataExtractor create(File p2dFile)
                                throws IOException
```
    Creates instance of Pdf2DataExtractor from pdf2data template file. Note that template should be processed.
    
    Parameters:
    
    p2dFile - pdf2data template archive
    
    Returns:
    
    a Pdf2DataExtractor instance
    
    Throws:
    
    IOException - if any I/O exception occurs
    
    com.itextpdf.pdf2data.exceptions.TemplateConversionException - if it's impossible to extract template from passed archive
  - create
```
public static Pdf2DataExtractor create(File p2dFile,
                                       OcrWithPostProcessingEngine ocrEngine)
                                throws IOException
```
    Creates instance of Pdf2DataExtractor from pdf2data template file with provided OCR engine. Note that template should be processed.
    
    Parameters:
    
    p2dFile - pdf2data template archive
    
    ocrEngine - OCR engine to be used for OCR involving recognitions. May be null if no OCR involving recognitions would be used.
    
    Returns:
    
    a Pdf2DataExtractor instance
    
    Throws:
    
    IOException - if any I/O exception occurs
    
    com.itextpdf.pdf2data.exceptions.TemplateConversionException - if it's impossible to extract template from passed archive
  - createFromTemplateContentJson
```
public static Pdf2DataExtractor createFromTemplateContentJson(InputStream templateContentJsonStream)
```
    Creates instance of Pdf2DataExtractor from stream which contants pdf2data template content in JSON format. Note that template should be processed.
    
    Parameters:
    
    templateContentJsonStream - processed template content stream
    
    Returns:
    
    a Pdf2DataExtractor instance
    
    Throws:
    
    com.itextpdf.pdf2data.exceptions.TemplateConversionException - if it's impossible to extract template from passed archive
  - createFromTemplateContentJson
```
public static Pdf2DataExtractor createFromTemplateContentJson(InputStream templateContentJsonStream,
                                                              OcrWithPostProcessingEngine ocrEngine)
```
    Creates instance of Pdf2DataExtractor from stream which contants pdf2data template content in JSON format. Note that template should be processed.
    
    Parameters:
    
    templateContentJsonStream - processed template content stream
    
    ocrEngine - OCR engine to be used for OCR involving recognitions. May be null if no OCR involving recognitions would be used.
    
    Returns:
    
    a Pdf2DataExtractor instance
    
    Throws:
    
    com.itextpdf.pdf2data.exceptions.TemplateConversionException - if it's impossible to extract template from passed archive
  - getTemplate
```
public com.itextpdf.pdf2data.template.Template getTemplate()
```
    Gets current template instance.
    
    Returns:
    
    current template instance
  - getOcrEngine
```
public com.itextpdf.pdfocr.IOcrEngine getOcrEngine()
```
    Gets current OCR engine instance.
    
    Returns:
    
    current OCR engine instance.
  - extract
```
public RecognitionResultHolder extract(File targetPDF)
                                throws IOException
```
    Recognize the pdf file.
    
    Parameters:
    
    targetPDF - pdf file for recognition
    
    Returns:
    
    RecognitionResultHolder instance
    
    Throws:
    
    IOException - if any I/O issue occurs
    
    com.itextpdf.pdf2data.exceptions.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
  - extract
```
public RecognitionResultHolder extract(File targetFile,
                                       RecognitionProperties properties)
                                throws IOException
```
    Recognize the file.
    
    Parameters:
    
    targetFile - file for recognition
    
    properties - a RecognitionProperties instance
    
    Returns:
    
    RecognitionResultHolder instance
    
    Throws:
    
    IOException - if any I/O issue occurs
    
    com.itextpdf.pdf2data.exceptions.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
  - extract
```
public RecognitionResultHolder extract(InputStream targetInputStream)
                                throws IOException
```
    Recognize the pdf file.
    
    Parameters:
    
    targetInputStream - input stream from pdf file for recognition
    
    Returns:
    
    RecognitionResultHolder instance
    
    Throws:
    
    IOException - if any I/O issue occurs
    
    com.itextpdf.pdf2data.exceptions.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
  - extract
```
public RecognitionResultHolder extract(InputStream targetInputStream,
                                       RecognitionProperties properties)
                                throws IOException
```
    Recognize the file.
    
    Parameters:
    
    targetInputStream - input stream from file for recognition
    
    properties - a RecognitionProperties instance
    
    Returns:
    
    RecognitionResultHolder instance
    
    Throws:
    
    IOException - if any I/O issue occurs
    
    com.itextpdf.pdf2data.exceptions.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
  - check
```
public Map<String,Integer> check(File targetPDF)
                          throws IOException
```
    Recognize the pdf file and returns recognition results amount.
    
    Parameters:
    
    targetPDF - pdf file for recognition
    
    Returns:
    
    A Map containing the recognition results as key-value pairs of strings and integers.
    
    Throws:
    
    IOException - if any I/O issue occurs
    
    com.itextpdf.pdf2data.exceptions.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
  - check
```
public Map<String,Integer> check(File targetFile,
                                 RecognitionProperties properties)
                          throws IOException
```
    Recognize the document and returns recognition results amount.
    
    Parameters:
    
    targetFile - file for recognition
    
    properties - a RecognitionProperties instance
    
    Returns:
    
    A Map containing the recognition results as key-value pairs of strings and integers.
    
    Throws:
    
    IOException - if any I/O issue occurs
    
    com.itextpdf.pdf2data.exceptions.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
  - check
```
public Map<String,Integer> check(InputStream targetInputStream)
                          throws IOException
```
    Recognize the pdf file and returns recognition results amount.
    
    Parameters:
    
    targetInputStream - input stream from pdf file for recognition
    
    Returns:
    
    A Map containing the recognition results as key-value pairs of strings and integers.
    
    Throws:
    
    IOException - if any I/O issue occurs
    
    com.itextpdf.pdf2data.exceptions.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
  - check
```
public Map<String,Integer> check(InputStream targetInputStream,
                                 RecognitionProperties properties)
                          throws IOException
```
    Recognize the document and returns recognition results amount.
    
    Parameters:
    
    targetInputStream - input stream from file for recognition
    
    properties - a RecognitionProperties instance
    
    Returns:
    
    A Map containing the recognition results as key-value pairs of strings and integers.
    
    Throws:
    
    IOException - if any I/O issue occurs
    
    com.itextpdf.pdf2data.exceptions.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted

Class Pdf2DataExtractor

Method Summary

Methods inherited from class java.lang.Object

Method Detail

create

create

createFromTemplateContentJson

createFromTemplateContentJson

getTemplate

getOcrEngine

extract

extract

extract

extract

check

check

check

check