public class Pdf2DataExtractor extends Object
To extract templates, use parseTemplateFromPDF(String pdfFile) or parseTemplateFromXML(String xmlFile).
To perform a single data extraction, use recognize(Template template, String sourcePdf).
To perform a batch extraction, it is recommended to create an instance of Pdf2DataExctractor and then use recognize(String sourcePdf) method for each pdf file.
| Constructor and Description |
|---|
Pdf2DataExtractor(com.duallab.pdf2data.template.Template template) |
| Modifier and Type | Method and Description |
|---|---|
static com.duallab.pdf2data.template.Template |
parseTemplateFromPDF(InputStream templateInputStream)
Parses the template from input stream.
|
static void |
parseTemplateFromPDF(InputStream templateInputStream, OutputStream xmlStream)
Parses the template from pdf input stream and save it into output stream in xml form.
|
static com.duallab.pdf2data.template.Template |
parseTemplateFromPDF(String templatePDF)
Parses the template from pdf file.
|
static void |
parseTemplateFromPDF(String templatePDF, String outXML)
Parses the template from pdf file and save it into xml file.
|
static com.duallab.pdf2data.template.Template |
parseTemplateFromXML(InputStream xmlInputStream)
Gets the template from input stream.
|
static com.duallab.pdf2data.template.Template |
parseTemplateFromXML(String templateXML)
Gets the template from xml file.
|
ParsingResult |
recognize(InputStream targetInputStream)
Recognize a pdf file using template instance.
|
ParsingResult |
recognize(InputStream targetInputStream, OutputStream pdfOutputStream)
Recognize a pdf file using template instance and write results to pdf file.
|
ParsingResult |
recognize(String targetPDF)
Recognize a pdf file using template instance.
|
ParsingResult |
recognize(String targetPDF, String outputPDF)
Recognize a pdf file using template instance and write results to pdf file.
|
static ParsingResult |
recognize(com.duallab.pdf2data.template.Template template, InputStream targetInputStream)
Recognizes a pdf file using template instance.
|
static ParsingResult |
recognize(com.duallab.pdf2data.template.Template template, InputStream targetInputStream, OutputStream pdfOutputStream)
Recognizes a pdf file using template instance and write results to pdf file.
|
static ParsingResult |
recognize(com.duallab.pdf2data.template.Template template, String targetPDF)
Recognizes a pdf file using template instance.
|
static ParsingResult |
recognize(com.duallab.pdf2data.template.Template template, String targetPDF, String outputPDF)
Recognize a pdf file using template instance and write results to pdf file.
|
void |
recognizeToXML(InputStream targetInputStream, OutputStream pdfOutputStream, OutputStream xmlOutputStream)
Recognize a pdf file using template instance and save results to xml and pdf files.
|
void |
recognizeToXML(InputStream targetInputStream, OutputStream pdfOutputStream, OutputStream xmlOutputStream, RecognitionProperties properties)
Recognize a pdf file using template instance and save results to xml and pdf files.
|
void |
recognizeToXML(String targetPDF, String outputPDF, String outputXML)
Recognize a pdf file using template instance and save results to xml and pdf files.
|
void |
recognizeToXML(String targetPDF, String outputPDF, String outputXML, RecognitionProperties properties)
Recognize a pdf file using template instance and save results to xml and pdf files.
|
static void |
recognizeToXML(com.duallab.pdf2data.template.Template template, InputStream targetInputStream, OutputStream xmlOutputStream)
Recognize a pdf file using template instance and save results to xml file.
|
static void |
recognizeToXML(com.duallab.pdf2data.template.Template template, InputStream targetInputStream, OutputStream pdfOutputStream, OutputStream xmlOutputStream)
Recognize a pdf file using template instance and save results to xml and pdf files.
|
static void |
recognizeToXML(com.duallab.pdf2data.template.Template template, InputStream targetInputStream, OutputStream pdfOutputStream, OutputStream xmlOutputStream, RecognitionProperties properties)
Recognize a pdf file using template instance and save results to xml and pdf files.
|
static void |
recognizeToXML(com.duallab.pdf2data.template.Template template, InputStream targetInputStream, OutputStream xmlOutputStream, RecognitionProperties properties)
Recognize a pdf file using template instance and save results to xml file.
|
static void |
recognizeToXML(com.duallab.pdf2data.template.Template template, String targetPDF, String outputXML)
Recognize a pdf file using template instance and save results to xml file.
|
static void |
recognizeToXML(com.duallab.pdf2data.template.Template template, String targetPDF, String outputXML, RecognitionProperties properties)
Recognize a pdf file using template instance and save results to xml file.
|
static void |
recognizeToXML(com.duallab.pdf2data.template.Template template, String targetPDF, String outputPDF, String outputXML)
Recognize a pdf file using template instance and save results to xml and pdf files.
|
static void |
recognizeToXML(com.duallab.pdf2data.template.Template template, String targetPDF, String outputPDF, String outputXML, RecognitionProperties properties)
Recognize a pdf file using template instance and save results to xml and pdf files.
|
public Pdf2DataExtractor(com.duallab.pdf2data.template.Template template)
public static com.duallab.pdf2data.template.Template parseTemplateFromPDF(String templatePDF) throws IOException
templatePDF - a path to pdf file
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
public static void parseTemplateFromPDF(String templatePDF, String outXML) throws IOException
templatePDF - a path to pdf file
outXML - a path to xml file
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
public static com.duallab.pdf2data.template.Template parseTemplateFromPDF(InputStream templateInputStream) throws IOException
templateInputStream - pdf input stream with template
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
public static void parseTemplateFromPDF(InputStream templateInputStream, OutputStream xmlStream) throws IOException
templateInputStream - input stream with template
xmlStream - output stream for saving template
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
public static com.duallab.pdf2data.template.Template parseTemplateFromXML(String templateXML) throws IOException
templateXML - a path to xml file
IOException
public static com.duallab.pdf2data.template.Template parseTemplateFromXML(InputStream xmlInputStream) throws IOException
xmlInputStream - input stream that contains template in xml form
IOException
public static ParsingResult recognize(com.duallab.pdf2data.template.Template template, String targetPDF) throws IOException
template - template instance
targetPDF - path to pdf file for recognition
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
public static ParsingResult recognize(com.duallab.pdf2data.template.Template template, String targetPDF, String outputPDF) throws IOException
template - template instance
targetPDF - path to pdf file for recognition
outputPDF - path to pdf file with recognition results (annotation type)
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException - if pdf document is encrypted and creating/modifying annotations is not permitted
public static ParsingResult recognize(com.duallab.pdf2data.template.Template template, InputStream targetInputStream) throws IOException
template - template instance
targetInputStream - input stream from pdf file for recognition
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
public static ParsingResult recognize(com.duallab.pdf2data.template.Template template, InputStream targetInputStream, OutputStream pdfOutputStream) throws IOException
template - template instance
targetInputStream - input stream from pdf file for recognition
pdfOutputStream - output stream for writing recognition results (pdf annotation type)
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException - if pdf document is encrypted and creating/modifying annotations is not permitted
public ParsingResult recognize(String targetPDF) throws IOException
targetPDF - path to pdf file for recognition
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
public ParsingResult recognize(String targetPDF, String outputPDF) throws IOException
targetPDF - path to pdf file for recognition
outputPDF - path to pdf file with recognition results (annotation type)
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException - if pdf document is encrypted and creating/modifying annotations is not permitted
public ParsingResult recognize(InputStream targetInputStream) throws IOException
targetInputStream - input stream from pdf file for recognition
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
public ParsingResult recognize(InputStream targetInputStream, OutputStream pdfOutputStream) throws IOException
targetInputStream - input stream from pdf file for recognition
pdfOutputStream - output stream for writing recognition results (pdf annotation type)
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException - if pdf document is encrypted and creating/modifying annotations is not permitted
public static void recognizeToXML(com.duallab.pdf2data.template.Template template,
String targetPDF,
String outputXML)
throws IOException
template - template instance
targetPDF - path to pdf file for recognition
outputXML - path to xml file with recognition results
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
public static void recognizeToXML(com.duallab.pdf2data.template.Template template,
String targetPDF,
String outputXML,
RecognitionProperties properties)
throws IOException
template - template instance
targetPDF - path to pdf file for recognition
outputXML - path to xml file with recognition results
properties - a RecognitionProperties instance
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
public static void recognizeToXML(com.duallab.pdf2data.template.Template template,
String targetPDF,
String outputPDF,
String outputXML)
throws IOException
template - template instance
targetPDF - path to pdf file for recognition
outputPDF - path to pdf file with recognition results (annotation type)
outputXML - path to xml file with recognition results
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException - if pdf document is encrypted and creating/modifying annotations is not permitted
public static void recognizeToXML(com.duallab.pdf2data.template.Template template,
String targetPDF,
String outputPDF,
String outputXML,
RecognitionProperties properties)
throws IOException
template - template instance
targetPDF - path to pdf file for recognition
outputPDF - path to pdf file with recognition results (annotation type)
outputXML - path to xml file with recognition results
properties - a RecognitionProperties instance
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException - if pdf document is encrypted and creating/modifying annotations is not permitted
public static void recognizeToXML(com.duallab.pdf2data.template.Template template,
InputStream targetInputStream,
OutputStream xmlOutputStream)
throws IOException
template - template instance
targetInputStream - input stream from pdf file for recognition
xmlOutputStream - output stream for writing recognition results (xml type)
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
public static void recognizeToXML(com.duallab.pdf2data.template.Template template,
InputStream targetInputStream,
OutputStream xmlOutputStream,
RecognitionProperties properties)
throws IOException
template - template instance
targetInputStream - input stream from pdf file for recognition
xmlOutputStream - output stream for writing recognition results (xml type)
properties - a RecognitionProperties instance
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
public static void recognizeToXML(com.duallab.pdf2data.template.Template template,
InputStream targetInputStream,
OutputStream pdfOutputStream,
OutputStream xmlOutputStream)
throws IOException
template - template instance
targetInputStream - input stream from pdf file for recognition
pdfOutputStream - output stream for writing recognition results (pdf annotation type)
xmlOutputStream - output stream for writing recognition results (xml type)
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException - if pdf document is encrypted and creating/modifying annotations is not permitted
public static void recognizeToXML(com.duallab.pdf2data.template.Template template,
InputStream targetInputStream,
OutputStream pdfOutputStream,
OutputStream xmlOutputStream,
RecognitionProperties properties)
throws IOException
template - template instance
targetInputStream - input stream from pdf file for recognition
pdfOutputStream - output stream for writing recognition results (pdf annotation type)
xmlOutputStream - output stream for writing recognition results (xml type)
properties - a RecognitionProperties instance
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException - if pdf document is encrypted and creating/modifying annotations is not permitted
public void recognizeToXML(String targetPDF, String outputPDF, String outputXML) throws IOException
targetPDF - path to pdf file for recognition
outputPDF - path to pdf file with recognition results (annotation type)
outputXML - path to xml file with recognition results
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException - if pdf document is encrypted and creating/modifying annotations is not permitted
public void recognizeToXML(String targetPDF, String outputPDF, String outputXML, RecognitionProperties properties) throws IOException
targetPDF - path to pdf file for recognition
outputPDF - path to pdf file with recognition results (annotation type)
outputXML - path to xml file with recognition results
properties - a RecognitionProperties instance
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException - if pdf document is encrypted and creating/modifying annotations is not permitted
public void recognizeToXML(InputStream targetInputStream, OutputStream pdfOutputStream, OutputStream xmlOutputStream) throws IOException
targetInputStream - input stream from pdf file for recognition
pdfOutputStream - output stream for writing recognition results (pdf annotation type)
xmlOutputStream - output stream for writing recognition results (xml type)
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException - if pdf document is encrypted and creating/modifying annotations is not permitted
public void recognizeToXML(InputStream targetInputStream, OutputStream pdfOutputStream, OutputStream xmlOutputStream, RecognitionProperties properties) throws IOException
targetInputStream - input stream from pdf file for recognition
pdfOutputStream - output stream for writing recognition results (pdf annotation type)
xmlOutputStream - output stream for writing recognition results (xml type)
properties - a RecognitionProperties instance
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException - if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException - if pdf document is encrypted and creating/modifying annotations is not permitted
Copyright © 2021. All rights reserved.