public class Pdf2DataExtractor extends Object
To extract templates, use parseTemplateFromPDF(String pdfFile)
or parseTemplateFromXML(String xmlFile)
.
To perform a single data extraction, use recognize(Template template, String sourcePdf)
.
To perform a batch extraction, it is recommended to create an instance of Pdf2DataExctractor and then use recognize(String sourcePdf)
method for each pdf file.
Constructor and Description |
---|
Pdf2DataExtractor(com.duallab.pdf2data.template.Template template) |
Modifier and Type | Method and Description |
---|---|
static com.duallab.pdf2data.template.Template |
parseTemplateFromPDF(InputStream templateInputStream)
Parses the template from input stream.
|
static void |
parseTemplateFromPDF(InputStream templateInputStream, OutputStream xmlStream)
Parses the template from pdf input stream and save it into output stream in xml form.
|
static com.duallab.pdf2data.template.Template |
parseTemplateFromPDF(String templatePDF)
Parses the template from pdf file.
|
static void |
parseTemplateFromPDF(String templatePDF, String outXML)
Parses the template from pdf file and save it into xml file.
|
static com.duallab.pdf2data.template.Template |
parseTemplateFromXML(InputStream xmlInputStream)
Gets the template from input stream.
|
static com.duallab.pdf2data.template.Template |
parseTemplateFromXML(String templateXML)
Gets the template from xml file.
|
ParsingResult |
recognize(InputStream targetInputStream)
Recognize a pdf file using template instance.
|
ParsingResult |
recognize(InputStream targetInputStream, OutputStream pdfOutputStream)
Recognize a pdf file using template instance and write results to pdf file.
|
ParsingResult |
recognize(String targetPDF)
Recognize a pdf file using template instance.
|
ParsingResult |
recognize(String targetPDF, String outputPDF)
Recognize a pdf file using template instance and write results to pdf file.
|
static ParsingResult |
recognize(com.duallab.pdf2data.template.Template template, InputStream targetInputStream)
Recognizes a pdf file using template instance.
|
static ParsingResult |
recognize(com.duallab.pdf2data.template.Template template, InputStream targetInputStream, OutputStream pdfOutputStream)
Recognizes a pdf file using template instance and write results to pdf file.
|
static ParsingResult |
recognize(com.duallab.pdf2data.template.Template template, String targetPDF)
Recognizes a pdf file using template instance.
|
static ParsingResult |
recognize(com.duallab.pdf2data.template.Template template, String targetPDF, String outputPDF)
Recognize a pdf file using template instance and write results to pdf file.
|
void |
recognizeToXML(InputStream targetInputStream, OutputStream pdfOutputStream, OutputStream xmlOutputStream)
Recognize a pdf file using template instance and save results to xml and pdf files.
|
void |
recognizeToXML(InputStream targetInputStream, OutputStream pdfOutputStream, OutputStream xmlOutputStream, RecognitionProperties properties)
Recognize a pdf file using template instance and save results to xml and pdf files.
|
void |
recognizeToXML(String targetPDF, String outputPDF, String outputXML)
Recognize a pdf file using template instance and save results to xml and pdf files.
|
void |
recognizeToXML(String targetPDF, String outputPDF, String outputXML, RecognitionProperties properties)
Recognize a pdf file using template instance and save results to xml and pdf files.
|
static void |
recognizeToXML(com.duallab.pdf2data.template.Template template, InputStream targetInputStream, OutputStream xmlOutputStream)
Recognize a pdf file using template instance and save results to xml file.
|
static void |
recognizeToXML(com.duallab.pdf2data.template.Template template, InputStream targetInputStream, OutputStream pdfOutputStream, OutputStream xmlOutputStream)
Recognize a pdf file using template instance and save results to xml and pdf files.
|
static void |
recognizeToXML(com.duallab.pdf2data.template.Template template, InputStream targetInputStream, OutputStream pdfOutputStream, OutputStream xmlOutputStream, RecognitionProperties properties)
Recognize a pdf file using template instance and save results to xml and pdf files.
|
static void |
recognizeToXML(com.duallab.pdf2data.template.Template template, InputStream targetInputStream, OutputStream xmlOutputStream, RecognitionProperties properties)
Recognize a pdf file using template instance and save results to xml file.
|
static void |
recognizeToXML(com.duallab.pdf2data.template.Template template, String targetPDF, String outputXML)
Recognize a pdf file using template instance and save results to xml file.
|
static void |
recognizeToXML(com.duallab.pdf2data.template.Template template, String targetPDF, String outputXML, RecognitionProperties properties)
Recognize a pdf file using template instance and save results to xml file.
|
static void |
recognizeToXML(com.duallab.pdf2data.template.Template template, String targetPDF, String outputPDF, String outputXML)
Recognize a pdf file using template instance and save results to xml and pdf files.
|
static void |
recognizeToXML(com.duallab.pdf2data.template.Template template, String targetPDF, String outputPDF, String outputXML, RecognitionProperties properties)
Recognize a pdf file using template instance and save results to xml and pdf files.
|
public Pdf2DataExtractor(com.duallab.pdf2data.template.Template template)
public static com.duallab.pdf2data.template.Template parseTemplateFromPDF(String templatePDF) throws IOException
templatePDF
- a path to pdf file
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
public static void parseTemplateFromPDF(String templatePDF, String outXML) throws IOException
templatePDF
- a path to pdf file
outXML
- a path to xml file
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
public static com.duallab.pdf2data.template.Template parseTemplateFromPDF(InputStream templateInputStream) throws IOException
templateInputStream
- pdf input stream with template
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
public static void parseTemplateFromPDF(InputStream templateInputStream, OutputStream xmlStream) throws IOException
templateInputStream
- input stream with template
xmlStream
- output stream for saving template
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
public static com.duallab.pdf2data.template.Template parseTemplateFromXML(String templateXML) throws IOException
templateXML
- a path to xml file
IOException
public static com.duallab.pdf2data.template.Template parseTemplateFromXML(InputStream xmlInputStream) throws IOException
xmlInputStream
- input stream that contains template in xml form
IOException
public static ParsingResult recognize(com.duallab.pdf2data.template.Template template, String targetPDF) throws IOException
template
- template instance
targetPDF
- path to pdf file for recognition
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
public static ParsingResult recognize(com.duallab.pdf2data.template.Template template, String targetPDF, String outputPDF) throws IOException
template
- template instance
targetPDF
- path to pdf file for recognition
outputPDF
- path to pdf file with recognition results (annotation type)
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException
- if pdf document is encrypted and creating/modifying annotations is not permitted
public static ParsingResult recognize(com.duallab.pdf2data.template.Template template, InputStream targetInputStream) throws IOException
template
- template instance
targetInputStream
- input stream from pdf file for recognition
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
public static ParsingResult recognize(com.duallab.pdf2data.template.Template template, InputStream targetInputStream, OutputStream pdfOutputStream) throws IOException
template
- template instance
targetInputStream
- input stream from pdf file for recognition
pdfOutputStream
- output stream for writing recognition results (pdf annotation type)
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException
- if pdf document is encrypted and creating/modifying annotations is not permitted
public ParsingResult recognize(String targetPDF) throws IOException
targetPDF
- path to pdf file for recognition
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
public ParsingResult recognize(String targetPDF, String outputPDF) throws IOException
targetPDF
- path to pdf file for recognition
outputPDF
- path to pdf file with recognition results (annotation type)
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException
- if pdf document is encrypted and creating/modifying annotations is not permitted
public ParsingResult recognize(InputStream targetInputStream) throws IOException
targetInputStream
- input stream from pdf file for recognition
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
public ParsingResult recognize(InputStream targetInputStream, OutputStream pdfOutputStream) throws IOException
targetInputStream
- input stream from pdf file for recognition
pdfOutputStream
- output stream for writing recognition results (pdf annotation type)
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException
- if pdf document is encrypted and creating/modifying annotations is not permitted
public static void recognizeToXML(com.duallab.pdf2data.template.Template template, String targetPDF, String outputXML) throws IOException
template
- template instance
targetPDF
- path to pdf file for recognition
outputXML
- path to xml file with recognition results
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
public static void recognizeToXML(com.duallab.pdf2data.template.Template template, String targetPDF, String outputXML, RecognitionProperties properties) throws IOException
template
- template instance
targetPDF
- path to pdf file for recognition
outputXML
- path to xml file with recognition results
properties
- a RecognitionProperties
instance
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
public static void recognizeToXML(com.duallab.pdf2data.template.Template template, String targetPDF, String outputPDF, String outputXML) throws IOException
template
- template instance
targetPDF
- path to pdf file for recognition
outputPDF
- path to pdf file with recognition results (annotation type)
outputXML
- path to xml file with recognition results
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException
- if pdf document is encrypted and creating/modifying annotations is not permitted
public static void recognizeToXML(com.duallab.pdf2data.template.Template template, String targetPDF, String outputPDF, String outputXML, RecognitionProperties properties) throws IOException
template
- template instance
targetPDF
- path to pdf file for recognition
outputPDF
- path to pdf file with recognition results (annotation type)
outputXML
- path to xml file with recognition results
properties
- a RecognitionProperties
instance
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException
- if pdf document is encrypted and creating/modifying annotations is not permitted
public static void recognizeToXML(com.duallab.pdf2data.template.Template template, InputStream targetInputStream, OutputStream xmlOutputStream) throws IOException
template
- template instance
targetInputStream
- input stream from pdf file for recognition
xmlOutputStream
- output stream for writing recognition results (xml type)
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
public static void recognizeToXML(com.duallab.pdf2data.template.Template template, InputStream targetInputStream, OutputStream xmlOutputStream, RecognitionProperties properties) throws IOException
template
- template instance
targetInputStream
- input stream from pdf file for recognition
xmlOutputStream
- output stream for writing recognition results (xml type)
properties
- a RecognitionProperties
instance
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
public static void recognizeToXML(com.duallab.pdf2data.template.Template template, InputStream targetInputStream, OutputStream pdfOutputStream, OutputStream xmlOutputStream) throws IOException
template
- template instance
targetInputStream
- input stream from pdf file for recognition
pdfOutputStream
- output stream for writing recognition results (pdf annotation type)
xmlOutputStream
- output stream for writing recognition results (xml type)
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException
- if pdf document is encrypted and creating/modifying annotations is not permitted
public static void recognizeToXML(com.duallab.pdf2data.template.Template template, InputStream targetInputStream, OutputStream pdfOutputStream, OutputStream xmlOutputStream, RecognitionProperties properties) throws IOException
template
- template instance
targetInputStream
- input stream from pdf file for recognition
pdfOutputStream
- output stream for writing recognition results (pdf annotation type)
xmlOutputStream
- output stream for writing recognition results (xml type)
properties
- a RecognitionProperties
instance
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException
- if pdf document is encrypted and creating/modifying annotations is not permitted
public void recognizeToXML(String targetPDF, String outputPDF, String outputXML) throws IOException
targetPDF
- path to pdf file for recognition
outputPDF
- path to pdf file with recognition results (annotation type)
outputXML
- path to xml file with recognition results
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException
- if pdf document is encrypted and creating/modifying annotations is not permitted
public void recognizeToXML(String targetPDF, String outputPDF, String outputXML, RecognitionProperties properties) throws IOException
targetPDF
- path to pdf file for recognition
outputPDF
- path to pdf file with recognition results (annotation type)
outputXML
- path to xml file with recognition results
properties
- a RecognitionProperties
instance
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException
- if pdf document is encrypted and creating/modifying annotations is not permitted
public void recognizeToXML(InputStream targetInputStream, OutputStream pdfOutputStream, OutputStream xmlOutputStream) throws IOException
targetInputStream
- input stream from pdf file for recognition
pdfOutputStream
- output stream for writing recognition results (pdf annotation type)
xmlOutputStream
- output stream for writing recognition results (xml type)
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException
- if pdf document is encrypted and creating/modifying annotations is not permitted
public void recognizeToXML(InputStream targetInputStream, OutputStream pdfOutputStream, OutputStream xmlOutputStream, RecognitionProperties properties) throws IOException
targetInputStream
- input stream from pdf file for recognition
pdfOutputStream
- output stream for writing recognition results (pdf annotation type)
xmlOutputStream
- output stream for writing recognition results (xml type)
properties
- a RecognitionProperties
instance
IOException
com.duallab.pdf2data.exception.DocumentExtractionDeniedException
- if pdf document is encrypted and extracting text is not permitted
com.duallab.pdf2data.exception.DocumentAnnotationsDeniedException
- if pdf document is encrypted and creating/modifying annotations is not permitted
Copyright © 2020. All rights reserved.