Class PdfReader

java.lang.Object
com.itextpdf.kernel.pdf.PdfReader
All Implemented Interfaces:
Closeable, AutoCloseable

public class PdfReader extends Object implements Closeable
Reads a PDF document.
  • Field Details

    • DEFAULT_STRICTNESS_LEVEL

      public static final PdfReader.StrictnessLevel DEFAULT_STRICTNESS_LEVEL
      The default PdfReader.StrictnessLevel to be used.
    • correctStreamLength

      protected static boolean correctStreamLength
    • tokens

      protected PdfTokenizer tokens
    • decrypt

      protected PdfEncryption decrypt
    • headerPdfVersion

      protected PdfVersion headerPdfVersion
    • lastXref

      protected long lastXref
    • eofPos

      protected long eofPos
    • trailer

      protected PdfDictionary trailer
    • pdfDocument

      protected PdfDocument pdfDocument
    • pdfAConformanceLevel

      protected PdfAConformanceLevel pdfAConformanceLevel
    • properties

      protected ReaderProperties properties
    • encrypted

      protected boolean encrypted
    • rebuiltXref

      protected boolean rebuiltXref
    • hybridXref

      protected boolean hybridXref
    • fixedXref

      protected boolean fixedXref
    • xrefStm

      protected boolean xrefStm
  • Constructor Details

    • PdfReader

      public PdfReader (IRandomAccessSource byteSource, ReaderProperties properties) throws IOException
      Constructs a new PdfReader.
      Parameters:
      byteSource - source of bytes for the reader
      properties - properties of the created reader
      Throws:
      IOException - if an I/O error occurs
    • PdfReader

      public PdfReader (InputStream is, ReaderProperties properties) throws IOException
      Reads and parses a PDF document.
      Parameters:
      is - the InputStream containing the document. If the inputStream is an instance of RASInputStream then the IRandomAccessSource would be extracted. Otherwise the stream is read to the end but is not closed.
      properties - properties of the created reader
      Throws:
      IOException - on error
    • PdfReader

      public PdfReader (File file) throws FileNotFoundException, IOException
      Reads and parses a PDF document.
      Parameters:
      file - the File containing the document.
      Throws:
      IOException - on error
      FileNotFoundException - when the specified File is not found
    • PdfReader

      public PdfReader (InputStream is) throws IOException
      Reads and parses a PDF document.
      Parameters:
      is - the InputStream containing the document. If the inputStream is an instance of RASInputStream then the IRandomAccessSource would be extracted. Otherwise the stream is read to the end but is not closed.
      Throws:
      IOException - on error
    • PdfReader

      public PdfReader (String filename, ReaderProperties properties) throws IOException
      Reads and parses a PDF document.
      Parameters:
      filename - the file name of the document
      properties - properties of the created reader
      Throws:
      IOException - on error
    • PdfReader

      public PdfReader (String filename) throws IOException
      Reads and parses a PDF document.
      Parameters:
      filename - the file name of the document
      Throws:
      IOException - on error
    • PdfReader

      public PdfReader (File file, ReaderProperties properties) throws IOException
      Reads and parses a PDF document.
      Parameters:
      file - the file of the document
      properties - properties of the created reader
      Throws:
      IOException - on error
  • Method Details

    • close

      public void close() throws IOException
      Close PdfTokenizer.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Throws:
      IOException - on error.
    • setUnethicalReading

      public PdfReader setUnethicalReading (boolean unethicalReading)
      The iText is not responsible if you decide to change the value of this parameter.
      Parameters:
      unethicalReading - true to enable unethicalReading, false to disable it. By default unethicalReading is disabled.
      Returns:
      this PdfReader instance.
    • setMemorySavingMode

      public PdfReader setMemorySavingMode (boolean memorySavingMode)
      Defines if memory saving mode is enabled.

      By default memory saving mode is disabled for the sake of time–memory trade-off.

      If memory saving mode is enabled, document processing might slow down, but reading will be less memory demanding.

      Parameters:
      memorySavingMode - true to enable memory saving mode, false to disable it.
      Returns:
      this PdfReader instance.
    • getStrictnessLevel

      public PdfReader.StrictnessLevel getStrictnessLevel()
      Get the current PdfReader.StrictnessLevel of the reader.
      Returns:
      the current PdfReader.StrictnessLevel
    • setStrictnessLevel

      public PdfReader setStrictnessLevel (PdfReader.StrictnessLevel strictnessLevel)
      Set the PdfReader.StrictnessLevel for the reader. If the argument is null, then the DEFAULT_STRICTNESS_LEVEL will be used.
      Parameters:
      strictnessLevel - the PdfReader.StrictnessLevel to set
      Returns:
      this PdfReader instance
    • isCloseStream

      public boolean isCloseStream()
      Gets whether close() method shall close input stream.
      Returns:
      true, if close() method will close input stream, otherwise false.
    • setCloseStream

      public void setCloseStream (boolean closeStream)
      Sets whether close() method shall close input stream.
      Parameters:
      closeStream - true, if close() method shall close input stream, otherwise false.
    • hasRebuiltXref

      public boolean hasRebuiltXref()
      If any exception generated while reading XRef section, PdfReader will try to rebuild it.
      Returns:
      true, if PdfReader rebuilt Cross-Reference section.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
    • hasHybridXref

      public boolean hasHybridXref()
      Some documents contain hybrid XRef, for more information see "7.5.8.4 Compatibility with Applications That Do Not Support Compressed Reference Streams" in PDF 32000-1:2008 spec.
      Returns:
      true, if the document has hybrid Cross-Reference section.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
    • hasXrefStm

      public boolean hasXrefStm()
      Indicates whether the document has Cross-Reference Streams.
      Returns:
      true, if the document has Cross-Reference Streams.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
    • hasFixedXref

      public boolean hasFixedXref()
      If any exception generated while reading PdfObject, PdfReader will try to fix offsets of all objects.

      This method's returned value might change over time, because PdfObjects reading can be postponed even up to document closing.

      Returns:
      true, if PdfReader fixed offsets of PdfObjects.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
    • getLastXref

      public long getLastXref()
      Gets position of the last Cross-Reference table.
      Returns:
      -1 if Cross-Reference table has rebuilt, otherwise position of the last Cross-Reference table.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
    • readStreamBytes

      public byte[] readStreamBytes (PdfStream stream, boolean decode) throws IOException
      Reads, decrypt and optionally decode stream bytes. Note, this method doesn't store actual bytes in any internal structures.
      Parameters:
      stream - a PdfStream stream instance to be read and optionally decoded.
      decode - true if to get decoded stream bytes, false if to leave it originally encoded.
      Returns:
      byte[] array.
      Throws:
      IOException - on error.
    • readStreamBytesRaw

      public byte[] readStreamBytesRaw (PdfStream stream) throws IOException
      Reads and decrypt stream bytes. Note, this method doesn't store actual bytes in any internal structures.
      Parameters:
      stream - a PdfStream stream instance to be read
      Returns:
      byte[] array.
      Throws:
      IOException - on error.
    • readStream

      public InputStream readStream (PdfStream stream, boolean decode) throws IOException
      Reads, decrypts and optionally decodes stream bytes into ByteArrayInputStream. User is responsible for closing returned stream.
      Parameters:
      stream - a PdfStream stream instance to be read
      decode - true if to get decoded stream, false if to leave it originally encoded.
      Returns:
      InputStream or null if reading was failed.
      Throws:
      IOException - on error.
    • decodeBytes

      public static byte[] decodeBytes (byte[] b, PdfDictionary streamDictionary)
      Decode bytes applying the filters specified in the provided dictionary using default filter handlers.
      Parameters:
      b - the bytes to decode
      streamDictionary - the dictionary that contains filter information
      Returns:
      the decoded bytes
      Throws:
      PdfException - if there are any problems decoding the bytes
    • decodeBytes

      public static byte[] decodeBytes (byte[] b, PdfDictionary streamDictionary, Map<PdfName,IFilterHandler> filterHandlers)
      Decode a byte[] applying the filters specified in the provided dictionary using the provided filter handlers.
      Parameters:
      b - the bytes to decode
      streamDictionary - the dictionary that contains filter information
      filterHandlers - the map used to look up a handler for each type of filter
      Returns:
      the decoded bytes
      Throws:
      PdfException - if there are any problems decoding the bytes
    • getSafeFile

      public RandomAccessFileOrArray getSafeFile()
      Gets a new file instance of the original PDF document.
      Returns:
      a new file instance of the original PDF document
    • getFileLength

      public long getFileLength()
      Provides the size of the opened file.
      Returns:
      The size of the opened file.
    • isOpenedWithFullPermission

      public boolean isOpenedWithFullPermission()
      Checks if the document was opened with the owner password so that the end application can decide what level of access restrictions to apply. If the document is not encrypted it will return true.
      Returns:
      true if the document was opened with the owner password or if it's not encrypted, false if the document was opened with the user password.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
    • getPermissions

      public long getPermissions()
      Gets the encryption permissions. It can be used directly in WriterProperties.setStandardEncryption(byte[], byte[], int, int). See ISO 32000-1, Table 22 for more details.
      Returns:
      the encryption permissions, an unsigned 32-bit quantity.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
    • getCryptoMode

      public int getCryptoMode()
      Gets encryption algorithm and access permissions.
      Returns:
      int value corresponding to a certain type of encryption.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
      See Also:
    • getPdfAConformanceLevel

      public PdfAConformanceLevel getPdfAConformanceLevel()
      Gets the declared PDF/A conformance level of the source document that is being read. Note that this information is provided via XMP metadata and is not verified by iText. pdfAConformanceLevel is lazy initialized. It will be initialized during the first call of this method.
      Returns:
      conformance level of the source document, or null if no PDF/A conformance level information is specified.
    • computeUserPassword

      public byte[] computeUserPassword()
      Computes user password if standard encryption handler is used with Standard40, Standard128 or AES128 encryption algorithm.
      Returns:
      user password, or null if not a standard encryption handler was used or if ownerPasswordUsed wasn't use to open the document.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
    • getOriginalFileId

      public byte[] getOriginalFileId()
      Gets original file ID, the first element in PdfName.ID key of trailer. If the size of ID array does not equal 2, an empty array will be returned.

      The returned value reflects the value that was written in opened document. If document is modified, the ultimate document id can be retrieved from PdfDocument.getOriginalDocumentId().

      Returns:
      byte array represents original file ID.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
      See Also:
    • getModifiedFileId

      public byte[] getModifiedFileId()
      Gets modified file ID, the second element in PdfName.ID key of trailer. If the size of ID array does not equal 2, an empty array will be returned.

      The returned value reflects the value that was written in opened document. If document is modified, the ultimate document id can be retrieved from PdfDocument.getModifiedDocumentId().

      Returns:
      byte array represents modified file ID.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
      See Also:
    • isEncrypted

      public boolean isEncrypted()
      Checks if the PdfDocument read with this PdfReader is encrypted.
      Returns:
      true is the document is encrypted, otherwise false.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
    • readPdf

      protected void readPdf() throws IOException
      Parses the entire PDF
      Throws:
      IOException - if an I/O error occurs.
    • readObjectStream

      protected void readObjectStream (PdfStream objectStream) throws IOException
      Throws:
      IOException
    • readObject

      protected PdfObject readObject (PdfIndirectReference reference)
    • readObject

      protected PdfObject readObject (boolean readAsDirect) throws IOException
      Throws:
      IOException
    • readReference

      protected PdfObject readReference (boolean readAsDirect)
    • readObject

      protected PdfObject readObject (boolean readAsDirect, boolean objStm) throws IOException
      Throws:
      IOException
    • readPdfName

      protected PdfName readPdfName (boolean readAsDirect)
    • readDictionary

      protected PdfDictionary readDictionary (boolean objStm) throws IOException
      Throws:
      IOException
    • readArray

      protected PdfArray readArray (boolean objStm) throws IOException
      Throws:
      IOException
    • readXref

      protected void readXref() throws IOException
      Throws:
      IOException
    • readXrefSection

      protected PdfDictionary readXrefSection() throws IOException
      Throws:
      IOException
    • readXrefStream

      protected boolean readXrefStream (long ptr) throws IOException
      Throws:
      IOException
    • fixXref

      protected void fixXref() throws IOException
      Throws:
      IOException
    • rebuildXref

      protected void rebuildXref() throws IOException
      Throws:
      IOException
    • getXrefPrev

      protected PdfNumber getXrefPrev (PdfObject prevObjectToCheck)