Class PdfTokenizer

java.lang.Object
com.itextpdf.io.source.PdfTokenizer
All Implemented Interfaces:
Closeable, AutoCloseable

public class PdfTokenizer extends Object implements Closeable
  • Field Details

    • Obj

      public static final byte[] Obj
    • R

      public static final byte[] R
    • Xref

      public static final byte[] Xref
    • Startxref

      public static final byte[] Startxref
    • Stream

      public static final byte[] Stream
    • Trailer

      public static final byte[] Trailer
    • N

      public static final byte[] N
    • F

      public static final byte[] F
    • Null

      public static final byte[] Null
    • True

      public static final byte[] True
    • False

      public static final byte[] False
    • type

      protected PdfTokenizer.TokenType type
    • reference

      protected int reference
    • generation

      protected int generation
    • hexString

      protected boolean hexString
    • outBuf

      protected ByteBuffer outBuf
  • Constructor Details

    • PdfTokenizer

      public PdfTokenizer (RandomAccessFileOrArray file)
      Creates a PdfTokenizer for the specified RandomAccessFileOrArray. The beginning of the file is read to determine the location of the header, and the data source is adjusted as necessary to account for any junk that occurs in the byte source before the header
      Parameters:
      file - the source
  • Method Details

    • seek

      public void seek (long pos)
    • readFully

      public void readFully (byte[] bytes) throws IOException
      Throws:
      IOException
    • getPosition

      public long getPosition()
    • close

      public void close() throws IOException
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Throws:
      IOException
    • length

      public long length()
    • read

      public int read() throws IOException
      Throws:
      IOException
    • peek

      public int peek() throws IOException
      Gets the next byte of pdf source without moving source position.
      Returns:
      the byte, or -1 if EOF is reached
      Throws:
      IOException - in case of any reading error.
    • peek

      public int peek (byte[] buffer) throws IOException
      Gets the next buffer.length bytes of pdf source without moving source position.
      Parameters:
      buffer - buffer to store read bytes
      Returns:
      the number of read bytes. If it is less than buffer.length it means EOF has been reached.
      Throws:
      IOException - in case of any reading error.
    • readString

      public String readString (int size) throws IOException
      Throws:
      IOException
    • getTokenType

      public PdfTokenizer.TokenType getTokenType()
    • getByteContent

      public byte[] getByteContent()
    • getStringValue

      public String getStringValue()
    • getDecodedStringContent

      public byte[] getDecodedStringContent()
    • tokenValueEqualsTo

      public boolean tokenValueEqualsTo (byte[] cmp)
    • getObjNr

      public int getObjNr()
    • getGenNr

      public int getGenNr()
    • backOnePosition

      public void backOnePosition (int ch)
    • getHeaderOffset

      public int getHeaderOffset() throws IOException
      Throws:
      IOException
    • checkPdfHeader

      public String checkPdfHeader() throws IOException
      Throws:
      IOException
    • checkFdfHeader

      public void checkFdfHeader() throws IOException
      Throws:
      IOException
    • getStartxref

      public long getStartxref() throws IOException
      Throws:
      IOException
    • getNextEof

      public long getNextEof() throws IOException
      Gets next %%EOF marker in current PDF file.
      Returns:
      next %%EOF marker position
      Throws:
      IOException - in case of input-output related exceptions during PDF document reading
    • nextValidToken

      public void nextValidToken() throws IOException
      Throws:
      IOException
    • nextToken

      public boolean nextToken() throws IOException
      Throws:
      IOException
    • getLongValue

      public long getLongValue()
    • getIntValue

      public int getIntValue()
    • isHexString

      public boolean isHexString()
    • isCloseStream

      public boolean isCloseStream()
    • setCloseStream

      public void setCloseStream (boolean closeStream)
    • getSafeFile

      public RandomAccessFileOrArray getSafeFile()
    • decodeStringContent

      protected static byte[] decodeStringContent (byte[] content, int from, int to, boolean hexWriting)
      Resolve escape symbols or hexadecimal symbols.

      NOTE Due to PdfReference 1.7 part 3.2.3 String value contain ASCII characters, so we can convert it directly to byte array.

      Parameters:
      content - string bytes to be decoded
      from - given start index
      to - given end index
      hexWriting - true if given string is hex-encoded, e.g. '<69546578…>'. False otherwise, e.g. '((iText( some version)…)'
      Returns:
      byte[] for decrypting or for creating String.
    • decodeStringContent

      public static byte[] decodeStringContent (byte[] content, boolean hexWriting)
      Resolve escape symbols or hexadecimal symbols.
      NOTE Due to PdfReference 1.7 part 3.2.3 String value contain ASCII characters, so we can convert it directly to byte array.
      Parameters:
      content - string bytes to be decoded
      hexWriting - true if given string is hex-encoded, e.g. '<69546578…>'. False otherwise, e.g. '((iText( some version)…)'
      Returns:
      byte[] for decrypting or for creating String.
    • isWhitespace

      public static boolean isWhitespace (int ch)
      Is a certain character a whitespace? Currently checks on the following: '0', '9', '10', '12', '13', '32'.
      The same as calling isWhiteSpace(ch, true).
      Parameters:
      ch - int
      Returns:
      boolean
    • isWhitespace

      protected static boolean isWhitespace (int ch, boolean isWhitespace)
      Checks whether a character is a whitespace. Currently checks on the following: '0', '9', '10', '12', '13', '32'.
      Parameters:
      ch - int
      isWhitespace - boolean
      Returns:
      boolean
    • isDelimiter

      protected static boolean isDelimiter (int ch)
    • isDelimiterWhitespace

      protected static boolean isDelimiterWhitespace (int ch)
    • throwError

      public void throwError (String error, Object... messageParams)
      Helper method to handle content errors. Add file position to PdfRuntimeException.
      Parameters:
      error - message.
      messageParams - error params.
      Throws:
      IOException - wrap error message into PdfRuntimeException and add position in file.
    • checkTrailer

      public static boolean checkTrailer (ByteBuffer line)
      Checks whether line equals to 'trailer'.
      Parameters:
      line - for check
      Returns:
      true, if line is equals to 'trailer', otherwise false
    • readLineSegment

      public boolean readLineSegment (ByteBuffer buffer) throws IOException
      Reads data into the provided byte[]. Checks on leading whitespace. See isWhiteSpace(int) or isWhiteSpace(int, boolean) for a list of whitespace characters.
      The same as calling readLineSegment(input, true).
      Parameters:
      buffer - a ByteBuffer to which the result of reading will be saved
      Returns:
      true, if something was read or if the end of the input stream is not reached
      Throws:
      IOException - in case of any reading error
    • readLineSegment

      public boolean readLineSegment (ByteBuffer buffer, boolean isNullWhitespace) throws IOException
      Reads data into the provided byte[]. Checks on leading whitespace. See isWhiteSpace(int) or isWhiteSpace(int, boolean) for a list of whitespace characters.
      Parameters:
      buffer - a ByteBuffer to which the result of reading will be saved
      isNullWhitespace - boolean to indicate whether '0' is whitespace or not. If in doubt, use true or overloaded method readLineSegment(input)
      Returns:
      true, if something was read or if the end of the input stream is not reached
      Throws:
      IOException - in case of any reading error
    • checkObjectStart

      public static int[] checkObjectStart (PdfTokenizer lineTokenizer)
      Check whether line starts with object declaration.
      Parameters:
      lineTokenizer - tokenizer, built by single line.
      Returns:
      object number and generation if check is successful, otherwise - null.