Package com.itextpdf.io.source
Class PdfTokenizer
java.lang.Object
com.itextpdf.io.source.PdfTokenizer
- All Implemented Interfaces:
-
Closeable
,AutoCloseable
-
Nested Class Summary
-
Field Summary
Modifier and TypeFieldDescriptionstatic final byte[]
static final byte[]
protected int
protected boolean
static final byte[]
static final byte[]
static final byte[]
protected ByteBuffer
static final byte[]
protected int
static final byte[]
static final byte[]
static final byte[]
static final byte[]
protected PdfTokenizer.TokenType
static final byte[]
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
backOnePosition
(int ch) void
static int[]
checkObjectStart
(PdfTokenizer lineTokenizer) Check whether line starts with object declaration.static boolean
checkTrailer
(ByteBuffer line) Checks whetherline
equals to 'trailer'.void
close()
static byte[]
decodeStringContent
(byte[] content, boolean hexWriting) Resolve escape symbols or hexadecimal symbols.protected static byte[]
decodeStringContent
(byte[] content, int from, int to, boolean hexWriting) Resolve escape symbols or hexadecimal symbols.byte[]
byte[]
int
getGenNr()
int
int
long
long
Gets next %%EOF marker in current PDF file.int
getObjNr()
long
long
boolean
protected static boolean
isDelimiter
(int ch) protected static boolean
isDelimiterWhitespace
(int ch) boolean
static boolean
isWhitespace
(int ch) Is a certain character a whitespace? Currently checks on the following: '0', '9', '10', '12', '13', '32'.protected static boolean
isWhitespace
(int ch, boolean isWhitespace) Checks whether a character is a whitespace.long
length()
boolean
void
int
peek()
Gets the next byte of pdf source without moving source position.int
peek
(byte[] buffer) Gets the nextbuffer.length
bytes of pdf source without moving source position.int
read()
void
readFully
(byte[] bytes) boolean
readLineSegment
(ByteBuffer buffer) Reads data into the provided byte[].boolean
readLineSegment
(ByteBuffer buffer, boolean isNullWhitespace) Reads data into the provided byte[].readString
(int size) void
seek
(long pos) void
setCloseStream
(boolean closeStream) void
throwError
(String error, Object... messageParams) Helper method to handle content errors.boolean
tokenValueEqualsTo
(byte[] cmp)
-
Field Details
-
Obj
public static final byte[] Obj -
R
public static final byte[] R -
Xref
public static final byte[] Xref -
Startxref
public static final byte[] Startxref -
Stream
public static final byte[] Stream -
Trailer
public static final byte[] Trailer -
N
public static final byte[] N -
F
public static final byte[] F -
Null
public static final byte[] Null -
True
public static final byte[] True -
False
public static final byte[] False -
type
-
reference
protected int reference -
generation
protected int generation -
hexString
protected boolean hexString -
outBuf
-
-
Constructor Details
-
PdfTokenizer
Creates a PdfTokenizer for the specifiedRandomAccessFileOrArray
. The beginning of the file is read to determine the location of the header, and the data source is adjusted as necessary to account for any junk that occurs in the byte source before the header- Parameters:
-
file
- the source
-
-
Method Details
-
seek
public void seek(long pos) -
readFully
- Throws:
-
IOException
-
getPosition
public long getPosition() -
close
- Specified by:
-
close
in interfaceAutoCloseable
- Specified by:
-
close
in interfaceCloseable
- Throws:
-
IOException
-
length
public long length() -
read
- Throws:
-
IOException
-
peek
Gets the next byte of pdf source without moving source position.- Returns:
- the byte, or -1 if EOF is reached
- Throws:
-
IOException
- in case of any reading error.
-
peek
Gets the nextbuffer.length
bytes of pdf source without moving source position.- Parameters:
-
buffer
- buffer to store read bytes - Returns:
-
the number of read bytes. If it is less than
buffer.length
it means EOF has been reached. - Throws:
-
IOException
- in case of any reading error.
-
readString
- Throws:
-
IOException
-
getTokenType
-
getByteContent
public byte[] getByteContent() -
getStringValue
-
getDecodedStringContent
public byte[] getDecodedStringContent() -
tokenValueEqualsTo
public boolean tokenValueEqualsTo(byte[] cmp) -
getObjNr
public int getObjNr() -
getGenNr
public int getGenNr() -
backOnePosition
public void backOnePosition(int ch) -
getHeaderOffset
- Throws:
-
IOException
-
checkPdfHeader
- Throws:
-
IOException
-
checkFdfHeader
- Throws:
-
IOException
-
getStartxref
- Throws:
-
IOException
-
getNextEof
Gets next %%EOF marker in current PDF file.- Returns:
- next %%EOF marker position
- Throws:
-
IOException
- in case of input-output related exceptions during PDF document reading
-
nextValidToken
- Throws:
-
IOException
-
nextToken
- Throws:
-
IOException
-
getLongValue
public long getLongValue() -
getIntValue
public int getIntValue() -
isHexString
public boolean isHexString() -
isCloseStream
public boolean isCloseStream() -
setCloseStream
public void setCloseStream(boolean closeStream) -
getSafeFile
-
decodeStringContent
protected static byte[] decodeStringContent(byte[] content, int from, int to, boolean hexWriting) Resolve escape symbols or hexadecimal symbols.NOTE Due to PdfReference 1.7 part 3.2.3 String value contain ASCII characters, so we can convert it directly to byte array.
- Parameters:
-
content
- string bytes to be decoded -
from
- given start index -
to
- given end index -
hexWriting
- true if given string is hex-encoded, e.g. '<69546578…>'. False otherwise, e.g. '((iText( some version)…)' - Returns:
-
byte[] for decrypting or for creating
String
.
-
decodeStringContent
public static byte[] decodeStringContent(byte[] content, boolean hexWriting) Resolve escape symbols or hexadecimal symbols.
NOTE Due to PdfReference 1.7 part 3.2.3 String value contain ASCII characters, so we can convert it directly to byte array.- Parameters:
-
content
- string bytes to be decoded -
hexWriting
- true if given string is hex-encoded, e.g. '<69546578…>'. False otherwise, e.g. '((iText( some version)…)' - Returns:
-
byte[] for decrypting or for creating
String
.
-
isWhitespace
public static boolean isWhitespace(int ch) Is a certain character a whitespace? Currently checks on the following: '0', '9', '10', '12', '13', '32'.
The same as callingisWhiteSpace(ch, true)
.- Parameters:
-
ch
- int - Returns:
- boolean
-
isWhitespace
protected static boolean isWhitespace(int ch, boolean isWhitespace) Checks whether a character is a whitespace. Currently checks on the following: '0', '9', '10', '12', '13', '32'.- Parameters:
-
ch
- int -
isWhitespace
- boolean - Returns:
- boolean
-
isDelimiter
protected static boolean isDelimiter(int ch) -
isDelimiterWhitespace
protected static boolean isDelimiterWhitespace(int ch) -
throwError
Helper method to handle content errors. Add file position toPdfRuntimeException
.- Parameters:
-
error
- message. -
messageParams
- error params. - Throws:
-
IOException
- wrap error message intoPdfRuntimeException
and add position in file.
-
checkTrailer
Checks whetherline
equals to 'trailer'.- Parameters:
-
line
- for check - Returns:
- true, if line is equals to 'trailer', otherwise false
-
readLineSegment
Reads data into the provided byte[]. Checks on leading whitespace. SeeisWhiteSpace(int)
orisWhiteSpace(int, boolean)
for a list of whitespace characters.
The same as callingreadLineSegment(input, true)
.- Parameters:
-
buffer
- aByteBuffer
to which the result of reading will be saved - Returns:
- true, if something was read or if the end of the input stream is not reached
- Throws:
-
IOException
- in case of any reading error
-
readLineSegment
Reads data into the provided byte[]. Checks on leading whitespace. SeeisWhiteSpace(int)
orisWhiteSpace(int, boolean)
for a list of whitespace characters.- Parameters:
-
buffer
- aByteBuffer
to which the result of reading will be saved -
isNullWhitespace
- boolean to indicate whether '0' is whitespace or not. If in doubt, use true or overloaded methodreadLineSegment(input)
- Returns:
- true, if something was read or if the end of the input stream is not reached
- Throws:
-
IOException
- in case of any reading error
-
checkObjectStart
Check whether line starts with object declaration.- Parameters:
-
lineTokenizer
- tokenizer, built by single line. - Returns:
- object number and generation if check is successful, otherwise - null.
-