public class LocationTextExtractionStrategy extends Object implements TextExtractionStrategy
Modifier and Type | Class and Description |
---|---|
static class |
LocationTextExtractionStrategy.TextChunk
Represents a chunk of text, it's orientation, and location relative to the orientation vector
|
static interface |
LocationTextExtractionStrategy.TextChunkFilter
Specifies a filter for filtering LocationTextExtractionStrategy.TextChunk objects during text extraction
|
static interface |
LocationTextExtractionStrategy.TextChunkLocation |
static interface |
LocationTextExtractionStrategy.TextChunkLocationStrategy |
Constructor and Description |
---|
LocationTextExtractionStrategy()
Creates a new text extraction renderer.
|
LocationTextExtractionStrategy(LocationTextExtractionStrategy.TextChunkLocationStrategy strat)
Creates a new text extraction renderer, with a custom strategy for creating new TextChunkLocation objects based on the input of the TextRenderInfo.
|
Modifier and Type | Method and Description |
---|---|
void |
beginTextBlock()
Called when a new text block is beginning (i.e.
|
void |
endTextBlock()
Called when a text block has ended (i.e.
|
String |
getResultantText()
Returns the result so far.
|
String |
getResultantText(LocationTextExtractionStrategy.TextChunkFilter chunkFilter)
Gets text that meets the specified filter If multiple text extractions will be performed for the same page (i.e. for different physical regions of the page), filtering at this level is more efficient than filtering using FilteredRenderListener - but not nearly as powerful because most of the RenderInfo state is not captured in LocationTextExtractionStrategy.TextChunk
|
protected boolean |
isChunkAtWordBoundary(LocationTextExtractionStrategy.TextChunk chunk, LocationTextExtractionStrategy.TextChunk previousChunk)
Determines if a space character should be inserted between a previous chunk and the current chunk.
|
void |
renderImage(ImageRenderInfo renderInfo)
no-op method - this renderer isn't interested in image events
|
void |
renderText(TextRenderInfo renderInfo)
Called when text should be rendered
|
public LocationTextExtractionStrategy()
public LocationTextExtractionStrategy(LocationTextExtractionStrategy.TextChunkLocationStrategy strat)
strat
- the custom strategy
public void beginTextBlock()
RenderListener
beginTextBlock
in interface RenderListener
RenderListener.beginTextBlock()
public void endTextBlock()
RenderListener
endTextBlock
in interface RenderListener
RenderListener.endTextBlock()
protected boolean isChunkAtWordBoundary(LocationTextExtractionStrategy.TextChunk chunk, LocationTextExtractionStrategy.TextChunk previousChunk)
chunk
- the new chunk being evaluated
previousChunk
- the chunk that appeared immediately before the current chunk
public String getResultantText(LocationTextExtractionStrategy.TextChunkFilter chunkFilter)
FilteredRenderListener
- but not nearly as powerful because most of the RenderInfo state is not captured in LocationTextExtractionStrategy.TextChunk
chunkFilter
- the filter to to apply
public String getResultantText()
getResultantText
in interface TextExtractionStrategy
public void renderText(TextRenderInfo renderInfo)
RenderListener
renderText
in interface RenderListener
renderInfo
- information specifying what to render
RenderListener.renderText(com.itextpdf.text.pdf.parser.TextRenderInfo)
public void renderImage(ImageRenderInfo renderInfo)
renderImage
in interface RenderListener
renderInfo
- information specifying what to render
RenderListener.renderImage(com.itextpdf.text.pdf.parser.ImageRenderInfo)
Copyright © 2016. All rights reserved.