Class LocationTextExtractionStrategy

All Implemented Interfaces:
IEventListener, ITextExtractionStrategy

public class LocationTextExtractionStrategy extends Object implements ITextExtractionStrategy
  • Constructor Details

    • LocationTextExtractionStrategy

      public LocationTextExtractionStrategy()
      Creates a new text extraction renderer.
    • LocationTextExtractionStrategy

      public LocationTextExtractionStrategy (LocationTextExtractionStrategy.ITextChunkLocationStrategy strat)
      Creates a new text extraction renderer, with a custom strategy for creating new TextChunkLocation objects based on the input of the TextRenderInfo.
      strat - the custom strategy
  • Method Details

    • setUseActualText

      public LocationTextExtractionStrategy setUseActualText (boolean useActualText)
      Changes the behavior of text extraction so that if the parameter is set to true, /ActualText marked content property will be used instead of raw decoded bytes. Beware: the logic is not stable yet.
      useActualText - true to use /ActualText, false otherwise
      this object
    • setRightToLeftRunDirection

      public LocationTextExtractionStrategy setRightToLeftRunDirection (boolean rightToLeftRunDirection)
      Sets if text flows from left to right or from right to left. Call this method with true argument for extracting Arabic, Hebrew or other text with right-to-left writing direction.
      rightToLeftRunDirection - value specifying whether the direction should be right to left
      this object
    • isUseActualText

      public boolean isUseActualText()
      Gets the value of the property which determines if /ActualText will be used when extracting the text
      true if /ActualText value is used, false otherwise
    • eventOccurred

      public void eventOccurred (IEventData data, EventType type)
      Description copied from interface: IEventListener
      Called when some event occurs during parsing a content stream.
      Specified by:
      eventOccurred in interface IEventListener
      data - Combines the data required for processing corresponding event type.
      type - Event type.
    • getSupportedEvents

      public Set<EventType> getSupportedEvents()
      Description copied from interface: IEventListener
      Provides the set of event types this listener supports. Returns null if all possible event types are supported.
      Specified by:
      getSupportedEvents in interface IEventListener
      Set of event types supported by this listener or null if all possible event types are supported.
    • getResultantText

      public String getResultantText()
      Description copied from interface: ITextExtractionStrategy
      Returns the text that has been processed so far.
      Specified by:
      getResultantText in interface ITextExtractionStrategy
      String instance with the current resultant text
    • isChunkAtWordBoundary

      protected boolean isChunkAtWordBoundary (TextChunk chunk, TextChunk previousChunk)
      Determines if a space character should be inserted between a previous chunk and the current chunk. This method is exposed as a callback so subclasses can fine time the algorithm for determining whether a space should be inserted or not. By default, this method will insert a space if the there is a gap of more than half the font space character width between the end of the previous chunk and the beginning of the current chunk. It will also indicate that a space is needed if the starting point of the new chunk appears *before* the end of the previous chunk (i.e. overlapping text).
      chunk - the new chunk being evaluated
      previousChunk - the chunk that appeared immediately before the current chunk
      true if the two chunks represent different words (i.e. should have a space between them). False otherwise.