iText 9.4.0 API
iText.Kernel.Pdf.Canvas.Parser.Listener.RegexBasedLocationExtractionStrategy Class Reference

This class is designed to search for the occurrences of a regular expression and return the resultant rectangles. More...

Inheritance diagram for iText.Kernel.Pdf.Canvas.Parser.Listener.RegexBasedLocationExtractionStrategy:
iText.Kernel.Pdf.Canvas.Parser.Listener.ILocationExtractionStrategy iText.Kernel.Pdf.Canvas.Parser.Listener.IEventListener

Public Member Functions

  RegexBasedLocationExtractionStrategy (String regex)
 
  RegexBasedLocationExtractionStrategy (Regex pattern)
 
virtual ICollection< IPdfTextLocation GetResultantLocations ()
  Returns the iText.Kernel.Geom.Rectangle s that have been processed so far. More...
 
virtual void  EventOccurred (IEventData data, EventType type)
  Called when some event occurs during parsing a content stream. More...
 
virtual ICollection< EventType GetSupportedEvents ()
  Provides the set of event types this listener supports. More...
 

Package Functions

virtual IList< CharacterRenderInfo ToCRI (TextRenderInfo tri)
  Convert iText.Kernel.Pdf.Canvas.Parser.Data.TextRenderInfo to CharacterRenderInfo This method is public and not final so that custom implementations can choose to override it. More...
 
virtual IList< Rectangle ToRectangles (IList< CharacterRenderInfo > cris)
  Converts CharacterRenderInfo objects to iText.Kernel.Geom.Rectangle s This method is protected and not final so that custom implementations can choose to override it. More...
 

Detailed Description

This class is designed to search for the occurrences of a regular expression and return the resultant rectangles.

This class is designed to search for the occurrences of a regular expression and return the resultant rectangles. Do note that this class holds all text locations and can't be used for processing multiple pages. If you want to extract text from several pages of pdf document you have to create a new instance of RegexBasedLocationExtractionStrategy for each page.

Here is an example of usage with new instance per each page: PdfDocument document = new PdfDocument(new PdfReader("...")); for (int i = 1; i <= document.getNumberOfPages(); ++i) { RegexBasedLocationExtractionStrategy extractionStrategy = new RegexBasedLocationExtractionStrategy(""); PdfCanvasProcessor processor = new PdfCanvasProcessor(extractionStrategy); processor.processPageContent(document.getPage(i)); for (IPdfTextLocation location : extractionStrategy.getResultantLocations()) { //process locations ... } }

Member Function Documentation

◆ EventOccurred()

virtual void iText.Kernel.Pdf.Canvas.Parser.Listener.RegexBasedLocationExtractionStrategy.EventOccurred ( IEventData  data,
EventType  type 
)
inlinevirtual

Called when some event occurs during parsing a content stream.

Implements iText.Kernel.Pdf.Canvas.Parser.Listener.IEventListener.

◆ GetResultantLocations()

virtual ICollection<IPdfTextLocation> iText.Kernel.Pdf.Canvas.Parser.Listener.RegexBasedLocationExtractionStrategy.GetResultantLocations ( )
inlinevirtual

◆ GetSupportedEvents()

virtual ICollection<EventType> iText.Kernel.Pdf.Canvas.Parser.Listener.RegexBasedLocationExtractionStrategy.GetSupportedEvents ( )
inlinevirtual

Provides the set of event types this listener supports.

Implements iText.Kernel.Pdf.Canvas.Parser.Listener.IEventListener.

◆ ToCRI()

virtual IList<CharacterRenderInfo> iText.Kernel.Pdf.Canvas.Parser.Listener.RegexBasedLocationExtractionStrategy.ToCRI ( TextRenderInfo  tri )
inlinepackagevirtual

Convert iText.Kernel.Pdf.Canvas.Parser.Data.TextRenderInfo to CharacterRenderInfo This method is public and not final so that custom implementations can choose to override it.

Convert iText.Kernel.Pdf.Canvas.Parser.Data.TextRenderInfo to CharacterRenderInfo This method is public and not final so that custom implementations can choose to override it. Other implementations of CharacterRenderInfo may choose to store different properties than merely the iText.Kernel.Geom.Rectangle describing the bounding box. E.g. a custom implementation might choose to store iText.Kernel.Colors.Color information as well, to better match the content surrounding the redaction iText.Kernel.Geom.Rectangle.

Parameters
tri

iText.Kernel.Pdf.Canvas.Parser.Data.TextRenderInfo object

Returns
a list of CharacterRenderInfo s which represents the passed iText.Kernel.Pdf.Canvas.Parser.Data.TextRenderInfo ?

◆ ToRectangles()

virtual IList<Rectangle> iText.Kernel.Pdf.Canvas.Parser.Listener.RegexBasedLocationExtractionStrategy.ToRectangles ( IList< CharacterRenderInfo cris )
inlinepackagevirtual

Converts CharacterRenderInfo objects to iText.Kernel.Geom.Rectangle s This method is protected and not final so that custom implementations can choose to override it.

Converts CharacterRenderInfo objects to iText.Kernel.Geom.Rectangle s This method is protected and not final so that custom implementations can choose to override it. E.g. other implementations may choose to add padding/margin to the Rectangles. This method also offers a convenient access point to the mapping of CharacterRenderInfo to iText.Kernel.Geom.Rectangle. This mapping enables (custom implementations) to match color of text in redacted Rectangles, or match color of background, by the mere virtue of offering access to the CharacterRenderInfo objects that generated the iText.Kernel.Geom.Rectangle.

Parameters
cris list of CharacterRenderInfo objects
Returns
an array containing the elements of this list