Class TextInfo

java.lang.Object
com.itextpdf.pdfocr.TextInfo

public class TextInfo extends Object
This class describes how recognized text is positioned on the image providing bbox for each text item (could be a line or a word).
  • Constructor Details

    • TextInfo

      public TextInfo()
      Creates a new TextInfo instance.
    • TextInfo

      public TextInfo (TextInfo textInfo)
      Creates a new TextInfo instance from existing one.
      Parameters:
      textInfo - to create from
    • TextInfo

      public TextInfo (String text, com.itextpdf.kernel.geom.Point[] bbox)
      Creates new TextInfo instance.
      Parameters:
      text - text string
      bbox - array of 4 Points describing text bbox (lower-left based relative to text) expressed in points (0 - lower-left, 1 - upper-left, 2 - upper-right, 3 - lower-right point)
    • TextInfo

      public TextInfo (String text, com.itextpdf.kernel.geom.Rectangle bbox)
      Creates new TextInfo instance. Could be used for not rotated text chunks.
      Parameters:
      text - text string
      bbox - Rectangle describing text bounding box expressed in PDF points
  • Method Details

    • getText

      public String getText()
      Gets text element.
      Returns:
      text string
    • setText

      public TextInfo setText (String newText)
      Sets text element.
      Parameters:
      newText - retrieved text
      Returns:
      this instance
    • getTextPoints

      public com.itextpdf.kernel.geom.Point[] getTextPoints()
      Gets array of 4 Points describing text bbox (lower-left based relative to text) expressed in points.

      Point array stores text polygon in the following order relative to text: 0 - lower-left, 1 - upper-left, 2 - upper-right, 3 - lower-right point.

      The following coordinate system is used for points coordinate: the origin is located in left bottom corner of the page, vertical (y) coordinates increase from the bottom of the page to the top, horizontal (x) coordinates increase from the left side of the page to the right, axe unit is user space unit which we call PDF point (1 PDF point = 1/72 inch = 4/3 pixel).

      Returns:
      array of 4 Points describing text bbox (lower-left based relative to text) expressed in points
    • setTextPoints

      public TextInfo setTextPoints (com.itextpdf.kernel.geom.Point[] textPoints)
      Sets array of 4 Points describing text bbox (lower-left based relative to text) expressed in points.

      Point array should store text polygon in the following order relative to text: 0 - lower-left, 1 - upper-left, 2 - upper-right, 3 - lower-right point.

      The following coordinate system is used for points coordinate: the origin is located in left bottom corner of the page, vertical (y) coordinates increase from the bottom of the page to the top, horizontal (x) coordinates increase from the left side of the page to the right, axe unit is user space unit which we call PDF point (1 PDF point = 1/72 inch = 4/3 pixel).

      Parameters:
      textPoints - array of 4 Points describing text bbox (lower-left based relative to text) expressed in points
      Returns:
      this instance
    • getPixelTextPoints

      public com.itextpdf.kernel.geom.Point[] getPixelTextPoints (int imageHeight)
      Gets array of 4 Points describing text bbox (lower-left based relative to text) expressed in pixels.

      Point array stores text polygon in the following order relative to text: 0 - lower-left, 1 - upper-left, 2 - upper-right, 3 - lower-right point.

      The following coordinate system is used for text points coordinate: the origin is located in left top corner of the page (image), vertical (y) coordinates increase from the top of the page to the bottom, horizontal (x) coordinates increase from the left side of the page to the right, axe unit is pixel (1 pixel = 1/96 inch = 0.75 PDF point).

      Parameters:
      imageHeight - height of the image to convert the text PDF points to image pixels coordinates. Used to change the y origin
      Returns:
      array of 4 Points describing text bbox (lower-left based relative to text) expressed in pixels
    • setPixelTextPoints

      public TextInfo setPixelTextPoints (com.itextpdf.kernel.geom.Point[] textPoints, int imageHeight)
      Sets an array of 4 Points describing text bbox (lower-left based relative to text) expressed in pixels.

      Point array should store text polygon in the following order relative to text: 0 - lower-left, 1 - upper-left, 2 - upper-right, 3 - lower-right point.

      The following coordinate system is used for text points coordinate: the origin is located in left top corner of the page, vertical (y) coordinates increase from the top of the page to the bottom, horizontal (x) coordinates increase from the left side of the page to the right, axe unit is pixel (1 pixel = 1/96 inch = 0.75 PDF point).

      Parameters:
      textPoints - array of 4 Points describing text bbox (0 - lower-left, 1 - upper-left, 2 - upper-right, 3 - lower-right relative to text) expressed in pixels
      imageHeight - height of the image to convert the text PDF points to image pixels coordinates. Used to change the y origin
      Returns:
      array of 4 Points describing text bbox (lower-left based relative to text) expressed in pixels
    • getBBoxRect

      public com.itextpdf.kernel.geom.Rectangle getBBoxRect()
      Converts a text polygon to a bounding box.
      Returns:
      Rectangle representing text bounding box
    • getRotationAngle

      public float getRotationAngle()
      Returns the text rotation angle in radian for this TextInfo in the range of -pi to pi.
      Returns:
      the text rotation angle in radian for the current TextInfo
    • getLogicalStructureTreeItem

      public LogicalStructureTreeItem getLogicalStructureTreeItem()
      Retrieves structure tree item for the text item.
      Returns:
      structure tree item.
    • setLogicalStructureTreeItem

      public void setLogicalStructureTreeItem (LogicalStructureTreeItem logicalStructureTreeItem)
      Sets logical structure tree parent item for the text info. It allows to organize text chunks into logical hierarchy, e.g. specify document paragraphs, tables, etc.

      If LogicalStructureTreeItem is set, then the list of TextInfos in IOcrEngine.doImageOcr(java.io.File) return value is expected to be in logical order.

      Parameters:
      logicalStructureTreeItem - structure tree item