Class HyphenationTree

java.lang.Object
com.itextpdf.layout.hyphenation.TernaryTree
com.itextpdf.layout.hyphenation.HyphenationTree
All Implemented Interfaces:
IPatternConsumer

public class HyphenationTree extends TernaryTree implements IPatternConsumer
This tree structure stores the hyphenation patterns in an efficient way for fast lookup. It provides the provides the method to hyphenate a word.

This work was authored by Carlos Villegas (cav@uniscope.co.jp).

  • Field Details

    • vspace

      protected ByteVector vspace
      value space: stores the interletter values
    • stoplist

      protected Map<String,List> stoplist
      This map stores hyphenation exceptions
    • classmap

      protected TernaryTree classmap
      This map stores the character classes
  • Constructor Details

    • HyphenationTree

      public HyphenationTree()
      Default constructor.
  • Method Details

    • packValues

      protected int packValues (String values)
      Packs the values by storing them in 4 bits, two values into a byte Values range is from 0 to 9. We use zero as terminator, so we'll add 1 to the value.
      Parameters:
      values - a string of digits from '0' to '9' representing the interletter values.
      Returns:
      the index into the vspace array where the packed values are stored.
    • unpackValues

      protected String unpackValues (int k)
      Unpack values.
      Parameters:
      k - an integer
      Returns:
      a string
    • loadPatterns

      public void loadPatterns (String filename) throws HyphenationException, FileNotFoundException
      Read hyphenation patterns from an XML file.
      Parameters:
      filename - the filename
      Throws:
      HyphenationException - In case the parsing fails
      FileNotFoundException - When the specified file is not found
    • loadPatterns

      public void loadPatterns (InputStream stream, String name) throws HyphenationException
      Read hyphenation patterns from an XML file.
      Parameters:
      stream - the InputSource for the file
      name - unique key representing country-language combination
      Throws:
      HyphenationException - In case the parsing fails
    • findPattern

      public String findPattern (String pat)
      Find pattern.
      Parameters:
      pat - a pattern
      Returns:
      a string
    • hstrcmp

      protected int hstrcmp (char[] s, int si, char[] t, int ti)
      String compare, returns 0 if equal or t is a substring of s.
      Parameters:
      s - first character array
      si - starting index into first array
      t - second character array
      ti - starting index into second array
      Returns:
      an integer
    • getValues

      protected byte[] getValues (int k)
      Get values.
      Parameters:
      k - an integer
      Returns:
      a byte array
    • searchPatterns

      protected void searchPatterns (char[] word, int index, byte[] il)
      Search for all possible partial matches of word starting at index an update interletter values. In other words, it does something like:

      for(i=0; i

      But it is done in an efficient way since the patterns are stored in a ternary tree. In fact, this is the whole purpose of having the tree: doing this search without having to test every single pattern. The number of patterns for languages such as English range from 4000 to 10000. Thus, doing thousands of string comparisons for each word to hyphenate would be really slow without the tree. The tradeoff is memory, but using a ternary tree instead of a trie, almost halves the the memory used by Lout or TeX. It's also faster than using a hash table

      Parameters:
      word - null terminated word to match
      index - start index from word
      il - interletter values array to update
    • hyphenate

      public Hyphenation hyphenate (String word, int remainCharCount, int pushCharCount)
      Hyphenate word and return a Hyphenation object.
      Parameters:
      word - the word to be hyphenated
      remainCharCount - Minimum number of characters allowed before the hyphenation point.
      pushCharCount - Minimum number of characters allowed after the hyphenation point.
      Returns:
      a Hyphenation object representing the hyphenated word or null if word is not hyphenated.
    • hyphenate

      public Hyphenation hyphenate (char[] w, int offset, int len, int remainCharCount, int pushCharCount)
      Hyphenate word and return an array of hyphenation points.
      Parameters:
      w - char array that contains the word
      offset - Offset to first character in word
      len - Length of word
      remainCharCount - Minimum number of characters allowed before the hyphenation point.
      pushCharCount - Minimum number of characters allowed after the hyphenation point.
      Returns:
      a Hyphenation object representing the hyphenated word or null if word is not hyphenated.
    • addClass

      public void addClass (String chargroup)
      Add a character class to the tree. It is used by PatternParser as callback to add character classes. Character classes define the valid word characters for hyphenation. If a word contains a character not defined in any of the classes, it is not hyphenated. It also defines a way to normalize the characters in order to compare them with the stored patterns. Usually pattern files use only lower case characters, in this case a class for letter 'a', for example, should be defined as "aA", the first character being the normalization char.
      Specified by:
      addClass in interface IPatternConsumer
      Parameters:
      chargroup - a character class (group)
    • addException

      public void addException (String word, List hyphenatedword)
      Add an exception to the tree. It is used by PatternParser class as callback to store the hyphenation exceptions.
      Specified by:
      addException in interface IPatternConsumer
      Parameters:
      word - normalized word
      hyphenatedword - a vector of alternating strings and hyphen objects.
    • addPattern

      public void addPattern (String pattern, String ivalue)
      Add a pattern to the tree. Mainly, to be used by PatternParser class as callback to add a pattern to the tree.
      Specified by:
      addPattern in interface IPatternConsumer
      Parameters:
      pattern - the hyphenation pattern
      ivalue - interletter weight values indicating the desirability and priority of hyphenating at a given point within the pattern. It should contain only digit characters. (i.e. '0' to '9').