java.lang.Object

com.itextpdf.pdfocr.onnxtr.recognition.Vocabulary

public class Vocabulary extends Object

A string-based LUT for mapping text recognition model results to characters.

This class assumes, that each character is represented with a single UTF-16 code unit. So the string itself can be used as a LUT. If this is not the case, results will be unpredictable.

It pretty much implements IOutputLabelMapper for Character but since it would involve unnecessary boxing, it is a standalone thing instead.

Field Summary

Fields

Modifier and Type

Field

Description

static final Vocabulary

AFRIKAANS

static final Vocabulary

ALBANIAN

static final Vocabulary

ANCIENT_GREEK

static final Vocabulary

ARABIC

static final Vocabulary

ARABIC_DIACRITICS

static final Vocabulary

ARABIC_DIGITS

static final Vocabulary

ARABIC_LETTERS

static final Vocabulary

ARABIC_PUNCTUATION

static final Vocabulary

ARMENIAN

static final Vocabulary

ASCII_LETTERS

static final Vocabulary

ASCII_LOWERCASE

static final Vocabulary

ASCII_UPPERCASE

static final Vocabulary

AZERBAIJANI

static final Vocabulary

BASQUE

static final Vocabulary

BELARUSIAN

static final Vocabulary

BENGALI

static final Vocabulary

BENGALI_CONSONANTS

static final Vocabulary

BENGALI_DIGITS

static final Vocabulary

BENGALI_MATRAS

static final Vocabulary

BENGALI_PUNCTUATION

static final Vocabulary

BENGALI_SIGNS

static final Vocabulary

BENGALI_VIRAMA

static final Vocabulary

BENGALI_VOWELS

static final Vocabulary

BOSNIAN

static final Vocabulary

BULGARIAN

static final Vocabulary

BURMESE

static final Vocabulary

BURMESE_CONSONANTS

static final Vocabulary

BURMESE_DIACRITICS

static final Vocabulary

BURMESE_DIGITS

static final Vocabulary

BURMESE_PUNCTUATION

static final Vocabulary

BURMESE_VIRAMA

static final Vocabulary

BURMESE_VOWELS

static final Vocabulary

CATALAN

static final Vocabulary

CROATIAN

static final Vocabulary

CURRENCY

static final Vocabulary

CZECH

static final Vocabulary

DANISH

static final Vocabulary

DEVANAGARI

static final Vocabulary

DEVANAGARI_CONSONANTS

static final Vocabulary

DEVANAGARI_DIGITS

static final Vocabulary

DEVANAGARI_MATRAS

static final Vocabulary

DEVANAGARI_PUNCTUATION

static final Vocabulary

DEVANAGARI_SIGNS

static final Vocabulary

DEVANAGARI_VIRAMA

static final Vocabulary

DEVANAGARI_VOWELS

static final Vocabulary

DIGITS

static final Vocabulary

DUTCH

static final Vocabulary

ENGLISH

static final Vocabulary

ESPERANTO

static final Vocabulary

ESTONIAN

static final Vocabulary

ETHIOPIC

static final Vocabulary

FINNISH

static final Vocabulary

FRENCH

static final Vocabulary

FRISIAN

static final Vocabulary

GALICIAN

static final Vocabulary

GENERIC_CYRILLIC_LETTERS

static final Vocabulary

GEORGIAN

static final Vocabulary

GERMAN

static final Vocabulary

GREEK

static final Vocabulary

GREEK_EXTENDED

static final Vocabulary

GUJARATI

static final Vocabulary

GUJARATI_CONSONANTS

static final Vocabulary

GUJARATI_DIGITS

static final Vocabulary

GUJARATI_MATRAS

static final Vocabulary

GUJARATI_PUNCTUATION

static final Vocabulary

GUJARATI_SIGNS

static final Vocabulary

GUJARATI_VIRAMA

static final Vocabulary

GUJARATI_VOWELS

static final Vocabulary

HAUSA

static final Vocabulary

HEBREW

static final Vocabulary

HEBREW_CANTILLATIONS

static final Vocabulary

HEBREW_CONSONANTS

static final Vocabulary

HEBREW_PUNCTUATION

static final Vocabulary

HEBREW_SPECIALS

static final Vocabulary

HEBREW_VOWELS

static final Vocabulary

HINDI

static final Vocabulary

HINDI_DIGITS

static final Vocabulary

HUNGARIAN

static final Vocabulary

ICELANDIC

static final Vocabulary

INDONESIAN

static final Vocabulary

IRISH

static final Vocabulary

ITALIAN

static final Vocabulary

JAPANESE

static final Vocabulary

JAVANESE

static final Vocabulary

JAVANESE_CONSONANTS

static final Vocabulary

JAVANESE_DIACRITICS

static final Vocabulary

JAVANESE_DIGITS

static final Vocabulary

JAVANESE_PUNCTUATION

static final Vocabulary

JAVANESE_VIRAMA

static final Vocabulary

JAVANESE_VOWELS

static final Vocabulary

KANNADA

static final Vocabulary

KANNADA_CONSONANTS

static final Vocabulary

KANNADA_DIGITS

static final Vocabulary

KANNADA_MATRAS

static final Vocabulary

KANNADA_PUNCTUATION

static final Vocabulary

KANNADA_SIGNS

static final Vocabulary

KANNADA_VIRAMA

static final Vocabulary

KANNADA_VOWELS

static final Vocabulary

KAZAKH

static final Vocabulary

KHMER

static final Vocabulary

KHMER_CONSONANTS

static final Vocabulary

KHMER_DIACRITICS

static final Vocabulary

KHMER_DIGITS

static final Vocabulary

KHMER_MATRAS

static final Vocabulary

KHMER_PUNCTUATION

static final Vocabulary

KHMER_VIRAMA

static final Vocabulary

KHMER_VOWELS

static final Vocabulary

KOREAN

static final Vocabulary

KURDISH

static final Vocabulary

KYRGYZ

static final Vocabulary

LAO

static final Vocabulary

LATIN

static final Vocabulary

LATIN_EXTENDED

static final Vocabulary

LATVIAN

static final Vocabulary

LEGACY_FRENCH

static final Vocabulary

LITHUANIAN

static final Vocabulary

LUXEMBOURGISH

static final Vocabulary

MACEDONIAN

static final Vocabulary

MALAGASY

static final Vocabulary

MALAY

static final Vocabulary

MALAYALAM

static final Vocabulary

MALAYALAM_CONSONANTS

static final Vocabulary

MALAYALAM_DIGITS

static final Vocabulary

MALAYALAM_MATRAS

static final Vocabulary

MALAYALAM_SIGNS

static final Vocabulary

MALAYALAM_VIRAMA

static final Vocabulary

MALAYALAM_VOWELS

static final Vocabulary

MALTESE

static final Vocabulary

MAORI

static final Vocabulary

MARATHI

static final Vocabulary

MONGOLIAN

static final Vocabulary

MONTENEGRIN

static final Vocabulary

MULTI_LANG

static final Vocabulary

MULTI_LANG_FULL

static final Vocabulary

NEPALI

static final Vocabulary

NORWEGIAN

static final Vocabulary

ODIA

static final Vocabulary

ODIA_CONSONANTS

static final Vocabulary

ODIA_DIGITS

static final Vocabulary

ODIA_MATRAS

static final Vocabulary

ODIA_PUNCTUATION

static final Vocabulary

ODIA_SIGNS

static final Vocabulary

ODIA_VIRAMA

static final Vocabulary

ODIA_VOWELS

static final Vocabulary

PASHTO

static final Vocabulary

PERSIAN

static final Vocabulary

PERSIAN_LETTERS

static final Vocabulary

POLISH

static final Vocabulary

PORTUGUESE

static final Vocabulary

PUNCTUATION

static final Vocabulary

PUNJABI

static final Vocabulary

PUNJABI_CONSONANTS

static final Vocabulary

PUNJABI_DIGITS

static final Vocabulary

PUNJABI_MATRAS

static final Vocabulary

PUNJABI_PUNCTUATION

static final Vocabulary

PUNJABI_SIGNS

static final Vocabulary

PUNJABI_VIRAMA

static final Vocabulary

PUNJABI_VOWELS

static final Vocabulary

QUECHUA

static final Vocabulary

ROMANIAN

static final Vocabulary

RUSSIAN

static final Vocabulary

RUSSIAN_CYRILLIC_LETTERS

static final Vocabulary

RUSSIAN_SIGNS

static final Vocabulary

SANSKRIT

static final Vocabulary

SCOTTISH_GAELIC

static final Vocabulary

SERBIAN_CYRILLIC

static final Vocabulary

SERBIAN_LATIN

static final Vocabulary

SIMPLIFIED_CHINESE

static final Vocabulary

SINDHI

static final Vocabulary

SINHALA

static final Vocabulary

SINHALA_CONSONANTS

static final Vocabulary

SINHALA_DIGITS

static final Vocabulary

SINHALA_MATRAS

static final Vocabulary

SINHALA_PUNCTUATION

static final Vocabulary

SINHALA_SIGNS

static final Vocabulary

SINHALA_VIRAMA

static final Vocabulary

SINHALA_VOWELS

static final Vocabulary

SLOVAK

static final Vocabulary

SLOVENE

static final Vocabulary

SOMALI

static final Vocabulary

SPANISH

static final Vocabulary

SUDANESE

static final Vocabulary

SUDANESE_CONSONANTS

static final Vocabulary

SUDANESE_DIACRITICS

static final Vocabulary

SUDANESE_DIGITS

static final Vocabulary

SUDANESE_VOWELS

static final Vocabulary

SWAHILI

static final Vocabulary

SWEDISH

static final Vocabulary

TAGALOG

static final Vocabulary

TAJIK

static final Vocabulary

TAMIL

static final Vocabulary

TAMIL_CONSONANTS

static final Vocabulary

TAMIL_DIGITS

static final Vocabulary

TAMIL_FRACTIONS

static final Vocabulary

TAMIL_MATRAS

static final Vocabulary

TAMIL_PUNCTUATION

static final Vocabulary

TAMIL_SIGNS

static final Vocabulary

TAMIL_VIRAMA

static final Vocabulary

TAMIL_VOWELS

static final Vocabulary

TATAR

static final Vocabulary

TELUGU

static final Vocabulary

TELUGU_CONSONANTS

static final Vocabulary

TELUGU_DIGITS

static final Vocabulary

TELUGU_MATRAS

static final Vocabulary

TELUGU_PUNCTUATION

static final Vocabulary

TELUGU_SIGNS

static final Vocabulary

TELUGU_VIRAMA

static final Vocabulary

TELUGU_VOWELS

static final Vocabulary

THAI

static final Vocabulary

TURKISH

static final Vocabulary

UKRAINIAN

static final Vocabulary

URDU

static final Vocabulary

UYGHUR

static final Vocabulary

UZBEK_CYRILLIC

static final Vocabulary

UZBEK_LATIN

static final Vocabulary

VIETNAMESE

static final Vocabulary

WELSH

static final Vocabulary

YAKUT

static final Vocabulary

YORUBA

static final Vocabulary

ZULU
Constructor Summary

Constructors

Constructor

Description

Vocabulary(String lookUpString)

Creates a new vocabulary based on a look-up string.
Method Summary

Modifier and Type

Method

Description

static Vocabulary

concat(Vocabulary... vocabularies)

Creates a new vocabulary by concatenating multiple ones.

boolean

equals(Object o)

String

getLookUpString()

Returns the look-up string.

int

hashCode()

char

map(int index)

Returns character, which is mapped to the specified index in the lookup string.

int

size()

Returns the size of the vocabulary.

String

toString()

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Field Details
- ASCII_LOWERCASE
  
  public static final Vocabulary ASCII_LOWERCASE
- ASCII_UPPERCASE
  
  public static final Vocabulary ASCII_UPPERCASE
- ASCII_LETTERS
  
  public static final Vocabulary ASCII_LETTERS
- DIGITS
  
  public static final Vocabulary DIGITS
- PUNCTUATION
  
  public static final Vocabulary PUNCTUATION
- CURRENCY
  
  public static final Vocabulary CURRENCY
- LATIN
  
  public static final Vocabulary LATIN
- ENGLISH
  
  public static final Vocabulary ENGLISH
- LEGACY_FRENCH
  
  public static final Vocabulary LEGACY_FRENCH
- FRENCH
  
  public static final Vocabulary FRENCH
- HINDI_DIGITS
  
  public static final Vocabulary HINDI_DIGITS
- GENERIC_CYRILLIC_LETTERS
  
  public static final Vocabulary GENERIC_CYRILLIC_LETTERS
- RUSSIAN_CYRILLIC_LETTERS
  
  public static final Vocabulary RUSSIAN_CYRILLIC_LETTERS
- RUSSIAN_SIGNS
  
  public static final Vocabulary RUSSIAN_SIGNS
- ANCIENT_GREEK
  
  public static final Vocabulary ANCIENT_GREEK
- ARABIC_DIACRITICS
  
  public static final Vocabulary ARABIC_DIACRITICS
- ARABIC_DIGITS
  
  public static final Vocabulary ARABIC_DIGITS
- ARABIC_LETTERS
  
  public static final Vocabulary ARABIC_LETTERS
- ARABIC_PUNCTUATION
  
  public static final Vocabulary ARABIC_PUNCTUATION
- PERSIAN_LETTERS
  
  public static final Vocabulary PERSIAN_LETTERS
- BENGALI_CONSONANTS
  
  public static final Vocabulary BENGALI_CONSONANTS
- BENGALI_VOWELS
  
  public static final Vocabulary BENGALI_VOWELS
- BENGALI_DIGITS
  
  public static final Vocabulary BENGALI_DIGITS
- BENGALI_MATRAS
  
  public static final Vocabulary BENGALI_MATRAS
- BENGALI_VIRAMA
  
  public static final Vocabulary BENGALI_VIRAMA
- BENGALI_PUNCTUATION
  
  public static final Vocabulary BENGALI_PUNCTUATION
- BENGALI_SIGNS
  
  public static final Vocabulary BENGALI_SIGNS
- GUJARATI_CONSONANTS
  
  public static final Vocabulary GUJARATI_CONSONANTS
- GUJARATI_VOWELS
  
  public static final Vocabulary GUJARATI_VOWELS
- GUJARATI_DIGITS
  
  public static final Vocabulary GUJARATI_DIGITS
- GUJARATI_MATRAS
  
  public static final Vocabulary GUJARATI_MATRAS
- GUJARATI_VIRAMA
  
  public static final Vocabulary GUJARATI_VIRAMA
- GUJARATI_PUNCTUATION
  
  public static final Vocabulary GUJARATI_PUNCTUATION
- GUJARATI_SIGNS
  
  public static final Vocabulary GUJARATI_SIGNS
- DEVANAGARI_CONSONANTS
  
  public static final Vocabulary DEVANAGARI_CONSONANTS
- DEVANAGARI_VOWELS
  
  public static final Vocabulary DEVANAGARI_VOWELS
- DEVANAGARI_DIGITS
  
  public static final Vocabulary DEVANAGARI_DIGITS
- DEVANAGARI_MATRAS
  
  public static final Vocabulary DEVANAGARI_MATRAS
- DEVANAGARI_VIRAMA
  
  public static final Vocabulary DEVANAGARI_VIRAMA
- DEVANAGARI_PUNCTUATION
  
  public static final Vocabulary DEVANAGARI_PUNCTUATION
- DEVANAGARI_SIGNS
  
  public static final Vocabulary DEVANAGARI_SIGNS
- PUNJABI_CONSONANTS
  
  public static final Vocabulary PUNJABI_CONSONANTS
- PUNJABI_VOWELS
  
  public static final Vocabulary PUNJABI_VOWELS
- PUNJABI_DIGITS
  
  public static final Vocabulary PUNJABI_DIGITS
- PUNJABI_MATRAS
  
  public static final Vocabulary PUNJABI_MATRAS
- PUNJABI_VIRAMA
  
  public static final Vocabulary PUNJABI_VIRAMA
- PUNJABI_PUNCTUATION
  
  public static final Vocabulary PUNJABI_PUNCTUATION
- PUNJABI_SIGNS
  
  public static final Vocabulary PUNJABI_SIGNS
- TAMIL_CONSONANTS
  
  public static final Vocabulary TAMIL_CONSONANTS
- TAMIL_VOWELS
  
  public static final Vocabulary TAMIL_VOWELS
- TAMIL_DIGITS
  
  public static final Vocabulary TAMIL_DIGITS
- TAMIL_MATRAS
  
  public static final Vocabulary TAMIL_MATRAS
- TAMIL_VIRAMA
  
  public static final Vocabulary TAMIL_VIRAMA
- TAMIL_PUNCTUATION
  
  public static final Vocabulary TAMIL_PUNCTUATION
- TAMIL_SIGNS
  
  public static final Vocabulary TAMIL_SIGNS
- TAMIL_FRACTIONS
  
  public static final Vocabulary TAMIL_FRACTIONS
- TELUGU_CONSONANTS
  
  public static final Vocabulary TELUGU_CONSONANTS
- TELUGU_DIGITS
  
  public static final Vocabulary TELUGU_DIGITS
- TELUGU_VOWELS
  
  public static final Vocabulary TELUGU_VOWELS
- TELUGU_MATRAS
  
  public static final Vocabulary TELUGU_MATRAS
- TELUGU_VIRAMA
  
  public static final Vocabulary TELUGU_VIRAMA
- TELUGU_PUNCTUATION
  
  public static final Vocabulary TELUGU_PUNCTUATION
- TELUGU_SIGNS
  
  public static final Vocabulary TELUGU_SIGNS
- KANNADA_CONSONANTS
  
  public static final Vocabulary KANNADA_CONSONANTS
- KANNADA_VOWELS
  
  public static final Vocabulary KANNADA_VOWELS
- KANNADA_DIGITS
  
  public static final Vocabulary KANNADA_DIGITS
- KANNADA_MATRAS
  
  public static final Vocabulary KANNADA_MATRAS
- KANNADA_VIRAMA
  
  public static final Vocabulary KANNADA_VIRAMA
- KANNADA_PUNCTUATION
  
  public static final Vocabulary KANNADA_PUNCTUATION
- KANNADA_SIGNS
  
  public static final Vocabulary KANNADA_SIGNS
- SINHALA_CONSONANTS
  
  public static final Vocabulary SINHALA_CONSONANTS
- SINHALA_VOWELS
  
  public static final Vocabulary SINHALA_VOWELS
- SINHALA_DIGITS
  
  public static final Vocabulary SINHALA_DIGITS
- SINHALA_MATRAS
  
  public static final Vocabulary SINHALA_MATRAS
- SINHALA_VIRAMA
  
  public static final Vocabulary SINHALA_VIRAMA
- SINHALA_PUNCTUATION
  
  public static final Vocabulary SINHALA_PUNCTUATION
- SINHALA_SIGNS
  
  public static final Vocabulary SINHALA_SIGNS
- MALAYALAM_CONSONANTS
  
  public static final Vocabulary MALAYALAM_CONSONANTS
- MALAYALAM_VOWELS
  
  public static final Vocabulary MALAYALAM_VOWELS
- MALAYALAM_DIGITS
  
  public static final Vocabulary MALAYALAM_DIGITS
- MALAYALAM_MATRAS
  
  public static final Vocabulary MALAYALAM_MATRAS
- MALAYALAM_VIRAMA
  
  public static final Vocabulary MALAYALAM_VIRAMA
- MALAYALAM_SIGNS
  
  public static final Vocabulary MALAYALAM_SIGNS
- ODIA_CONSONANTS
  
  public static final Vocabulary ODIA_CONSONANTS
- ODIA_VOWELS
  
  public static final Vocabulary ODIA_VOWELS
- ODIA_DIGITS
  
  public static final Vocabulary ODIA_DIGITS
- ODIA_MATRAS
  
  public static final Vocabulary ODIA_MATRAS
- ODIA_VIRAMA
  
  public static final Vocabulary ODIA_VIRAMA
- ODIA_PUNCTUATION
  
  public static final Vocabulary ODIA_PUNCTUATION
- ODIA_SIGNS
  
  public static final Vocabulary ODIA_SIGNS
- KHMER_CONSONANTS
  
  public static final Vocabulary KHMER_CONSONANTS
- KHMER_VOWELS
  
  public static final Vocabulary KHMER_VOWELS
- KHMER_DIGITS
  
  public static final Vocabulary KHMER_DIGITS
- KHMER_MATRAS
  
  public static final Vocabulary KHMER_MATRAS
- KHMER_DIACRITICS
  
  public static final Vocabulary KHMER_DIACRITICS
- KHMER_VIRAMA
  
  public static final Vocabulary KHMER_VIRAMA
- KHMER_PUNCTUATION
  
  public static final Vocabulary KHMER_PUNCTUATION
- BURMESE_CONSONANTS
  
  public static final Vocabulary BURMESE_CONSONANTS
- BURMESE_VOWELS
  
  public static final Vocabulary BURMESE_VOWELS
- BURMESE_DIGITS
  
  public static final Vocabulary BURMESE_DIGITS
- BURMESE_DIACRITICS
  
  public static final Vocabulary BURMESE_DIACRITICS
- BURMESE_VIRAMA
  
  public static final Vocabulary BURMESE_VIRAMA
- BURMESE_PUNCTUATION
  
  public static final Vocabulary BURMESE_PUNCTUATION
- JAVANESE_CONSONANTS
  
  public static final Vocabulary JAVANESE_CONSONANTS
- JAVANESE_VOWELS
  
  public static final Vocabulary JAVANESE_VOWELS
- JAVANESE_DIGITS
  
  public static final Vocabulary JAVANESE_DIGITS
- JAVANESE_DIACRITICS
  
  public static final Vocabulary JAVANESE_DIACRITICS
- JAVANESE_VIRAMA
  
  public static final Vocabulary JAVANESE_VIRAMA
- JAVANESE_PUNCTUATION
  
  public static final Vocabulary JAVANESE_PUNCTUATION
- SUDANESE_CONSONANTS
  
  public static final Vocabulary SUDANESE_CONSONANTS
- SUDANESE_VOWELS
  
  public static final Vocabulary SUDANESE_VOWELS
- SUDANESE_DIGITS
  
  public static final Vocabulary SUDANESE_DIGITS
- SUDANESE_DIACRITICS
  
  public static final Vocabulary SUDANESE_DIACRITICS
- HEBREW_CANTILLATIONS
  
  public static final Vocabulary HEBREW_CANTILLATIONS
- HEBREW_CONSONANTS
  
  public static final Vocabulary HEBREW_CONSONANTS
- HEBREW_SPECIALS
  
  public static final Vocabulary HEBREW_SPECIALS
- HEBREW_PUNCTUATION
  
  public static final Vocabulary HEBREW_PUNCTUATION
- HEBREW_VOWELS
  
  public static final Vocabulary HEBREW_VOWELS
- ALBANIAN
  
  public static final Vocabulary ALBANIAN
- AFRIKAANS
  
  public static final Vocabulary AFRIKAANS
- BASQUE
  
  public static final Vocabulary BASQUE
- CATALAN
  
  public static final Vocabulary CATALAN
- CROATIAN
  
  public static final Vocabulary CROATIAN
- CZECH
  
  public static final Vocabulary CZECH
- DANISH
  
  public static final Vocabulary DANISH
- DUTCH
  
  public static final Vocabulary DUTCH
- ESTONIAN
  
  public static final Vocabulary ESTONIAN
- FINNISH
  
  public static final Vocabulary FINNISH
- GERMAN
  
  public static final Vocabulary GERMAN
- HUNGARIAN
  
  public static final Vocabulary HUNGARIAN
- INDONESIAN
  
  public static final Vocabulary INDONESIAN
- IRISH
  
  public static final Vocabulary IRISH
- ITALIAN
  
  public static final Vocabulary ITALIAN
- LUXEMBOURGISH
  
  public static final Vocabulary LUXEMBOURGISH
- MALAY
  
  public static final Vocabulary MALAY
- NORWEGIAN
  
  public static final Vocabulary NORWEGIAN
- POLISH
  
  public static final Vocabulary POLISH
- PORTUGUESE
  
  public static final Vocabulary PORTUGUESE
- ROMANIAN
  
  public static final Vocabulary ROMANIAN
- SERBIAN_LATIN
  
  public static final Vocabulary SERBIAN_LATIN
- SLOVAK
  
  public static final Vocabulary SLOVAK
- SPANISH
  
  public static final Vocabulary SPANISH
- SWEDISH
  
  public static final Vocabulary SWEDISH
- VIETNAMESE
  
  public static final Vocabulary VIETNAMESE
- ZULU
  
  public static final Vocabulary ZULU
- AZERBAIJANI
  
  public static final Vocabulary AZERBAIJANI
- BOSNIAN
  
  public static final Vocabulary BOSNIAN
- ESPERANTO
  
  public static final Vocabulary ESPERANTO
- FRISIAN
  
  public static final Vocabulary FRISIAN
- GALICIAN
  
  public static final Vocabulary GALICIAN
- HAUSA
  
  public static final Vocabulary HAUSA
- ICELANDIC
  
  public static final Vocabulary ICELANDIC
- LATVIAN
  
  public static final Vocabulary LATVIAN
- LITHUANIAN
  
  public static final Vocabulary LITHUANIAN
- MALAGASY
  
  public static final Vocabulary MALAGASY
- MALTESE
  
  public static final Vocabulary MALTESE
- MAORI
  
  public static final Vocabulary MAORI
- MONTENEGRIN
  
  public static final Vocabulary MONTENEGRIN
- QUECHUA
  
  public static final Vocabulary QUECHUA
- SCOTTISH_GAELIC
  
  public static final Vocabulary SCOTTISH_GAELIC
- SLOVENE
  
  public static final Vocabulary SLOVENE
- SOMALI
  
  public static final Vocabulary SOMALI
- SWAHILI
  
  public static final Vocabulary SWAHILI
- TAGALOG
  
  public static final Vocabulary TAGALOG
- TURKISH
  
  public static final Vocabulary TURKISH
- UZBEK_LATIN
  
  public static final Vocabulary UZBEK_LATIN
- WELSH
  
  public static final Vocabulary WELSH
- YORUBA
  
  public static final Vocabulary YORUBA
- RUSSIAN
  
  public static final Vocabulary RUSSIAN
- BELARUSIAN
  
  public static final Vocabulary BELARUSIAN
- UKRAINIAN
  
  public static final Vocabulary UKRAINIAN
- TATAR
  
  public static final Vocabulary TATAR
- TAJIK
  
  public static final Vocabulary TAJIK
- KAZAKH
  
  public static final Vocabulary KAZAKH
- KYRGYZ
  
  public static final Vocabulary KYRGYZ
- BULGARIAN
  
  public static final Vocabulary BULGARIAN
- MACEDONIAN
  
  public static final Vocabulary MACEDONIAN
- MONGOLIAN
  
  public static final Vocabulary MONGOLIAN
- YAKUT
  
  public static final Vocabulary YAKUT
- SERBIAN_CYRILLIC
  
  public static final Vocabulary SERBIAN_CYRILLIC
- UZBEK_CYRILLIC
  
  public static final Vocabulary UZBEK_CYRILLIC
- GREEK
  
  public static final Vocabulary GREEK
- GREEK_EXTENDED
  
  public static final Vocabulary GREEK_EXTENDED
- HEBREW
  
  public static final Vocabulary HEBREW
- ARABIC
  
  public static final Vocabulary ARABIC
- PERSIAN
  
  public static final Vocabulary PERSIAN
- URDU
  
  public static final Vocabulary URDU
- PASHTO
  
  public static final Vocabulary PASHTO
- KURDISH
  
  public static final Vocabulary KURDISH
- UYGHUR
  
  public static final Vocabulary UYGHUR
- SINDHI
  
  public static final Vocabulary SINDHI
- DEVANAGARI
  
  public static final Vocabulary DEVANAGARI
- HINDI
  
  public static final Vocabulary HINDI
- SANSKRIT
  
  public static final Vocabulary SANSKRIT
- MARATHI
  
  public static final Vocabulary MARATHI
- NEPALI
  
  public static final Vocabulary NEPALI
- GUJARATI
  
  public static final Vocabulary GUJARATI
- BENGALI
  
  public static final Vocabulary BENGALI
- TAMIL
  
  public static final Vocabulary TAMIL
- TELUGU
  
  public static final Vocabulary TELUGU
- KANNADA
  
  public static final Vocabulary KANNADA
- SINHALA
  
  public static final Vocabulary SINHALA
- MALAYALAM
  
  public static final Vocabulary MALAYALAM
- PUNJABI
  
  public static final Vocabulary PUNJABI
- ODIA
  
  public static final Vocabulary ODIA
- KHMER
  
  public static final Vocabulary KHMER
- ARMENIAN
  
  public static final Vocabulary ARMENIAN
- SUDANESE
  
  public static final Vocabulary SUDANESE
- THAI
  
  public static final Vocabulary THAI
- LAO
  
  public static final Vocabulary LAO
- BURMESE
  
  public static final Vocabulary BURMESE
- JAVANESE
  
  public static final Vocabulary JAVANESE
- GEORGIAN
  
  public static final Vocabulary GEORGIAN
- ETHIOPIC
  
  public static final Vocabulary ETHIOPIC
- JAPANESE
  
  public static final Vocabulary JAPANESE
- KOREAN
  
  public static final Vocabulary KOREAN
- SIMPLIFIED_CHINESE
  
  public static final Vocabulary SIMPLIFIED_CHINESE
- LATIN_EXTENDED
  
  public static final Vocabulary LATIN_EXTENDED
- MULTI_LANG
  
  public static final Vocabulary MULTI_LANG
- MULTI_LANG_FULL
  
  public static final Vocabulary MULTI_LANG_FULL
Constructor Details
- Vocabulary
  
  public Vocabulary (String lookUpString)
  
  Creates a new vocabulary based on a look-up string.
  
  Parameters:
  
  lookUpString - look-up string to be used as LUT for the vocabulary
Method Details
- concat
  
  public static Vocabulary concat (Vocabulary... vocabularies)
  
  Creates a new vocabulary by concatenating multiple ones.
  
  Parameters:
  
  vocabularies - vocabularies to concatenate
  
  Returns:
  
  the new aggregated vocabulary
- getLookUpString
  
  public String getLookUpString()
  
  Returns the look-up string.
  
  Returns:
  
  the look-up string
- size
  
  public int size()
  
  Returns the size of the vocabulary.
  
  Returns:
  
  the size of the vocabulary
- map
  
  public char map (int index)
  
  Returns character, which is mapped to the specified index in the lookup string.
  
  Parameters:
  
  index - index to map
  
  Returns:
  
  mapped character
- hashCode
  
  public int hashCode()
  
  Overrides:
  
  hashCode in class Object
- equals
  
  public boolean equals (Object o)
  
  Overrides:
  
  equals in class Object
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object

Class Vocabulary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

ASCII_LOWERCASE

ASCII_UPPERCASE

ASCII_LETTERS

DIGITS

PUNCTUATION

CURRENCY

LATIN

ENGLISH

LEGACY_FRENCH

FRENCH

HINDI_DIGITS

GENERIC_CYRILLIC_LETTERS

RUSSIAN_CYRILLIC_LETTERS

RUSSIAN_SIGNS

ANCIENT_GREEK

ARABIC_DIACRITICS

ARABIC_DIGITS

ARABIC_LETTERS

ARABIC_PUNCTUATION

PERSIAN_LETTERS

BENGALI_CONSONANTS

BENGALI_VOWELS

BENGALI_DIGITS

BENGALI_MATRAS

BENGALI_VIRAMA

BENGALI_PUNCTUATION

BENGALI_SIGNS

GUJARATI_CONSONANTS

GUJARATI_VOWELS

GUJARATI_DIGITS

GUJARATI_MATRAS

GUJARATI_VIRAMA

GUJARATI_PUNCTUATION

GUJARATI_SIGNS

DEVANAGARI_CONSONANTS

DEVANAGARI_VOWELS

DEVANAGARI_DIGITS

DEVANAGARI_MATRAS

DEVANAGARI_VIRAMA

DEVANAGARI_PUNCTUATION

DEVANAGARI_SIGNS

PUNJABI_CONSONANTS

PUNJABI_VOWELS

PUNJABI_DIGITS

PUNJABI_MATRAS

PUNJABI_VIRAMA

PUNJABI_PUNCTUATION

PUNJABI_SIGNS

TAMIL_CONSONANTS

TAMIL_VOWELS

TAMIL_DIGITS

TAMIL_MATRAS

TAMIL_VIRAMA

TAMIL_PUNCTUATION

TAMIL_SIGNS

TAMIL_FRACTIONS

TELUGU_CONSONANTS

TELUGU_DIGITS

TELUGU_VOWELS

TELUGU_MATRAS

TELUGU_VIRAMA

TELUGU_PUNCTUATION

TELUGU_SIGNS

KANNADA_CONSONANTS

KANNADA_VOWELS

KANNADA_DIGITS

KANNADA_MATRAS

KANNADA_VIRAMA

KANNADA_PUNCTUATION

KANNADA_SIGNS

SINHALA_CONSONANTS

SINHALA_VOWELS

SINHALA_DIGITS

SINHALA_MATRAS