Class Vocabulary
java.lang.Object
com.itextpdf.pdfocr.onnxtr.recognition.Vocabulary
A string-based LUT for mapping text recognition model results to characters.
This class assumes, that each character is represented with a single UTF-16 code unit. So the string itself can be used as a LUT. If this is not the case, results will be unpredictable.
It pretty much implements IOutputLabelMapper for Character but since it would involve unnecessary boxing, it is a standalone thing instead.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabularystatic final Vocabulary -
Constructor Summary
ConstructorsConstructorDescriptionVocabulary(String lookUpString) Creates a new vocabulary based on a look-up string. -
Method Summary
Modifier and TypeMethodDescriptionstatic Vocabularyconcat(Vocabulary... vocabularies) Creates a new vocabulary by concatenating multiple ones.booleanReturns the look-up string.inthashCode()charmap(int index) Returns character, which is mapped to the specified index in the lookup string.intsize()Returns the size of the vocabulary.toString()
-
Field Details
-
ASCII_LOWERCASE
-
ASCII_UPPERCASE
-
ASCII_LETTERS
-
DIGITS
-
PUNCTUATION
-
CURRENCY
-
LATIN
-
ENGLISH
-
LEGACY_FRENCH
-
FRENCH
-
HINDI_DIGITS
-
GENERIC_CYRILLIC_LETTERS
-
RUSSIAN_CYRILLIC_LETTERS
-
RUSSIAN_SIGNS
-
ANCIENT_GREEK
-
ARABIC_DIACRITICS
-
ARABIC_DIGITS
-
ARABIC_LETTERS
-
ARABIC_PUNCTUATION
-
PERSIAN_LETTERS
-
BENGALI_CONSONANTS
-
BENGALI_VOWELS
-
BENGALI_DIGITS
-
BENGALI_MATRAS
-
BENGALI_VIRAMA
-
BENGALI_PUNCTUATION
-
BENGALI_SIGNS
-
GUJARATI_CONSONANTS
-
GUJARATI_VOWELS
-
GUJARATI_DIGITS
-
GUJARATI_MATRAS
-
GUJARATI_VIRAMA
-
GUJARATI_PUNCTUATION
-
GUJARATI_SIGNS
-
DEVANAGARI_CONSONANTS
-
DEVANAGARI_VOWELS
-
DEVANAGARI_DIGITS
-
DEVANAGARI_MATRAS
-
DEVANAGARI_VIRAMA
-
DEVANAGARI_PUNCTUATION
-
DEVANAGARI_SIGNS
-
PUNJABI_CONSONANTS
-
PUNJABI_VOWELS
-
PUNJABI_DIGITS
-
PUNJABI_MATRAS
-
PUNJABI_VIRAMA
-
PUNJABI_PUNCTUATION
-
PUNJABI_SIGNS
-
TAMIL_CONSONANTS
-
TAMIL_VOWELS
-
TAMIL_DIGITS
-
TAMIL_MATRAS
-
TAMIL_VIRAMA
-
TAMIL_PUNCTUATION
-
TAMIL_SIGNS
-
TAMIL_FRACTIONS
-
TELUGU_CONSONANTS
-
TELUGU_DIGITS
-
TELUGU_VOWELS
-
TELUGU_MATRAS
-
TELUGU_VIRAMA
-
TELUGU_PUNCTUATION
-
TELUGU_SIGNS
-
KANNADA_CONSONANTS
-
KANNADA_VOWELS
-
KANNADA_DIGITS
-
KANNADA_MATRAS
-
KANNADA_VIRAMA
-
KANNADA_PUNCTUATION
-
KANNADA_SIGNS
-
SINHALA_CONSONANTS
-
SINHALA_VOWELS
-
SINHALA_DIGITS
-
SINHALA_MATRAS
-
SINHALA_VIRAMA
-
SINHALA_PUNCTUATION
-
SINHALA_SIGNS
-
MALAYALAM_CONSONANTS
-
MALAYALAM_VOWELS
-
MALAYALAM_DIGITS
-
MALAYALAM_MATRAS
-
MALAYALAM_VIRAMA
-
MALAYALAM_SIGNS
-
ODIA_CONSONANTS
-
ODIA_VOWELS
-
ODIA_DIGITS
-
ODIA_MATRAS
-
ODIA_VIRAMA
-
ODIA_PUNCTUATION
-
ODIA_SIGNS
-
KHMER_CONSONANTS
-
KHMER_VOWELS
-
KHMER_DIGITS
-
KHMER_MATRAS
-
KHMER_DIACRITICS
-
KHMER_VIRAMA
-
KHMER_PUNCTUATION
-
BURMESE_CONSONANTS
-
BURMESE_VOWELS
-
BURMESE_DIGITS
-
BURMESE_DIACRITICS
-
BURMESE_VIRAMA
-
BURMESE_PUNCTUATION
-
JAVANESE_CONSONANTS
-
JAVANESE_VOWELS
-
JAVANESE_DIGITS
-
JAVANESE_DIACRITICS
-
JAVANESE_VIRAMA
-
JAVANESE_PUNCTUATION
-
SUDANESE_CONSONANTS
-
SUDANESE_VOWELS
-
SUDANESE_DIGITS
-
SUDANESE_DIACRITICS
-
HEBREW_CANTILLATIONS
-
HEBREW_CONSONANTS
-
HEBREW_SPECIALS
-
HEBREW_PUNCTUATION
-
HEBREW_VOWELS
-
ALBANIAN
-
AFRIKAANS
-
BASQUE
-
CATALAN
-
CROATIAN
-
CZECH
-
DANISH
-
DUTCH
-
ESTONIAN
-
FINNISH
-
GERMAN
-
HUNGARIAN
-
INDONESIAN
-
IRISH
-
ITALIAN
-
LUXEMBOURGISH
-
MALAY
-
NORWEGIAN
-
POLISH
-
PORTUGUESE
-
ROMANIAN
-
SERBIAN_LATIN
-
SLOVAK
-
SPANISH
-
SWEDISH
-
VIETNAMESE
-
ZULU
-
AZERBAIJANI
-
BOSNIAN
-
ESPERANTO
-
FRISIAN
-
GALICIAN
-
HAUSA
-
ICELANDIC
-
LATVIAN
-
LITHUANIAN
-
MALAGASY
-
MALTESE
-
MAORI
-
MONTENEGRIN
-
QUECHUA
-
SCOTTISH_GAELIC
-
SLOVENE
-
SOMALI
-
SWAHILI
-
TAGALOG
-
TURKISH
-
UZBEK_LATIN
-
WELSH
-
YORUBA
-
RUSSIAN
-
BELARUSIAN
-
UKRAINIAN
-
TATAR
-
TAJIK
-
KAZAKH
-
KYRGYZ
-
BULGARIAN
-
MACEDONIAN
-
MONGOLIAN
-
YAKUT
-
SERBIAN_CYRILLIC
-
UZBEK_CYRILLIC
-
GREEK
-
GREEK_EXTENDED
-
HEBREW
-
ARABIC
-
PERSIAN
-
URDU
-
PASHTO
-
KURDISH
-
UYGHUR
-
SINDHI
-
DEVANAGARI
-
HINDI
-
SANSKRIT
-
MARATHI
-
NEPALI
-
GUJARATI
-
BENGALI
-
TAMIL
-
TELUGU
-
KANNADA
-
SINHALA
-
MALAYALAM
-
PUNJABI
-
ODIA
-
KHMER
-
ARMENIAN
-
SUDANESE
-
THAI
-
LAO
-
BURMESE
-
JAVANESE
-
GEORGIAN
-
ETHIOPIC
-
JAPANESE
-
KOREAN
-
SIMPLIFIED_CHINESE
-
LATIN_EXTENDED
-
MULTI_LANG
-
MULTI_LANG_FULL
-
-
Constructor Details
-
Vocabulary
Creates a new vocabulary based on a look-up string.- Parameters:
-
lookUpString- look-up string to be used as LUT for the vocabulary
-
-
Method Details
-
concat
Creates a new vocabulary by concatenating multiple ones.- Parameters:
-
vocabularies- vocabularies to concatenate - Returns:
- the new aggregated vocabulary
-
getLookUpString
Returns the look-up string.- Returns:
- the look-up string
-
size
public int size()Returns the size of the vocabulary.- Returns:
- the size of the vocabulary
-
map
public char map(int index) Returns character, which is mapped to the specified index in the lookup string.- Parameters:
-
index- index to map - Returns:
- mapped character
-
hashCode
public int hashCode() -
equals
-
toString
-