Class Vocabulary
java.lang.Object
com.itextpdf.pdfocr.onnxtr.recognition.Vocabulary
A string-based LUT for mapping text recognition model results to characters.
This class assumes, that each character is represented with a single UTF-16 code unit. So the string itself can be used as a LUT. If this is not the case, results will be unpredictable.
It pretty much implements IOutputLabelMapper
for Character
but since it would involve unnecessary boxing, it is a standalone thing instead.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
static final Vocabulary
-
Constructor Summary
ConstructorsConstructorDescriptionVocabulary
(String lookUpString) Creates a new vocabulary based on a look-up string. -
Method Summary
Modifier and TypeMethodDescriptionstatic Vocabulary
concat
(Vocabulary... vocabularies) Creates a new vocabulary by concatenating multiple ones.boolean
Returns the look-up string.int
hashCode()
char
map
(int index) Returns character, which is mapped to the specified index in the lookup string.int
size()
Returns the size of the vocabulary.toString()
-
Field Details
-
ASCII_LOWERCASE
-
ASCII_UPPERCASE
-
ASCII_LETTERS
-
DIGITS
-
PUNCTUATION
-
CURRENCY
-
LATIN
-
ENGLISH
-
LEGACY_FRENCH
-
FRENCH
-
HINDI_DIGITS
-
GENERIC_CYRILLIC_LETTERS
-
RUSSIAN_CYRILLIC_LETTERS
-
RUSSIAN_SIGNS
-
ANCIENT_GREEK
-
ARABIC_DIACRITICS
-
ARABIC_DIGITS
-
ARABIC_LETTERS
-
ARABIC_PUNCTUATION
-
PERSIAN_LETTERS
-
BENGALI_CONSONANTS
-
BENGALI_VOWELS
-
BENGALI_DIGITS
-
BENGALI_MATRAS
-
BENGALI_VIRAMA
-
BENGALI_PUNCTUATION
-
BENGALI_SIGNS
-
GUJARATI_CONSONANTS
-
GUJARATI_VOWELS
-
GUJARATI_DIGITS
-
GUJARATI_MATRAS
-
GUJARATI_VIRAMA
-
GUJARATI_PUNCTUATION
-
GUJARATI_SIGNS
-
DEVANAGARI_CONSONANTS
-
DEVANAGARI_VOWELS
-
DEVANAGARI_DIGITS
-
DEVANAGARI_MATRAS
-
DEVANAGARI_VIRAMA
-
DEVANAGARI_PUNCTUATION
-
DEVANAGARI_SIGNS
-
PUNJABI_CONSONANTS
-
PUNJABI_VOWELS
-
PUNJABI_DIGITS
-
PUNJABI_MATRAS
-
PUNJABI_VIRAMA
-
PUNJABI_PUNCTUATION
-
PUNJABI_SIGNS
-
TAMIL_CONSONANTS
-
TAMIL_VOWELS
-
TAMIL_DIGITS
-
TAMIL_MATRAS
-
TAMIL_VIRAMA
-
TAMIL_PUNCTUATION
-
TAMIL_SIGNS
-
TAMIL_FRACTIONS
-
TELUGU_CONSONANTS
-
TELUGU_DIGITS
-
TELUGU_VOWELS
-
TELUGU_MATRAS
-
TELUGU_VIRAMA
-
TELUGU_PUNCTUATION
-
TELUGU_SIGNS
-
KANNADA_CONSONANTS
-
KANNADA_VOWELS
-
KANNADA_DIGITS
-
KANNADA_MATRAS
-
KANNADA_VIRAMA
-
KANNADA_PUNCTUATION
-
KANNADA_SIGNS
-
SINHALA_CONSONANTS
-
SINHALA_VOWELS
-
SINHALA_DIGITS
-
SINHALA_MATRAS
-
SINHALA_VIRAMA
-
SINHALA_PUNCTUATION
-
SINHALA_SIGNS
-
MALAYALAM_CONSONANTS
-
MALAYALAM_VOWELS
-
MALAYALAM_DIGITS
-
MALAYALAM_MATRAS
-
MALAYALAM_VIRAMA
-
MALAYALAM_SIGNS
-
ODIA_CONSONANTS
-
ODIA_VOWELS
-
ODIA_DIGITS
-
ODIA_MATRAS
-
ODIA_VIRAMA
-
ODIA_PUNCTUATION
-
ODIA_SIGNS
-
KHMER_CONSONANTS
-
KHMER_VOWELS
-
KHMER_DIGITS
-
KHMER_MATRAS
-
KHMER_DIACRITICS
-
KHMER_VIRAMA
-
KHMER_PUNCTUATION
-
BURMESE_CONSONANTS
-
BURMESE_VOWELS
-
BURMESE_DIGITS
-
BURMESE_DIACRITICS
-
BURMESE_VIRAMA
-
BURMESE_PUNCTUATION
-
JAVANESE_CONSONANTS
-
JAVANESE_VOWELS
-
JAVANESE_DIGITS
-
JAVANESE_DIACRITICS
-
JAVANESE_VIRAMA
-
JAVANESE_PUNCTUATION
-
SUDANESE_CONSONANTS
-
SUDANESE_VOWELS
-
SUDANESE_DIGITS
-
SUDANESE_DIACRITICS
-
HEBREW_CANTILLATIONS
-
HEBREW_CONSONANTS
-
HEBREW_SPECIALS
-
HEBREW_PUNCTUATION
-
HEBREW_VOWELS
-
ALBANIAN
-
AFRIKAANS
-
BASQUE
-
CATALAN
-
CROATIAN
-
CZECH
-
DANISH
-
DUTCH
-
ESTONIAN
-
FINNISH
-
GERMAN
-
HUNGARIAN
-
INDONESIAN
-
IRISH
-
ITALIAN
-
LUXEMBOURGISH
-
MALAY
-
NORWEGIAN
-
POLISH
-
PORTUGUESE
-
ROMANIAN
-
SERBIAN_LATIN
-
SLOVAK
-
SPANISH
-
SWEDISH
-
VIETNAMESE
-
ZULU
-
AZERBAIJANI
-
BOSNIAN
-
ESPERANTO
-
FRISIAN
-
GALICIAN
-
HAUSA
-
ICELANDIC
-
LATVIAN
-
LITHUANIAN
-
MALAGASY
-
MALTESE
-
MAORI
-
MONTENEGRIN
-
QUECHUA
-
SCOTTISH_GAELIC
-
SLOVENE
-
SOMALI
-
SWAHILI
-
TAGALOG
-
TURKISH
-
UZBEK_LATIN
-
WELSH
-
YORUBA
-
RUSSIAN
-
BELARUSIAN
-
UKRAINIAN
-
TATAR
-
TAJIK
-
KAZAKH
-
KYRGYZ
-
BULGARIAN
-
MACEDONIAN
-
MONGOLIAN
-
YAKUT
-
SERBIAN_CYRILLIC
-
UZBEK_CYRILLIC
-
GREEK
-
GREEK_EXTENDED
-
HEBREW
-
ARABIC
-
PERSIAN
-
URDU
-
PASHTO
-
KURDISH
-
UYGHUR
-
SINDHI
-
DEVANAGARI
-
HINDI
-
SANSKRIT
-
MARATHI
-
NEPALI
-
GUJARATI
-
BENGALI
-
TAMIL
-
TELUGU
-
KANNADA
-
SINHALA
-
MALAYALAM
-
PUNJABI
-
ODIA
-
KHMER
-
ARMENIAN
-
SUDANESE
-
THAI
-
LAO
-
BURMESE
-
JAVANESE
-
GEORGIAN
-
ETHIOPIC
-
JAPANESE
-
KOREAN
-
SIMPLIFIED_CHINESE
-
LATIN_EXTENDED
-
MULTI_LANG
-
MULTI_LANG_FULL
-
-
Constructor Details
-
Vocabulary
Creates a new vocabulary based on a look-up string.- Parameters:
-
lookUpString
- look-up string to be used as LUT for the vocabulary
-
-
Method Details
-
concat
Creates a new vocabulary by concatenating multiple ones.- Parameters:
-
vocabularies
- vocabularies to concatenate - Returns:
- the new aggregated vocabulary
-
getLookUpString
Returns the look-up string.- Returns:
- the look-up string
-
size
public int size()Returns the size of the vocabulary.- Returns:
- the size of the vocabulary
-
map
public char map(int index) Returns character, which is mapped to the specified index in the lookup string.- Parameters:
-
index
- index to map - Returns:
- mapped character
-
hashCode
public int hashCode() -
equals
-
toString
-