# biome.text.featurizer Module

# InputFeaturizer Class

class InputFeaturizer (tokenizer: Tokenizer, indexer: Dict[str, allennlp.data.token_indexers.token_indexer.TokenIndexer])

Transforms input text (words and/or characters) into indexes and embedding vectors.

This class defines two input features, words and chars for embeddings at word and character level respectively.

You can provide additional features by manually specify indexer and embedder configurations within each input feature.


tokenizer : Tokenizer
Tokenizes the input depending on its type (str, List[str], Dict[str, Any])
indexer : Dict[str, TokenIdexer]
Features dictionary for token indexing. Built from FeaturesConfiguration

# Instance variables

var has_word_features : bool

Checks if word features are already configured as part of the featurization

Maintained by