# NLP Tasks

In biome.text NLP tasks are defined via TaskHead classes.

This section gives a summary of the library's main heads and tasks.

# TextClassification

Tutorials: Training a short text classifier of German business names

NLP tasks: text classification, sentiment analysis, entity typing, relation classification.

Input: text: a single field or a concatenation of input fields.

Output: label by default, a probability distribution over labels except if multilabel is enabled for multi-label classification problems.

Main parameters:

pooler: a Seq2VecEncoderConfiguration to pool a sequence of encoded word/char vectors into a single vector representing the input text

See TextClassification API for more details.

# RecordClassification

NLP tasks: text classification, sentiment analysis, entity typing, relation classification and semi-structured data classification problems with product, customer data, etc.

Input: document: a list of fields.

Output: label by default, a probability distribution over labels except if multilabel is enabled for multi-label classification problems.

Main parameters:

record_keys: field keys to be used as input features to the model, e.g., name, first_name, body, subject, etc.

tokens_pooler: a Seq2VecEncoderConfiguration to pool a sequence of encoded word/char vectors for each field into a single vector representing the field.

fields_encoder: a Seq2SeqEncoderConfiguration to encode a sequence of field vectors.

fields_pooler: a Seq2VecEncoderConfiguration to pool a sequence of encoded field vectors into a single vector representing the whole document/record.

See RecordClassification API for more details.

# TokenClassification

Tutorials: Training a sequence tagger for Slot Filling

NLP tasks: NER, Slot filling, Part of speech tagging.

Input: text: pretokenized text as a list of tokens.

Output: labels: one label for each token according to the label_encoding scheme defined in the head (e.g., BIO).

Main parameters:

feedforward: feed-forward layer to be applied after token encoding.

See TokenClassification API for more details.

# LanguageModelling

NLP tasks: Pre-training, word-level next token language model.

Input: text: a single field or a concatenation of input fields.

Output: contextualized word vectors.

Main parameters:

dropout to be applied after token encoding.

See LanguageModelling API for more details.

Maintained by