# biome.text.modules.heads.classification.record_pair_classification Module

# RecordPairClassification Class

class RecordPairClassification (
    backbone: ModelBackbone,
    labels: List[str],
    field_encoder: Seq2VecEncoderConfiguration,
    record_encoder: Seq2SeqEncoderConfiguration,
    matcher_forward: BiMpmMatchingConfiguration,
    aggregator: Seq2VecEncoderConfiguration,
    classifier_feedforward: FeedForwardConfiguration,
    matcher_backward: BiMpmMatchingConfiguration = None,
    dropout: float = 0.1,
    initializer: allennlp.nn.initializers.InitializerApplicator = <allennlp.nn.initializers.InitializerApplicator object>,

Classifies the relation between a pair of records using a matching layer.

The input for models using this TaskHead are two records with one or more data fields each, and a label describing their relationship. If you would like a meaningful explanation of the model's prediction, both records must consist of the same number of data fields and hold them in the same order.

The architecture is loosely based on the AllenNLP implementation of the BiMPM model described in Bilateral Multi-Perspective Matching for Natural Language Sentences <https://arxiv.org/abs/1702.03814>_ by Zhiguo Wang et al., 2017, and was adapted to deal with record pairs.


backbone : ModelBackbone
Takes care of the embedding and optionally of the language encoding
labels : List[str]
List of labels
field_encoder : Seq2VecEncoder
Encodes a data field, contextualized within the field
record_encoder : Seq2SeqEncoder
Encodes data fields, contextualized within the record
matcher_forward : BiMPMMatching
BiMPM matching for the forward output of the record encoder layer
matcher_backward : BiMPMMatching, optional
BiMPM matching for the backward output of the record encoder layer
aggregator : Seq2VecEncoder
Aggregator of all BiMPM matching vectors
classifier_feedforward : FeedForward
Fully connected layers for classification. A linear output layer with the number of labels at the end will be added automatically!!!
dropout : float, optional (default=0.1)
Dropout percentage to use.
initializer : InitializerApplicator, optional (default=``InitializerApplicator()``)
If provided, will be used to initialize the model parameters.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

# Ancestors

  • ClassificationHead
  • TaskHead
  • torch.nn.modules.module.Module
  • allennlp.common.registrable.Registrable
  • allennlp.common.from_params.FromParams

# featurize Method

def featurize (
  record1: Dict[str, Any],
  record2: Dict[str, Any],
  label: Union[str, NoneType] = None,
)  -> Union[allennlp.data.instance.Instance, NoneType]

Tokenizes, indexes and embeds the two records and optionally adds the label


record1 : Dict[str, Any]
First record
record2 : Dict[str, Any]
Second record
label : Optional[str]
Classification label


AllenNLP instance containing the two records plus optionally a label

# forward Method

def forward (
  record1: Dict[str, Dict[str, torch.Tensor]],
  record2: Dict[str, Dict[str, torch.Tensor]],
  label: torch.LongTensor = None,
)  -> TaskOutput


Tokens of the first record. The dictionary is the output of a ListField.as_array(). It gives names to the tensors created by the TokenIndexers. In its most basic form, using a SingleIdTokenIndexer, the dictionary is composed of: {"tokens": {"tokens": Tensor(batch_size, num_fields, num_tokens)}}. The dictionary is designed to be passed on directly to a TextFieldEmbedder, that has a TokenEmbedder for each key in the dictionary (except you set allow_unmatched_keys in the TextFieldEmbedder to False) and knows how to combine different word/character representations into a single vector per token in your input.
Tokens of the second record.
label : torch.LongTensor, optional (default = None)
A torch tensor representing the sequence of integer gold class label of shape (batch_size, num_classes).


An output dictionary consisting of:
logits : torch.FloatTensor
class_probabilities : torch.FloatTensor
loss : torch.FloatTensor, optional
A scalar loss to be optimised.

# explain_prediction Method

def explain_prediction (
  prediction: Dict[str, ],
  instance: allennlp.data.instance.Instance,
  n_steps: int,
)  -> Dict[str, Any]

Calculates attributions for each data field in the record by integrating the gradients.

IMPORTANT: The calculated attributions only make sense for a duplicate/not_duplicate binary classification task of the two records.




The prediction dictionary with a newly added "explain" key

# Inherited members

# RecordPairClassificationConfiguration Class

class RecordPairClassificationConfiguration (*args, **kwds)

Config for record pair classification head component

# Ancestors

# Inherited members

Maintained by