# biome.text.modules.heads.classification.record_pair_classification Module

# RecordPairClassification Class


class RecordPairClassification (
    backbone: ModelBackbone,
    labels: List[str],
    field_encoder: Seq2VecEncoderConfiguration,
    record_encoder: Seq2SeqEncoderConfiguration,
    matcher_forward: BiMpmMatchingConfiguration,
    aggregator: Seq2VecEncoderConfiguration,
    classifier_feedforward: FeedForwardConfiguration,
    matcher_backward: BiMpmMatchingConfiguration = None,
    dropout: float = 0.1,
    initializer: allennlp.nn.initializers.InitializerApplicator = <allennlp.nn.initializers.InitializerApplicator object>,
)

Classifies the relation between a pair of records using a matching layer.

The input for models using this TaskHead are two records with one or more data fields each, and a label describing their relationship. If you would like a meaningful explanation of the model's prediction, both records must consist of the same number of data fields and hold them in the same order.

The architecture is loosely based on the AllenNLP implementation of the BiMPM model described in Bilateral Multi-Perspective Matching for Natural Language Sentences <https://arxiv.org/abs/1702.03814>_ by Zhiguo Wang et al., 2017, and was adapted to deal with record pairs.

Parameters

backbone : ModelBackbone
Takes care of the embedding and optionally of the language encoding
labels : List[str]
List of labels
field_encoder : Seq2VecEncoder
Encodes a data field, contextualized within the field
record_encoder : Seq2SeqEncoder
Encodes data fields, contextualized within the record
matcher_forward : BiMPMMatching
BiMPM matching for the forward output of the record encoder layer
matcher_backward : BiMPMMatching, optional
BiMPM matching for the backward output of the record encoder layer
aggregator : Seq2VecEncoder
Aggregator of all BiMPM matching vectors
classifier_feedforward : FeedForward
Fully connected layers for classification. A linear output layer with the number of labels at the end will be added automatically!!!
dropout : float, optional (default=0.1)
Dropout percentage to use.
initializer : InitializerApplicator, optional (default=``InitializerApplicator()``)
If provided, will be used to initialize the model parameters.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

# Ancestors

  • ClassificationHead
  • TaskHead
  • torch.nn.modules.module.Module
  • allennlp.common.registrable.Registrable
  • allennlp.common.from_params.FromParams

# featurize Method


def featurize (
  self,
  record1: Dict[str, Any],
  record2: Dict[str, Any],
  label: Union[str, NoneType] = None,
)  -> Union[allennlp.data.instance.Instance, NoneType]

Tokenizes, indexes and embeds the two records and optionally adds the label

Parameters

record1 : Dict[str, Any]
First record
record2 : Dict[str, Any]
Second record
label : Optional[str]
Classification label

Returns

instance
AllenNLP instance containing the two records plus optionally a label

# forward Method


def forward (
  self,
  record1: Dict[str, Dict[str, torch.Tensor]],
  record2: Dict[str, Dict[str, torch.Tensor]],
  label: torch.LongTensor = None,
)  -> TaskOutput

Parameters

record1
Tokens of the first record. The dictionary is the output of a ListField.as_array(). It gives names to the tensors created by the TokenIndexers. In its most basic form, using a SingleIdTokenIndexer, the dictionary is composed of: {"tokens": {"tokens": Tensor(batch_size, num_fields, num_tokens)}}. The dictionary is designed to be passed on directly to a TextFieldEmbedder, that has a TokenEmbedder for each key in the dictionary (except you set allow_unmatched_keys in the TextFieldEmbedder to False) and knows how to combine different word/character representations into a single vector per token in your input.
record2
Tokens of the second record.
label : torch.LongTensor, optional (default = None)
A torch tensor representing the sequence of integer gold class label of shape (batch_size, num_classes).

Returns

An output dictionary consisting of:
 
logits : torch.FloatTensor
 
class_probabilities : torch.FloatTensor
 
loss : torch.FloatTensor, optional
A scalar loss to be optimised.

# explain_prediction Method


def explain_prediction (
  self,
  prediction: Dict[str, ],
  instance: allennlp.data.instance.Instance,
  n_steps: int,
)  -> Dict[str, Any]

Calculates attributions for each data field in the record by integrating the gradients.

IMPORTANT: The calculated attributions only make sense for a duplicate/not_duplicate binary classification task of the two records.

Parameters

prediction
 
instance
 
n_steps
 

Returns

prediction_dict
The prediction dictionary with a newly added "explain" key

# Inherited members

# RecordPairClassificationConfiguration Class


class RecordPairClassificationConfiguration (*args, **kwds)

Config for record pair classification head component

# Ancestors

# Inherited members

Maintained by