paddlespeech.s2t.utils.error_rate module
This module provides functions to calculate error rate in different level. e.g. wer for word-level, cer for char-level.
- class paddlespeech.s2t.utils.error_rate.ErrorCalculator(char_list, sym_space, sym_blank, report_cer=False, report_wer=False)[source]
Bases:
objectCalculate CER and WER for E2E_ASR and CTC models during training.
- Parameters
y_hats -- numpy array with predicted text
y_pads -- numpy array with true (target) text
char_list -- List[str]
sym_space -- <space>
sym_blank -- <blank>
- Returns
Methods
__call__(ys_hat, ys_pad[, is_ctc])Calculate sentence-level WER/CER score.
calculate_cer(seqs_hat, seqs_true)Calculate sentence-level CER score.
calculate_cer_ctc(ys_hat, ys_pad)Calculate sentence-level CER score for CTC.
calculate_wer(seqs_hat, seqs_true)Calculate sentence-level WER score.
convert_to_char(ys_hat, ys_pad)Convert index to character.
- calculate_cer(seqs_hat, seqs_true)[source]
Calculate sentence-level CER score.
- Parameters
seqs_hat (list) -- prediction
seqs_true (list) -- reference
- Returns
average sentence-level CER score
:rtype float
- calculate_cer_ctc(ys_hat, ys_pad)[source]
Calculate sentence-level CER score for CTC.
- Parameters
ys_hat (paddle.Tensor) -- prediction (batch, seqlen)
ys_pad (paddle.Tensor) -- reference (batch, seqlen)
- Returns
average sentence-level CER score
:rtype float
- paddlespeech.s2t.utils.error_rate.cer(reference, hypothesis, ignore_case=False, remove_space=False)[source]
Calculate charactor error rate (CER). CER compares reference text and hypothesis text in char-level. CER is defined as:
\[CER = (Sc + Dc + Ic) / Nc\]where
Sc is the number of characters substituted, Dc is the number of characters deleted, Ic is the number of characters inserted Nc is the number of characters in the reference
We can use levenshtein distance to calculate CER. Chinese input should be encoded to unicode. Please draw an attention that the leading and tailing space characters will be truncated and multiple consecutive space characters in a sentence will be replaced by one space character.
- Parameters
reference (str) -- The reference sentence.
hypothesis (str) -- The hypothesis sentence.
ignore_case (bool) -- Whether case-sensitive or not.
remove_space (bool) -- Whether remove internal space characters
- Returns
Character error rate.
- Return type
float
- Raises
ValueError -- If the reference length is zero.
- paddlespeech.s2t.utils.error_rate.char_errors(reference, hypothesis, ignore_case=False, remove_space=False)[source]
Compute the levenshtein distance between reference sequence and hypothesis sequence in char-level.
- Parameters
reference (str) -- The reference sentence.
hypothesis (str) -- The hypothesis sentence.
ignore_case (bool) -- Whether case-sensitive or not.
remove_space (bool) -- Whether remove internal space characters
- Returns
Levenshtein distance and length of reference sentence.
- Return type
list
- paddlespeech.s2t.utils.error_rate.wer(reference, hypothesis, ignore_case=False, delimiter=' ')[source]
Calculate word error rate (WER). WER compares reference text and hypothesis text in word-level. WER is defined as:
\[WER = (Sw + Dw + Iw) / Nw\]where
Sw is the number of words subsituted, Dw is the number of words deleted, Iw is the number of words inserted, Nw is the number of words in the reference
We can use levenshtein distance to calculate WER. Please draw an attention that empty items will be removed when splitting sentences by delimiter.
- Parameters
reference (str) -- The reference sentence.
hypothesis (str) -- The hypothesis sentence.
ignore_case (bool) -- Whether case-sensitive or not.
delimiter (char) -- Delimiter of input sentences.
- Returns
Word error rate.
- Return type
float
- Raises
ValueError -- If word number of reference is zero.
- paddlespeech.s2t.utils.error_rate.word_errors(reference, hypothesis, ignore_case=False, delimiter=' ')[source]
Compute the levenshtein distance between reference sequence and hypothesis sequence in word-level.
- Parameters
reference (str) -- The reference sentence.
hypothesis (str) -- The hypothesis sentence.
ignore_case (bool) -- Whether case-sensitive or not.
delimiter (char) -- Delimiter of input sentences.
- Returns
Levenshtein distance and word number of reference sentence.
- Return type
list