paddle speech Logo
latest

Introduction

  • PaddleSpeech

Quick Start

  • Installation
  • Quick Start of Speech-to-Text
  • Quick Start of Text-to-Speech

Speech-to-Text

  • Models introduction
  • Data Preparation
  • Features
  • Ngram LM

Text-to-Speech

  • Advanced Usage
  • Chinese Rule-Based Text Frontend
  • Models introduction
  • GAN Vocoders
  • Audio Sample
  • Audio Sample (PaddleSpeech TTS VS Espnet TTS)

Released Models

  • Released Models

Demos

  • Demo Video
  • Streaming ASR Demo Video
  • TTS Demo Video
  • Streaming TTS Demo Video

API Reference

  • paddleaudio
  • paddlespeech.audio
  • paddlespeech.cli
  • paddlespeech.cls
  • paddlespeech.kws
  • paddlespeech.resource
  • paddlespeech.s2t
    • LayerDict
    • broadcast_shape()
    • cat()
    • contiguous()
    • fill_()
    • func_float()
    • func_int()
    • func_long()
    • is_broadcastable()
    • item()
    • masked_fill()
    • masked_fill_()
    • new_full()
    • repeat()
    • to()
    • tolist()
    • type_as()
    • view()
    • view_as()
    • Subpackages
      • paddlespeech.s2t.decoders package
        • Subpackages
        • Submodules
      • paddlespeech.s2t.exps package
      • paddlespeech.s2t.frontend package
      • paddlespeech.s2t.io package
      • paddlespeech.s2t.models package
      • paddlespeech.s2t.modules package
      • paddlespeech.s2t.training package
      • paddlespeech.s2t.utils package
  • paddlespeech.server
  • paddlespeech.t2s
  • paddlespeech.text
  • paddlespeech.vector
paddle speech
  • paddlespeech.s2t package
  • paddlespeech.s2t.decoders package
  • paddlespeech.s2t.decoders.beam_search package
  • paddlespeech.s2t.decoders.beam_search.beam_search module
  • Edit on GitHub

paddlespeech.s2t.decoders.beam_search.beam_search module

Beam search module.

class paddlespeech.s2t.decoders.beam_search.beam_search.BeamSearch(scorers: Dict[str, ScorerInterface], weights: Dict[str, float], beam_size: int, vocab_size: int, sos: int, eos: int, token_list: Optional[List[str]] = None, pre_beam_ratio: float = 1.5, pre_beam_score_key: Optional[str] = None)[source]

Bases: Layer

Beam search implementation.

Methods

__call__(*inputs, **kwargs)

Call self as a function.

add_parameter(name, parameter)

Adds a Parameter instance.

add_sublayer(name, sublayer)

Adds a sub Layer instance.

append_token(xs, x)

Append new token to prefix tokens.

apply(fn)

Applies fn recursively to every sublayer (as returned by .sublayers()) as well as self.

beam(weighted_scores, ids)

Compute topk full token ids and partial token ids.

buffers([include_sublayers])

Returns a list of all buffers from current layer and its sub-layers.

children()

Returns an iterator over immediate children layers.

clear_gradients()

Clear the gradients of all parameters for this layer.

create_parameter(shape[, attr, dtype, ...])

Create parameters for this layer.

create_tensor([name, persistable, dtype])

Create Tensor for this layer.

create_variable([name, persistable, dtype])

Create Tensor for this layer.

eval()

Sets this Layer and all its sublayers to evaluation mode.

extra_repr()

Extra representation of this layer, you can have custom implementation of your own layer.

forward(x[, maxlenratio, minlenratio])

Perform beam search.

full_name()

Full name for this layer, composed by name_scope + "/" + MyLayer.__class__.__name__

init_hyp(x)

Get an initial hypothesis data.

load_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

merge_scores(prev_scores, next_full_scores, ...)

Merge scores for new hypothesis.

merge_states(states, part_states, part_idx)

Merge states for new hypothesis.

named_buffers([prefix, include_sublayers])

Returns an iterator over all buffers in the Layer, yielding tuple of name and Tensor.

named_children()

Returns an iterator over immediate children layers, yielding both the name of the layer as well as the layer itself.

named_parameters([prefix, include_sublayers])

Returns an iterator over all parameters in the Layer, yielding tuple of name and parameter.

named_sublayers([prefix, include_self, ...])

Returns an iterator over all sublayers in the Layer, yielding tuple of name and sublayer.

parameters([include_sublayers])

Returns a list of all Parameters from current layer and its sub-layers.

post_process(i, maxlen, maxlenratio, ...)

Perform post-processing of beam search iterations.

register_buffer(name, tensor[, persistable])

Registers a tensor as buffer into the layer.

register_forward_post_hook(hook)

Register a forward post-hook for Layer.

register_forward_pre_hook(hook)

Register a forward pre-hook for Layer.

score_full(hyp, x)

Score new hypothesis by self.full_scorers.

score_partial(hyp, ids, x)

Score new hypothesis by self.part_scorers.

search(running_hyps, x)

Search new tokens for running hypotheses and encoded speech x.

set_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

set_state_dict(state_dict[, use_structured_name])

Set parameters and persistable buffers from state_dict.

state_dict([destination, include_sublayers, ...])

Get all parameters and persistable buffers of current layer and its sub-layers.

sublayers([include_self])

Returns a list of sub layers.

to([device, dtype, blocking])

Cast the parameters and buffers of Layer by the give device, dtype and blocking.

to_static_state_dict([destination, ...])

Get all parameters and buffers of current layer and its sub-layers.

train()

Sets this Layer and all its sublayers to training mode.

backward

register_state_dict_hook

static append_token(xs: Tensor, x: Union[int, Tensor]) → Tensor[source]

Append new token to prefix tokens.

Args:

xs (paddle.Tensor): The prefix token, (T,) x (int): The new token to append

Returns:

paddle.Tensor: (T+1,), New tensor contains: xs + [x] with xs.dtype and xs.device

beam(weighted_scores: Tensor, ids: Tensor) → Tuple[Tensor, Tensor][source]

Compute topk full token ids and partial token ids.

Args:
weighted_scores (paddle.Tensor): The weighted sum scores for each tokens.

Its shape is (self.n_vocab,).

ids (paddle.Tensor): The partial token ids(Global) to compute topk.

Returns:
Tuple[paddle.Tensor, paddle.Tensor]:

The topk full token ids and partial token ids. Their shapes are (self.beam_size,). i.e. (global ids, global relative local ids).

forward(x: Tensor, maxlenratio: float = 0.0, minlenratio: float = 0.0) → List[Hypothesis][source]

Perform beam search.

Args:

x (paddle.Tensor): Encoded speech feature (T, D) maxlenratio (float): Input length ratio to obtain max output length.

If maxlenratio=0.0 (default), it uses a end-detect function

to automatically find maximum hypothesis lengths

If maxlenratio<0.0, its absolute value is interpreted

as a constant max output length.

minlenratio (float): Input length ratio to obtain min output length.

Returns:

list[Hypothesis]: N-best decoding results

init_hyp(x: Tensor) → List[Hypothesis][source]

Get an initial hypothesis data.

Args:

x (paddle.Tensor): The encoder output feature, (T, D)

Returns:

Hypothesis: The initial hypothesis.

static merge_scores(prev_scores: Dict[str, float], next_full_scores: Dict[str, Tensor], full_idx: int, next_part_scores: Dict[str, Tensor], part_idx: int) → Dict[str, Tensor][source]

Merge scores for new hypothesis.

Args:
prev_scores (Dict[str, float]):

The previous hypothesis scores by self.scorers

next_full_scores (Dict[str, paddle.Tensor]): scores by self.full_scorers full_idx (int): The next token id for next_full_scores next_part_scores (Dict[str, paddle.Tensor]):

scores of partial tokens by self.part_scorers

part_idx (int): The new token id for next_part_scores

Returns:
Dict[str, paddle.Tensor]: The new score dict.

Its keys are names of self.full_scorers and self.part_scorers. Its values are scalar tensors by the scorers.

merge_states(states: Any, part_states: Any, part_idx: int) → Any[source]

Merge states for new hypothesis.

Args:

states: states of self.full_scorers part_states: states of self.part_scorers part_idx (int): The new token id for part_scores

Returns:
Dict[str, paddle.Tensor]: The new score dict.

Its keys are names of self.full_scorers and self.part_scorers. Its values are states of the scorers.

post_process(i: int, maxlen: int, maxlenratio: float, running_hyps: List[Hypothesis], ended_hyps: List[Hypothesis]) → List[Hypothesis][source]

Perform post-processing of beam search iterations.

Args:

i (int): The length of hypothesis tokens. maxlen (int): The maximum length of tokens in beam search. maxlenratio (int): The maximum length ratio in beam search. running_hyps (List[Hypothesis]): The running hypotheses in beam search. ended_hyps (List[Hypothesis]): The ended hypotheses in beam search.

Returns:

List[Hypothesis]: The new running hypotheses.

score_full(hyp: Hypothesis, x: Tensor) → Tuple[Dict[str, Tensor], Dict[str, Any]][source]

Score new hypothesis by self.full_scorers.

Args:

hyp (Hypothesis): Hypothesis with prefix tokens to score x (paddle.Tensor): Corresponding input feature, (T, D)

Returns:
Tuple[Dict[str, paddle.Tensor], Dict[str, Any]]: Tuple of

score dict of hyp that has string keys of self.full_scorers and tensor score values of shape: (self.n_vocab,), and state dict that has string keys and state values of self.full_scorers

score_partial(hyp: Hypothesis, ids: Tensor, x: Tensor) → Tuple[Dict[str, Tensor], Dict[str, Any]][source]

Score new hypothesis by self.part_scorers.

Args:

hyp (Hypothesis): Hypothesis with prefix tokens to score ids (paddle.Tensor): 1D tensor of new partial tokens to score,

len(ids) < n_vocab

x (paddle.Tensor): Corresponding input feature, (T, D)

Returns:
Tuple[Dict[str, paddle.Tensor], Dict[str, Any]]: Tuple of

score dict of hyp that has string keys of self.part_scorers and tensor score values of shape: (len(ids),), and state dict that has string keys and state values of self.part_scorers

search(running_hyps: List[Hypothesis], x: Tensor) → List[Hypothesis][source]

Search new tokens for running hypotheses and encoded speech x.

Args:

running_hyps (List[Hypothesis]): Running hypotheses on beam x (paddle.Tensor): Encoded speech feature (T, D)

Returns:

List[Hypotheses]: Best sorted hypotheses

class paddlespeech.s2t.decoders.beam_search.beam_search.Hypothesis(yseq: Tensor, score: Union[float, Tensor] = 0, scores: Dict[str, Union[float, Tensor]] = {}, states: Dict[str, Any] = {})[source]

Bases: tuple

Hypothesis data type.

Attributes
score

Alias for field number 1

scores

Alias for field number 2

states

Alias for field number 3

yseq

Alias for field number 0

Methods

asdict()

Convert data to JSON-friendly dict.

count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

asdict() → dict[source]

Convert data to JSON-friendly dict.

property score

Alias for field number 1

property scores

Alias for field number 2

property states

Alias for field number 3

property yseq

Alias for field number 0

paddlespeech.s2t.decoders.beam_search.beam_search.beam_search(x: Tensor, sos: int, eos: int, beam_size: int, vocab_size: int, scorers: Dict[str, ScorerInterface], weights: Dict[str, float], token_list: Optional[List[str]] = None, maxlenratio: float = 0.0, minlenratio: float = 0.0, pre_beam_ratio: float = 1.5, pre_beam_score_key: str = 'full') → list[source]

Perform beam search with scorers.

Args:

x (paddle.Tensor): Encoded speech feature (T, D) sos (int): Start of sequence id eos (int): End of sequence id beam_size (int): The number of hypotheses kept during search vocab_size (int): The number of vocabulary scorers (dict[str, ScorerInterface]): Dict of decoder modules

e.g., Decoder, CTCPrefixScorer, LM The scorer will be ignored if it is None

weights (dict[str, float]): Dict of weights for each scorers

The scorer will be ignored if its weight is 0

token_list (list[str]): List of tokens for debug log maxlenratio (float): Input length ratio to obtain max output length.

If maxlenratio=0.0 (default), it uses a end-detect function to automatically find maximum hypothesis lengths

minlenratio (float): Input length ratio to obtain min output length. pre_beam_score_key (str): key of scores to perform pre-beam search pre_beam_ratio (float): beam size in the pre-beam search

will be int(pre_beam_ratio * beam_size)

Returns:

List[Dict]: N-best decoding results

Previous Next

© Copyright 2021, paddlespeech-developers. Revision d8bf8c6f.

Built with Sphinx using a theme provided by Read the Docs.