paddlespeech.t2s.datasets.am_batch_fn module

class paddlespeech.t2s.datasets.am_batch_fn.ErnieSATCollateFn(mlm_prob: float = 0.8, mean_phn_span: int = 8, seg_emb: bool = False, text_masking: bool = False)[source]

Bases: object

Functor class of common_collate_fn()

Methods

__call__(exmaples)

Call self as a function.

paddlespeech.t2s.datasets.am_batch_fn.build_erniesat_collate_fn(mlm_prob: float = 0.8, mean_phn_span: int = 8, seg_emb: bool = False, text_masking: bool = False)[source]
paddlespeech.t2s.datasets.am_batch_fn.erniesat_batch_fn(examples, mlm_prob: float = 0.8, mean_phn_span: int = 8, seg_emb: bool = False, text_masking: bool = False)[source]
paddlespeech.t2s.datasets.am_batch_fn.fastspeech2_multi_spk_batch_fn(examples)[source]
paddlespeech.t2s.datasets.am_batch_fn.fastspeech2_single_spk_batch_fn(examples)[source]
paddlespeech.t2s.datasets.am_batch_fn.speedyspeech_multi_spk_batch_fn(examples)[source]
paddlespeech.t2s.datasets.am_batch_fn.speedyspeech_single_spk_batch_fn(examples)[source]
paddlespeech.t2s.datasets.am_batch_fn.tacotron2_multi_spk_batch_fn(examples)[source]
paddlespeech.t2s.datasets.am_batch_fn.tacotron2_single_spk_batch_fn(examples)[source]
paddlespeech.t2s.datasets.am_batch_fn.transformer_single_spk_batch_fn(examples)[source]
paddlespeech.t2s.datasets.am_batch_fn.vits_multi_spk_batch_fn(examples)[source]
Returns:
Dict[str, Any]:
  • text (Tensor): Text index tensor (B, T_text).

  • text_lengths (Tensor): Text length tensor (B,).

  • feats (Tensor): Feature tensor (B, T_feats, aux_channels).

  • feats_lengths (Tensor): Feature length tensor (B,).

  • speech (Tensor): Speech waveform tensor (B, T_wav).

  • spk_id (Optional[Tensor]): Speaker index tensor (B,) or (B, 1).

  • spk_emb (Optional[Tensor]): Speaker embedding tensor (B, spk_embed_dim).

paddlespeech.t2s.datasets.am_batch_fn.vits_single_spk_batch_fn(examples)[source]
Returns:
Dict[str, Any]:
  • text (Tensor): Text index tensor (B, T_text).

  • text_lengths (Tensor): Text length tensor (B,).

  • feats (Tensor): Feature tensor (B, T_feats, aux_channels).

  • feats_lengths (Tensor): Feature length tensor (B,).

  • speech (Tensor): Speech waveform tensor (B, T_wav).