Too Long; Didn't Read
wav2vec2 is a leading machine-learning model for the design of automatic speech recognition (ASR) systems. It is composed of three general components: a Feature Encoder, a Quantization Module, and a Transformer. The model is pretrained on audio-only data to learn basic speech units. The model is then finetuned on labeled data where speech units are mapped to text.
@pictureinthenoise
Picture in the Noise
Speech and language processing. At the end of the beginning.
Receive Stories from @pictureinthenoise
RELATED STORIES
L O A D I N G
. . . comments & more!