arXiv 2410.15608
Moonshine: Speech Recognition for Live Transcription and Voice Commands
By Nat Jeffries, Evan King, et al.
Published 2024-10-21
Discussion
Read the public discussion and references gathered around this paper.
This paper introduces Moonshine, a family of speech recognition models optimized for live transcription and voice command processing. Moonshine is based on an encoder-decoder transformer architecture and employs Rotary Position Embedding (RoPE) instead of traditional absolute position embeddings. The model is trained on speech segments of various lengths, but without using zero-padding, leading to greater efficiency…