arXiv 2410.15608
Moonshine: Speech Recognition for Live Transcription and Voice Commands
By Nat Jeffries, Evan King, et al.
Published 2024-10-21
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
This paper introduces Moonshine, a family of speech recognition models optimized for live transcription and voice command processing. Moonshine is based on an encoder-decoder transformer architecture and employs Rotary Position Embedding (RoPE) instead of traditional absolute position embeddings. The model is trained on speech segments of various lengths, but without using zero-padding, leading to greater efficiency…