arXiv 2410.15608

Moonshine: Speech Recognition for Live Transcription and Voice Commands

By Nat Jeffries, Evan King, et al.

Published 2024-10-21

Discussion

Read the public discussion and references gathered around this paper.

This paper introduces Moonshine, a family of speech recognition models optimized for live transcription and voice command processing. Moonshine is based on an encoder-decoder transformer architecture and employs Rotary Position Embedding (RoPE) instead of traditional absolute position embeddings. The model is trained on speech segments of various lengths, but without using zero-padding, leading to greater efficiency…

View the original paper on arXiv