Transcribe and align audio to text
Convert spoken words into text
Transcribe audio to structured JSON