You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The new audio generation features are awesome, but audio could generation much closer to real time. As it stands, text does not get sent to the audio generation API until the full response has been received. Preventing anything like a real time chat experience.
Describe the solution you'd like
Since the TTS functionality already sends 1 sentence at a time to get generated, we should send the first sentence out to the audio API as soon as it has streamed in. That way, we can get audio back withing seconds of the message being sent instead of waiting for the whole response to come back.
Describe alternatives you've considered
None
Additional context
Technical Notes
There are two places audio is generated which will need to be touched. The regular chat interface and the call interface. We will need to figure out how to tap into stream and, I suppose use the same sentence chunker which is already written chunking text to send to the TTS API for chunking the stream. I've only glanced at the code but this doesn't sound too hard.
Is your feature request related to a problem? Please describe.
The new audio generation features are awesome, but audio could generation much closer to real time. As it stands, text does not get sent to the audio generation API until the full response has been received. Preventing anything like a real time chat experience.
Describe the solution you'd like
Since the TTS functionality already sends 1 sentence at a time to get generated, we should send the first sentence out to the audio API as soon as it has streamed in. That way, we can get audio back withing seconds of the message being sent instead of waiting for the whole response to come back.
Describe alternatives you've considered
None
Additional context
Technical Notes
There are two places audio is generated which will need to be touched. The regular chat interface and the call interface. We will need to figure out how to tap into stream and, I suppose use the same sentence chunker which is already written chunking text to send to the TTS API for chunking the stream. I've only glanced at the code but this doesn't sound too hard.
The text was updated successfully, but these errors were encountered: