Azure text to speech and play it in virtual microphone using python

Question:

My use case is to convert text to speech using Azure and then play it into a virtual microphone.

option 1 – with an intermediate .wav file

I tried both steps manually on a Jupiter notebook.
The problem is, the output .wav file of Azure cannot be played directly on the python
"error: No file ‘file.wav’ found in working directory". When I restart the python kernal, audio can be played.

text-to-speech

audio_config = speechsdk.audio.AudioOutputConfig(filename="file.wav")
...
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()

audio play

mixer.init(devicename = 'Line 1 (Virtual Audio Cable)')
mixer.music.load("file.wav")
mixer.music.play()

option 2 – direct stream to audio device

I tried to configure the audio output device of azure SDK.
this method worked for output devices. but when I add an ID of the virtual microphone, it won’t play any sound.

audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=False,device_name="{0.0.0.00000000}.{9D30BDBF-1418-4AFC-A709-CD4C431833E2}")

Also it will be much better if there is any other method that can direct the audio to a virtual microphone instead of the speaker.

Asked By: George Raveen

||

Answers:

Create a speech service and get the key and location of the service.

enter image description here

Then set the environment with that key. Open command prompt and use the below code block.

setx SPEECH_KEY yourkey

Use import azure.cognitiveservices.speech as speechsdk

After conversion, use the below code block to get the virtual device.

audio_config = AudioConfig(device_name="<device id>");

Get the device speaker information and set it in this location.

Answered By: TadepalliSairam

I found a solution by changing the output a stream, saving to a file and then play it through pygame as follows,

speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=None)
speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()
stream = speechsdk.AudioDataStream(speech_synthesis_result)
stream.save_to_wav_file("file.wav")

mixer.init(devicename = 'Line 1 (Virtual Audio Cable)')
mixer.music.load("file.wav")
mixer.music.play()

Also much appreciated if there is any other method that doesn’t need any intermediate audio file.

Answered By: George Raveen