Azure text to speech and play it in virtual microphone using python

Question

My use case is to convert text to speech using Azure and then play it into a virtual microphone.

option 1 – with an intermediate .wav file

I tried both steps manually on a Jupiter notebook.
The problem is, the output .wav file of Azure cannot be played directly on the python
"error: No file ‘file.wav’ found in working directory". When I restart the python kernal, audio can be played.

text-to-speech

audio_config = speechsdk.audio.AudioOutputConfig(filename="file.wav")
...
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()

audio play

mixer.init(devicename = 'Line 1 (Virtual Audio Cable)')
mixer.music.load("file.wav")
mixer.music.play()

option 2 – direct stream to audio device

I tried to configure the audio output device of azure SDK.
this method worked for output devices. but when I add an ID of the virtual microphone, it won’t play any sound.

audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=False,device_name="{0.0.0.00000000}.{9D30BDBF-1418-4AFC-A709-CD4C431833E2}")

Also it will be much better if there is any other method that can direct the audio to a virtual microphone instead of the speaker.

Asked By: George Raveen

||

Source

Answer 1

Create a speech service and get the key and location of the service.

Then set the environment with that key. Open command prompt and use the below code block.

setx SPEECH_KEY yourkey

Use import azure.cognitiveservices.speech as speechsdk

After conversion, use the below code block to get the virtual device.

audio_config = AudioConfig(device_name="<device id>");

Get the device speaker information and set it in this location.

Answered By: TadepalliSairam

Answer 2

I found a solution by changing the output a stream, saving to a file and then play it through pygame as follows,

speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=None)
speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()
stream = speechsdk.AudioDataStream(speech_synthesis_result)
stream.save_to_wav_file("file.wav")

mixer.init(devicename = 'Line 1 (Virtual Audio Cable)')
mixer.music.load("file.wav")
mixer.music.play()

Also much appreciated if there is any other method that doesn’t need any intermediate audio file.

Answered By: George Raveen