speech-recognition

Cannot Train Wav2vec XLSR Model With Common Voice Data

Cannot Train Wav2vec XLSR Model With Common Voice Data Question: I am trying to train a transformer ASR model with wav2vec XLSR in the danish language, but whenever I try to pull the danish dataset with datasets library it’s giving me an error.. Notebook link error log: ValueError: BuilderConfig da not found. Available: [‘ab’, ‘ar’, …

Total answers: 1

Import "speech_recognition" could not be resolved

Import "speech_recognition" could not be resolved Question: I installed the speech recognition and the pyttsx3 libraries pip install SpeechRecognition pip install pyttsx3 but when i try to import them it gives two errors Import "speech_recognition" could not be resolved Import "pyttsx3" could not be resolved heres my code import speech_recognition as sr import pyttsx3 audio …

Total answers: 1

FLAC conversion utility not available – consider installing the FLAC command line application on Spyder/Windows 10

FLAC conversion utility not available – consider installing the FLAC command line application on Spyder/Windows 10 Question: I’m working on a speech recognition and following the example shown in this PythonCode page on Windows 10 with Spyder 5.1.5/Anaconda (Python 3.8.10). I installed SpeechRecognition and pydub with conda install -c conda-forge, and when I run the …

Total answers: 2

Python library to identify spoken numbers and characters?

Python library to identify spoken numbers and characters? Question: Is there a library that can translate spoken characters/numbers to text? Most of what I have found after googling (example SpeechRecognition) has the ambition of identifying words in a certain language, but I need something "dumber". It should only identify single characters/numbers and not try to …

Total answers: 1

Python Speech Recognition Change Between Microphones

Python Speech Recognition Change Between Microphones Question: By running the following code i get all my available microphone: import speech_recognition as sr for index, name in enumerate(sr.Microphone.list_microphone_names()): print(f'{index}, {name}’) These are all my microphones (and other things) that i have: 0, Microsoft Sound Mapper – Input 1, Microphone (Realtek(R) Audio) 2, Stereo Mix (Realtek(R) Audio) …

Total answers: 2

Invalid argument: Dimension -972891 must be >= 0

Invalid argument: Dimension -972891 must be >= 0 Question: I have created a data pipeline using tf.data for speech recognition using the following code snippets: def get_waveform_and_label(file_path): label = tf.strings.split(file_path, os.path.sep)[-2] audio_binary = tf.io.read_file(file_path) audio, _ = tf.audio.decode_wav(audio_binary) waveform = tf.squeeze(audio, axis=-1) return waveform, label def get_spectrogram(waveform): # Padding for files with less than 16000 …

Total answers: 4

How to add Hotword Detection in python AI

How to add Hotword Detection in python AI Question: I am trying to make a python AI using the speech_recognition module, and i want to add a hotword detection feature in the AI, so I tried to make it using speech_recognition module but it didn’t work. It listened once after 4 – 5 seconds and …

Total answers: 1

Transcribing mp3 to text (python) –> "RIFF id" error

Transcribing mp3 to text (python) –> "RIFF id" error Question: I am trying to turn mp3 file to text, but my code returns the error outlined below. Any help is appreciated! This is a sample mp3 file. And below is what I have tried: import speech_recognition as sr print(sr.__version__) r = sr.Recognizer() file_audio = sr.AudioFile(r"C:UsersAndrewPodcast.mp3") …

Total answers: 3

Custom audio input bytes to azure cognitive speech translation service in Python

Custom audio input bytes to azure cognitive speech translation service in Python Question: I am in need to able to translate custom audio bytes which I can get from any source and translate the voice into the language I need (currently Hindi). I have been trying to pass custom audio bytes using following code in …

Total answers: 3

How can I do real-time voice activity detection in Python?

How can I do real-time voice activity detection in Python? Question: I am performing a voice activity detection on the recorded audio file to detect speech vs non-speech portions in the waveform. The output of the classifier looks like (highlighted green regions indicate speech): The only issue I face here is making it work for …

Total answers: 5