Addition of Python API to ailia AI Voice and ailia AI Speech | by David Cochard | axinc-ai | Oct, 2024

ailia AI Voice is a library that performs speech synthesis using GPT-SoVITS, while ailia AI Speech is a library that performs speech recognition using Whisper.

Previously, these libraries provided bindings for C++, C#, and Flutter, and we just added Python bindings.

ailia AI Voice and ailia AI Speech have very few dependencies and run on ONNX without using PyTorch, enabling stable operation without relying on framework versions. Additionally, after prototyping in Python, you can seamlessly deploy to mobile devices like iOS or Android using bindings for Unity or Flutter.

Both modules can be install via pip

pip3 install ailia_voice
pip3 install ailia_speech

Using the Python bindings for ailia AI Voice and ailia AI Speech, speech synthesis and speech recognition can be achieved in just a few line of code. The models are also downloaded automatically.

Speech synthesis with ailia AI Voice

As shown in the sample below, download the reference_audio_girl.wav file, perform speech synthesis based on the voice in this file, and save the result.

import ailia_voiceimport librosa
import time
import soundfile


import os
import urllib.request
# Load reference audio
ref_text = "水をマレーシアから買わなくてはならない。"
ref_file_path = "reference_audio_girl.wav"
if not os.path.exists(ref_file_path):
urllib.request.urlretrieve(
"https://github.com/axinc-ai/ailia-models/raw/refs/heads/master/audio_processing/gpt-sovits/reference_audio_captured_by_ax.wav",
"reference_audio_girl.wav"
)
audio_waveform, sampling_rate = librosa.load(ref_file_path, mono=True)
# Infer
voice = ailia_voice.GPTSoVITS()
voice.initialize_model(model_path = "./models/")
voice.set_reference_audio(ref_text, ailia_voice.AILIA_VOICE_G2P_TYPE_GPT_SOVITS_JA, audio_waveform, sampling_rate)
buf, sampling_rate = voice.synthesize_voice("こんにちは。今日はいい天気ですね。", ailia_voice.AILIA_VOICE_G2P_TYPE_GPT_SOVITS_JA)
# Save result
soundfile.write("output.wav", buf, sampling_rate)

Speech recognition with ailia AI Speech

As shown below, download the demo.wav file and perform speech recognition on it. Since the return value is a generator, you can sequentially obtain the recognition results even for long audio files.

import ailia_speechimport librosa
import os
import urllib.request
# Load target audio
input_file_path = "demo.wav"
if not os.path.exists(input_file_path):
urllib.request.urlretrieve(
"https://github.com/axinc-ai/ailia-models/raw/refs/heads/master/audio_processing/whisper/demo.wa",
"demo.wav"
)
audio_waveform, sampling_rate = librosa.load(input_file_path, mono=True)
# Infer
speech = ailia_speech.Whisper()
speech.initialize_model(model_path = "./models/", model_type = ailia_speech.AILIA_SPEECH_MODEL_TYPE_WHISPER_MULTILINGUAL_SMALL)
recognized_text = speech.transcribe(audio_waveform, sampling_rate)
for text in recognized_text:
print(text)

Various parameters can be passed to the ailia SDK constructor. For example if you want to use the GPU, you can configure it as shown below.

import ailia
import ailia_voice
import ailia_speechenv_id = ailia.get_gpu_environment_id()
voice = ailia_voice.GPTSoVITS(env_id = env_id)
speech = ailia_speech.Whisper(env_id = env_id)

If the AI model files exist in the model_path, both speech synthesis and speech recognition will operate completely offline.

By providing a function to Whisper’s callback, it is possible to obtain intermediate results during speech recognition.

import ailia_speechdef f_callback(text):
print(text)
speech = ailia_speech.Whisper(callback = f_callback)

Addition of Python API to ailia AI Voice and ailia AI Speech | by David Cochard | axinc-ai | Oct, 2024

Speech synthesis with ailia AI Voice

Speech recognition with ailia AI Speech

Recent Articles

7 Emerging Trends in Generative AI and Their Real-World Impact

Coinbase Agents Bribed, Data of ~1% Users Leaked; $20M Extortion Attempt Fails

Papers Explained 368: ThinkPRM. ThinkPRM, a long CoT verifier… | by Ritvik Rastogi | May, 2025

Crypto elite increasingly worried about their personal safety

AWS machine learning supports Scuderia Ferrari HP pit stop analysis

Related Stories

Leave A Reply Cancel reply