Giving Discord bots a voice with speech recognition and text to speech.
I'd like to create a "no code needed" version of this project, because as you may be able to tell, the manual setup described in this README is not exactly a cake walk. Specifically, I'd like to create a service that handles hosting, Google Speech-To-Text, and Porcupine (wakeword detection) for you. The service would have to be paid (something like $10/mo) because Google STT, hosting, and Porcupine all cost money.
I don't want to get started on this until I know there is sufficient demand for it. I've created a discussion thread for this.
If you find this project useful, a donation helps out a lot! Donate
- VocalCord is a library, not a standalone bot. VocalCord is built on the excellent JDA, providing a dead simple wrapper to receive voice transcripts and generate speech audio. VocalCord is a tool to build whatever your imagination decides.
- Porcupine is used for wake detection, it's incredibly accurate and works consistently well.
- Google Speech to Text is used for speech recognition, it's decently fast and provides accurate transcripts.
- Google Text to Speech is used for text to speech generation, it works great and is fast.
- VocalCord officially supports Windows and Linux
- Thanks to Olical for some great examples that really helped in developing the bot.
Porcupine requires you to build a wake phrase AI model for every wake phrase you'd like to use. This process can take about 3 hours, so if you're eager to get started, do this right away.
- Create a Porcupine account at Picovoice Console
- Under the "Wake Words" utility, enter your wake phrase into the "Phrase" box. I haven't had much feedback yet about how carried away you can get with wake words, but as it takes three hours, I would recommend trying to choose crisp, unambigious words that Porcupine is unlikely to get confused with similar words.
- For Linux, select
Linux (x86_64)
. For Windows, selectWindows (x86_64)
. - Click "Train" to begin training the model. Check back in about 3 hours.
- VocalCord supports multiple wake phrases at once or even different wake phrases for different users. Generate a wake phrase model for each wakeup phrase you'd like to use.
- Go to the Discord Developer Console and click "New application".
- On the left sidebar of the application view, selected "Bot"
- Click "Add Bot"
- Click "Copy" under the token header. This is your Discord bot token, put it in a safe place (keep it secret!).
- Select the "OAuth2" tab on the left sidebar
- Under "Scopes" make sure "bot" is checked.
- Enable any permissions your bot will utilize under the "Bot permissions" header. You will need to check
Connect
,Speak
, andUse Voice Activity
to use speech recognition and generation facilities. - Discord will auto generate a link for you, copy this link and paste it into your browser. From here, you may select which server you'd like to add the bot to.
- Navigate to Google Cloud Console
- In the lop left, select the projects drop down and create a new project.
- Once your project is created, click the "APIs & Services" card.
- From here, select the "Dashboard" tab on the left sidebar, click "Enable APIs and Services"
- Search for and enable
Cloud Speech-to-Text API
andCloud Text-to-Speech API
- On the left sidebar, select "Credentials", then under "Service Accounts", selected "Manage service accounts". Give your service account a name, and everything at its default. You will need to click the "Create Key" button, make sure JSON is selected, and hit "Create". This will download a JSON file. This is your credentials for using Google APIs, keep it secret! Save it to a location where you will remember where it is.
- You will need to add an environment variable named
GOOGLE_APPLICATION_CREDENTIALS
where its value is the path to the Google Credential JSON file you downloaded in the last step. - On Windows, open the start menu and search "Edit the system environment variables". Click "Environment Variables" and under System Variables, click "New"
- For "Variable name", enter
GOOGLE_APPLICATION_CREDENTIALS
- For "Variable value", enter the path to your Google Credentials JSON, for example:
C:\Users\wdavi\IdeaProjects\VocalCord\vocalcord-gcs.json
. It does not matter where you put this .json file on your system, as long as the PATH points correctly to it.
- Edit your
.bashrc
file by enteringsudo nano ~/.bashrc
- Add the line
export GOOGLE_APPLICATION_CREDENTIALS="path-to-google-creds.JSON"
to the end of the file and save. Example:export GOOGLE_APPLICATION_CREDENTIALS="/mnt/c/Users/wdavi/IdeaProjects/VocalCord/vocalcord-gcs.json"
- Restart your terminal for this change to take effect.
The recommended IDE is InteliJ IDEA.
- Download Java SDK 12.0.2 and extract to
C:\Program Files\Java
. Your installation folder should be something likeC:\Program Files\Java\jdk-12.0.2
. If you're on Linux, runsudo apt-get install openjdk-12-jdk
- Click
New > New Project
- On the left side panel, select
Gradle
, and checkJava
. - Give the project a name and hit
Finish
- Ensure you are using JDK 12
File > Settings > Build, Execution, Deployment > Gradle > Gradle JVM
should be set to your JDK 12Right click project > Open Module Settings > Project > Project SDK
should be set to your JDK 12Right click project > Open Module Settings > Project > Project language level
should be12 - No new language features
Right click project > Open Module Settings > Modules > Module SDK
should be set to your JDK 12
- Edit your
build.gradle
file to installVocalCord
:
repositories {
mavenCentral()
maven { url 'https://jitpack.io' }
jcenter()
}
dependencies {
implementation 'net.dv8tion:JDA:4.1.1_136'
implementation 'com.google.cloud:google-cloud-speech:1.22.6'
implementation 'com.google.cloud:google-cloud-texttospeech:1.0.2'
implementation 'com.github.wdavies973:VocalCord:2.3'
}
VocalCord uses Porcupine for wake detection, however Porcupine does not support Java. Instead, VocalCord uses the Java Native Interface (JNI) to wrap the Porcupine C library in Java bindings. You will need to obtain the Porcupine dynamic library, as well as the VocalCord wrapper dynamic library. VocalCord will load the wrapper library, which will in turn load the Porcupine dynamic library.
- Create a folder with your root project directory called "native", within this create a subdirectory labeled "linux"
- Download libjni_porcupine.so
- Download libpv_porcupine.so
- Download porcupine_params.pv
- Move
libjni_porcupine.so
andlibpv_porcupine.so
intonative/linux
- Move
porcupine_params.pv
intonative
- You
native
directory should look like this.
- Create a folder with your root project directory called "native", within this create a subdirectory labeled "linux"
- Download libjni_porcupine.dll
- Download libpv_porcupine.dll
- Download porcupine_params.pv
- Move
libjni_porcupine.so
andlibpv_porcupine.so
intonative/linux
- Move
porcupine_params.pv
intonative
- You
native
directory should look like this.
Once Porcupine's wake phrase training is done, you should also move your wake_phrase.ppn
file to native/
You are now ready to configure your application and begin hacking.
You can find a basic example here.
You can read up on most configuration options in the VocalCord docs
JDA enforces a restriction of only one AudioSendHandler
at once. This introduces a problem if you want to use TTS and a music bot. To address this problem, VocalCord implements a audio send multliplexer, which essentially will mix the audio between your music send handler and VocalCord's internal TTS SendHandler. Currently, there are two send multiplex modes, Switch
, which will pause your music while TTS is occuring, and Blend
which will lower the volume of your music bot while TTS is occuring. Blend
is currently not implemented yet.
Upcoming features:
Blend
multiplexing mode- Option to use offline Picovoice Cheetah voice recognition for faster voice recognition.
- Continuation phrases so the bot can carry out an ongoing conversation
- Improvements to command chain
If you need help or have any suggestions, feel free to contact me at [email protected]