VAX : the genuine Dectalk speech engine on Emy

Now you can add some texts, lyrics, poetry, or Kraftwerk’s style “sprechgesang” to your Eurorack set-up!

Add the TextToSpeech click from MikroElektronika to your Emy, insert the SD card with the VAX firmware, and enjoy the fun of the Dectalk Speech engine.

Text To Speech click is a mikroBUS™ add-on board that carries an Epson S1V30120 speech synthesis IC. The IC is powered by the Fonix DECtalk® v5 speech synthesis engine that can talk in US English, Castilian Spanish, or Latin American Spanish, in one of nine pre-defined voices.

What is Dectalk

Dectalk was a speech synthesizer and text-to-speech technology developed by Digital Equipment Corporation in 1984, based largely on the work of Dennis Klatt at MIT. The Dectalk Express what connected to the serial port and would simply speak what was being “printed”.

The synthesizer can process text and produce speech with 9 different voices. The Dectalk engine includes a parser that gives users fine control over the quality, pitch, and intonation of the synthesized speech. Dectalk can also be programmed to play phonemes and sing with quite a realistic expression.

[hxae<300,10>piy<300,10> brr<600,12>th<100>dey<600,10> tuw<600,15> yu<1200,14>_<120>]
[hxae<300,10>piy<300,10> brr<600,12>th<100>dey<600,10> tuw<600,17> yu<1200,15>_<120>]
[hxae<300,10>piy<300,10>
brr<600,22>th<100>dey<600,19>dih<600,15>rdeh<600,14>ktao<600,12>k_<120>_<120>]
[hxae<300,20>piy<300,20> brr<600,19>th<100>dey<600,15> tuw<600,17> yu<1200,15>]

The command syntax for coding musical sequences is:

[phoneme<duration, pitch number>]

Timing

Latency
There is a latency of 200 ms between the trigger and the start of the speech. This latency is very consistent so it allows the speech to stays in the tempo even if not exactly on the beat. The firmware uses the gate going down to stop the speech, preparing the chip for the next utterance, so when sequencing some stutter-like speech in a loop they still fire up in sync with the tempo.

Real-time
The various voice parameters are applied just before triggering the speech and are ineffective while speaking. They are applied to the next utterance. The best is to fiddle a bit with the knobs for the desired effects.