Between 2008 and 2013, I began to suffer from carpal tunnel syndrome. During that same time, I realized how much I love programming. When I saw Tavis Rudd’s Pycon talk, I thought to myself, “That’s awesome,” then, “I wonder if I can make that work.” I have, and I’ve learned a lot in the process. In this series of articles, I’m going to lay out what I’ve learned about voice programming, the software that enables it, and the growing community which uses and maintains that software.
Speech Recognition Engines
The first part of programming by voice is getting a computer to recognize what you’re saying in human readable words. For this, you need a speech recognition engine. There are a lot of voice-recognition projects for Linux. Windows Speech Recognition is decent. Google’s speech recognition is also coming along. But if you intend to program by voice, there’s really only one show in town.
Dragon NaturallySpeaking
Dragon NaturallySpeaking, by Nuance, is a Windows program which turns spoken audio into text. It uses the same technology which powers Apple’s Siri, and it’s available in quite a few languages, such as German, Spanish, Japanese, and of course, English. Dragon is king not because of its fancy features (posting to Twitter/Facebook, controlling Microsoft software, etc.), but simply because it’s a lot more accurate than the competition.
At this point in the article, I’d love to tell you all about how to use Dragon, and what it can do, but Nuance has already done an excellent job of providing instructional materials and videos, so I’ll simply link you to their stuff: Dragon PDF files, Dragon instructional YouTube channel.
Improving Performance
Although Dragon’s recognition is amazing out-of-the-box, there are a few things you can do to improve the speed and accuracy of recognition even more.
- Upgrade your hardware.
Nuance recommends 8 GB of RAM, and users on the Knowbrainer forums have reported significant performance gains with better microprocessors and soundcards. Also, although Dragon supports many kinds of microphones (including your iPhone/Android!), recognition is generally better with a powered USB microphone. - Use optimal profile settings.
When creating a profile, click on “Advanced” and choose a Medium size vocabulary and BestMatch IV. BestMatch V has performance issues with Natlink. - Train your Dragon.
Select “Audio” -> “Read text to improve accuracy” and do some readings say Dragon can adjust to your voice. Although Dragon 13 eliminated this step, as mentioned in the previous article, Dragon 12 is best for programming. In my experience, one or two readings are all Dragon needs and it’s diminishing returns after that. - Speak in short phrases.
Whole sentence dictation is better for natural language sentences (because Dragon is able to deduce from context which words you said), but spoken command chains (for which Dragon’s context is useless) require perfection and can’t be corrected with Dragon’s Select-and-Say.
There is no Programmer Edition
As you can see, Dragon can probably transcribe faster than you can type, and it’s useful for common office tasks and Internet browsing. That said, it isn’t intended for dictating code. In order to dictate the following JavaScript snippet, you would have to say,
“spell F O R space open paren I N T space I equals zero semicolon”
just to get to the first semicolon:
for (int i=0; i<5; i++){
console.log(i);
}
Obviously unworkable. You’d be better off trying to type it out with your feet. Dragon’s technology is geared toward recognizing spoken language syntax, not programming language syntax. (It is possible to create custom commands, including text commands, with Dragon’s Advanced Scripting if you buy the professional version, but the Advanced Scripting language is cumbersome, limited, and really not all that advanced.)
If it’s so terrible at dictating code, why am I recommending* it wholeheartedly?
Natlink
In 1999, Joel Gould created Natlink, an extension for Dragon which allows a user to create custom voice commands using Python rather than Advanced Scripting/ Visual Basic. This was huge. It meant that Dragon’s super-accurate word recognition could be chained to the expressive power of the entire Python language!
Gould went on to document and open-source his work. Natlink has been hosted at SourceForge since 2003 and has been maintained for some time by Quintijn Hoogenboom and a few others as new versions of Dragon have been released. At present, it gets about 300 downloads per month, and for the last few years, the number of downloads it’s gotten annually has increased by about 1000 per year. It’s used by a number of other popular voice software packages, including VoiceCoder, Vocola, Unimacro, and Dragonfly.
With Natlink, instead of spelling out almost every single character manually, you might make a macro which handles the creation of for-loops, and another which handles printing to the console. Then, in order to speak the above JavaScript snippet into existence, you would only say
for loop five, log I
and be finished with the whole thing. Quite an improvement, isn’t it?
Getting Started – Not Yet
In order to install Natlink, follow the instructions here. Then you can read the documentation here and here, and look at the sample macros here. But don’t bother with the Natlink docs and examples, at least not yet. Most likely, whatever you’re hoping to achieve can be done more cleanly and easily with one of the other software packages mentioned above, VoiceCoder, Vocola, Unimacro or Dragonfly. In the next article, I’ll talk about each of these, and some interesting projects in the Dragonfly family.
* Due to app compatibility issues with Dragon 13, I decided to downgrade back to version 12.5, and so it is version 12.5 that I recommend, not version 13.