Last time, I talked about why Dragon NaturallySpeaking (version 12.5) is currently the best choice of speech engine for voice programming, and how Natlink ties Dragon to Python. Natlink is a great tool by itself, but it also enables lots of other great voice software packages, some of which are explicitly geared toward voice programming, and all of which also offer accessibility and productivity features. In this article, I’m going to go over the purposes and capabilities of a number of these, their dependencies, and how to get started with them.

I’m also going to rate each piece of software with a star rating out of five. () The rating indicates the software’s voice programming potential (“VPP”), not the overall quality of the software.

The Natlink Family

Unimacro, Vocola*, and VoiceCoder all require Natlink and are developed in close proximity with it, so I’ll start with them.

Unimacro (documentation/ download) [VPP: ]

Unimacro is an extension for Natlink which mainly focuses on global accessibility and productivity tasks, like navigating the desktop and web browser. It’s very configurable, requires minimal Python familiarity, and as of release 4.1mike, it has built-in support for triggering AutoHotkey scripts by voice.

Unimacro is geared toward accessibility, not voice programming. Still, it’s useful in a general sense and has potential if you’re an AutoHotkey master.

Vocola (documentation | download) [VPP: ]

Vocola is something of a sister project to Unimacro; their developers work together to keep them compatible. Like Unimacro, Vocola extends Dragon’s command set. Unlike Unimacro, which is modified through the use of .ini files, Vocola allows users to write their own commands. Vocola commands have their own syntax and look like the following.

When you start Dragon, Vocola .vcl files are converted to Python and loaded into Natlink. Vocola can also use “extensions”, Python modules full of functions which can be called from Vocola. These extensions are where Vocola’s real power lies. They allow you to map any piece of Python code to any spoken command.

Vocola’s syntax is easy to learn, it’s well-documented, and it gives you full access to Python, so it’s a powerful step up from Natlink. However, it’s not quite as easy to call Python with as Dragonfly is and its community became somewhat displaced when the SpeechComputing.com forum site went down. (Though it has regrouped some at the Knowbrainer.com forums.)

VoiceCoder (download) [VPP: ]

Also sometimes called VoiceCode (and in no way related to voicecode.io), VoiceCoder’s complex history is best documented by Christine Masuoka’s academic paper. VoiceCoder aims to make speaking code as fluid as possible, and its design is very impressive. (Demo video – audio starts late.)

For having been around for so long, it still has a fairly active community. Mark Lillibridge recently created this FAQ on programming by voice with the help of said community.

VoiceCoder does what it does extremely well, but that’s pretty much limited to coding in C/C++ or Python in Emacs. Furthermore, since its website and wiki are down, documentation is sparse.

Dragonfly (documentation | download | setup) [VPP: ]

In 2008, Christo Butcher created Dragonfly, a Python framework for speech recognition. In 2014, he moved it to Github, which has accelerated its development. The real strength of Dragonfly is its modernization of the voice command programming process. Dragonfly makes it easy for anyone with nominal Python experience to get started on adding voice commands to DNS or WSR. Install it, copy some of the examples into the /MacroSystem/ folder, and you’re ready to go. Here’s an example.

It’s pure Python and unlike vanilla Natlink, it’s API is very clean and simple. It’s well documented and it has a growing community on Github. Furthermore, the _multiedit.py module enables fine-grained control of CCR. (CCR is “continues command recognition”, the ability to string spoken commands together and have them executed one after the other, rather than pausing in between commands. Vocola and VoiceCoder can also use CCR, but not with as much flexibility.)

The Dragonfly Family

Due to Dragonfly’s ease of use, a large number of Dragonfly grammar repositories have appeared on Github recently, as well as some larger projects.

Aenea (Github) [VPP: ]

Aenea is software which allows you to use Dragonfly grammars on Linux or Mac via virtual machine (or own any remote computer for that matter). Getting it set up is pretty complicated but worth it if your workspace of choice (or necessity) is not a Windows machine. Aenea is basically a fully realized version of what Tavis Rudd demonstrated at Pycon.

Damselfly (Github) [VPP: ]

Damselfly lets you use Dragonfly on an installation of Dragon NaturallySpeaking on Wine on Linux, cutting Windows out of the picture entirely.

Caster (download | Github/ documentation) [VPP: ]

Like VoiceCoder, Caster aims to be a general purpose programming tool. Unlike VoiceCoder, Caster aims to work in any programming environment, not just Emacs. It improves upon _multiedit.py by letting you selectively enable/disable CCR subsections. (“Enable Java”, “Disable XML”, etc.) It allows you to run Sikuli scripts by voice. And, like VoiceCoder, it uses Veldicott-style fuzzy string matching to allow dictation of strange/unspeakable symbol names. It is still in active development and contributions or suggestions are welcome.

Some Final Thoughts on Open Source Voice Programming

The open source voice programming community has come along way since its days of obscure academic papers and mailing lists. Community spaces have developed and produced fruit. The projects listed above are downright awesome, all of them, and I’m sure more are coming. There is still much to be achieved, much low hanging fruit to be plucked even. All this has happened in fifteen years. I’m looking forward to the next fifteen.

 

* To be more precise, Vocola 2 requires Natlink. There are several versions, and the most recent version does not.

Between 2008 and 2013, I began to suffer from carpal tunnel syndrome. During that same time, I realized how much I love programming. When I saw Tavis Rudd’s Pycon talk, I thought to myself, “That’s awesome,” then, “I wonder if I can make that work.” I have, and I’ve learned a lot in the process. In this series of articles, I’m going to lay out what I’ve learned about voice programming, the software that enables it, and the growing community which uses and maintains that software.

Speech Recognition Engines

The first part of programming by voice is getting a computer to recognize what you’re saying in human readable words. For this, you need a speech recognition engine. There are a lot of voice-recognition projects for Linux. Windows Speech Recognition is decent. Google’s speech recognition is also coming along. But if you intend to program by voice, there’s really only one show in town.

Dragon NaturallySpeaking

Dragon NaturallySpeaking, by Nuance, is a Windows program which turns spoken audio into text. It uses the same technology which powers Apple’s Siri, and it’s available in quite a few languages, such as German, Spanish, Japanese, and of course, English. Dragon is king not because of its fancy features (posting to Twitter/Facebook, controlling Microsoft software, etc.), but simply because it’s a lot more accurate than the competition.

At this point in the article, I’d love to tell you all about how to use Dragon, and what it can do, but Nuance has already done an excellent job of providing instructional materials and videos, so I’ll simply link you to their stuff: Dragon PDF files, Dragon instructional YouTube channel.

Improving Performance

Although Dragon’s recognition is amazing out-of-the-box, there are a few things you can do to improve the speed and accuracy of recognition even more.

  1. Upgrade your hardware.
    Nuance recommends 8 GB of RAM, and users on the Knowbrainer forums have reported significant performance gains with better microprocessors and soundcards. Also, although Dragon supports many kinds of microphones (including your iPhone/Android!), recognition is generally better with a powered USB microphone.
  2. Use optimal profile settings.
    When creating a profile, click on “Advanced” and choose a Medium size vocabulary and BestMatch IV. BestMatch V has performance issues with Natlink.
  3. Train your Dragon.
    Select “Audio” -> “Read text to improve accuracy” and do some readings say Dragon can adjust to your voice. Although Dragon 13 eliminated this step, as mentioned in the previous article, Dragon 12 is best for programming. In my experience, one or two readings are all Dragon needs and it’s diminishing returns after that.
  4. Speak in short phrases.
    Whole sentence dictation is better for natural language sentences (because Dragon is able to deduce from context which words you said), but spoken command chains (for which Dragon’s context is useless) require perfection and can’t be corrected with Dragon’s Select-and-Say.

There is no Programmer Edition

As you can see, Dragon can probably transcribe faster than you can type, and it’s useful for common office tasks and Internet browsing. That said, it isn’t intended for dictating code. In order to dictate the following JavaScript snippet, you would have to say,

“spell F O R space open paren I N T space I equals zero semicolon”

just to get to the first semicolon:

Obviously unworkable. You’d be better off trying to type it out with your feet. Dragon’s technology is geared toward recognizing spoken language syntax, not programming language syntax. (It is possible to create custom commands, including text commands, with Dragon’s Advanced Scripting if you buy the professional version, but the Advanced Scripting language is cumbersome, limited, and really not all that advanced.)

If it’s so terrible at dictating code, why am I recommending* it wholeheartedly?

Natlink

In 1999, Joel Gould created Natlink, an extension for Dragon which allows a user to create custom voice commands using Python rather than Advanced Scripting/ Visual Basic. This was huge. It meant that Dragon’s super-accurate word recognition could be chained to the expressive power of the entire Python language!

Gould went on to document and open-source his work. Natlink has been hosted at SourceForge since 2003 and has been maintained for some time by Quintijn Hoogenboom and a few others as new versions of Dragon have been released. At present, it gets about 300 downloads per month, and for the last few years, the number of downloads it’s gotten annually has increased by about 1000 per year. It’s used by a number of other popular voice software packages, including VoiceCoder, Vocola, Unimacro, and Dragonfly.

With Natlink, instead of spelling out almost every single character manually, you might make a macro which handles the creation of for-loops, and another which handles printing to the console. Then, in order to speak the above JavaScript snippet into existence, you would only say

for loop five, log I

and be finished with the whole thing. Quite an improvement, isn’t it?

Getting Started – Not Yet

In order to install Natlink, follow the instructions here. Then you can read the documentation here and here, and look at the sample macros here. But don’t bother with the Natlink docs and examples, at least not yet. Most likely, whatever you’re hoping to achieve can be done more cleanly and easily with one of the other software packages mentioned above, VoiceCoder, Vocola, Unimacro or Dragonfly. In the next article, I’ll talk about each of these, and some interesting projects in the Dragonfly family.

 

Due to app compatibility issues with Dragon 13, I decided to downgrade back to version 12.5, and so it is version 12.5 that I recommend, not version 13.