In the previous articles in this series, I covered the essentials of getting started and getting involved with voice programming, and some best practices. This time, I’m going to talk about an issue that comes up as you begin to create more complex command sets and grammars: grammar complexity and the dreaded BadGrammar error.

First, a little background. BadGrammar is a kind of error which occurs in voice software which plugs into Dragon NaturallySpeaking, including Natlink, Vocola, and Dragonfly. For a long time, I thought it was caused by adding a sufficiently high number of commands to a grammar. Then the subject came up in the Github Caster chat, and I decided to create some tests.

A Bit of Vocabulary

Before I describe the tests, I need to make sure we’re speaking the same language. For the purposes of this article, a command is a pairing of a spec and an action. A spec is a set of spoken words used to trigger an action. The action then, is some programmed behavior, such as automatic keypresses or executing Python code. Finally, a grammar is an object comprised of one or more commands which is passed  to the speech engine.

The Tests

The first test I did was to create a series of grammars with increasingly large sets of commands in them. I used a random word generator and a text formatter to create simple one-word commands which did nothing else but print their specs on the screen. All tests were done with Dragonfly, so the commands looked like the following.

After creating a (slow but usable) grammar with 3000 commands in it, my sufficiently-high-number-of-commands theory was shot. (Previously, I had gotten BadGrammar with about 500 commands.) Instead, it had to be, as Mark Lillibridge had speculated, complexity. So then, how much complexity was allowed before BadGrammar?

Fortunately, Dragonfly has a tool which measures the complexity of grammars. It returns its results in elements, which for our purposes here can be summarized as units of grammar complexity. There are many ways to increase the number of elements of a grammar, but the basic idea is, the more combinatorial possibility you introduce into your commands, the more elements there are (which should surprise no one). For example, the following rule with one Dragonfly Choice extra creates more elements than the above example, and adding either more choices (key/value pairs) to the Choice object or more Choice objects to the spec would create more still.

CCR grammars create exceptionally large numbers of elements because every command in a CCR grammar can be followed by itself and any other command in the grammar, up to the user-defined maximum number of times. The default maximum number of CCR repetitions (meaning the default number of commands you can speak in succession) in Dragonfly is 16.

With this in mind, I wrote a procedure which creates a series of increasingly complex grammars, scaling up first on number of Choices in a spec, then number of commands in a grammar, then max repetitions. (All tests were done with CCR grammars since they are the easiest to get BadGrammar with.)

The Results

The data turned up some interesting results. The most salient points which emerged were as follows:

  • The relationship between number of repetitions and number of elements which causes BadGrammar can be described by a formula. Roughly, if the number of max repetitions in a CCR grammar minus the square root of the elements divided by 1000 is greater than 23, you get BadGrammar.
  • These results are consistent across at least Dragon NaturallySpeaking versions 12 through 14 and BestMatch recognition algorithms II-V.
  • Multiple grammars can work together to produce BadGrammar. That is, if you have two grammars active which both use 55% of your max complexity, you still get the error.
  • Using Windows Speech Recognition as your speech recognition engine rather than Dragon NaturallySpeaking, you won’t get a BadGrammar error, but your complex grammars simply won’t work, and will slow down any other active grammars.


So what does all of this mean for you, the voice programmer? At the very least, it means that if you get BadGrammar, you can sacrifice some max repetitions in order to maintain your current complexity. (Let’s be honest, who speaks 16 commands in a row?) It also exposes the parameters for solutions and workarounds such as Caster’s NodeRule. It gives us another benchmark by which to judge the next Dragon NaturallySpeaking. Finally, it enables features like complexity warnings both at the development and user levels.

Grammars do have a complexity limit, but it’s a known quantity and can be dealt with as such.

Last time, I talked about why Dragon NaturallySpeaking (version 12.5) is currently the best choice of speech engine for voice programming, and how Natlink ties Dragon to Python. Natlink is a great tool by itself, but it also enables lots of other great voice software packages, some of which are explicitly geared toward voice programming, and all of which also offer accessibility and productivity features. In this article, I’m going to go over the purposes and capabilities of a number of these, their dependencies, and how to get started with them.

I’m also going to rate each piece of software with a star rating out of five. () The rating indicates the software’s voice programming potential (“VPP”), not the overall quality of the software.

The Natlink Family

Unimacro, Vocola*, and VoiceCoder all require Natlink and are developed in close proximity with it, so I’ll start with them.

Unimacro (documentation/ download) [VPP: ]

Unimacro is an extension for Natlink which mainly focuses on global accessibility and productivity tasks, like navigating the desktop and web browser. It’s very configurable, requires minimal Python familiarity, and as of release 4.1mike, it has built-in support for triggering AutoHotkey scripts by voice.

Unimacro is geared toward accessibility, not voice programming. Still, it’s useful in a general sense and has potential if you’re an AutoHotkey master.

Vocola (documentation | download) [VPP: ]

Vocola is something of a sister project to Unimacro; their developers work together to keep them compatible. Like Unimacro, Vocola extends Dragon’s command set. Unlike Unimacro, which is modified through the use of .ini files, Vocola allows users to write their own commands. Vocola commands have their own syntax and look like the following.

When you start Dragon, Vocola .vcl files are converted to Python and loaded into Natlink. Vocola can also use “extensions”, Python modules full of functions which can be called from Vocola. These extensions are where Vocola’s real power lies. They allow you to map any piece of Python code to any spoken command.

Vocola’s syntax is easy to learn, it’s well-documented, and it gives you full access to Python, so it’s a powerful step up from Natlink. However, it’s not quite as easy to call Python with as Dragonfly is and its community became somewhat displaced when the forum site went down. (Though it has regrouped some at the forums.)

VoiceCoder (download) [VPP: ]

Also sometimes called VoiceCode (and in no way related to, VoiceCoder’s complex history is best documented by Christine Masuoka’s academic paper. VoiceCoder aims to make speaking code as fluid as possible, and its design is very impressive. (Demo video – audio starts late.)

For having been around for so long, it still has a fairly active community. Mark Lillibridge recently created this FAQ on programming by voice with the help of said community.

VoiceCoder does what it does extremely well, but that’s pretty much limited to coding in C/C++ or Python in Emacs. Furthermore, since its website and wiki are down, documentation is sparse.

Dragonfly (documentation | download | setup) [VPP: ]

In 2008, Christo Butcher created Dragonfly, a Python framework for speech recognition. In 2014, he moved it to Github, which has accelerated its development. The real strength of Dragonfly is its modernization of the voice command programming process. Dragonfly makes it easy for anyone with nominal Python experience to get started on adding voice commands to DNS or WSR. Install it, copy some of the examples into the /MacroSystem/ folder, and you’re ready to go. Here’s an example.

It’s pure Python and unlike vanilla Natlink, it’s API is very clean and simple. It’s well documented and it has a growing community on Github. Furthermore, the module enables fine-grained control of CCR. (CCR is “continues command recognition”, the ability to string spoken commands together and have them executed one after the other, rather than pausing in between commands. Vocola and VoiceCoder can also use CCR, but not with as much flexibility.)

The Dragonfly Family

Due to Dragonfly’s ease of use, a large number of Dragonfly grammar repositories have appeared on Github recently, as well as some larger projects.

Aenea (Github) [VPP: ]

Aenea is software which allows you to use Dragonfly grammars on Linux or Mac via virtual machine (or own any remote computer for that matter). Getting it set up is pretty complicated but worth it if your workspace of choice (or necessity) is not a Windows machine. Aenea is basically a fully realized version of what Tavis Rudd demonstrated at Pycon.

Damselfly (Github) [VPP: ]

Damselfly lets you use Dragonfly on an installation of Dragon NaturallySpeaking on Wine on Linux, cutting Windows out of the picture entirely.

Caster (download | Github/ documentation) [VPP: ]

Like VoiceCoder, Caster aims to be a general purpose programming tool. Unlike VoiceCoder, Caster aims to work in any programming environment, not just Emacs. It improves upon by letting you selectively enable/disable CCR subsections. (“Enable Java”, “Disable XML”, etc.) It allows you to run Sikuli scripts by voice. And, like VoiceCoder, it uses Veldicott-style fuzzy string matching to allow dictation of strange/unspeakable symbol names. It is still in active development and contributions or suggestions are welcome.

Some Final Thoughts on Open Source Voice Programming

The open source voice programming community has come along way since its days of obscure academic papers and mailing lists. Community spaces have developed and produced fruit. The projects listed above are downright awesome, all of them, and I’m sure more are coming. There is still much to be achieved, much low hanging fruit to be plucked even. All this has happened in fifteen years. I’m looking forward to the next fifteen.


* To be more precise, Vocola 2 requires Natlink. There are several versions, and the most recent version does not.

Between 2008 and 2013, I began to suffer from carpal tunnel syndrome. During that same time, I realized how much I love programming. When I saw Tavis Rudd’s Pycon talk, I thought to myself, “That’s awesome,” then, “I wonder if I can make that work.” I have, and I’ve learned a lot in the process. In this series of articles, I’m going to lay out what I’ve learned about voice programming, the software that enables it, and the growing community which uses and maintains that software.

Speech Recognition Engines

The first part of programming by voice is getting a computer to recognize what you’re saying in human readable words. For this, you need a speech recognition engine. There are a lot of voice-recognition projects for Linux. Windows Speech Recognition is decent. Google’s speech recognition is also coming along. But if you intend to program by voice, there’s really only one show in town.

Dragon NaturallySpeaking

Dragon NaturallySpeaking, by Nuance, is a Windows program which turns spoken audio into text. It uses the same technology which powers Apple’s Siri, and it’s available in quite a few languages, such as German, Spanish, Japanese, and of course, English. Dragon is king not because of its fancy features (posting to Twitter/Facebook, controlling Microsoft software, etc.), but simply because it’s a lot more accurate than the competition.

At this point in the article, I’d love to tell you all about how to use Dragon, and what it can do, but Nuance has already done an excellent job of providing instructional materials and videos, so I’ll simply link you to their stuff: Dragon PDF files, Dragon instructional YouTube channel.

Improving Performance

Although Dragon’s recognition is amazing out-of-the-box, there are a few things you can do to improve the speed and accuracy of recognition even more.

  1. Upgrade your hardware.
    Nuance recommends 8 GB of RAM, and users on the Knowbrainer forums have reported significant performance gains with better microprocessors and soundcards. Also, although Dragon supports many kinds of microphones (including your iPhone/Android!), recognition is generally better with a powered USB microphone.
  2. Use optimal profile settings.
    When creating a profile, click on “Advanced” and choose a Medium size vocabulary and BestMatch IV. BestMatch V has performance issues with Natlink.
  3. Train your Dragon.
    Select “Audio” -> “Read text to improve accuracy” and do some readings say Dragon can adjust to your voice. Although Dragon 13 eliminated this step, as mentioned in the previous article, Dragon 12 is best for programming. In my experience, one or two readings are all Dragon needs and it’s diminishing returns after that.
  4. Speak in short phrases.
    Whole sentence dictation is better for natural language sentences (because Dragon is able to deduce from context which words you said), but spoken command chains (for which Dragon’s context is useless) require perfection and can’t be corrected with Dragon’s Select-and-Say.

There is no Programmer Edition

As you can see, Dragon can probably transcribe faster than you can type, and it’s useful for common office tasks and Internet browsing. That said, it isn’t intended for dictating code. In order to dictate the following JavaScript snippet, you would have to say,

“spell F O R space open paren I N T space I equals zero semicolon”

just to get to the first semicolon:

Obviously unworkable. You’d be better off trying to type it out with your feet. Dragon’s technology is geared toward recognizing spoken language syntax, not programming language syntax. (It is possible to create custom commands, including text commands, with Dragon’s Advanced Scripting if you buy the professional version, but the Advanced Scripting language is cumbersome, limited, and really not all that advanced.)

If it’s so terrible at dictating code, why am I recommending* it wholeheartedly?


In 1999, Joel Gould created Natlink, an extension for Dragon which allows a user to create custom voice commands using Python rather than Advanced Scripting/ Visual Basic. This was huge. It meant that Dragon’s super-accurate word recognition could be chained to the expressive power of the entire Python language!

Gould went on to document and open-source his work. Natlink has been hosted at SourceForge since 2003 and has been maintained for some time by Quintijn Hoogenboom and a few others as new versions of Dragon have been released. At present, it gets about 300 downloads per month, and for the last few years, the number of downloads it’s gotten annually has increased by about 1000 per year. It’s used by a number of other popular voice software packages, including VoiceCoder, Vocola, Unimacro, and Dragonfly.

With Natlink, instead of spelling out almost every single character manually, you might make a macro which handles the creation of for-loops, and another which handles printing to the console. Then, in order to speak the above JavaScript snippet into existence, you would only say

for loop five, log I

and be finished with the whole thing. Quite an improvement, isn’t it?

Getting Started – Not Yet

In order to install Natlink, follow the instructions here. Then you can read the documentation here and here, and look at the sample macros here. But don’t bother with the Natlink docs and examples, at least not yet. Most likely, whatever you’re hoping to achieve can be done more cleanly and easily with one of the other software packages mentioned above, VoiceCoder, Vocola, Unimacro or Dragonfly. In the next article, I’ll talk about each of these, and some interesting projects in the Dragonfly family.


Due to app compatibility issues with Dragon 13, I decided to downgrade back to version 12.5, and so it is version 12.5 that I recommend, not version 13.