In the previous articles in this series, I covered the essentials of getting started and getting involved with voice programming, and some best practices. This time, I’m going to talk about an issue that comes up as you begin to create more complex command sets and grammars: grammar complexity and the dreaded BadGrammar error.

First, a little background. BadGrammar is a kind of error which occurs in voice software which plugs into Dragon NaturallySpeaking, including Natlink, Vocola, and Dragonfly. For a long time, I thought it was caused by adding a sufficiently high number of commands to a grammar. Then the subject came up in the Github Caster chat, and I decided to create some tests.

A Bit of Vocabulary

Before I describe the tests, I need to make sure we’re speaking the same language. For the purposes of this article, a command is a pairing of a spec and an action. A spec is a set of spoken words used to trigger an action. The action then, is some programmed behavior, such as automatic keypresses or executing Python code. Finally, a grammar is an object comprised of one or more commands which is passed  to the speech engine.

The Tests

The first test I did was to create a series of grammars with increasingly large sets of commands in them. I used a random word generator and a text formatter to create simple one-word commands which did nothing else but print their specs on the screen. All tests were done with Dragonfly, so the commands looked like the following.

After creating a (slow but usable) grammar with 3000 commands in it, my sufficiently-high-number-of-commands theory was shot. (Previously, I had gotten BadGrammar with about 500 commands.) Instead, it had to be, as Mark Lillibridge had speculated, complexity. So then, how much complexity was allowed before BadGrammar?

Fortunately, Dragonfly has a tool which measures the complexity of grammars. It returns its results in elements, which for our purposes here can be summarized as units of grammar complexity. There are many ways to increase the number of elements of a grammar, but the basic idea is, the more combinatorial possibility you introduce into your commands, the more elements there are (which should surprise no one). For example, the following rule with one Dragonfly Choice extra creates more elements than the above example, and adding either more choices (key/value pairs) to the Choice object or more Choice objects to the spec would create more still.

CCR grammars create exceptionally large numbers of elements because every command in a CCR grammar can be followed by itself and any other command in the grammar, up to the user-defined maximum number of times. The default maximum number of CCR repetitions (meaning the default number of commands you can speak in succession) in Dragonfly is 16.

With this in mind, I wrote a procedure which creates a series of increasingly complex grammars, scaling up first on number of Choices in a spec, then number of commands in a grammar, then max repetitions. (All tests were done with CCR grammars since they are the easiest to get BadGrammar with.)

The Results

The data turned up some interesting results. The most salient points which emerged were as follows:

  • The relationship between number of repetitions and number of elements which causes BadGrammar can be described by a formula. Roughly, if the number of max repetitions in a CCR grammar minus the square root of the elements divided by 1000 is greater than 23, you get BadGrammar.
  • These results are consistent across at least Dragon NaturallySpeaking versions 12 through 14 and BestMatch recognition algorithms II-V.
  • Multiple grammars can work together to produce BadGrammar. That is, if you have two grammars active which both use 55% of your max complexity, you still get the error.
  • Using Windows Speech Recognition as your speech recognition engine rather than Dragon NaturallySpeaking, you won’t get a BadGrammar error, but your complex grammars simply won’t work, and will slow down any other active grammars.

Implications

So what does all of this mean for you, the voice programmer? At the very least, it means that if you get BadGrammar, you can sacrifice some max repetitions in order to maintain your current complexity. (Let’s be honest, who speaks 16 commands in a row?) It also exposes the parameters for solutions and workarounds such as Caster’s NodeRule. It gives us another benchmark by which to judge the next Dragon NaturallySpeaking. Finally, it enables features like complexity warnings both at the development and user levels.

Grammars do have a complexity limit, but it’s a known quantity and can be dealt with as such.

In the prior two articles in this series, I went over the basics of getting started with voice programming, and talked a little bit about the history and community of it. In this article, I’m going to go over best practices.

Let me preface with this. Your personal command set and phonetic design are going to depend on a variety of factors: accent, programming environment and languages, disability (if any), usage style (assistance versus total replacement), etc. The following is a list of guidelines based mostly on my experiences. Your mileage may vary.

Use Command Chains

If I could only impart one of these to you, it would be to use continuous command recognition/ command sequences. Get Dragonfly or Vocola and learn how to set it up. (Dragonfly. — Vocola.) Speaking chains of commands is much faster and smoother than speaking individual commands with pauses in between. If you’re not convinced yet, watch Tavis Rudd do it.

Phonetic Distinctness Trumps All

When selecting words as spoken triggers (specs) for actions, keep in mind that Dragon must understand you, and unless you’re a professional news anchor, your pronunciation is probably less than perfect.

  • James Stout points out the use of prefix and suffix words on his blog, Hands-Free Coding. Though they do add syllables to the spec, they make the spec more phonetically distinct. An example of a prefix word might be, adding “fun” to the beginning of the name of a function you commonly use. Doing so also gets you in the habit of saying “fun” when a function is coming up, which believe it or not, is often enough time to think of the rest of the name of the function, allowing for an easy mental slide.

  • Use what you can pronounce. Don’t be afraid to steal words or phonemes from books or even other spoken languages. I personally think Korean is very easy on the tongue with its total lack of adjacent unvoiced consonants. Maybe you like German, or French.
  • Single syllable specs are okay, but if they’re not distinct enough, Dragon may mistakenly hear them as parts of other commands (especially in command chains). As a rule of thumb, low number of syllables is alright, low number of phonemes isn’t.

The Frequency Bump

When you speak sentences into Dragon, it uses a frequency/proximity algorithm to determine whether you said “ice cream” or “I scream”, etc. However, it works differently for words registered as command specs. Spec words get a major frequency bump and are recognized much more easily than words in normal dictation. Take advantage of this and let Dragon do the heavy lifting. Let me give you an example of what I mean.

Dragonfly’s Dictation element and Vocola’s <_anything> allow you to create commands which take a chunk of spoken text as a parameter. The following Dragonfly command prints “hello N” where N is whatever comes after the word “for”.

I’m going to refer to these sorts of commands as free-form commands. Given a choice between setting up the following Function action with free-form dictation via the Dictation element, or a set of choices via the Choice element, the Choice element is the far superior um… choice.

In this example, if you set up <parameter> as a Dictation element, Dragon can potentially mishear either “foo” or “bar”. If you set up <parameter> as a Choice element instead, all of the options in the Choice element (in this case, “foo” and “bar”) get registered as command words just like the phrase “do some action” does, and are therefore far more likely to be heard correctly by Dragon.

Anchor Words and the Free-Form Problem

Let’s say we have a free-form command, like the one mentioned above, and another command with the spec “splitter”. In this hypothetical situation, let’s also say they are both part of the same command chain grammar.

Usually, I would use the “splitter” command to print out the split function, but this time I want to create a variable called “splitter”. If I say “variable splitter”, nothing will happen. This is because, when Dragon parses the command chain, first it recognizes “variable”, then before it can get any text to feed to the “variable”command, the next command (“splitter”) closes off the dictation. This has the effect of crashing the entire command chain.

There are a few ways around this. The first is to simply give up on using free-form commands or specs with common words in command chains. Not a great solution. The second way is to use anchor words.

In this modified version of the command, “elephant” is being used as an anchor word, a word that tells Dragon “free-form dictation is finished at this point”. So here, I can say, “variable splitter elephant” to produce the text “var splitter”.

Despite the effectiveness of the second workaround, I find myself getting annoyed at having to say some phonetically distinct anchor word all the time, and often use another method: pluralizing the free-form Dictation element, then speaking a command for the backspace character immediately after. For example, to produce the text “var splitter”, I could also say, “variable splitters clear”. (“Clear” is backspace in Caster.)

I am working on a better solution to this problem and will update this article when I finish it.

Reusable Parts

On the Yahoo VoiceCoder group site, Mark Lillibridge proposes two categories for voice programmers, what he calls Type I and Type II. Type I optimize strongly for a very specific programming environment. Type II create more generic commands intended to work in a wide variety of environments. Along with Ben Meyer of VoiceCode.io, I fall into the latter category. My job has me switching between editors and languages constantly, so I try to use lots of commands like the following.

I also try to standardize spoken syntax between programming languages. It does take extra mental effort to program by voice, so the less specialized commands you have to learn across the different environments you work in, the better.

What About You?

That’s all I’ve got. Have any best practices or techniques of your own? Leave them in the comments; I’d love to hear them!

Last time, I talked about why Dragon NaturallySpeaking (version 12.5) is currently the best choice of speech engine for voice programming, and how Natlink ties Dragon to Python. Natlink is a great tool by itself, but it also enables lots of other great voice software packages, some of which are explicitly geared toward voice programming, and all of which also offer accessibility and productivity features. In this article, I’m going to go over the purposes and capabilities of a number of these, their dependencies, and how to get started with them.

I’m also going to rate each piece of software with a star rating out of five. () The rating indicates the software’s voice programming potential (“VPP”), not the overall quality of the software.

The Natlink Family

Unimacro, Vocola*, and VoiceCoder all require Natlink and are developed in close proximity with it, so I’ll start with them.

Unimacro (documentation/ download) [VPP: ]

Unimacro is an extension for Natlink which mainly focuses on global accessibility and productivity tasks, like navigating the desktop and web browser. It’s very configurable, requires minimal Python familiarity, and as of release 4.1mike, it has built-in support for triggering AutoHotkey scripts by voice.

Unimacro is geared toward accessibility, not voice programming. Still, it’s useful in a general sense and has potential if you’re an AutoHotkey master.

Vocola (documentation | download) [VPP: ]

Vocola is something of a sister project to Unimacro; their developers work together to keep them compatible. Like Unimacro, Vocola extends Dragon’s command set. Unlike Unimacro, which is modified through the use of .ini files, Vocola allows users to write their own commands. Vocola commands have their own syntax and look like the following.

When you start Dragon, Vocola .vcl files are converted to Python and loaded into Natlink. Vocola can also use “extensions”, Python modules full of functions which can be called from Vocola. These extensions are where Vocola’s real power lies. They allow you to map any piece of Python code to any spoken command.

Vocola’s syntax is easy to learn, it’s well-documented, and it gives you full access to Python, so it’s a powerful step up from Natlink. However, it’s not quite as easy to call Python with as Dragonfly is and its community became somewhat displaced when the SpeechComputing.com forum site went down. (Though it has regrouped some at the Knowbrainer.com forums.)

VoiceCoder (download) [VPP: ]

Also sometimes called VoiceCode (and in no way related to voicecode.io), VoiceCoder’s complex history is best documented by Christine Masuoka’s academic paper. VoiceCoder aims to make speaking code as fluid as possible, and its design is very impressive. (Demo video – audio starts late.)

For having been around for so long, it still has a fairly active community. Mark Lillibridge recently created this FAQ on programming by voice with the help of said community.

VoiceCoder does what it does extremely well, but that’s pretty much limited to coding in C/C++ or Python in Emacs. Furthermore, since its website and wiki are down, documentation is sparse.

Dragonfly (documentation | download | setup) [VPP: ]

In 2008, Christo Butcher created Dragonfly, a Python framework for speech recognition. In 2014, he moved it to Github, which has accelerated its development. The real strength of Dragonfly is its modernization of the voice command programming process. Dragonfly makes it easy for anyone with nominal Python experience to get started on adding voice commands to DNS or WSR. Install it, copy some of the examples into the /MacroSystem/ folder, and you’re ready to go. Here’s an example.

It’s pure Python and unlike vanilla Natlink, it’s API is very clean and simple. It’s well documented and it has a growing community on Github. Furthermore, the _multiedit.py module enables fine-grained control of CCR. (CCR is “continues command recognition”, the ability to string spoken commands together and have them executed one after the other, rather than pausing in between commands. Vocola and VoiceCoder can also use CCR, but not with as much flexibility.)

The Dragonfly Family

Due to Dragonfly’s ease of use, a large number of Dragonfly grammar repositories have appeared on Github recently, as well as some larger projects.

Aenea (Github) [VPP: ]

Aenea is software which allows you to use Dragonfly grammars on Linux or Mac via virtual machine (or own any remote computer for that matter). Getting it set up is pretty complicated but worth it if your workspace of choice (or necessity) is not a Windows machine. Aenea is basically a fully realized version of what Tavis Rudd demonstrated at Pycon.

Damselfly (Github) [VPP: ]

Damselfly lets you use Dragonfly on an installation of Dragon NaturallySpeaking on Wine on Linux, cutting Windows out of the picture entirely.

Caster (download | Github/ documentation) [VPP: ]

Like VoiceCoder, Caster aims to be a general purpose programming tool. Unlike VoiceCoder, Caster aims to work in any programming environment, not just Emacs. It improves upon _multiedit.py by letting you selectively enable/disable CCR subsections. (“Enable Java”, “Disable XML”, etc.) It allows you to run Sikuli scripts by voice. And, like VoiceCoder, it uses Veldicott-style fuzzy string matching to allow dictation of strange/unspeakable symbol names. It is still in active development and contributions or suggestions are welcome.

Some Final Thoughts on Open Source Voice Programming

The open source voice programming community has come along way since its days of obscure academic papers and mailing lists. Community spaces have developed and produced fruit. The projects listed above are downright awesome, all of them, and I’m sure more are coming. There is still much to be achieved, much low hanging fruit to be plucked even. All this has happened in fifteen years. I’m looking forward to the next fifteen.

 

* To be more precise, Vocola 2 requires Natlink. There are several versions, and the most recent version does not.