Between 2008 and 2013, I began to suffer from carpal tunnel syndrome. During that same time, I realized how much I love programming. When I saw Tavis Rudd’s Pycon talk, I thought to myself, “That’s awesome,” then, “I wonder if I can make that work.” I have, and I’ve learned a lot in the process. In this series of articles, I’m going to lay out what I’ve learned about voice programming, the software that enables it, and the growing community which uses and maintains that software.

Speech Recognition Engines

The first part of programming by voice is getting a computer to recognize what you’re saying in human readable words. For this, you need a speech recognition engine. There are a lot of voice-recognition projects for Linux. Windows Speech Recognition is decent. Google’s speech recognition is also coming along. But if you intend to program by voice, there’s really only one show in town.

Dragon NaturallySpeaking

Dragon NaturallySpeaking, by Nuance, is a Windows program which turns spoken audio into text. It uses the same technology which powers Apple’s Siri, and it’s available in quite a few languages, such as German, Spanish, Japanese, and of course, English. Dragon is king not because of its fancy features (posting to Twitter/Facebook, controlling Microsoft software, etc.), but simply because it’s a lot more accurate than the competition.

At this point in the article, I’d love to tell you all about how to use Dragon, and what it can do, but Nuance has already done an excellent job of providing instructional materials and videos, so I’ll simply link you to their stuff: Dragon PDF files, Dragon instructional YouTube channel.

Improving Performance

Although Dragon’s recognition is amazing out-of-the-box, there are a few things you can do to improve the speed and accuracy of recognition even more.

  1. Upgrade your hardware.
    Nuance recommends 8 GB of RAM, and users on the Knowbrainer forums have reported significant performance gains with better microprocessors and soundcards. Also, although Dragon supports many kinds of microphones (including your iPhone/Android!), recognition is generally better with a powered USB microphone.
  2. Use optimal profile settings.
    When creating a profile, click on “Advanced” and choose a Medium size vocabulary and BestMatch IV. BestMatch V has performance issues with Natlink.
  3. Train your Dragon.
    Select “Audio” -> “Read text to improve accuracy” and do some readings say Dragon can adjust to your voice. Although Dragon 13 eliminated this step, as mentioned in the previous article, Dragon 12 is best for programming. In my experience, one or two readings are all Dragon needs and it’s diminishing returns after that.
  4. Speak in short phrases.
    Whole sentence dictation is better for natural language sentences (because Dragon is able to deduce from context which words you said), but spoken command chains (for which Dragon’s context is useless) require perfection and can’t be corrected with Dragon’s Select-and-Say.

There is no Programmer Edition

As you can see, Dragon can probably transcribe faster than you can type, and it’s useful for common office tasks and Internet browsing. That said, it isn’t intended for dictating code. In order to dictate the following JavaScript snippet, you would have to say,

“spell F O R space open paren I N T space I equals zero semicolon”

just to get to the first semicolon:

Obviously unworkable. You’d be better off trying to type it out with your feet. Dragon’s technology is geared toward recognizing spoken language syntax, not programming language syntax. (It is possible to create custom commands, including text commands, with Dragon’s Advanced Scripting if you buy the professional version, but the Advanced Scripting language is cumbersome, limited, and really not all that advanced.)

If it’s so terrible at dictating code, why am I recommending* it wholeheartedly?

Natlink

In 1999, Joel Gould created Natlink, an extension for Dragon which allows a user to create custom voice commands using Python rather than Advanced Scripting/ Visual Basic. This was huge. It meant that Dragon’s super-accurate word recognition could be chained to the expressive power of the entire Python language!

Gould went on to document and open-source his work. Natlink has been hosted at SourceForge since 2003 and has been maintained for some time by Quintijn Hoogenboom and a few others as new versions of Dragon have been released. At present, it gets about 300 downloads per month, and for the last few years, the number of downloads it’s gotten annually has increased by about 1000 per year. It’s used by a number of other popular voice software packages, including VoiceCoder, Vocola, Unimacro, and Dragonfly.

With Natlink, instead of spelling out almost every single character manually, you might make a macro which handles the creation of for-loops, and another which handles printing to the console. Then, in order to speak the above JavaScript snippet into existence, you would only say

for loop five, log I

and be finished with the whole thing. Quite an improvement, isn’t it?

Getting Started – Not Yet

In order to install Natlink, follow the instructions here. Then you can read the documentation here and here, and look at the sample macros here. But don’t bother with the Natlink docs and examples, at least not yet. Most likely, whatever you’re hoping to achieve can be done more cleanly and easily with one of the other software packages mentioned above, VoiceCoder, Vocola, Unimacro or Dragonfly. In the next article, I’ll talk about each of these, and some interesting projects in the Dragonfly family.

 

Due to app compatibility issues with Dragon 13, I decided to downgrade back to version 12.5, and so it is version 12.5 that I recommend, not version 13.

10 Thoughts on “Introduction to Voice Programming, Part One: DNS + Natlink

  1. CannotStandChairs on March 19, 2016 at 4:57 am said:

    Dude, thank you so much for writing these tutorials! I got into the idea of using my copy of dragon to possibly let me code 2+ years ago (physical limitations prevent me from getting substantial experience at any one time coding, due to necessary breaks I must take). I had gotten frustrated and given up on the project, and pretty much forgot about it until a few weeks ago when I saw the video of Travis Rudd doing the amazing things he did on stage using only dragon, some third-party software, and a normal computer rig with microphone.
    I read the first three of these articles before I even started installing python. I figured that if I got to the point where the software was working properly, that is when the information in the fourth article would be much more applicable to me.
    Since it’s been about a year since you’ve written these articles, I was wondering if you have any suggestions in regards to what may have changed? What might be working better, and what might no longer work? If there are now any better open source programs, and finally if Dragon 2013 has caught up and will work sufficiently ( unfortunately, I’ve been running, 2013, although I just realized I have an old copy of Dragon 12 Home Edition!!!! Just figured it out right now!). Maybe I’ll backup/uninstall 2013, reinstall 12, uninstall the parts of Natlink/Vocola/Unimacro I got from here: http://qh.antenna.nl/unimacro/installation/installation.html and then reinstall that as well.
    I simply had not been able to get Natlink or any other window to open with Dragon 13. It has been very frustrating, but I know there must be a way.
    Thanks again, man, this is really been an inspiration for me to learn to program again (which is something I’ve wanted to do ever since I first started making stupid programs and little games using Basic back 15 to 20 years ago, haha). You’ve been a great resource for me; I will be sure to pass on the generosity to others!

    • I’m glad the tutorials helped you so much! 🙂

      To answer your question, not much has changed. The biggest two items are Natlink and Windows 10.

      The Natlink user directory for Dragonfly (if you intend to go that route) is new. Natlink 4.1mike does everything the old way, but as of 4.1oscar, your Dragonfly grammar files in are supposed to go in the UserDirectory folder. See the release notes on 4.1oscar for more details.

      Some people (myself included) are having trouble installing any version of Natlink on Windows 10. I have a work computer with a clean install of Windows 10 which I’m still trying to get working, and a home computer, which I installed Natlink on before upgrading, and works fine with Windows 10. When I figure out what’s wrong with the work computer, I’ll probably write an addendum to one of the articles, or a new article about troubleshooting a Windows 10 installation. Some people can install without a hitch though.

      • CannotStandChairs on March 20, 2016 at 1:57 am said:

        I finally got it working! I ended up switching to my Lenovo yoga 700 to install Python, NatLink, Vocola, and Unimacro from one of the sources you provided, http://qh.antenna.nl/unimacro/installation/installation.html

        After getting very frustrated trying to just follow the written instructions on his site, I gave up on trying to get everything to work on my older yoga (which was running Windows 8.1), and tried to do a completely fresh install of everything on my 700 ( which actually runs Windows 10!). I also ended up watching the YouTube video he supplied on his page that showed how to install and configure everything, and it worked! Perhaps you may want to give that a try to see if it will work on your Windows 10.

        You have no idea how excited I am that this is working! I have some familiarity with various programming languages, but Python is completely new to me and a lot of the Vocola and Unimacro stuff is complicated and hard to understand. I am learning slowly though! I’ve even gotten to make a few of my own very simple commands!!!

        Thank you again (so much!) for these articles you’ve written! And thank you for the thoughtful reply and answering my questions.

        I’ve been trying for a week to get this thing to work, and now that it’s running, I have such a high 😀

        • Happy to assist, glad you’re on your way. 🙂

          I know why it doesn’t work on one of my Windows 10 computers. It’s some kind of DLL problem which is going to be a pain to diagnose, but I’ll get there. It’s well outside the scope of Quintjin’s Natlink video, but I appreciate the thought.

  2. Jason Skowronski on March 29, 2016 at 4:27 pm said:

    David I just want to say thanks for writing this blog because my story is very similar to CannotStandChairs. I tried a few years ago and you had inspired me to try it again. The tools this time around our way better.

    I also got this working on Windows 10! I didn’t have any problems with NatLink installation using the bundled installer with 32-bit python. I had some issues with the dragonfly v.0.6.5 installer because it requires msvcr71.dll which is only available in the old .NET 1.1. I followed these instructions to install from source instead https://www.youtube.com/watch?v=iNAsV4pcnEA&list=PLV6JPhkq1x8JNM6Cw02M_cMueGmh9ma50&index=3

    I also want to say that on Dragon Home 13 v86 it appears they fixed the compatibility issue. It offered me an option after doing a fresh install to not use the dictation box. After I clicked no, it allowed me to dictate into chrome, putty and others without a popup. It also worked fine with Natlink and dragonfly.

    • Hi Jason. I’m glad I was able to help! I’ve just about cleared up my Windows 10 issues now, but thank you for the pointers. And yes, Dragon 13 and 14 aren’t as bad as I initially thought, though they are lacking Select-n-Say on most applications. Anyway, good luck. 🙂

  3. Hi,

    I don’t know if you could help, but I found the above explanation clear and concise and so I hope so. You write very well.

    My father in law has been using Dragon since it inception and does well with it, dictating his professional lectures and notes onto his PC and so I have seen how effective it can be.

    I have a friend in the States whose employer runs the family business, but who is suffering increasing hearing impairment. Taking telephone calls is a particular problem. It occurred to me that technology should be able to help by transcribing the words of callers in real time.

    There appears to be a free service in the States (I’m in the UK) that does provide captions to telephone conversations via the internet, if you buy a special phone. Before I suggest the guy goes down this route, however, I wanted to see if there is an alternative. Relying on the internet for transcription could be unreliable and, from the reviews I have seen, slow. Having a system inbuilt or attached to the telephone would be quicker and more responsive.

    I can’t see that Dragon offer this product. Do you know any manufacturer that does? It would be an obvious use for voice recognition and with an aging population would seem to offer market potential. Or am I missing something that would make current technology inadequate?

    Any advice or comments you have would be very welcome.

    Thanks,

    Naomi

    • Hi Naomi,

      Thank you for the compliments, and thanks for reaching out.

      Unfortunately, I do not think such a service exists yet, or at least I do not know of one. If you’ve seen your father in law dictating with Dragon, you’ve probably noticed that he speaks to Dragon in a certain ultra-clear style that he’s become accustomed to over the years. Dragon is great at recognizing the speech of experienced Dragon users, but not so great at recognizing the sort of speech you’d typically use with another human being.

      The closest thing I can think of to what you’re looking for is transcribed Google Voice voicemails. Info here. I know that’s a far cry from what your friend needs, but I just don’t think we’re there yet with voice recognition.

      Best,

      David

    • Hello Naomi,

      I have a Deaf sister and I help run my brother’s business which provides interpreting and captioning services. I’m also a Dragon user experimenting with some Natlink stuff (though I’m not a coder). It’s true, here in the US the FCC funds captioned telephone calls for the deaf and hard of hearing. You don’t have to have get the special land-line phone with display; you can use any smartphone or computer to see what the other person is saying. It’s not the internet that makes these captioned calls unreliable or slow — it’s because low-paid respeakers are doing the transcribing on the other end. They all work in call centers and their computers and respeaking skills are not always optimized for speed and accuracy. Your friend could try an automated captioning solution such as RogerVoice or Ava.me. Those are paid services (after a free trial). Text will appear much faster but accuracy will vary because it’s automated.

      If your friend is tech-savvy and up to experimenting, s/he could try her/his own free automated captioning solution: Open a blank Google Doc (https://docs.google.com/create), click Tools > Voice typing, click the mic icon so it starts listening, then click the webcam icon at the end of the URL address bar and switch the microphone input to Stereo Mix. That way it’s listening to whatever sound is coming out of the computer instead of the computer’s microphone. Then make or receive the phone call on the computer (using Google Hangouts for example). While on the call, tap the mic icon in the Google Doc and it will auto-transcribe whatever audio it hears.

      If it’s a really important phone call, it might be worth it to contract with a realtime captioner, the type of professional that provides realtime captioning for classroom lectures, TV newscasts, or court proceedings for example. That’s a service we provide, cost is $30/30 minutes. http://www.oneinterpreting.com

  4. Pingback: Does BestMatch V still have performance issues with Natlink in Dragon NaturallySpeaking? – 1OO Club

Leave a Reply

Your email address will not be published. Required fields are marked *

Post Navigation