It’s easy to think about code execution on a single timeline. Code executes in the order it appears in the source. It’s harder to think about code which executes concurrently. What are the benefits? What are the trade-offs? What extra steps need consideration? When should you write concurrent code instead of single threaded code? In this article, I will attempt to answer these questions, and I will attempt to explain my answers as if I were talking to a five year old.

Basic Concept

In order to explain concurrency, let’s imagine that there is some large job to be done: counting toys in a house. Let’s also imagine that there’s a person in charge of doing this job. Let’s call her Tina. Tina is really good at counting toys, so she can count all the toys in a house in ten minutes. If she starts counting a house at 1:00, she’ll be done at 1:10.

Now let’s imagine that there are two houses full of toys. If Tina starts at 1:00, she’ll take ten minutes to count each house and be done at 1:20. If there are three houses, she’ll be done at 1:30.

Tina can only count one house at a time, and it always takes her ten minutes. But what if she has a friend help her? Tina has a friend named Tamara. Tamara can also count all the toys in a house in ten minutes. So, if there are two houses and they each start counting toys in a different house at 1:00, they’ll both be done at 1:10. If Tina did it by herself, she’d be done at 1:20, but with Tamara, she can finish at 1:10. It’s faster when she works with a friend.

Tina and Tamara are analogous to threads. Some jobs are divisible into parts. (Each house is a “part” here. In a computer, it might be processing a chunk of data.) When jobs are divisible, sometimes it makes sense to split the work among multiple threads in order to speed up the completion time overall. (Note that not ALL jobs are divisible. Think of baking a cake. Some things have to be done before the others. If you bake the ingredients before you mix them, no one will like the resulting cake. Baking and mixing simultaneously will also be a bad time for all. Part of what makes a job divisible is that the parts are order-independent.)

Synchronization and Immutability

Let’s add in a supervisor who constantly asks for updates on the toy counting, Tabatha. Tabatha shows up once and a while at unexpected times and wants to know the count of toys.

Let’s also state that Tina and Tamara have a strong distrust of their own short term memories and so when they count toys, they always use a piece of paper and a pencil, immediately marking the papers when they count a new toy. They both use the same method, writing a two digit number on the paper, and then erasing it when it needs to change. Before one of them even starts counting, she writes in “00” since that’s how many toys she has counted so far. Every time she finds a new toy, she erases the number on the page (one digit at a time) and writes in a new number. When she counts the first toy, she erases “00” and writes “01”. When she counts the second toy, she erases “01” and writes “02”.

Tabatha is amenable to this method. She used to count houses herself before becoming a supervisor, and used the same method back then.

Let’s imagine that there are 22 toys in the house that Tina is counting. Tina started counting at 1:00. Tabatha shows up at 1:09, when Tina is mostly done, and wants to see how many toys Tina has counted so far. On the paper, there is the number “19”. But at the moment Tabatha showed up, Tina counted the 20th toy. Tabatha reads slowly and Tina writes quickly. So, Tabatha reads the first digit of “19”, which is “1” for the tens digit. But before she reads the “9” in “19”, Tina changes both digits so that the number says “20”. Then Tabatha reads the “0” in “20”. Tabatha has read a “1” (in “19”) and a “0” (in “20”). She puts the two digits together and thinks that Tina has counted only 10 toys. Tabatha is now upset because she knows there is something wrong with this, but doesn’t know what happened. We’ll call this the “19/20 problem”.

Here’s an even worse situation. Tina and Tamara are counting their two respective houses, but today they are sharing one piece of paper. The paper says “20” because together, Tina and Tamara have counted 20 toys so far. Tina then counts 1 new toy and Tamara counts 1 new toy at the same time. They both go to write to the paper at the same time, but they don’t wait for each other. Tamara gets there first, sees the “20”, adds 1 in her head, and writes “21”. But before she finishes writing the “1” in “21”, Tina sees the “20”, adds 1 in her head, and writes “21”. Since Tamara started first, she finishes writing first. When Tina finishes writing, the paper says “21”, but it should say “22”. The paper is wrong and since Tina and Tamara are not keeping totals in their heads, the overall total will be wrong. We’ll call this the “21/22 problem”.

Both the 19/20 and 21/22 problems are the sorts of thing that can happen when two threads access the same region of memory at the same time. There are basically two solutions to this.

Synchronization is where everyone agrees in advance that only one person can be using the totals paper at the same time. This resolves the data integrity problems, but can also result in deadlocks if implemented incorrectly.

What is a deadlock? Suppose that in order to avoid the 21/22 scenario, Tina and Tamara agree that only one person can hold the pencil at a time, and the same for the paper: only one person at a time. This way, no one can make the mistake of looking at a partially written number. Suppose Tina picks up the pencil and at the same time, Tamara picks up the paper. Neither of them will be able to finish her own respective task, and they will wait for each other indefinitely because of their agreement. In this example, the deadlock could have been avoided by treating the paper and pencil as a single unit.

There is another problem with synchronization: it causes contention. What if Tabatha wants to know how many toys have been counted before Tina and Tamara are finished? Using synchronization (properly), only one person is allowed to read or write to a paper at a time. So, if Tabatha asks Tina and Tamara for their papers, they must both wait for Tabatha to finish doing her totals before they can start counting again. This wastes time.

Another solution is immutability. Instead of handing Tabatha their papers, Tina and Tamara each stop counting for a moment, make a(n ink) copy of their counting paper, and give it to Tabatha so she can do her totals. Making copies instead of using the originals allows Tina and Tamara to resume counting faster since they don’t have to wait for Tabatha to do her totals. Making the copies in ink instead of pencil is called immutability, and doing this ensures that the 21/22 and 19/20 -type errors don’t happen since no one is overwriting anything on the same paper. It also avoids deadlock, since again, no one is writing on the same paper.

In programmer-speak, any mutability should be isolated to a single process.

Trade-Offs

What if when Tina calls for Tamara, Tamara takes 60 minutes to arrive? In that case, it doesn’t make any sense to even call Tamara unless there are 8 or more houses, since Tina will have counted the first 6 by herself by the time Tamara gets there.

Likewise, what if making an ink copy uses a whole pen and Tina has to run to the store for a new pen each time Tabatha asks for an ink copy? Tabatha would be better off just waiting until the end to receive the subtotals unless there are very many houses.

There is overhead to setting up and maintaining threads. It is not always better to use many threads because the overhead may be heavier than the task itself.

Summary

Threads are like little people in a computer who can only do one sequential thing at a time. “Thread safety” means handling race conditions like 19/20 and 21/22 via synchronization or immutability. Neither solution is inherently better (as discussed in the trade-offs section). Determining which to use depends on the situation.

Different tasks are better off single threaded or multithreaded, depending on their characteristics. Tasks which are divisible, heavy enough to outweigh thread overhead, and order-independent are the best candidates for multithreading.

Concurrency can be a little bit tricky, but when done right, the rewards are substantial.

At AngularJS workshops and in React internet articles, I’ve often seen instruction on using a live-reload Node.js server. A live-reload server is one that scans the project path for changes to files, and if something has changed, rebuilds the project and restarts the service. Some of them are even able to refresh the browser, so that, mere seconds after you save a file, you see the page updated. It’s an effective way to work because you get to see your changes right away, without a bunch of intermediary clicks and delays.

In this article, I’m going to explain how to set up a Spring Boot project to run with a live-reload server.

Setting Up a Basic Spring Boot Project

initializr_setup

Head on over to Spring Initializr and generate a new Gradle project. (Gradle instead of Maven does matter in this case because we’re going to be using two special Gradle commands.) Select Web and DevTools in the dependencies box.

If you’re upgrading an old project, you’ll need to make sure that:

  • you’re using Gradle 2.5+
  • you have Spring Boot DevTools in your build.gradle file, in the dependencies section

Setting Up a Test Page

Extract the project folder and open it in your favorite text editor or IDE. Create a basic controller class that serves index.html.

Also create index.html in the src/main/resources/static directory.

At this point, you should be able to run the project and see index.html at localhost:8080.

Live Reloading

Now let’s set up live reloading.

  • If you have the project running, stop it.
  • Install the Live Reload browser extension on your browser of choice.
  • Open two terminals.
  • Navigate to the root directory of the project in each terminal.
  • Run the following two Gradle commands. Explanation / docs for them are here and here respectively.

gradle_continuous_build

gradle_bootrun

  • Open your browser and navigate to localhost:8080.
  • Click the Live Reload icon in order to enable it for the current page.

That’s it. Now if you edit index.html, you should see the changes reflected instantly!

Extra Notes

  • If you run the project in Eclipse with Spring Boot DevTools included, you will get a SilentExitException on start up. This is normal and harmless, but there is a workaround for it here.
  • Hot Reloading vs Live Reloading
  • If you’re serving authenticated content and need a session to persist through service reboots, try adding Spring Session support and Redis.

In the previous articles in this series, I covered the essentials of getting started and getting involved with voice programming, and some best practices. This time, I’m going to talk about an issue that comes up as you begin to create more complex command sets and grammars: grammar complexity and the dreaded BadGrammar error.

First, a little background. BadGrammar is a kind of error which occurs in voice software which plugs into Dragon NaturallySpeaking, including Natlink, Vocola, and Dragonfly. For a long time, I thought it was caused by adding a sufficiently high number of commands to a grammar. Then the subject came up in the Github Caster chat, and I decided to create some tests.

A Bit of Vocabulary

Before I describe the tests, I need to make sure we’re speaking the same language. For the purposes of this article, a command is a pairing of a spec and an action. A spec is a set of spoken words used to trigger an action. The action then, is some programmed behavior, such as automatic keypresses or executing Python code. Finally, a grammar is an object comprised of one or more commands which is passed  to the speech engine.

The Tests

The first test I did was to create a series of grammars with increasingly large sets of commands in them. I used a random word generator and a text formatter to create simple one-word commands which did nothing else but print their specs on the screen. All tests were done with Dragonfly, so the commands looked like the following.

After creating a (slow but usable) grammar with 3000 commands in it, my sufficiently-high-number-of-commands theory was shot. (Previously, I had gotten BadGrammar with about 500 commands.) Instead, it had to be, as Mark Lillibridge had speculated, complexity. So then, how much complexity was allowed before BadGrammar?

Fortunately, Dragonfly has a tool which measures the complexity of grammars. It returns its results in elements, which for our purposes here can be summarized as units of grammar complexity. There are many ways to increase the number of elements of a grammar, but the basic idea is, the more combinatorial possibility you introduce into your commands, the more elements there are (which should surprise no one). For example, the following rule with one Dragonfly Choice extra creates more elements than the above example, and adding either more choices (key/value pairs) to the Choice object or more Choice objects to the spec would create more still.

CCR grammars create exceptionally large numbers of elements because every command in a CCR grammar can be followed by itself and any other command in the grammar, up to the user-defined maximum number of times. The default maximum number of CCR repetitions (meaning the default number of commands you can speak in succession) in Dragonfly is 16.

With this in mind, I wrote a procedure which creates a series of increasingly complex grammars, scaling up first on number of Choices in a spec, then number of commands in a grammar, then max repetitions. (All tests were done with CCR grammars since they are the easiest to get BadGrammar with.)

The Results

The data turned up some interesting results. The most salient points which emerged were as follows:

  • The relationship between number of repetitions and number of elements which causes BadGrammar can be described by a formula. Roughly, if the number of max repetitions in a CCR grammar minus the square root of the elements divided by 1000 is greater than 23, you get BadGrammar.
  • These results are consistent across at least Dragon NaturallySpeaking versions 12 through 14 and BestMatch recognition algorithms II-V.
  • Multiple grammars can work together to produce BadGrammar. That is, if you have two grammars active which both use 55% of your max complexity, you still get the error.
  • Using Windows Speech Recognition as your speech recognition engine rather than Dragon NaturallySpeaking, you won’t get a BadGrammar error, but your complex grammars simply won’t work, and will slow down any other active grammars.

Implications

So what does all of this mean for you, the voice programmer? At the very least, it means that if you get BadGrammar, you can sacrifice some max repetitions in order to maintain your current complexity. (Let’s be honest, who speaks 16 commands in a row?) It also exposes the parameters for solutions and workarounds such as Caster’s NodeRule. It gives us another benchmark by which to judge the next Dragon NaturallySpeaking. Finally, it enables features like complexity warnings both at the development and user levels.

Grammars do have a complexity limit, but it’s a known quantity and can be dealt with as such.

In the prior two articles in this series, I went over the basics of getting started with voice programming, and talked a little bit about the history and community of it. In this article, I’m going to go over best practices.

Let me preface with this. Your personal command set and phonetic design are going to depend on a variety of factors: accent, programming environment and languages, disability (if any), usage style (assistance versus total replacement), etc. The following is a list of guidelines based mostly on my experiences. Your mileage may vary.

Use Command Chains

If I could only impart one of these to you, it would be to use continuous command recognition/ command sequences. Get Dragonfly or Vocola and learn how to set it up. (Dragonfly. — Vocola.) Speaking chains of commands is much faster and smoother than speaking individual commands with pauses in between. If you’re not convinced yet, watch Tavis Rudd do it.

Phonetic Distinctness Trumps All

When selecting words as spoken triggers (specs) for actions, keep in mind that Dragon must understand you, and unless you’re a professional news anchor, your pronunciation is probably less than perfect.

  • James Stout points out the use of prefix and suffix words on his blog, Hands-Free Coding. Though they do add syllables to the spec, they make the spec more phonetically distinct. An example of a prefix word might be, adding “fun” to the beginning of the name of a function you commonly use. Doing so also gets you in the habit of saying “fun” when a function is coming up, which believe it or not, is often enough time to think of the rest of the name of the function, allowing for an easy mental slide.

  • Use what you can pronounce. Don’t be afraid to steal words or phonemes from books or even other spoken languages. I personally think Korean is very easy on the tongue with its total lack of adjacent unvoiced consonants. Maybe you like German, or French.
  • Single syllable specs are okay, but if they’re not distinct enough, Dragon may mistakenly hear them as parts of other commands (especially in command chains). As a rule of thumb, low number of syllables is alright, low number of phonemes isn’t.

The Frequency Bump

When you speak sentences into Dragon, it uses a frequency/proximity algorithm to determine whether you said “ice cream” or “I scream”, etc. However, it works differently for words registered as command specs. Spec words get a major frequency bump and are recognized much more easily than words in normal dictation. Take advantage of this and let Dragon do the heavy lifting. Let me give you an example of what I mean.

Dragonfly’s Dictation element and Vocola’s <_anything> allow you to create commands which take a chunk of spoken text as a parameter. The following Dragonfly command prints “hello N” where N is whatever comes after the word “for”.

I’m going to refer to these sorts of commands as free-form commands. Given a choice between setting up the following Function action with free-form dictation via the Dictation element, or a set of choices via the Choice element, the Choice element is the far superior um… choice.

In this example, if you set up <parameter> as a Dictation element, Dragon can potentially mishear either “foo” or “bar”. If you set up <parameter> as a Choice element instead, all of the options in the Choice element (in this case, “foo” and “bar”) get registered as command words just like the phrase “do some action” does, and are therefore far more likely to be heard correctly by Dragon.

Anchor Words and the Free-Form Problem

Let’s say we have a free-form command, like the one mentioned above, and another command with the spec “splitter”. In this hypothetical situation, let’s also say they are both part of the same command chain grammar.

Usually, I would use the “splitter” command to print out the split function, but this time I want to create a variable called “splitter”. If I say “variable splitter”, nothing will happen. This is because, when Dragon parses the command chain, first it recognizes “variable”, then before it can get any text to feed to the “variable”command, the next command (“splitter”) closes off the dictation. This has the effect of crashing the entire command chain.

There are a few ways around this. The first is to simply give up on using free-form commands or specs with common words in command chains. Not a great solution. The second way is to use anchor words.

In this modified version of the command, “elephant” is being used as an anchor word, a word that tells Dragon “free-form dictation is finished at this point”. So here, I can say, “variable splitter elephant” to produce the text “var splitter”.

Despite the effectiveness of the second workaround, I find myself getting annoyed at having to say some phonetically distinct anchor word all the time, and often use another method: pluralizing the free-form Dictation element, then speaking a command for the backspace character immediately after. For example, to produce the text “var splitter”, I could also say, “variable splitters clear”. (“Clear” is backspace in Caster.)

I am working on a better solution to this problem and will update this article when I finish it.

Reusable Parts

On the Yahoo VoiceCoder group site, Mark Lillibridge proposes two categories for voice programmers, what he calls Type I and Type II. Type I optimize strongly for a very specific programming environment. Type II create more generic commands intended to work in a wide variety of environments. Along with Ben Meyer of VoiceCode.io, I fall into the latter category. My job has me switching between editors and languages constantly, so I try to use lots of commands like the following.

I also try to standardize spoken syntax between programming languages. It does take extra mental effort to program by voice, so the less specialized commands you have to learn across the different environments you work in, the better.

What About You?

That’s all I’ve got. Have any best practices or techniques of your own? Leave them in the comments; I’d love to hear them!

UPDATE 3/26/2017: Saltbot now has its own subreddit, /r/saltbot

UPDATE 8/14/2016: Since the last update, Reconman has taken over maintenance and upgrades of Saltbot. Because he has added a lot of features, I am updating this article again.

UPDATE 5/16/2015: Chrome updated their App policy, so if you want to install Saltbot, you have to do so from the App Store.

Due to a recent surge of interest in Saltbot, the betting bot I created for Saltybet.com, I’ve decided to write this guide detailing its use, give its interface a facelift, and make available on Github a substantial but dated chunk of data which I gathered and used to develop the bot.

I’m going to go through its features in the order in which they appear in the UI. First then, are the four modes.

Betting Modes

Saltbot has four different modes which determine its basic behavior: Monk, Scientist, Cowboy, and Lunatic. The names are just for fun. Here’s how they work.

  1. Monk
    All four modes recorded match information after every match. Monk records information only, and doesn’t place bets.
  2. Scientist
    Scientist is the most accurate of the four modes. It uses all available information gathered from past matches to create a confidence score for each upcoming character. That score is then used to determine the selection and the betting amount. When determining its betting amount, it applies the confidence score to a flat amount which is itself determined by your total winnings meeting certain thresholds. Scientist requires about 5000 recorded matches to be usable.
    It also requires an evolved “chromosome” for its genetic algorithm to be effective. See the “Chromosome Management” section below.
  3. Cowboy
    Cowboy is a dumber (or more focused) version of Scientist. It only takes win percentage into account when making its selection. Also, unlike Scientist and Lunatic, it bets based on a percentage of your total winnings, not a flat amount based on a winnings threshold.
  4. Lunatic
    Lunatic doesn’t use stats at all. It flips a coin to determine its selection, then bets flat amounts, again, based on your winnings reaching certain thresholds.

For any of the modes to work, you have to be logged into a Salty Bet account. If you press F12, you can see their logic and messages in the developer console.

I should mention that the first match out of every hundred will be recorded with some information missing due to the auto refresh feature and the way in which the information is collected. This has little bearing on accuracy because the information which goes missing isn’t very important. However, if you close the Twitch window which the extension launches, lots of information will be missing. Leave it open.

Chromosome Management

In order to get started using Scientist, you have to first initialize the chromosome pool by clicking the “Reset Pool” button, and then setting the chromosome evolution in motion by clicking the “Update Genetic Weights” button. The genetic algorithm running will freeze Saltbot’s UI until you switch tabs or click off of it. For best results, you should let the genetic algorithm run for at least fifty generations.

Between rounds of evolution, the messages box will be updated with three pieces of information: “g”, the generation number for this round of evolution (closing the extension resets this counter but doesn’t reset the pool); the current best chromosome’s accuracy when applied to all recorded matches; and the current best chromosome’s approximate winnings when applied to all recorded matches.

If you like, while the genetic algorithm is running, you can open the extension’s background window and watch the chromosomes evolve in the background window developer console by right clicking any part of the extension UI, and selecting “Inspect Element”. Maybe this is really nerdy, but during the development of this bot, I came to enjoy watching the chromosomes more than the matches.

Records

If you would like to make a copy of your database for backup or analysis, or share your records with your friends, you can use the import and export buttons to do so.

Options

Presently there is only one option: Toggle Video. This is intended for low-bandwidth users or users who wish to let Saltbot bet in a background tab and therefore don’t need the video panel consuming resources.

Betting Controls

In the two years since its creation, many Salty Bettors have turned up at the Github page and asked for more granular control of the automated betting. Reconman responded by adding the Betting Controls section and some options on the Configuration menu. (See below for Configuration menu details.)

  • The “aggressive betting up to” control allows you to multiply bets by 10 until the specified cash threshold is reached. (Not active during tournaments.)
  • The “stop betting at” control lets you stop bets after the specified cash threshold is reached.
  • The “betting multiplier” control lets you increase or decrease all bets by up to an of magnitude. This feature stacks with the “aggressive betting up to” control, but does not stop at the “aggressive betting up to” threshold. (Not active during tournaments.)

New Features

Reconman has added a lot of new features since the original bot was written. They are as follows.

New: Character Database

Capture1

The character database is accessed by clicking the grid icon at the top of the SaltBot UI. You can use it to view the raw data that Scientist and Cowboy modes use to make their decisions. You can also search by character name. The characters in the character database come from the character data that you collect each match, and any data you upload to the bot.

(Notes: The “strategy” column records which mode was active for that match. Monk = “obs”, Scientist=”cs”, Cowboy=”rc”, and Lunatic=”ipu”. In the “winner” column 0 means red and 1 means blue.)

New: Configuration

There is only so much space on the SaltBot UI, and so some items have been moved to the Configuration menu. The Configuration menu can be accessed via the gear icon.  Capture3

  • Exhibition Betting Toggle: Some players think that betting on Exhibitions mode at all is inherently too random/ risky and would prefer the bot not to bet on them at all. Bets on Exhibitions mode matches can be toggled off via this menu option.
  • Tournament Options: There are settings to stop Saltbot from betting in tournaments after a certain cash threshold has been reached, and to always all-in or not.
  • Player Rankings: This used to be displayed in the F12 developer console, but now has a much cleaner-looking display on the Configuration page. SaltBot tracks player data as well as character data. The most frequent bettors’ betting stats can be viewed via these buttons.

New: Help and Github links

The question mark and Github icons lead here and to SaltBot’s Github pages, respectively. If it’s not apparent from the comments below, I no longer maintain SaltBot and so questions and concerns should be directed to the Github page where Reconman and a few others work on it. Also, Reconman is very patient, but for his sanity, if you have a bug you’d like to report, please read the bug reporting guide. It’s short.

Getting Started

To install the available historical data, download “65k records without exhibitions June 2016.txt” from Github, or one of the other seed data files, and import it with the “Import Records” button. (Alternatively, you can let Monk mode gather your own data for you for a while.) From there, you can switch to Scientist mode and SaltBot will take over for you. Happy betting!

Last time, I talked about why Dragon NaturallySpeaking (version 12.5) is currently the best choice of speech engine for voice programming, and how Natlink ties Dragon to Python. Natlink is a great tool by itself, but it also enables lots of other great voice software packages, some of which are explicitly geared toward voice programming, and all of which also offer accessibility and productivity features. In this article, I’m going to go over the purposes and capabilities of a number of these, their dependencies, and how to get started with them.

I’m also going to rate each piece of software with a star rating out of five. () The rating indicates the software’s voice programming potential (“VPP”), not the overall quality of the software.

The Natlink Family

Unimacro, Vocola*, and VoiceCoder all require Natlink and are developed in close proximity with it, so I’ll start with them.

Unimacro (documentation/ download) [VPP: ]

Unimacro is an extension for Natlink which mainly focuses on global accessibility and productivity tasks, like navigating the desktop and web browser. It’s very configurable, requires minimal Python familiarity, and as of release 4.1mike, it has built-in support for triggering AutoHotkey scripts by voice.

Unimacro is geared toward accessibility, not voice programming. Still, it’s useful in a general sense and has potential if you’re an AutoHotkey master.

Vocola (documentation | download) [VPP: ]

Vocola is something of a sister project to Unimacro; their developers work together to keep them compatible. Like Unimacro, Vocola extends Dragon’s command set. Unlike Unimacro, which is modified through the use of .ini files, Vocola allows users to write their own commands. Vocola commands have their own syntax and look like the following.

When you start Dragon, Vocola .vcl files are converted to Python and loaded into Natlink. Vocola can also use “extensions”, Python modules full of functions which can be called from Vocola. These extensions are where Vocola’s real power lies. They allow you to map any piece of Python code to any spoken command.

Vocola’s syntax is easy to learn, it’s well-documented, and it gives you full access to Python, so it’s a powerful step up from Natlink. However, it’s not quite as easy to call Python with as Dragonfly is and its community became somewhat displaced when the SpeechComputing.com forum site went down. (Though it has regrouped some at the Knowbrainer.com forums.)

VoiceCoder (download) [VPP: ]

Also sometimes called VoiceCode (and in no way related to voicecode.io), VoiceCoder’s complex history is best documented by Christine Masuoka’s academic paper. VoiceCoder aims to make speaking code as fluid as possible, and its design is very impressive. (Demo video – audio starts late.)

For having been around for so long, it still has a fairly active community. Mark Lillibridge recently created this FAQ on programming by voice with the help of said community.

VoiceCoder does what it does extremely well, but that’s pretty much limited to coding in C/C++ or Python in Emacs. Furthermore, since its website and wiki are down, documentation is sparse.

Dragonfly (documentation | download | setup) [VPP: ]

In 2008, Christo Butcher created Dragonfly, a Python framework for speech recognition. In 2014, he moved it to Github, which has accelerated its development. The real strength of Dragonfly is its modernization of the voice command programming process. Dragonfly makes it easy for anyone with nominal Python experience to get started on adding voice commands to DNS or WSR. Install it, copy some of the examples into the /MacroSystem/ folder, and you’re ready to go. Here’s an example.

It’s pure Python and unlike vanilla Natlink, it’s API is very clean and simple. It’s well documented and it has a growing community on Github. Furthermore, the _multiedit.py module enables fine-grained control of CCR. (CCR is “continues command recognition”, the ability to string spoken commands together and have them executed one after the other, rather than pausing in between commands. Vocola and VoiceCoder can also use CCR, but not with as much flexibility.)

The Dragonfly Family

Due to Dragonfly’s ease of use, a large number of Dragonfly grammar repositories have appeared on Github recently, as well as some larger projects.

Aenea (Github) [VPP: ]

Aenea is software which allows you to use Dragonfly grammars on Linux or Mac via virtual machine (or own any remote computer for that matter). Getting it set up is pretty complicated but worth it if your workspace of choice (or necessity) is not a Windows machine. Aenea is basically a fully realized version of what Tavis Rudd demonstrated at Pycon.

Damselfly (Github) [VPP: ]

Damselfly lets you use Dragonfly on an installation of Dragon NaturallySpeaking on Wine on Linux, cutting Windows out of the picture entirely.

Caster (download | Github/ documentation) [VPP: ]

Like VoiceCoder, Caster aims to be a general purpose programming tool. Unlike VoiceCoder, Caster aims to work in any programming environment, not just Emacs. It improves upon _multiedit.py by letting you selectively enable/disable CCR subsections. (“Enable Java”, “Disable XML”, etc.) It allows you to run Sikuli scripts by voice. And, like VoiceCoder, it uses Veldicott-style fuzzy string matching to allow dictation of strange/unspeakable symbol names. It is still in active development and contributions or suggestions are welcome.

Some Final Thoughts on Open Source Voice Programming

The open source voice programming community has come along way since its days of obscure academic papers and mailing lists. Community spaces have developed and produced fruit. The projects listed above are downright awesome, all of them, and I’m sure more are coming. There is still much to be achieved, much low hanging fruit to be plucked even. All this has happened in fifteen years. I’m looking forward to the next fifteen.

 

* To be more precise, Vocola 2 requires Natlink. There are several versions, and the most recent version does not.

Between 2008 and 2013, I began to suffer from carpal tunnel syndrome. During that same time, I realized how much I love programming. When I saw Tavis Rudd’s Pycon talk, I thought to myself, “That’s awesome,” then, “I wonder if I can make that work.” I have, and I’ve learned a lot in the process. In this series of articles, I’m going to lay out what I’ve learned about voice programming, the software that enables it, and the growing community which uses and maintains that software.

Speech Recognition Engines

The first part of programming by voice is getting a computer to recognize what you’re saying in human readable words. For this, you need a speech recognition engine. There are a lot of voice-recognition projects for Linux. Windows Speech Recognition is decent. Google’s speech recognition is also coming along. But if you intend to program by voice, there’s really only one show in town.

Dragon NaturallySpeaking

Dragon NaturallySpeaking, by Nuance, is a Windows program which turns spoken audio into text. It uses the same technology which powers Apple’s Siri, and it’s available in quite a few languages, such as German, Spanish, Japanese, and of course, English. Dragon is king not because of its fancy features (posting to Twitter/Facebook, controlling Microsoft software, etc.), but simply because it’s a lot more accurate than the competition.

At this point in the article, I’d love to tell you all about how to use Dragon, and what it can do, but Nuance has already done an excellent job of providing instructional materials and videos, so I’ll simply link you to their stuff: Dragon PDF files, Dragon instructional YouTube channel.

Improving Performance

Although Dragon’s recognition is amazing out-of-the-box, there are a few things you can do to improve the speed and accuracy of recognition even more.

  1. Upgrade your hardware.
    Nuance recommends 8 GB of RAM, and users on the Knowbrainer forums have reported significant performance gains with better microprocessors and soundcards. Also, although Dragon supports many kinds of microphones (including your iPhone/Android!), recognition is generally better with a powered USB microphone.
  2. Use optimal profile settings.
    When creating a profile, click on “Advanced” and choose a Medium size vocabulary and BestMatch IV. BestMatch V has performance issues with Natlink.
  3. Train your Dragon.
    Select “Audio” -> “Read text to improve accuracy” and do some readings say Dragon can adjust to your voice. Although Dragon 13 eliminated this step, as mentioned in the previous article, Dragon 12 is best for programming. In my experience, one or two readings are all Dragon needs and it’s diminishing returns after that.
  4. Speak in short phrases.
    Whole sentence dictation is better for natural language sentences (because Dragon is able to deduce from context which words you said), but spoken command chains (for which Dragon’s context is useless) require perfection and can’t be corrected with Dragon’s Select-and-Say.

There is no Programmer Edition

As you can see, Dragon can probably transcribe faster than you can type, and it’s useful for common office tasks and Internet browsing. That said, it isn’t intended for dictating code. In order to dictate the following JavaScript snippet, you would have to say,

“spell F O R space open paren I N T space I equals zero semicolon”

just to get to the first semicolon:

Obviously unworkable. You’d be better off trying to type it out with your feet. Dragon’s technology is geared toward recognizing spoken language syntax, not programming language syntax. (It is possible to create custom commands, including text commands, with Dragon’s Advanced Scripting if you buy the professional version, but the Advanced Scripting language is cumbersome, limited, and really not all that advanced.)

If it’s so terrible at dictating code, why am I recommending* it wholeheartedly?

Natlink

In 1999, Joel Gould created Natlink, an extension for Dragon which allows a user to create custom voice commands using Python rather than Advanced Scripting/ Visual Basic. This was huge. It meant that Dragon’s super-accurate word recognition could be chained to the expressive power of the entire Python language!

Gould went on to document and open-source his work. Natlink has been hosted at SourceForge since 2003 and has been maintained for some time by Quintijn Hoogenboom and a few others as new versions of Dragon have been released. At present, it gets about 300 downloads per month, and for the last few years, the number of downloads it’s gotten annually has increased by about 1000 per year. It’s used by a number of other popular voice software packages, including VoiceCoder, Vocola, Unimacro, and Dragonfly.

With Natlink, instead of spelling out almost every single character manually, you might make a macro which handles the creation of for-loops, and another which handles printing to the console. Then, in order to speak the above JavaScript snippet into existence, you would only say

for loop five, log I

and be finished with the whole thing. Quite an improvement, isn’t it?

Getting Started – Not Yet

In order to install Natlink, follow the instructions here. Then you can read the documentation here and here, and look at the sample macros here. But don’t bother with the Natlink docs and examples, at least not yet. Most likely, whatever you’re hoping to achieve can be done more cleanly and easily with one of the other software packages mentioned above, VoiceCoder, Vocola, Unimacro or Dragonfly. In the next article, I’ll talk about each of these, and some interesting projects in the Dragonfly family.

 

Due to app compatibility issues with Dragon 13, I decided to downgrade back to version 12.5, and so it is version 12.5 that I recommend, not version 13.

In 2013, my carpal tunnel was beginning to become unbearable. Every day, I would come home with wrists burning and fingertips tingling. In an effort to alleviate my symptoms, I started to try alternate input hardware. In this article, I will describe my experiences with some of these alternate setups.

The “Minority Report” Setup

Using CamSpace and the finger of a glove which I had colored bright green, I put together a sort of ghetto Minority Report interface. Cool as it was, there were two major problems. The first was that it was annoying to hold my hand up in front of the camera constantly. The second is that it was really sensitive to changes in lighting and angle of my hand.

Optical Finger Mouse

My thought with this one was that maybe I could casually wave my hand over the desk rather than gripping a mouse. It didn’t work out that way. The parts are rather cheap, and just like the older generation of optical mice, the laser has to sit flat on the table, preferably on a mousepad, preferably which is dark-colored and not reflective.

One Finger Mouse

This one promises to free your hand from the desk altogether, and it does, but unless you have child sized hands, you’re going to strain your thumb trying to reach the trackball (and use it in general). Furthermore, the trackball gets dirty easily and starts sticking, requiring further thumb effort.

Evoluent VerticalMouse

The Evoluent was actually pretty good. It’s probably the best version of a wired old-style mouse that will ever exist. It did allow me to stop pronating my wrist. I ultimately rejected it because (like all mice), it sat far off to the side of my keyboard, requiring me to reach out quite far in order to move the cursor from left to right on the screen.

Kensington Expert Mouse

I decided to try a ball mouse. Looking around on Amazon, the Kensington Expert Mouse seemed to get consistently good reviews, so it was the one I ordered. I thought it would be ridiculously difficult to use, but it wasn’t. (I played all three Mass Effect games with it.) It did cause me to pronate my wrist some, but the pronation was far less than any other mouse except for the Evoluent. Since a ball mouse requires a smaller surface area, I also didn’t have to reach for it. While not perfect, I found it to be easier on my hands than all of the other mice I’d tried. I own two now, in case one breaks. Another bonus is that you can hold it in two hands like a Dreamcast controller, in handshake position, which is about as ergonomic as it gets.

Ergo Touchpad

This thing seemed like it had a lot of potential. If I could mount it anywhere, I could figure out a position which didn’t hurt my hands. I got pretty creative, but in the end, the Ergo Touchpad made both my wrist and my fingers hurt (as opposed to just my wrists).

EyeTech EyeOn

While technically impressive, the problem with eye tracking seems to be human eyes. The EyeOn tracked my eye movements very accurately, but human eyes flicker all over the place instead of settling on their targets like mice do. It’s possible that they might fix that in the software eventually, but as of late 2013, it was pretty unusable.

Webcam Eye Tracking Software

I tried a few different eye tracking software packages too. None of them worked as well as the EyeTech did, which is to say, they were all pretty horrible.

Leap Motion

When I saw the video for the Leap Motion, I got excited. When I tried it, I lost that excitement. It’s a nice toy, nothing more.

Dell E2014T Touch Screen LED-Lit Monitor

Initially I dismissed the possibility of using a touchscreen monitor because I figured reaching for it wouldn’t work, and I still think so, if you’re sitting down. However, because I later wanted a set up in which I could switch between standing and sitting, I purchased the E2014T to use while standing up, and for that it works quite well. My one complaint is that if you don’t touch the screen for a few minutes, it seems to fall asleep and the next few times you touch the screen, it is unresponsive. I often find myself tapping the screen until I see the little tap recognition animation and then going on to do the real tap. Still, it gives my thumbs a break from the trackball.

Microsoft Natural Keyboard Elite

The other ergonomic keyboards I tried, with the exception of the Kinesis, aren’t even worth mentioning here. In terms of comfort, the Natural Keyboard Elite was just the best. I have two now. The wrist pad is at just the right height. The angle of the keys is pretty good. Using it all day still hurt my wrists, but much less than other keyboards.

Kinesis Freestyle

Though all the standard setups for it (even with the accessory set) were less comfortable than the Natural Keyboard Elite, what I like about the Kinesis is that it gives you the ability to experiment.  With a little DIY spirit, anyone should be able to make it into a better keyboard than anything else available, because it’s not one-size-fits-all like the rest of them.

Dragon NaturallySpeaking

What’s better than a comfortable keyboard? No keyboard at all. Though Dragon does have a bit of a learning curve, I’ve found it to be completely worth it and wholeheartedly recommend it as a keyboard alternative. Even for programmers, there are (free) Dragon add-ons which enable programming by voice, including VoiceCode, Aenea, and (my project) Caster.

Concluding Remarks

So what did I ultimately choose? My current setup has Dragon NaturallySpeaking instead of a keyboard, the touchscreen monitor for standing, and the trackball for sitting. I haven’t had anything remotely resembling a standard setup in about 18 months. The result is that the burning and tingling have gone away completely, and all that remains of my carpal tunnel is occasional stiffness and soreness from extended use of the trackball. This is a better recovery* than I hoped for, but I think there’s still a lot of room for innovation. Come on hardware hackers, give me an opportunity to give you my money.

* It’s also worth mentioning here that after using the Natural Elite / Kensington combo for a while, I went to see a physical therapist. She had me wear night splints and did an ergonomic evaluation/ correction of my sitting posture at work. Those two things alone reduced my symptoms by about 60%.

This article recounts the story of how I became one of the wealthiest 100 players on a virtual betting site with over 10,000 active users. I was looking to sharpen up my JavaScript skills when I came across a mention on Hacker News of what turned out to be the perfect learning project opportunity: Salty’s Dream Casino.

salty01

Salty’s Dream Casino

A little bit of background is in order. Salty’s Dream Casino, a.k.a. SaltyBet, is a website whose main feature is an embedded Twitch video window and chat. The video window shows a video game called M.U.G.E.N. running 24/7. The game is a fighting game, and if you sign up for an account on SaltyBet.com, you are given 400 “salty bucks” and can bet on who will win the fights. It’s all play money of course, and if you run out, you automatically get a minimum amount so that you can continue betting. There are no human players controlling the characters; they are computer-controlled, some with better or worse AI. (Some have laughably bad AI, but that’s part of the fun.) The characters are all player created and there are over 5000 of them spanning over 5 tiers: P, B, A, S, X.

What I saw in SaltyBet was fast iteration for development (most matches are over in 1-2 minutes, the perfect amount of time to fix the logic and come back), a fun project, and data that would translate well into features for some of the machine learning algorithms that I’d been studying.

My Short Trip to the “0.1%”

As I would be needing to inject JavaScript into the site, I decided to go the route of a Google Chrome extension. I didn’t know anything about browser extensions at the time, so that would also be a great learning opportunity. The initial step was to create a basic runtime that ran in parallel to the SaltyBet fighting match cycle. I set it up: the very first version of the bot picked a side randomly and bet 10% of total winnings.

Happy that I’d gotten the extension working, I left it on overnight. When I woke up, my ranking was #512 out of 400,000 total accounts, with $367,000. As I suspected, this was just a great stroke of luck. When I got home that day, the bot had pissed away most of the money.

The Progression of Strategies

In order to create strategies any more advanced than a coin toss, I would need to collect data. I implemented some basic stat collection using Chrome Storage, as well as records import and export, then got to work on the first real strategy, “More Wins”. As its name implies, it simply compared wins and losses in order to determine which character to bet on. If there was a tie, it resorted to a coin toss again.

I plugged in RGraph to see just how much better “More Wins” was doing than a coin toss. I was dismayed to see that although the coin toss strategy had the expected 50% accuracy rate, “More Wins” was at 40%! After modifying the bot to print out its decision-making logic at betting time, I realized that (A) lots of matches were being decided by coin tosses, and more importantly, (B) lots of bad calls were being made because I didn’t have enough data to effectively compare wins and losses. For example, if there were a matchup between a very strong character whom I didn’t have any data on, and a relatively weak character with one win and a bunch of losses, the loser’s single recorded win would trump the zero recorded wins of the champ.

(At this point, of course I could have signed up for a premium account and gotten full access to character statistics, but where’s the fun in that? Besides, I wanted my bot to be able to work with limited data, since premium accounts really just had a larger amount of limited data.)

My solution was to create “More Wins Cautious”, which would only make a bet if it had at least three recorded matches for each character. While MWC did do a few percentage points better, it almost never bet. Not a good solution.

I had a bit more data by this point and had also started to realize that comparing wins and losses both rewarded and penalized popular characters more than it should have. For example, consider the following two character records. The parts in parentheses represent data that my bot has recorded.

As you can see, “BenJ” is being rewarded for being popular rather than effective. My next strategy, “Ratio Balance” compared win percentage rather than number of wins. This yielded a fairly significant improvement: 55% accuracy.

Enter Machine Learning

Wanting to apply some of the machine learning material I’d been studying recently, I upgraded the bot to collect more information than just wins and losses. Now, for each match, it recorded match odds, match time (for faster wins and slower losses), the favorite of bettors with premium accounts (who constitute about 5% of total active accounts), and the crowd favorite. The next version of the bot, “Confidence Score” combined all of those features in order to make its decision.

But how to weigh the different features? The problem was a good fit for a genetic algorithm, so I put all of those weights on a chromosome and created a simulator which would go back through all the recorded matches and try out different weighting combinations, selecting for accuracy. The accuracy immediately leapt to 65%! In the days that followed, the chromosome class underwent a lot of changes, but its final form looked like the following.

I also had the bot change its betting amount based on its confidence in its choice. This, more than any of the features of the data, turned out to be really important. It caused my winnings to stop fluctuating around the $20,000 mark and to instead fluctuate around the $300,000 mark. Then, I had the simulator select for (money * accuracy) instead of just accuracy, and my virtual wealth moved up into the $450,000 range despite the accuracy decreasing slightly. Of course by this time, I’d realized that most of the 400,000 accounts on SaltyBet were inactive, so I wasn’t really in the 1% yet.

Analysis

Before we move on, let’s take another look at that chromosome. There are a number of interesting facts which emerge, which aren’t intuitively obvious. For this reason, I’ve come to enjoy watching the chromosomes evolve more than watching the actual matches.

  • Win percentage dominates everything else. I actually put in an anti-domination measure in the simulator which penalized chromosomes by 95% which had one of the weights worth more than all of the others combined. (Doing so minimized the damage when the bot guessed wrong. Surprisingly, this didn’t hurt the accuracy much, small fractions of a percent.)
  • Crowd favor and premium account favor are completely worthless.
  • Though success and failure do count differently in different tiers, the distribution is far from uniform. For example, wins and odds in X tier count an order of magnitude more than almost everything else, but match times in X tier aren’t that important.
  • Not shown here is that the chromosome formerly included confidence nerfs, conditions like “both characters are winners/losers” or “not enough information” which would decrease the betting amount if triggered, and also switches to turn the nerfs on and off (like epigenetic DNA). The simulator consistently turned off all of my nerfs, so I got rid of most of them. The true face of non-risk-aversion.

You’ll also notice that there’s a tier, “U”, on the chromosome which I didn’t mention before. Due to some quirks of the site, my information gathering isn’t perfect. So, “U” stands for Unknown.

Why Not Also Track Humans?

salty02I had learned a ton about JavaScript (like closures and hoisting!) and browser extensions, and was pretty happy with the project. My bot swung wildly between $300,000 and $600,000 with 63% accuracy. I started to wonder how accurate the other players were, and realized I could also track them. So I did. I collected accuracy statistics on players for 30 days. This unearthed a few more interesting facts.

  • At 63% accuracy, my bot was in the 95th percentile. The most accurate bettor on the site bets at about 80%.
  • Players with premium accounts bet 6% more accurately than free players, on average.
  • Judging from the number of bets made, there were obviously other bots on the site.
  • Some players who were significantly richer than I was bet with much lower accuracy. One of them bet with 33% accuracy.

That last item in particular interested me. How could this be? … Upsets! I went back to the simulator and pulled out more statistics. By this time, I had quite a bit of data.

  • The average odds on an upset were 3:1.
  • The average odds on a non-upset were 14:1.
  • Upsets constituted 23% of all matches.
  • My bot was able to call 41% of all upsets correctly.
  • My bot was able to call 73% of all nonupsets correctly.

(3 * 0.41 * 0.23)+(-1 * .59 * .23)+(0.07 * .73 * .77)+(-1 * .27 * .77) = -0.02

If I switched the bot to pure flat bet amounts, it would take a loss, but it would almost break even! With just a little bit of tweaking, it might be able to get into the black in a stable, linear way, rather than all the wild swings around a threshold. (I was still betting 10% of total winnings at this time.) It also occurred to me, that since my bot was on 24/7, there were lower traffic times during which it could actually move the odds far enough that it would hurt itself. That too would be minimized by flat bets.

I switched the bot over to flat betting amounts based on total winnings. (Meaning, it was allowed to bet $100 until it passed $10,000, then $1000 until it passed $100,000, and so forth). I watched for a while and experimented with different things. What finally seemed to work was applying the confidence score to flat amounts, rather than the original 10% of totals. (So, the amount to bet was now (flat_amount * confidence)). That did the trick: my losses were instantly cut by 10%, which meant I was getting a penny back on every dollar bet, on average. My rank has been steadily rising ever since. No more wild swings or caps, just slow wealth accumulation.

salty03

I don’t work on the bot anymore, but I leave it on, 24/7. I come home from work and see another $100,000 accrued, and smile. Sometimes it drops $100,000 instead, but the dips are always temporary. Since I started writing this article, it has accumulated $40,000. If you care to try it out yourself, or perhaps improve it somehow, please fork it. It’s on Github.

Previously, I went over the process of writing an Away3D parser but skipped how to actually interpret the data in a file. In this post, I’m going to talk about the structure of an OBJ file (less complicated than SOUR), how it’s typical of all 3D files, and how in Away3D or any 3D engine one would turn its data into a usable 3D object.

First, let’s actually look at a 3D file, a simple cube saved by Blender. This cube has sides 2 Blender units long and is centered perfectly. (Hence all the 1’s and -1’s.) It has a material file and is UV-mapped. Unlike SOUR, it uses quads (four-pointed faces like rectangles, diamonds, or squares).

Alright, let’s go through it.

#
The lines that begin with a pound sign (#) are comments. They can be deleted and it will have no effect on the file.

mtllib
The lines that begin with “mtllib” refer to the MTL material files exported alongside the OBJ file. In my experience, MTL files are fairly useless, so I’m not going to cover how they work here except to say that they do not contain the PNG or JPG file used for the texture, and that you can delete that line with little effect. Same for “useMtl” lines.

v
The lines that begin with “v” are vertices. You can see that each has an X, Y, and Z value. Furthermore, they are “indexed”. More on that later.

dao_mesh

vt
The lines that begin with “vt” are texture vertices. They’re in 2D space, so they only have X and Y coordinates. They will for the rest of this post be called “UVs”, since that’s more common.

dao_UV

useMtl, s
Next are a few more useless lines. The line with “useMtl” was discussed above. The one with “s” has to do with smoothing groups, which applies to 3DS Max only.

f
Next up are the lines starting with “f”. F is for face. This is where it gets interesting. Remember how I said the vertices were indexed? Each face has two numbers. One refers to a vertex, a “v”, and the second refers to a UV, a “vt” from the above lists of vertices and UVs. You’ll see that there are eight vertices and each of the face number-sets’ first numbers is in the range 1-8. See where this is going? The first face is a quad with the vertices 1,2,3, and 4, or:

By plotting those four points in 3D space, we reconstruct the first of the six quad faces of a cube. “Indexing” means listing each of the eight vertices only once and then referring to that vertex by its “index” number in the face information. This saves space and bandwidth. The alternative would be listing the actual “v” vertex information per face. The UVs are also indexed.

So, in a custom parser, if you were parsing line by line like the OBJ and SOUR parsers do, you’d read a line, separate the bits by the spaces in the line via something like AS3’s String.split() function, throw out the first one, and use the rest to form a vertex or UV. Once you had a list of vertices, UVs, and whatever other data was relevant, you’d use your 3D engine’s methods/functions to construct the Mesh object and put it onscreen. In Away3D, that looks like:

And that’s really it. Parsing is just taking data in one form, and making it readable to whatever needs to use it.