RSS
 

Building and checking lists in JavaScript

16 Sep

Recently, I was building a white-list of strings that I needed to check an input string against. It reminded me to write about something I wanted to write about for a while now.

First, lets take a look at the different ways we can define a list of strings:

Using arrays as lists

Normally, this is the first thing you would think of when building a list of strings. It might look something like this:

var list = [ "item1", "item2", "item3", "item4", "item5" ];

if (list.indexOf("item3") > -1) {
}

Fairly straightforward, right? Array lists are useful if you’re iterating over them and using each value for something, but when it comes to white-listing or black-listing something, it gets a little more complicated.

Advantages

  • Easy to define and maintain.
  • Can iterate over and perform a task on each item
  • Simple method to add values — list.push("item6");.
  • ES5 makes it easy to search for value existence — list.indexOf("item3"); //-> 2.

Disadvantages

  • Can be awkward to avoid adding duplicate values at runtime.
  • Non-ES5 implementations require a shim or a for loop to search for value existence.
  • The larger the array, the more performance significantly degrades.

Using a regular expression

I see this method used quite a lot, it’s a simple regular expression with all the words joined by the pipe symbol | for alternation:

var list = /^(?:item1|item2|item3|item4|item5)$/;

if (list.test("item1")) {
}

This is a decent method for small lists that are intended to be constant and never modified at runtime.

Advantages

  • More widely supported than Array.prototype.indexOf.
  • Modern regex implementations appear to be able to cope well with even large lists.
  • Case-insensitive searches are easier, just add the /i modifier.

Disadvantages

  • Regular expressions require further understanding than basic JavaScript, it’s easy for naive users to make a mistake (such as missing ^ and $).
  • Difficult to modify at runtime, creating a new regexp may require string escaping.
  • Less readable and maintainable when the list is large.
  • No best of both worlds, you can’t use the regex to iterate over and perform a function on each list item.

Using an object

This is my preferred method of defining a list, offering the most flexibility and the fastest lookup methods.

var t = true, list = {item1:t, item2:t, item3:t, item4:t, item5:t};

if ("item3" in list) {
}
if (list.hasOwnProperty("item3")) {
}
if (list.item3) { // "truthy" values only
}

I defined the list with true as the value for each item, but you could assign null or undefined if you want to use the in operator or hasOwnProperty.

Advantages

  • Easy to iterate over and do something with each item/property.
  • Custom values can be used for even more flexibility.
  • Very easy to add/remove items at run time, with no chance of duplicate items.
  • Code is easily readable, writeable and maintainable. Omitting string delimiters for property names makes for the same amount of typing as an Array (if assigning a single character variable/value).

Disadvantages

  • Lots of seemingly redundant code when not actually using values. It’s more typing than should be needed!
  • Potential naming collisions with Object.prototype properties if using the in operator or boolean checks.

Some slight changes to Object literal grammar would make this approach even more perfect. If we could just write { item1, item2, item3, item4, item5 } to have an object initialized with those properties (and undefined values) or { item1: item2: item3: item4: item5: true } to have 5 properties defined with the same value, our syndrome-less carpal tunnels would be thanking us in years to come. I know something like that has been proposed for ECMAScript Harmony, so let’s hope that works out.

Performance

Mostly for myself, I set up a JSperf test to compare the performance between the different lookup methods. Here’s what I learned:

  • Array.prototype.indexOf is, as expected, the slowest. If you’re using a shim for older browsers, expect this to be sickeningly slow.
  • The straight boolean tests are the fastest by a long way, but remember they’re not the strictest test.
  • Opera is able to optimize the regular expression so that it is faster than the in operator and hasOwnProperty.
  • Chrome’s hasOwnProperty function appears to be well optimized, since in all the other browsers, in was the faster of the two.
  • Firefox is pretty slow compared to the other browsers when it comes to boolean checks.

So, there you have it. Don’t use arrays for lists and lookups because Array.prototype.indexOf isn’t as fast as other methods you can use. In fact, if you’re bothered about speed at all, use one of the object methods. Some methods aren’t always appropriate, for instance if you want to assign custom values to your lists they might not always be truthy. You’re always safe with hasOwnProperty, but if you control where the values are coming from (e.g. list[element.tagName]) go with the truthy values for best performance.

 

Tags: ,

Leave a Reply

 

 
  1. Nikita Vasilyev

    September 16th, 2011 at 10:39 am

    One more way to do it:

    ” item1 item2 item3 item4 item5 “.indexOf(” item3 “)

    Overal, it’s as slow as array.indexOf. http://jsperf.com/lists-indexof-vs-in-operator/4

     
    • Andy

      September 17th, 2011 at 12:57 pm

      Nikita,

      Yeah, that has an additional downside that the search string could give false positives if it contains the separator string, making it much less viable for a general solution. Thanks for the jsperf update!

       
  2. Anton Suprun

    September 19th, 2011 at 3:27 pm

    I think that the array method is the most readable and most of the times readability is more important. If you really need optimizations, you could create your own class that accepts an array and initializes it internally as an object. That’s what I would do.

     
    • Andy

      September 20th, 2011 at 3:33 pm

      A “class” was definitely something I was thinking about. I don’t know if I’d agree with “most of the times readability is more important [than performance]“, there has to be a cut-off point where you say “nope, this isn’t efficient enough for my liking”. You can always comment your code well enough to explain what’s going on. Whilst arrays are much more readable than any other method, I would argue that they’re not significantly more readable than objects to warrant the significant performance loss indexOf will cost you.

      I was (sort of) a fan of this harmony proposal which would allow object literals to be written like array literals. I *think* it might have been turned down, though.

       
  3. Edo

    September 19th, 2011 at 3:37 pm

    Would have been good to mention Weakmaps here. It has advantages and disadvantages, just like the solutions listed in this post.

     
    • Andy

      September 20th, 2011 at 3:21 pm

      Hi Edo,

      WeakMaps aren’t really a suggestible alternative since at the moment there’s only a proposal for natives, although SpiderMonkey/Firefox 6 has an implementation. Of course, you could write your own implementation but it would complicate the post more than I would like.

       
  4. Matt

    September 19th, 2011 at 6:49 pm

    Are there any problems accessing an object like a hash?

    if (list["item3"])

    It seems like I run into issues with older versions of IE when object properties are undefined, or when you do something like this:

    var userInputString = prompt();
    if (list[userInputString]) {}

     
    • Andy

      September 20th, 2011 at 3:35 pm

      Can you expand on that a little more, perhaps explain the issue in more detail? How old a version of IE are we talking?