Open Source Report Card

July 21, 2013 on 1:21 pm | In Programming | No Comments

I discovered the Open Source Report Card website today. Pretty cute!

Update (7/25/2013): I also claimed my projects on

Semantic Markup, Part 2

February 3, 2012 on 9:29 pm | In Programming | No Comments

Yesterday, I finally connected the dots between jQuery templates and our semantic JSP tag framework at work. A while ago, I built a JavaScript API which provides the equivalent functionality in the browser, but it is bulky, requiring the instantiation of a class for every tag. Today, I discovered that I could send the unprocessed JSP tags to the browser, and the tags would show up in the DOM. Since there is a 1:1 mapping between JSP tags and JavaScript classes, I can parse the DOM subtree, instantiate the equivalent JavaScript classes, and render them to replace the JSP tag subtree with real markup.

This makes building apps for NodeJS so much easier, because we can write pages using the semantic tags and then generate the HTML, either on the server or on the client.

The simplest hacks are the most satisfying. :)

Update (2/7/2012): Internet Explorer refuses to accept any tags outside a pre-defined set. (This breaks HTML5 tags, too!) To work around it, we switched from plain JSP tags (<foo:bar type="baz">) to div’s (<div formof="bar-baz">).

Fearless Refactoring

January 1, 2012 on 10:52 am | In Programming | No Comments

I was going to title this Unit Testing, but then I realized that would be giving the wrong emphasis, because unit testing is only useful when refactoring or enhancing.

If you build a system that will never change after it is deployed, then of course manual testing is sufficient, because there is no point in automating something that you will never do a second time. However, every programmer reading this is now either laughing or shaking their head, because software never stops changing, so automated testing is a huge time saver.

Unfortunately, I see many unit testing practices that actually hinder modifications to software. Test classes for Java beans are the worst, since testing getters and setters doesn’t add any value at all, but any unit tests for internal implementation details hinder refactoring.

The ideal unit tests validate the API that is exposed by a subsystem. This allows the internals to be refactored or even completely re-implemented without modifications to the tests. For example, a REST service can be tested by validating the output of each url for various valid and invalid inputs. Similarly for graph algorithms, data compression, image processing, etc., which are implemented as deep libraries with tiny public API’s. (This is only one of many reasons to keep the API of any library as small as possible.)

I do agree that separate tests for internal API’s can be useful for isolating an error in a complex subsystem, but these should be used sparingly, because the testing code may have to be thrown out during a major refactoring. (A better solution is often to split a large, complex subsystem into smaller, reusable components.)

Excessive need for mocks is a big red flag. It can be helpful to mock an http request/response pair or a database connection, but if I have to mock a large fraction of the Struts2 framework to test five lines of my code, and then Struts2 changes so my mocks no longer emulate the actual framework behavior, then the test is worse than useless. It is misleading.

So the next time you sit down to write tests, ask yourself whether the tests will help or hinder your next major refactoring effort.

Thanks to Tim Bray’s article for the kick.


Some argue that one should not even mock database connections. When building a framework that has plugin points, it is better to build sample plugins rather than use mocks.

UI’s pose a different set of problems, because automation requires emulating user actions, which is significantly more cumbersome than mocking input data. I prefer to test UI’s manually for two reasons. First, forcing myself to repeatedly use the interface helps me identify the annoying parts, so I can improve them. Second, for web applications, a critical part of validation is checking that everything looks right, and I don’t know of any way to automate this.


March 1, 2011 on 2:11 pm | In Deep Thoughts, Programming | 2 Comments

I code for a living. I code for a hobby. It’s almost to the point where I could say, I code, therefore I am, except that I do have a family :)

A while back, somebody in management, who used to be technical, tried to explain that, while doing it is all well and good, it’s even better to get other people to do it for you. My internal reaction was, No thanks! That’s my worst nightmare! and I immediately dismissed it.

Today, I sat in a design review meeting that was scheduled for one hour, but ran over to two, and I was in a really bad mood afterwards. I asked myself, How can my architect colleagues stand to do this all day, every day?

Then it hit me: For many people, coding is just a job. If you’re not passionate about it, and you can earn more doing something else, why not move up?

Of course, I’m hardly the first person to realize this or blog about it, but then I connected it with my reaction to the programmer-turned-manager: For those of us who can see the beauty in code, moving up to a position that takes us away from the coding is a fate worse than death.

If this sounds like utter nonsense, consider the possibility that maybe you just can’t see the beauty in code. Please don’t take offense. I can’t look at a painting and tell you if it’s great. I can’t look at a building and tell you if the architecture is great. I can’t listen to a symphony and tell you if it’s great. I can’t look at a business plan and tell you if it’s great. But I can study a piece of code or the design of a user interface and tell you if it’s great. And this quest for greatness keeps me going. I refuse to settle for second rate, no matter how many iterations it takes. Forcing me to do something else for a living would be like taking a fish out of water.

Naming member variables

February 11, 2011 on 8:00 pm | In Programming | No Comments

I just got bitten by the following code:

    public void setDebug(String debugInfo)
        this.debug = debug;

It happens to be in Java, but it would fail equally well in C++. It ought have been a compile error, but since it wasn’t, it took me a long time to notice the problem. It’s not on my list of issues to watch out for, because it’s supposed to be the compiler’s job.

This is yet another reason to always use a unique prefix for member variables. (The primary reason, of course, is to make the code easier for humans to parse.)

Tabs vs. Spaces

February 7, 2011 on 11:35 am | In Programming | No Comments

The debate over using tabs vs. spaces for indenting has raged for decades and is unlikely to ever disappear. (Thankfully, jwz clarified the issues.) I use tabs for my personal projects, but I also work on projects where the default is spaces. Some large projects even have a mix! At work, we had the debate a couple of years ago when I discovered that there was a mix, but since we couldn’t reach a consensus, and since the Subversion diffs would be ruined by converting everything either way, we decided to punt by agreeing to keep each file consistent. I try to do this whenever I work on mixed open source projects, too.

Unfortunately, maintaining consistency can be very tiresome. Checking every file every time you open it is annoying and error prone. Embeddable options like emacs’ indent-tabs-mode are widely supported by editors, but they must be placed in every file, and many projects do not use them.

I think I have finally found a nice way to handle this in Code Crusader. Like all decent editors, it supports both a global preference for tabs/spaces and the ability to override this for individual files. When a file is opened, Code Crusader can check if the indentation consistently matches the global preference. If there is a mix of spaces and tabs, then make whitespace visible. If more than 50% of the lines are the opposite of the global preference, automatically override the global preference.

Once this is implemented (in the upcoming release), my personal projects can continue to use tabs, while any file that uses only spaces will automatically stay consistent, and a file that has a mix of spaces and tabs will be highlighted for cleaning.

Always separate Code and Data

January 28, 2011 on 11:16 am | In Programming | No Comments

Is HTML code or data? Normally, the answer would be code, but when writing Java, it turns out that the correct answer is data. It took me a while to convince my Java colleagues to store the markup in property files (with tokens for parameter insertion), but the benefits for our web application framework have been enormous:

  1. Keeping them separate makes both much easier to read. (This is true for any separation of code and data.)
  2. The Java code required to implement 98% of our JSP tags was minimal: one base class and a tiny class (with only configuration) for each tag.
  3. Defining categories of tags (<yt:some-tag-category type="...">) made this even simpler: one Java class for the entire category, and the actual keys retrieved from the property file constructed (by the base class) based on the type parameter: category+type,open & category+type,close
  4. Since there was so little Java, and the HTML was stored separately, it was very easy for front-end developers (who didn’t know any Java) to create new tags and modify the existing tags. (Creating a new type for an existing category required zero Java code.)
  5. I was able to write a converter to automatically generate JavaScript functions equivalent to all the JSP tags. This allowed dynamic content creation on the client without either (1) duplicating markup (unmaintainable) or (2) server-side calls (slow).
  6. I was able to write another converter to automatically generate PHP functions equivalent to all the JSP tags. Instant support for another language!

JavaScript Scoping

August 16, 2010 on 1:36 pm | In Programming | No Comments

I’ve been working my way through Douglas Crockford‘s video series on JavaScript. This morning, while waking up, I found myself wondering how to implement closures with static scoping.

As background, Brendan Eich was forced to create JavaScript in a matter of weeks, so we should all be grateful that he didn’t implement dynamic scoping. This is the simplest to build, since the rule is If the variable wasn’t defined in this stack frame, search backwards through all the stack frames to find the requested name. Unfortunately, it’s a nightmare to work with, since it’s very difficult to predict what the stack will contain when your function is called. Language designers have learned this the hard way. If I remember correctly, Lisp and Perl both started with dynamic scoping.

So JavaScript implements static scoping. A scope is created every time a function is defined — not every time an open curly brace is encountered. This allows you to look at your source code and know what will be available in the closure: the variables defined in the function, the variables defined in the enclosing function, etc. — all the way up to global scope.

What is the simplest implementation of static scoping?

Start with a stack frame (hash map) at global scope. When the execution reaches a statement that defines a new function (scope), then save a reference to the current stack frame as the function’s enclosing scope. When a function is executed, store a reference to its enclosing scope in the stack frame. If the function defines another function, then this new function will save a reference to the current stack frame, which has a reference to its enclosing scope, etc. — all the way up to global scope. With garbage collection, stack frames which are referenced by function instances or other stack frames will not be lost when the executing function exits.


January 21, 2010 on 11:45 am | In Programming | No Comments

While driving to work this morning, I finally separated out what has been bothering me about Lisp. Functional programming proponents tend to make a big deal about avoiding side effects because it avoids long range coupling between functions and it makes list iteration trivially parallelizable. This is good for program maintenance and effective utilization of all the cores in a CPU.

Side effects are eliminated by avoiding (1) global variables and (2) functions that modify their arguments.

Point #1 is relatively easy in any language (except assembly), but it does require discipline, since global resources like files and databases are in principle always directly accessible, even if an encapsulating interface exists.

Point #2 is very difficult/painful in strongly typed languages like C++ and Java but very easy in untyped languages like Lisp, Perl, and JavaScript, mainly because untyped languages make it very easy to return a heterogeneous list of values/objects.

This is what has been bothering me about Lisp in the back of my mind: a list is the most natural return value, both because the syntax is simple and because most standard functions operate on lists. A homogeneous list is fine, of course, but a list of heterogenous values is terrible because (1) it is hard to remember what is returned in each slot and (2) without compile time checks, modifying a function to insert a new value into the returned list creates a maintenance nightmare. You have to manually find and update every use of the modified function, and if you miss one, you have a subtle bug.

How can we avoid all this trouble? Using a map instead of a list alleviates the problems because (1) well chosen key names are easier to remember and (2) most of the existing uses of the function probably will not need to be updated because they do not care about the new value and the original values will still be accessible via the same keys.

Unfortunately, working with maps (or hash tables) in Lisp is a lot messier than working with lists. In Perl and JavaScript maps are part of the language syntax.

The Long Road to Hell

November 29, 2009 on 8:15 pm | In Programming | No Comments

A friend just introduced me to Variadic Templates in C++0x. When will they admit that they just plain started from the wrong place? Their example of how to print a comma-separated list of arbitrary values, which is impossible in C++ without variadic templates, is so trivial in an untyped language that nobody would bother to discuss it.

Has it really never occurred to the C++ crowd that they should start with a loosely typed language and then add in an option for compile-time type checking? Objective C actually does this. A function parameter can be id, which means it accepts anything, or it can be a type, which means that it accepts that class or any subclass, or it can be a Protocol, which means that it accepts any class which implements the required methods. (Protocols also feature prominently in the new language, Go.)

The only flaws I can see in Objective C are (1) id only accepts objects, not primitives, and (2) constructors return id, so there is no type checking when an object is created. Java fixes the former this via autoboxing, but Java doesn’t have the concept of a Protocol and reflection is very painful.

One could argue that another flaw in Objective C is the lack of private functions. However, this can be solved the same was as in JavaScript: static functions, which are accessible only to other functions declared in the same source file.

Next Page »


Powered by WordPress with Pool theme design by Borja Fernandez.
Entries and comments feeds. Valid XHTML and CSS. ^Top^