TODO comments best practices

// TODO: filter employees by role

The best use for TODO comments is when you are actively working on a change set, adding TODOs that will be replaced with actual code as you work. These TODOs will be removed before you commit to source control.

In this case you add a TODO, signifying an intention to return later on to complete the code.

Before committing code to source control, it’s always a good practice to review all your changes with a diff tool. This is the perfect time to scan for leftover TODOs in the code and implement them.

Another case for TODOs is for something that will truly be implemented in a later commit. Before leaving those kinds of TODOs in the code ask yourself if you shouldn’t rather create an issue in your issue tracker. Otherwise this second kind of TODOs tend to accumulate and pollute the source code. They experience the same kind of code rot as old commented out code does.

TODOs that you leave after a commit should be for changes that are small, implementable in the foreseeable future and relating to source code rather than business processes.

Before leaving those kinds of TODOs in the code ask yourself if you shouldn’t rather create an issue in your issue tracker.

On larger projects you tend to see TODO comments which are several years old. Just today I found one that source control indicated was 3 years old. The person responsible for the comment may not even be working here anymore. While reading this specific comment could give me a general idea of what should be accomplished it does not do so clearly enough that I can go do it. Or why for that matters it needs to be done. You tend to see these often enough.

This is similar to commented out code, where nobody knows why the code was commented out in the first place and thus leave it as it is thinking it may be important. As time goes by there is less and less chance of someone actually knowing why the code was commented out. So it stays there until someone is brave enough to just delete the useless comment (how can it be useful if no one knows why it’s commented out). Keeping track of old code is the source control’s job anyway.

Absolutes

As time goes on, I’m distancing myself more and more from rules that you must always follow or guidelines that make you lean towards absolutes.

  • Commenting every method and variable.
  • Documenting all classes.
  • Coding guidelines that need to be applied everywhere exactly the same way.
  • Over 90% test coverage.

The problem with these rules is that they don’t have any reasoning or intelligence behind them. You must comment everything even if it doesn’t make sense because some tool is checking for it. You must strive for 100% code coverage even if it means writing tests with little business value along the way.

Our strength as developers is our reasoning and our judgement.

I like where these guys are going : Why Inch doesn’t use percentages. They have the right idea. Inch is a documentation coverage tool which gives a you a letter grade rather than a percentage score. Their design takes into account the fact that not everything needs to be documented. Having a grade is less precise that a numerical score but that’s what makes it better. Having a percentage score, particularly with an enforced lower bound will often lead to people gaming the system or fussing too much over the details. Code metrics should be about guiding you to write better code.

Another interesting link on the subject is The code documentation fallacy. The author makes the point that given the choice between 10% or 90% comment coverage, it is better to have 10% because it means the comments that are there have a reason for being there. On the other hand if comments are enforced, most of them will probably be noise.

I can relate to this. I have worked on a large project which required 100% comment coverage of all public classes, methods, method parameters and return values. The team was using a refactoring tool that amongst other things allowed to generate these comments automatically.

I would say about 9 in 10 comments were auto-generated and looked something like this:

/// <summary>
///  The GetName.
///  Returns The GetName.
/// </summary>
public string GetName()

It’s surprising how fast the human brain learns to filter out these useless comments until it doesn’t even notice them. When these comments actually contained useful information they were being gleaned over.

The reverse, absolutely no comments at all, is just as bad. The problem here isn’t the comments or the lack thereof but the inflexible rules.

Google understands this concept as well. If you look at their coding style guidelines documents, (C++, Python, JavaScript) you will see the following:

Use common sense and BE CONSISTENT.

If you are editing code, take a few minutes to look at the code around you and determine its style. If they use spaces around their if clauses, you should, too. If their comments have little boxes of stars around them, make your comments have little boxes of stars around them too.

The point of having style guidelines is to have a common vocabulary of coding so people can concentrate on what you are saying, rather than on how you are saying it. We present global style rules here so people know the vocabulary. But local style is also important.

Notice the mention about local style and the focus on the reasons for having guidelines.

If the person working on the web service layer, knowing the guidelines, truly believes he should use a different naming scheme for his methods I’d be inclined to trust him. He knows his specific context.

Context and judgement trumps blind adherence to rules. Aren’t they called guidelines anyway?

The world isn’t all back and white. We have to trust ourselves.

Programming in a bilingual environment, which language to use?

If, like me, you are working in an environment where English is not the main language you have encountered the problem of which natural language to use when programming. This is often an issue amongst bilingual teams.

There are three possibilities:

  • You can have all the code in English
  • All in the “native” language (French, Spanish, German, …)
  • Or in a mix of both

I have had work experience with all three situations and debated the issue with various people. After these experiences I have come to the conclusion that for traditional line of business software development the best approach is a mix of both languages.

If you are writing line of business software and the intended users work in another language than English you should use the user’s language for all business domain terms.

Otherwise you will have to translate these terms. Sometimes there will be no existing translation or it will be unknown to most of the users themselves. The developers will use these translated terms on a daily basis while working with the code.

You’ll have a situation where the developers and the users do not use the same vocabulary. Things will eventually get lost in translation, people will need to do constant mental mapping when talking amongst themselves and users will be puzzled when developers use a translation they have never heard of.

One of the common arguments against mixing two languages is the inconsistency it often breeds. Sometimes a word is written in English, later the same word is written in the other language. This creates confusion when people try to search through the solution to find something. This is a frequent problem and it needs to be addressed.

The best way to address it is to have clear guidelines about which words should be in English and which words in the other language.

I have found that what works best is having everything but the words of the problem domain (domain model, business domain, or whatever you call it) in English. This works well because it avoids problem areas like pronouns and prepositions. Two types of words that are susceptible from going either way.

It also keeps all of the software terms in English. Since most literature (online or offline) is in English, developers will appreciate using familiar and searchable terms.

Of course if you are working with other people and your only common language is English or you intend to put your code on the internet (ie: GitHub). You should write all of the code in English.

The same situation applies if your users are English speakers.

Likewise if you are writing code yourself for a personal project you can do as you please and in these case I always write everything in English.

Pros and cons of various choices

All in english
Pros: consistency, more people can understand the code, allows for the use of the more common English terms to represent software concepts

Cons:
if your clients or users use business domain terms in another language you will be faced with translation problems

All in the other language
Pros: consistency, business domain terms are in the language of the users, may be easier for developers who do not have a firm grasp of English

Cons: software terms (ie: Load, Save, Factory, Adapter, Proxy) need to be translated, programming language keywords are in English which makes it impossible to use a single language throughout the source code

A mix of both languages
Pros: the best of both worlds, software terms in English, business terms in the other language, reading the source code allows to easily distinguish between business and software concerns

Cons: often leads to inconsistency when not everyone applies the same rules all of the time

Reek vs RuboCop

Or code linters / analysers for Ruby

I initially stumbled upon RuboCop, a code quality analyzer for Ruby by chance. My interest was immediately piqued.

After trying it out but before committing to use it I searched for alternatives and found Reek, a code smell detector for Ruby.

Both project analyse Ruby source code and give you a series of warnings. Both can integrate with Rake and both have plugins or third parties that allow them to integrate with editors like Vim.

My first instinct was to search for “reek vs rubocop”, a search that led to no results. I was expecting to see some blog posts comparing the two.

I then proceeded to try them both on the same set of files. I then realised they are different programs that can complement each other rather then replace each other. While they will sometimes overlap they each have a different focus.

RuboCop

RuboCop’s main focus is on coding style and adherence to the Ruby Style Guide. Typical warnings will be about:

  • lines being too long
  • methods having too many lines
  • following Ruby idioms prescribed by the guide, like which types of quote to use or preferring each over for for iteration.

Feature wise RuboCop has more features. It has many output formatters and more options to work with. The output is also in color which is nice. Oh and I really like the name, seriously 🙂

Reek

Reek for it’s part will focus on code smells that could lead to refactorings and that aren’t style related.

Examples include:

  • methods with too many parameters
  • code repetition that could be extracted in a new method
  • unused parameters

Like RuboCop, Reek can be configured via a config file, but overall RuboCop has a bigger feature set (which may or may not be a good thing).

While RuboCop will help you enforce style guidelines on your project, Reek will allow you to find bad code that can be made better through refactoring.

Overlap

Both will give out a warning if a class has no class level comment. Both will also give a warning if there are too many lines/statements in a method.

RuboCop will count the number of lines in a method and warn if that number is over 10, Reek on the other hand will count the number of statements in a method.

There are also other overlaps.

They will usually give you a varying number of warnings for the same file. This has a tendency to go over both ways but in the end which one will give you the most warnings will depend on your coding style.

Example

Here is the default output for both linters on the following file:

class Particle
  attr_reader :stability_radius

  def initialize(stability_radius=1)
    @stability_radius = stability_radius
  end

  def drop(height_map, x, y, size_x)
    drop_point = x + y * size_x

    if height_map[drop_point] == 0
      height_map[drop_point] += 1
    else
      agitate(height_map, x, y, size_x, drop_point)
    end
  end

  private

  def agitate(height_map, x, y, size_x, drop_point)
    # compile list of lower level neighbors
    lower_neighbors = []

    for index_x in x - @stability_radius..x + @stability_radius
      for index_y in y - @stability_radius..y + @stability_radius
        check_and_add_neighbor(index_x, index_y, size_x, lower_neighbors, drop_point, height_map)
      end
    end

    if (lower_neighbors.length == 0) then
      # no lower neighbors, leave particle on drop point
      height_map[drop_point] += 1
    else
      # randomly select one of these neighbors and put particle to this neighbor
      selected = lower_neighbors.shuffle.first
      height_map[selected] += 1
    end
  end

  def check_and_add_neighbor(x, y, size_x, lower_neighbors, drop_point, height_map)
    unless height_map[x + y * size_x].nil?
      if height_map[x + y * size_x] < height_map[drop_point]
        lower_neighbors.push(x + y * size_x)
      end
    end
  end
end

Running:

rubocop particle.rb

Will produce the following output:

Inspecting 1 file
C

Offences:

particle.rb:1:1: C: Missing top-level class documentation comment.
class Particle
^^^^^
particle.rb:4:34: C: Surrounding space missing in default value assignment.
def initialize(stability_radius=1)
^
particle.rb:20:3: C: Method has too many lines. [12/10]
def agitate(height_map, x, y, size_x, drop_point)
^^^
particle.rb:24:5: C: Prefer *each* over *for*.
for index_x in x - @stability_radius..x + @stability_radius
^^^
particle.rb:25:7: C: Prefer *each* over *for*.
for index_y in y - @stability_radius..y + @stability_radius
^^^
particle.rb:26:80: C: Line is too long. [97/79]
check_and_add_neighbor(index_x, index_y, size_x, lower_neighbors, drop_point, height_map)
^^^^^^^^^^^^^^^^^^
particle.rb:30:5: C: Never use then for multi-line if/unless.
if (lower_neighbors.length == 0) then
^^^
particle.rb:30:8: C: Don't use parentheses around the condition of an if/unless/while/until
if (lower_neighbors.length == 0) then
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
particle.rb:34:80: C: Line is too long. [80/79]
# randomly select one of these neighbors and put particle to this neighbor
^
particle.rb:40:29: C: Avoid parameter lists longer than 5 parameters.
def check_and_add_neighbor(x, y, size_x, lower_neighbors, drop_point, height_map)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
particle.rb:40:80: C: Line is too long. [83/79]
def check_and_add_neighbor(x, y, size_x, lower_neighbors, drop_point, height_map)
^^^^

1 file inspected, 11 offences detected

Similarly running :

reek particle.rb

Will produce :

particle.rb -- 20 warnings:
Particle has no descriptive comment (IrresponsibleModule)
Particle takes parameters [height_map, size_x, x, y] to 3 methods (DataClump)
Particle#agitate has 5 parameters (LongParameterList)
Particle#agitate has the parameter name 'x' (UncommunicativeParameterName)
Particle#agitate has the parameter name 'y' (UncommunicativeParameterName)
Particle#check_and_add_neighbor calls (x + (y * size_x)) 3 times (DuplicateMethodCall)
Particle#check_and_add_neighbor calls (y * size_x) 3 times (DuplicateMethodCall)
Particle#check_and_add_neighbor calls height_map[(x + (y * size_x))] twice (DuplicateMethodCall)
Particle#check_and_add_neighbor doesn't depend on instance state (UtilityFunction)
Particle#check_and_add_neighbor has 6 parameters (LongParameterList)
Particle#check_and_add_neighbor has the parameter name 'x' (UncommunicativeParameterName)
Particle#check_and_add_neighbor has the parameter name 'y' (UncommunicativeParameterName)
Particle#check_and_add_neighbor performs a nil-check. (NilCheck)
Particle#check_and_add_neighbor refers to height_map more than self (FeatureEnvy)
Particle#check_and_add_neighbor refers to x more than self (FeatureEnvy)
Particle#check_and_add_neighbor refers to y more than self (FeatureEnvy)
Particle#drop has 4 parameters (LongParameterList)
Particle#drop has the parameter name 'x' (UncommunicativeParameterName)
Particle#drop has the parameter name 'y' (UncommunicativeParameterName)
Particle#drop refers to height_map more than self (FeatureEnvy)

You can get a good feel for the differences between the two by scanning the types of errors produced.

In practice

While I am in favor of using such tools I also believe good judgement always trumps blindly following rules. It’s best to carefully select a subset of rules to adhere by rather than simply try to apply everyone of them. Also be ready to make exceptions even on those rules you have selected.

Trust developers to choose the best formatting for the code they are currently writing even if it goes against your general guidelines. They know more about the current context they are in than anyone.