Chapter 4: Comments

The chapter about comments is not inherently bad - it contains enough caveats and examples of bad vs good comments. However, Martin poisons the well from the start:

Comments are not like Schindler's List. They are not "pure good." Indeed, comments are, at best, a necessary evil. If our programming languages were expressive enough, or if we had the talent to subtly wield those languages to express our intent, we would not need comments very much—perhaps not at all.

The proper use of comments is to compensate for our failure to express ourself in code. Note that I used the word failure. I meant it. Comments are always failures. We must have them because we cannot always figure out how to express ourselves without them, but their use is not a cause for celebration.

So when you find yourself in a position where you need to write a comment, think it through and see whether there isn't some way to turn the tables and express yourself in code. Every time you express yourself in code, you should pat yourself on the back. Every time you write a comment, you should grimace and feel the failure of your ability of expression.

This is the books second worst idea: comments are failure to write good code.

I believe this is the main takeaway that developers get out the chapter and the book. Declaring "comments are failure" has led developers to avoid a crucial tool for design, abstraction, and documentation. It's contributed to a culture where comments are undervalued and often ignored (notice how IDEs gray them out by default, even when they contain crucial design information or provide essential context).

Martin implicitly promotes "self-documenting code." But code alone can never provide all context. Even perfect code can only show what's there - not what's deliberately excluded or what was considered and rejected.

This anti-comment position drives verbose, over-descriptive naming:

private static int smallestOddNthMultipleNotLessThanCandidate(int candidate, int n) { }

Is smallestOddNthMultipleNotLessThanCandidate really a name or just inlined comment ?

Names are part of the code, but they’re just one step removed from comments: the compiler doesn’t check or enforce their semantic meaning, so they can easily fall out of sync — just like comments.

private static int smallestOddNthMultipleNotLessThanCandidate(int candidate, int n) { 
    return Integer.MAX_VALUE; // I lied ┌∩┐(◣_◢)┌∩┐
}

Example: Prime Number Generator

Here's the original code:

/**
 * This class Generates prime numbers up to a user specified
 * maximum.  The algorithm used is the Sieve of Eratosthenes.
 * <p>
 * Eratosthenes of Cyrene, b. c. 276 BC, Cyrene, Libya --
 * d. c. 194, Alexandria.  The first man to calculate the
 * circumference of the Earth.  Also known for working on
 * calendars with leap years and ran the library at Alexandria.
 * <p>
 * The algorithm is quite simple.  Given an array of integers
 * starting at 2.  Cross out all multiples of 2.  Find the next
 * uncrossed integer, and cross out all of its multiples.
 * Repeat until you have passed the square root of the maximum
 * value.
 *
 * @author Alphonse
 * @version 13 Feb 2002 atp
 */
 public class GeneratePrimes     
     /**
     * @param maxValue is the generation limit.
     */
     public static int[] generatePrimes(int maxValue) {
     if (maxValue >= 2) // the only valid case
     {
         // declarations
         int s = maxValue + 1; // size of array
         boolean[] f = new boolean[s];
         int i;
         // initialize array to true.
         for (i = 0; i < s; i++)
             f[i] = true;

         // get rid of known non-primes
         f[0] = f[1] = false;

         // sieve
         int j;
         for (i = 2; i < Math.sqrt(s) + 1; i++) {
             if (f[i]) // if i is uncrossed, cross its multiples.
             {
                 for (j = 2 * i; j < s; j += i)
                     f[j] = false; // multiple is not prime
             }
         }

         // how many primes are there?
         int count = 0;
         for (i = 0; i < s; i++) {
             if (f[i])
                 count++; // bump count.
         }

         int[] primes = new int[count];

         // move the primes into the result
         for (i = 0, j = 0; i < s; i++) {
             if (f[i])  // if prime
                 primes[j++] = i;
         }

         return primes;  // return the primes
     } else // maxValue < 2
         return new int[0]; // return null array if bad input.
     }

I can grant that this is pretty convoluted code, but honestly I'm not sure if it was written this way or it was intentionally obfuscated.

The good parts: the messines of the code is contained in a single place and API is safe and clear.

Comment about the library of Alexandria is cute, I would also add a link to the Wikipedia page.

The Martins refactoring:


/**
 * This class Generates prime numbers up to a user specified
 * maximum.  The algorithm used is the Sieve of Eratosthenes.
 * Given an array of integers starting at 2:
 * Find the first uncrossed integer, and cross out all its
 * multiples.  Repeat until there are no more multiples
 * in the array.
 */
public class PrimeGenerator {
    private static boolean[] crossedOut;GLOBAL MUTABLE STATE!
This is intermediate state
private static int[] resultGLOBAL MUTABLE STATE!
This is final state of computation
public static int[] generatePrimes(int maxValue) { if (maxValue < 2) return new int[0]; else { uncrossIntegersUpTo(maxValue); crossOutMultiples(); putUncrossedIntegersIntoResult(); return result; } } private static void uncrossIntegersUpTo(int maxValue) { crossedOut = new boolean[maxValue + 1]; for (int i = 2; i < crossedOut.length; i++) crossedOut[i] = false; } private static void crossOutMultiples() { int limit = determineIterationLimit(); for (int i = 2; i <= limit; i++) if (notCrossed(i)) crossOutMultiplesOf(i); } private static int determineIterationLimit() { // Every multiple in the array has a prime factor that // is less than or equal to the root of the array size, // so we don't have to cross out multiples of numbers // larger than that root. double iterationLimit = Math.sqrt(crossedOut.length);Nitpick, but why is this extraction needed? return (int) iterationLimit; } private static void crossOutMultiplesOf(int i) { for (int multiple = 2*i; multiple < crossedOut.length; multiple += i) crossedOut[multiple] = true; } private static boolean notCrossed(int i) { return crossedOut[i] == false; } private static void putUncrossedIntegersIntoResult() { result = new int[numberOfUncrossedIntegers()]; for (int j = 0, i = 2; i < crossedOut.length; i++) if (notCrossed(i)) result[j++] = i; } private static int numberOfUncrossedIntegers() { int count = 0; for (int i = 2; i < crossedOut.length; i++) if (notCrossed(i)) count++; return count; } }

Right off the bat: The refactoring introduced serious thread-safety issues by using static global mutable state (crossedOut and result - static fields)

The rule of thumb: if you see mutable static variables - RUN!

This code can not be used as is in multi-threaded environment. You will have to remedy this by using global lock, so that at most one thread in the application can be computing this.

Is there reason to introduce this scalability limit and take performance hit? No.
This is self-inflicted pain. There is 0 reasons for this design.

Second: are those small methods really necessary?

private static boolean notCrossed(int i) {
    return crossedOut[i] == false;
}

Compare: if (notCrossed(i))   vs   if (crossedOut[i] == false). Is the first option really more readable?
If anything, now he have polluted the domain with semi-similar words:

  • not crossed
  • crossed out
  • uncrossed

Is "notCrossed" same thing as "not crossedOut"? Is it same thing as "uncrossed"? One would have to look into implementation of all those small methods to understand if there is a difference or not.

This exemplifies the problem with too many small entities: human working memory holds 4-7 objects. Flood it with pebbles, and you'll hit cognitive overload. By excessive method extraction Martin introduced inconsistencies and confusion in terminlogy.

Lets remove obfuscation from the original method:

public class GeneratePrimes {
    public static int[] generatePrimes(int maxValue) {
        if (maxValue < 2) {  // first two numbers are not prime
            return new int[0];
        }

        boolean[] crossedOut = new boolean[maxValue + 1];
        Arrays.fill(crossedOut, 2, crossedOut.length, Boolean.TRUE); // first two numbers are not prime

        // the algorithm
        for (int i = 2; i < Math.sqrt(crossedOut.length) + 1; i++) {
            if (crossedOut[i]) {
                for (int j = 2 * i; j < crossedOut.length; j += i)
                    crossedOut[j] = false; // multiple is not prime
            }
        }

        List<Integer> result = new ArrayList<>();
        for (int i = 0; i < crossedOut.length; i++) {
            if (crossedOut[i]) result.add(i);
        }
        return result.toArray(new int[result.size()]);
    }
}

Does this look complicated?