Why your technical documentation sucks

August 16, 2022

technical documentation

_This article was originally presented as a “brown bag talk”, an internal series of talks at the office where employees meet over lunch to share and discuss technical topics that interest them. It was also originally published as a post here, but I’ve slightly tweaked it to be a little more, ahem, opinionated_I’ll know I’ve peaked when I can say “fuck” on the company blog.. Enjoy!

In today’s post we’re going to (briefly) explore one of my favourite topics: technical documentation.If you’ve ever interacted with really good documentation, you’ll understand perfectly the kind of reverence that I have for it. More specifically, we’re going to look at some reasons why your technical documentation sucks. Along the way, we’ll also (sort of) touch on how we can make things better. The post is a bit hyperbolic and tongue-in-cheek, but the lessons aren’t — they remain useful whether you’re a team of one, or a team of one thousand.

This will also have a distinctly Pythonic slant, since this is the language I spend most of my time working in.

Without further delay, here’s why your documentation sucks.

1. You think you don’t need documentation

Documentation is like sex. Even when it’s bad, it’s better than nothing.
— Someone on the internet.

You 100% need to document your code. No matter how good your product, library, or tool, if your documentation sucks, people aren’t going to use it. Period. It’s that simple. If you force people to use your tool without good documentation, they won’t just be ineffective with it — they’ll also dislike you.And they’ll dislike you a lot. Perhaps worst of all, if you’re forced to work on your own codebase without excellent documentation, you’ll definitely end up disliking yourself.

Not having documentation is like trying to wander around in a dark cave without a torch — it’s torturous.Ha! Puns! All you do bump, trip, and fumble while getting increasingly confused. It quickly descends into frustration and misinterpreting how things actually work, and is only made worse when you encounter a project for the first time (which is the same as returning to a project a few months later).

If you think you don’t need documentation, then your documentation sucks. And not having documentation also means your documentation sucks.

2. You think documentation only refers to docstrings and comments

This is the most common mistake I see beginner programmers make.I used to be one after all! But is documentation only docstrings and comments?

What about things like:

Good variable names (especially if you work in a dynamically-typed language!)
Clean code
Good tests
Good examples (technically considered documentation, but still!)

Anything that helps people understand the behaviour and intention of your code is documentation!

An example:

# This is a lot less descriptive...
my_dict = {
    'mary jane': True,
    'john smith': False,
    'billy bob': True,
}

# ... than this
people_i_have_murdered = {
    'mary jane': True,
    'john smith': False,
    'billy bob': True,
}

Good unit tests can also document how things are supposed to behave:

def test_my_python_phone():
    """Test my super cool Python phone."
  
   phone = PythonPhone()
   phone.enter_number("0832911234")
   phone.dial()
   
   assert phone.did_ring() == True

You should try and use everything at your disposal to make your code easy to understand. Documentation extends quite far beyond just docstrings and comments. Don’t ignore variable names, clean code and good tests. If you do, your documentation sucks.

3. You don’t know how or when to use comments

Another common beginner mistake.Sadly, this is often how people get taught at university, so it might not be entirely your fault. Your team asks you to document your code, and you produce something like this:

def add_unique_number_to_list(list_, number):
    
    # Check if number in the list
    if number not in list_: # if the number is not in the list
        list_.append(number)  # Then append the number to the list
    return list_  # return the list

I know your intentions are noble when you do this. But, guess what: I can read code. And so can your teammates.

Instead, use comments to explain something that is surprising, or unexpected, or is deliberate due to something non-obvious.

As an example, take a look at this snippet from one of our Django codebases:

query = Q()
for gene_result in context["profile"].result_set.all():
    matching_logic = MatchingLogicLookup.objects.get(
        file_header=result.rule.symbol
    )

    # A clever way to bundle all of your logic together, but only
    # execute one query.
    # https://stackoverflow.com/a/20177875
    query = query | Q(
        rule_type_logic__result=result.result,
        rule_type_logic__symbol=matching_logic.logic_match,
    )

If you’re not familiar with Django, you might not know that you can build up a single query by using the | operator on multiple Q objects. It’s a neat little trick that saves us running multiple filtering queries in series after one another to narrow down our results. To make this clear to readers, the developer added a comment (and a reference to a StackOverflow question) to explain what the intention is, and why it’s there. This is good!

Take another example, from the same project:

# We *must* store the session in a variable first!
# https://docs.djangoproject.com/en/4.0/topics/testing/tools/#django.test.Client.session
session = self.client.session
session["is_otp_verified"] = True
session.save()

When modifying with a session in Django, it will not be saved unless it is assigned to a variable first, due to some technical reason. An unsuspecting developer might come along, and think they can modify self.client.session directly, but then later be confused when the changes to the session isn’t persisted. So the developerIn this case, me, since I was the one bitten by this particular detail. added the comment to save future developers some confusion if they weren’t aware of this particular Django quirk.

Again — use comments to explain things that are surprising!

And you don’t have to take it from me either. I stole the idea:

A delicate matter, requiring taste and judgement. I tend to err on the side of eliminating comments, for several reasons. First, if the code is clear, and uses good type names and variable names, it should explain itself. Second, comments aren’t checked by the compiler, so there is no guarantee they’re right, especially after the code is modified. A misleading comment can be very confusing. Third, the issue of typography: comments clutter code.
— Rob Pike, “Notes on Programming in C”

Use comments wisely. If you don’t, your documentation sucks.

4. You think documentation is a substitute for confusing code

This is almost the opposite case of Rule 2: your code works, but it’s poorly written and therefore confusing. So you think to yourself, “Ah hah! Rather than refactor this (because that’s effort), maybe I can add some documentation to explain what’s happening. That should be ok, until I come back to it.“You won’t.

And you’d be wrong.

Take the following implementation of Fizz Buzz:

# Get the number from 1 to 100 inclusive.
# If the number is a multiple of three, replace it with Fizz.
# If the number is a multiple of five, replace it with Buzz.
[(i%3//2*'Fizz'+i%5//4*'Buzz'or-~i)for i in range(100)]

From the comments, I understand what the code is supposed to do, but I have no clue how it actually works.

What happens if I need to modify its behaviour?
What happens if I forget how it works (you will eventually)?
What happens if it has a bug? How do I figure out what’s gone wrong?

If you can’t write clean code, you probably not be able to write good documentation anyway:

A common fallacy is to assume authors of incomprehensible code will somehow be able to express themselves lucidly and clearly in comments.
— Kevlin Henney

So focus on your fundamentals! If you can’t write clean code, your code sucks.

And your documentation probably also sucks.

5. You don’t know about PEP-257

Our first Python-specific point 🐍.

A lot of Python developers know about PEP-8, which is the official style guide for Python code. But far fewer of them know about the documentation equivalent: PEP-257.

It’s worth reading through PEP-257 (it’s not long!), but my favourite example is one I see even senior developers occasionally get wrong: use the imperative mood in the first line of a docstring:

The docstring is a phrase ending in a period. It prescribes the function or method’s effect as a command (“Do this”, “Return that”), not as a description; e.g. don’t write “Returns the pathname …”.
— extract from PEP-257

If you don’t follow PEP-257, then your documentation sucks.

some more technical documentation

6. You don’t choose and follow a well-known docstring format

Luckily for us, the Python ecosystem is so large and mature that a number of really smart people have already spent a lot of time thinking about things like documentation. To my knowledge, there are three major docstring formats in Python that most projects will use:

reStructuredText
Numpydoc (my personal favourite)
Google Python Style Guide

Each style guide has a clear specification that you can (and should) follow.

Just because these docstring style guides exist doesn’t de facto mean they’re good, but standardising on accepted formats at least nets you a few easy wins:

Documentation generators (eg. Sphinx) will typically support one of the major formats.
Teammates will typically be familiar with one or more of the formats already
It’s easy to find good examples online.

Let’s take a brief look at what these three major formats look like:

# restructuredText
def add(a, b):
    """
    Add two numbers together.
    
    :param float a: The first number.
    :param float b: The second number.
    
    :returns answer: The sum of ``a`` and ``b``
    :rtype: float
    """
    
    return a + b


# NumpyDoc
def add(a, b):
    """
    Add two numbers together.
    
    Parameters
    ----------
    a : float
        The first number.
    b : float
        The second number.
        
    Returns
    -------
    float
        The sum of `a` and `b`.
    
    """
    
    return a + b


# Google 
def add(a, b):
    """
    Add two numbers together.
   
    Args:
        a (float): The first number.
        b (float): The second number.
    
    Returns:
        float: The sum of `a` and `b`
    """
    
    return a + b

It doesn’t matter too much which format you choose, as long as you choose one and stick to it!

Except…

7. You choose the wrong docstring format

Except maybe it does matter which docstring format you choose?

I personally think that reStructuredText is the wrong format.Fight me.

Here’s why:

It’s rather “ugly” in the source code.
The specification is confusing (years later, I still need to look up how certain directives work)
It’s hard to find good Python examples online
It seems quite fragile to parse (possibly as a side-effect of the confusing spec)

Fun fact: We (now ironically) chose to standardize on reStructuredText at my workplace. This was at a time before NumpyDoc or Google Python Style Guide popular. We’re in the process of considering moving to a more readable format, now that better things are available.

8. You do the bare minimum

Consider the following function (pulled again from one of our codebases):

def create_model(name,
                 artifact,
                 requirements_path,
                 script_path,
                 version_number=None):
    """Create and store a model on the model store.

    :param str name: The model name.
    :param artifact: The path to the artifact.
    :param str requirements_path: A path to a pip requirements file.
    :param str script_path: A path to the model script.
    :param int version_number: Optional. The model version.
    :return: The ModelReference object.
    :rtype: ModelReference
    """

And now compare that with the following:

def create_model(name,
                 artifact,
                 requirements_path,
                 script_path,
                 version_number=None):
    """Create and store a model on the model store.

    This also requires specifying a path to a pip requirements file, as well as
    a path to a python script that contains `load` and `predict` functions.

    The `load` function should accept a single `path` argument that will load
    and return the instantiated model object. The `predict` function must
    accept a `clf` argument, which will be the instantiated model, and a `data`
    argument that contains input data that the model must use to make
    predictions. 
    
    An example of a script.py file is shown below:

    :Example:
    # script.py
    import cloudpickle

    def load(path):
        with open(path, 'rb') as f:
            clf = cloudpickle.load(f)
        return clf

    def predict(clf, data):
        return clf.predict(data)

    :param str name: A name for the model.
    :param [str,Object] artifact: Either a path to a model artifact, or a model
        object. In the latter case, the model object will be pickled.
    :param str requirements_path: A path to a pip requirements file that
        specifies the model dependencies.
    :param str script_path: A path to a python file with filename `script.py`
        that contains a `load` and `predict` function.
    :param int version_number: Optional. A version number for the model. If not
        provided, an incrementing version number is automatically generated.
    :return: The ModelReference object that references the model on the model
        store.
    :rtype: ModelReference
    """

Which would you rather have? I think it’s fairly obvious. More explanation, more reasoning, more exposed “thinking process” is almost universally preferred. You’ll see this echoed across all of the major Python projects: scikit-learn, numpy, pandas, etc. all often have more documentation than code. This isn’t an accident!

Some other points to keep in mind when going beyond the bare minimum:

Poor grammar and incoherent writing is a sign of a poor thinking process and lack of understanding.
Good documentation takes time, and requires effort. Do it anyway.
Deep expertise is not a prerequisite for good documentation.

Also do more than the bare minimum. You’ll be thanked by your teammates, users, and yourself.

If you do the bare minimum, your documentation sucks.

9. You’re unaware of documentation’s biggest flaw

Even if you assume the perfect build system: excellent unit tests, linting, style checks, and so on. You’ll still have the following issues with documentation:

Incorrect or bad documentation doesn’t cause the build to fail.
Out-of-date documentation doesn’t cause the build to fail.
You will forget to update the documentation once you’ve changed the code.

When it comes to docstrings and comments, documentation is code. But it’s code that doesn’t get executed or tested (yes, I know about doctests, that’ll help with testing examples, but not with prose). As a result, you have to be extra vigilant and disciplined when it comes to maintaining your documentation. Made a point of revisiting it, making sure it’s still up to date and relevant. Otherwise you might inadvertently do the worst thing documentation can do (besides not existing): pointing someone in the wrong direction.

Till next time,
Michael.

PS: I’m hoping to write more in the near-future. Life has thrown a lot at me, but I’ve been constantly surprised at the peace of mind I achieve when I get to write about things I care about. So, holding thumbs, there’ll be more to share soon.