Why your technical documentation sucks

technical documentation

_This article was originally presented as a “brown bag talk”, an internal series of talks at the office where employees meet over lunch to share and discuss technical topics that interest them. It was also originally published as a post here, but I’ve slightly tweaked it to be a little more, ahem, opinionated_I’ll know I’ve peaked when I can say “fuck” on the company blog.. Enjoy!


In today’s post we’re going to (briefly) explore one of my favourite topics: technical documentation.If you’ve ever interacted with really good documentation, you’ll understand perfectly the kind of reverence that I have for it. More specifically, we’re going to look at some reasons why your technical documentation sucks. Along the way, we’ll also (sort of) touch on how we can make things better. The post is a bit hyperbolic and tongue-in-cheek, but the lessons aren’t — they remain useful whether you’re a team of one, or a team of one thousand.

This will also have a distinctly Pythonic slant, since this is the language I spend most of my time working in.

Without further delay, here’s why your documentation sucks.

1. You think you don’t need documentation

Documentation is like sex. Even when it’s bad, it’s better than nothing.

— Someone on the internet.

You 100% need to document your code. No matter how good your product, library, or tool, if your documentation sucks, people aren’t going to use it. Period. It’s that simple. If you force people to use your tool without good documentation, they won’t just be ineffective with it — they’ll also dislike you.And they’ll dislike you a lot. Perhaps worst of all, if you’re forced to work on your own codebase without excellent documentation, you’ll definitely end up disliking yourself.

Not having documentation is like trying to wander around in a dark cave without a torch — it’s torturous.Ha! Puns! All you do bump, trip, and fumble around getting increasingly confused. It quickly descends into frustration and misunderstanding how things actually work, and is only made worse when you encounter a project for the first time (or return to a project a few months later).

If you think you don’t need documentation, then your documentation sucks. And not having documentation also means your documentation sucks.

2. You think documentation only refers to docstrings and comments

This is the most common mistake I see beginner programmers make. But is documentation only docstrings and comments?

What about thins like:

Anything that helps people understand the behaviour and intention of your code is documentation!

An example of descriptive variable names:

# This is a lot less descriptive...
my_dict = {
    'mary jane': True,
    'john smith': False,
    'billy bob': True,
}

# ... than this
people_i_have_murdered = {
    'mary jane': True,
    'john smith': False,
    'billy bob': True,
}

And some good unit tests that document how things are supposed to behave:

def test_my_python_phone():
    """Test my super cool Python phone."
  
   phone = PythonPhone()
   phone.enter_number("0832911234")
   phone.dial()
   
   assert phone.did_ring() == True

(and please, remember to write tests!).

Use everything at your disposal to make your code easy to understand. Documentation extends quite far beyond docstrings and comments. Don’t ignore variable names, clean code and good tests. Otherwise your documentation will suck.

3. You don’t know how and when to use comments

Another common beginner mistake (this is often how you’re taught at university, so it might not be entirely your fault). Your team asks you to document your code. And you produce something like this:

def add_unique_number_to_list(list_, number):
    
    # Check if number in the list
    if number not in list_: # if the number is not in the list
        list_.append(number)  # Then append the number to the list
    return list_  # return the list

Now, I know your intentions are good when you write code like this. But I can read code. And guess what, your teammates can read code too.

Instead, rather use code to explain something that is surprising or unexpected, or is deliberate due to something non-obvious. For example, take a look at this snippet from one of our Django codebases:

query = Q()
for gene_result in context["profile"].result_set.all():
    matching_logic = MatchingLogicLookup.objects.get(
        file_header=result.rule.symbol
    )

    # A clever way to bundle all of your logic together, but only
    # execute one query.
    # https://stackoverflow.com/a/20177875
    query = query | Q(
        rule_type_logic__result=result.result,
        rule_type_logic__symbol=matching_logic.logic_match,
    )

If you’re not familiar with Django, you might not know that you can build up a single query by using the | operator on multiple Q objects, instead of step-by-step narrowing down your selection. It’s a neat little trick, so the developer added a comment (and a reference to a StackOverflow question!) to explain what the intention is, and why it’s there.

Take another example:

# We *must* store the session in a variable first!
# https://docs.djangoproject.com/en/4.0/topics/testing/tools/#django.test.Client.session
session = self.client.session
session["is_otp_verified"] = True
session.save()

When modifying with a session in Django, it will not be saved unless it is assigned to a variable first, due to some technical reason. An unsuspecting developer might come along, and think they can modify the session directly via self.client.session, but then later be confused why the session isn’t saved. Again — use comments to explain things that are surprising!

And you don’t have to take it from me either:

A delicate matter, requiring taste and judgement. I tend to err on the side of eliminating comments, for several reasons. First, if the code is clear, and uses good type names and variable names, it should explain itself. Second, comments aren’t checked by the compiler, so there is no guarantee they’re right, especially after the code is modified. A misleading comment can be very confusing. Third, the issue of typography: comments clutter code.

— Rob Pike, “Notes on Programming in C”

Use comments wisely, otherwise your documentation will suck.

4. You think documentation is a substitute for confusing code

This is almost the opposite case of Rule 2: your code works, but it’s poorly written and confusing. So you think to yourself, “Ah hah! Rather than fix this, I’ll just add some documentation to explain what’s happening. That should be ok.”

And you’d be wrong.

Take the following implementation of Fizz Buzz:

# Get the number from 1 to 100 inclusive.
# If the number is a multiple of three, replace it with Fizz.
# If the number is a multiple of five, replace it with Buzz.
[(i%3//2*'Fizz'+i%5//4*'Buzz'or-~i)for i in range(100)]

From the comments, I understand what the code is supposed to do, but I have no clue how it actually works.

What happens if I need to modify its behaviour?
What happens if I forget how it works (you will eventually)?
What happens if it has a bug? How do I figure out what’s gone wrong?

As a side note, it’s also highly probable that if you can’t write clean code, you likely won’t write good documentation anyway:

A common fallacy is to assume authors of incomprehensible code will somehow be able to express themselves lucidly and clearly in comments.

— Kevlin Henney

So focus on your fundamentals! If you can’t write clean code, your code sucks.

But your documentation also probably sucks.

5. You don’t know about PEP-257

Our first Python-specific point 🐍.

A lot of Python developers know about PEP-8, which is the official style guide for Python code. What not a lot of Python developers know is that there is also a documentation equivalent: PEP-257.

It’s worth reading through PEP-257 (it’s not long!), but my favourite example is one I see even senior developers not do: use the imperative mood in the first line of a docstring:

The docstring is a phrase ending in a period. It prescribes the function or method’s effect as a command (“Do this”, “Return that”), not as a description; e.g. don’t write “Returns the pathname …”.

— extract from PEP-257

Follow PEP-257, otherwise your documentation sucks.

some more technical documentation

6. You don’t choose and follow a common docstring format

Luckily for us, the Python ecosystem is so large and mature that a number of really smart people have already spent a lot of time thinking about things like documentation. To date, there are three major docstring formats in Python that most projects will use:

Each style guide has a clear specification that you can (and should) follow.

Just because these docstring style guides doesn’t necessarily mean they’re good, but by standardising on accepted formats nets you a few easy wins:

Let’s take a look at what these three major formats look like:

# restructuredText
def add(a, b):
    """
    Add two numbers together.
    
    :param float a: The first number.
    :param float b: The second number.
    
    :returns answer: The sum of ``a`` and ``b``
    :rtype: float
    """
    
    return a + b


# NumpyDoc
def add(a, b):
    """
    Add two numbers together.
    
    Parameters
    ----------
    a : float
        The first number.
    b : float
        The second number.
        
    Returns
    -------
    float
        The sum of `a` and `b`.
    
    """
    
    return a + b


# Google 
def add(a, b):
    """
    Add two numbers together.
   
    Args:
        a (float): The first number.
        b (float): The second number.
    
    Returns:
        float: The sum of `a` and `b`
    """
    
    return a + b

It doesn’t matter too much which format you choose, as long as you choose one and stick to it! Except when…

7. You choose the wrong docstring format

Except maybe it does matter which docstring format you choose?

I personally (fight me!) think that reStructuredText is the wrong format. Here’s why:

Fun fact: We chose standardised on reStructuredText at my workplace. This was at a time before NumpyDoc or Google Python Style Guide were as popular as they are now. We’re in the process of considering moving to a more readable format, now that better things are available.

8. You do the bare minimum

Consider the following function (pulled again from one of our codebases):

def create_model(name,
                 artifact,
                 requirements_path,
                 script_path,
                 version_number=None):
    """Create and store a model on the model store.

    :param str name: The model name.
    :param artifact: The path to the artifact.
    :param str requirements_path: A path to a pip requirements file.
    :param str script_path: A path to the model script.
    :param int version_number: Optional. The model version.
    :return: The ModelReference object.
    :rtype: ModelReference
    """

And now compare that with the following:

def create_model(name,
                 artifact,
                 requirements_path,
                 script_path,
                 version_number=None):
    """Create and store a model on the model store.

    This also requires specifying a path to a pip requirements file, as well as
    a path to a python script that contains `load` and `predict` functions.

    The `load` function should accept a single `path` argument that will load
    and return the instantiated model object. The `predict` function must
    accept a `clf` argument, which will be the instantiated model, and a `data`
    argument that contains input data that the model must use to make
    predictions. 
    
    An example of a script.py file is shown below:

    :Example:
    # script.py
    import cloudpickle

    def load(path):
        with open(path, 'rb') as f:
            clf = cloudpickle.load(f)
        return clf

    def predict(clf, data):
        return clf.predict(data)

    :param str name: A name for the model.
    :param [str,Object] artifact: Either a path to a model artifact, or a model
        object. In the latter case, the model object will be pickled.
    :param str requirements_path: A path to a pip requirements file that
        specifies the model dependencies.
    :param str script_path: A path to a python file with filename `script.py`
        that contains a `load` and `predict` function.
    :param int version_number: Optional. A version number for the model. If not
        provided, an incrementing version number is automatically generated.
    :return: The ModelReference object that references the model on the model
        store.
    :rtype: ModelReference
    """

Which would you rather have? I think it’s fairly obvious. More explanation, more reasoning, more exposed “thinking process” is almost universally preferred. You’ll see this echoed across all of the major Python projects: scikit-learn, numpy, pandas, etc. all often have more documentation than code. This isn’t an accident!

Some other points to keep in mind when going beyond the bare minimum:

Also do more than the bare minimum. You’ll be thanked by your teammates, users, and yourself.

If you do the bare minimum, your documentation sucks.

9. You’re unaware of documentation’s biggest flaw

Even if you assume the perfect build system: excellent unit tests, linting, style checks, and so on. You’ll still have the following issues with documentation:

When it comes to docstrings and comments, documentation is code. But it’s code that doesn’t get executed or tested (yes, I know about doctests, that’ll help with testing examples, but not with prose). As a result, you have to be extra vigilant and disciplined when it comes to maintaining your documentation. Made a point of revisiting it, making sure it’s still up to date and relevant. Otherwise you might inadvertently do the worst thing documentation can do (besides not existing): pointing someone in the wrong direction.


Till next time,
Michael.

PS: I’m hoping to write more in the near-future. Life has thrown a lot at me, but I’ve been constantly surprised at the peace of mind I achieve when I get to write about things I care about. So, holding thumbs, there’ll be more to share soon.