Blisteringly-quick mocking when testing Jupyter Notebooks
Notebooks are good for two things::
- Illustrative examples that provide code within the context of documentation (sometimes referred to as Literate Programming)
- In-line plots.
For everything else, you should be using a “proper” Python module that has some tests – at least for the important bits.
This was the thesis of Joel Grus’ hilarious talk “I don’t like notebooks”, which I happen to largely agree with. And I think it’s excellent advice…
But what happens when you are using your Notebooks to provide examples (say, as part of a package) and you want to make sure that they at least run without throwing an exception? Just to make sure that none of your code changes accidentally catastrophically break your examples.
Things are complicated further if your notebook needs to be executed in a separate environment to the one you’ll be testing in. This potentially means dependencies need to be mocked out somehow. Inside a notebook 😱.
I don’t have a perfect solution, but I do have something that’s minimally invasive, only relies on standard libraries (great!), and is easy to use by developers (double great!). At the very least, it’s been useful for a couple developers where I work.
Breaking it down.
We need to answer two questions:
- How can we test notebooks?
- How do I mock out dependencies in my notebook?
1. How can we test notebooks?
As I learned recently, there’s quite a nice tool for that –
nbval
.
From the project README:
The plugin adds functionality to py.test to recognise and collect Jupyter notebooks. The intended purpose of the tests is to determine whether execution of the stored inputs match the stored outputs of the .ipynb file. Whilst also ensuring that the notebooks are running without errors.
This is great. nbval
even allows you to skip certain cells and sanitizing
outputs (since __repr__
calls of objects that print out memory locations will
change on subsequent runs, causing an error, or logging lines that report the
current date and time).
This provides a lot more flexibility than bluntly using nbconvert
to turn your
Notebook into a an executable .py
file that you run and check for error codes.
So we’ll go with nbval
.
2. How do I mock out dependencies in my notebook?
Here we implement a little bit of minor magic.
We write ourselves a decorator:
import os
import warnings
def if_testing_mock_with(replacement_func):
def wrapper(function):
try:
if os.environ['TESTING'].lower() == 'true':
return replacement_func
return function
except KeyError:
# Feel free to remove this warning if you prefer
warnings.warn('TESTING environment variable not found. Assuming TESTING == False')
return function
return replacement_func
return wrapper
Our decorator checks if the TESTING
environmental variable is present and set
to true
. If that is the case, then we replace the decorated function with
replacement_func
. If this isn’t the case, then we use the original function.
Let’s look at an example.
Imagine you had the following piece of code in your notebook that has to fetch a large file from a filestore (maybe S3, or Blobstore) somewhere:
def fetch_remote_dataframe(remote_file_path):
"""
Access a remote .csv file and return a dataframe.
"""
file_contents = get_file_contents(remote_file_path) # Hit up a remote server
df = pd.read_csv(file_contents)
return df
To stub it out while testing we add our decorator it, and specify a replacement function:
def fetch_fake_data(remote_file_path):
"""Fetch some fake data"""
return pd.DataFrame({
'a': [1, 2, 3],
'b': [4, 5, 6]
})
@if_testing_replace_with(fetch_fake_data)
def fetch_remote_dataframe(remote_file_path):
"""
Access a remote .csv file and return a dataframe.
"""
file_contents = get_file_contents()
df = pd.read_csv(file_contents)
return df
You can freely omit the TESTING
environment variable in your development
environment, and specify TESTING=true
in your testing environment (for
example, on your build server that runs tests), and your function will be
magically stubbed out – or not, depending on the environment the notebook
is run in.
All that’s left to do is call your notebook using nbval
during your build
(check the documentation for how to do that – it’s not hard), and
you’re done.
Hope it helps next time you need to do something similar!
Till next time, Michael.