Learn how you can streamline authoring, testing, and maintenance of technical documentation with Jupyter Notebooks, nbdev, and Docusaurus. We also discuss open source content management tools that we use to optimize SEO and interoperate with Wordpress. Discover tools that will not only drastically improve your documentation workflow but also make the process of writing technical documentation delightful.
Documentation quality is one of the most important factors that developers use to decide if they should adopt a technology. Despite its importance, developers often struggle to maintain high-quality documentation because it can be challenging to keep prose, code and code outputs in sync.
This is in part due to a fundamental limitation involved with most documentation infrastructure: code and associated output are generally manually copied into external files. This presents several challenges:
- Code and associated outputs quickly become stale as your API changes.
- It is challenging to systematically test all of the code examples in the documentation.
- Editing code examples require a laborious process of writing & prototyping it somewhere else and eventually copy-pasting it into Markdown. This slows down iteration speed and distracts from the task of writing.
We have always treated the documentation for Metaflow as a top priority. This is reflected in recent efforts, which include a new book, and the site outerbounds.com, containing additional examples, how-to guides and tutorials. When we started to scale up our efforts, we knew we needed better tools. We wanted:
- A modern, customizable Content Management System (CMS) that allows us to publish a wide range of content: code examples, prose and blog articles.
- The ability to test and refresh outputs of code snippets automatically.
- An authoring experience that doesn’t involve copy and pasting code, with as close to a WYSIWYG experience as possible. Good authoring tools are important in both encouraging documentation but also in allowing the process to scale by removing friction.
Static Site Generator + Design
For our front-end-framework, we chose to customize Docusaurus. Docusaurus is a modern, beautiful static site generator developed at Facebook specifically for technical documentation and blog posts. We liked Docusaurus for its extensibility and that we could use use custom React components. When customizing our site, we drew inspiration from both Stripe and spaCy: We loved the ability to view both text / code side-by-side, which maximizes vertical real estate and minimizes context switching. Below is a screenshot of of how we implemented this (you can view the live page here):
As you can see, we prioritized having a minimalistic design where code is present on the right-hand side without distraction, with the options of adding additional callouts and line highlighting in code.
Furthermore, Docusaurus uses MDX, which is more flexible than plain Markdown. Thanks to MDX, we were able to implement the two-pane design shown above and give the author fine grained control on the placement. Docusaurus has provided an amazing amount of flexibility while at the same time including features specific to technical documentation such as code highlighting and magic comments, as well as being very fast compared to other static site generators.
The prevailing medium through which code and prose are co-located in traditional document systems are Markdown files. This presents the challenge of testing, refreshing and authoring code examples since Markdown is a fixed, static environment. Thankfully, there already exists a medium that naturally supports writing prose and code together in a live environment where code can also be run and tested for many scenarios: Jupyter Notebooks. However, notebooks do not offer facilities for documentation out of the box. While there are several projects such as JupyterBook, Sphinx and Quarto that offer publishing from notebooks, we needed the following additional capabilities:
- Ability to add custom pre and post-processing directives to code cells to control how code and outputs are displayed.
- Ability to write tests in-situ and configure their visibility and characteristics.
- Facilities for rendering api docs interspersed with code examples or prose that can be automatically inferred from docstrings.
- Freedom to use any static site generator
To accomplish these goals, we extended nbdev, a documentation-first python development framework by fast.ai. Concretely, we extended this framework to allow for documenting existing code bases that were written outside nbdev. We also are intimately familiar with this framework as we employ one of nbdev’s core contributors! To get an idea of how authoring in notebooks works, below is a whirlwind tour.
In this example, we can see that we write prose + code in a notebook, which gets immediately rendered in the docs:
Since Metaflow is designed primarily as a command line tool, we make sure you can render python scripts in the docs like this:
We can toggle visibility of cell inputs and outputs:
We can even create interactive plots:
Finally, we make testing our documentation easy. We embed tests directly into notebooks, right next to the code snippets or examples. When then programmatically run all of the code and the tests (which can be configured for fine grained rules around execution):
In addition to code snippets for tutorials and how-to guides, we wanted to to provide API documentation. Instead of writing this API documentation from scratch, we wanted to render it from existing docstrings, but still maintain the flexibility to interleave code snippets and prose where necessary. We were able to use nbdev and docusaurus to introspect and render docstrings for various objects. For example, here the API docs for the Metaflow client are created from this notebook.
A key aspect of documentation is collaboration. We wanted to enable people with different skill sets such as SEO specialists, marketers and copy editors to participate in content authoring. A popular tool for authoring and analyzing content that is popular among non developers is Wordpress. Wordpress offers features for SEO analysis as well as a review and editing workflow that is approachable for many.
In order to preserve the ability to author content in wordpress, we created wp2md, a simple CLI tool that allows you to export blog articles from Wordpress into Docusaurus-compliant Markdown, along with assets like embedded images and videos. Below is an example of how wp2md works:
Another piece of functionality we wanted to mirror that Wordpress offered is SEO analysis with Yoast. Yoast analyzes your content and helps to flag issues in your site such as:
- Missing metadata such as open graph tags or authors
- Titles, descriptions and content with inappropriate length
- Duplicate entities such as titles, slugs, etc
- Broken links
Since many of these checks are rule based, we created mdseo, a CLI tool that analyzes Markdown content for these same rules. The upshot of having a CLI tool rather than a GUI interface is that we can run these checks in CI alongside our other tests to spot issues. Furthermore, we were able to add customized flags to allow users to optionally ignore certain rules per article.
Here is a screenshot of mdseo running in GitHub Actions letting us know that a specific article has a description that is less than 50 characters long:
This authoring flow has been central to allowing us to author high documentation quickly. We’ve open sourced, nbdoc, a reference example of how we use nbdev and Docusaurus. Besides us, a few other companies have adopted the framework successfully:
David Berg, Senior Software Engineer at Netflix
Prior to using nbdev, documentation was the most cumbersome aspect of our software development process… Using nbdev allows us to spend more time creating rich prose around the many code snippets guaranteeing the whole experience is robust. nbdev has turned what was once a chore into a natural extension of the notebook-based testing we were already doing.
Roxanna Pourzand, Product Manager at Transform
We’re so excited about using nbdev. Our product is technical so our resulting documentation includes a lot of code-based examples. Before nbdev, we had no way of maintaining our code examples and ensuring that it was up-to-date for both command inputs and outputs. It was all manual. [Now, we] have this under control in a sustainable way. Since we’ve deployed these docs, we also had a situation where we were able to identify a bug in one of our interfaces, which we found by seeing the error that was output in the documentation!
It should be noted that most users may want to follow the development of nbdev rather than trying to use nbdoc, as many of these features (documenting existing codebases, custom directives, freedom to use any static site generators, etc.) have been upstreamed into nbdev.
Try This at Home
Most of the features described in this article have been upstreamed into nbdev, a community maintained project from fastai. Furthermore, wp2md and mdseo are open source projects. We recommend following nbdev for those interested in a community maintained general authoring tool documentation, blogs, and writing python packages.
Get In Touch
If your company needs well-documented, open-source infrastructure for data science and machine learning join our Slack for support and feedback and check out Metaflow.