Thougths on Source-located Documentation


Good documentation can be invaluable in the long term, but it takes effort which can be difficult to justify in the short term. Not all projects are expansive libraries or frameworks, often they're just small applications with just a couple of pieces of critical business logic. In those cases especially, it may be tempting to forgo external documentation, which will of course bite you right back when it comes to onboarding new people onto the project or having to explain a particular behavior to your stakeholders over and over again.

We have plenty of tools for building and hosting static pages and wikis from dedicated files but when it comes to documenting business logic, they require extreme vigilance to avoid discrepencies between docs and actual behavior. Misleading docs are often worse than no docs at all.

To avoid accumulation of discrepencies over time, it helps to put your docs near the code itself, so the engineer who modifies it can quickly spot new inaccuracies in docs and fix them.

Module, class and function docstrings, or well estabshed comment conventions are a fine place for such docs, but the namespace hierarchy of the codebase most likely won't be also the right hierarchy for the documentation. Furthermore, a docstring's primary job is also to provide information for the actual development use, so usurping them for business documentation would come with a palpable cost. There is of course room for a middle ground, where business logic documentation may be embeded within the docstrings as long as we leave ourselves the option not to expose our development-related stuff to the outside.


doc_note_untangler is a small proof-of-concept tool I made for extracting specifically marked docstring contents out of a python codebase and rebuilding them into a static webpage.

In short, the goal is to find a compromise between having the business logic documentaion co-located with its implementation, and having to expose the codebase's structure and internal api docs along with it.

This idea is vaguely similar to Literate programming, but the priorities are reversed. In Literate programming, the documentation runs the show, and the code snippets are embeded in it. In the Untangler concept, the codebase structure remains untouched, and there along for the ride.

If you're interested in trying it, a PyPI package is available, but you may have more use out of it if you just copypaste the source code into your own project and hack it up to best fit your needs.