Sunday, January 5, 2020

In the first two years of the history of Unix, no documentation existed

I decided to write this post as I was working through some research on the Tensorflow 1.0/2.0 notebooks using Long-Term-Short-Term (LSTM) Recurrent Neural Networks (RNN) for AI story generation and style cloning.  I became a bit frustrated as last year the API changed quite a bit, and some of the core documentation and system help was incorrect.

Project developers, open source or not, hopefully learn from their mistakes and improve their code.  It's exciting to start building something amazing, then seeing it take off and grow more legs, and minds.  Like the spider (or graph) that is the World Wide Web, a codebase can go in many directions.

Clone it, use it, break it, fork it, build it, test it, hack and change it
Test it, doc it, run it, use it, build it, upgrade then docs outdated.



With open source projects, new developers enter the team and old committers take on different roles.  This software maturity cycle, team churn, and feature/bug improvements inevitably results in some issues with stale documentation, examples, and tutorials.  This is especially prevalent with cloud and fast-moving technology, where by the time the user and technical docs would be complete it's time to move on to the *Next Big Thing*.

In the end, the code is the truth, the behaviour of the code is the truth and the code is the documentation of said truth.  Like Schrödinger's cat, the code is dead and alive at the same time.  It's only at the point of observation that the code becomes physically manifested.

Whether it is markdown or pdf, wiki or man pages, current documentation is important.  Even, and especially in an Agile world.

The following 8 points are tricks to digging through upgrade changes for an open source API that are causing stale code examples or dependencies to break.

1. RTFM

There is usually a guide from upgrading from one major version to another, and the guilt of the developers who have broken an API for the greater good of the project should easily surface the changes in the developers guidelines and release notes.

I once spent a few days pulling out the words "RTFM" from client code, and fixing documentation that could be "questionably unprofessional" to the client on the receiving end of the source code.  It's important to make your documentation as professional and readable as possible for a wide audience of (possibly) non-developers.  

2. "man" up, get help() and command discovery.

A man page, or manual page, Man is the help for Linux and an alias for Powershell.  When a dependency or module is loaded, you can walk through the api like the chapters of a good book by using the help system.  In python, you can use the help() command to review the documentation, or just search for the API in the source code in your favourite editor or in Github.

3. Generate the documentation.

In the first two years of the history of Unix, no documentation existed.  "One literally had to work beside the originators to learn it."

You don't necessarily have to depend on the developer to build external documentation.  Create an up-to-date API document using document generation tools.  For Python, this could be tools like Pydoc, Sphinx, pdoc or doxygen.

https://medium.com/@peterkong/comparison-of-python-documentation-generators-660203ca3804

Use man, help() and other tools to review the implementation and hopefully some decently commented code.

4. Browse through the Github issues or JIRA tickets for technical guides.

Much of the documentation of an open-source project lives in the tribal knowledge repo that is Github or JIRA.

5. Read the Request for Comments (RFC) documents and any community discussions.  

Often on larger projects it is essential for teams to collaborate on new features and APIs to create standards or best practices.

The Tensorflow RFCs
https://github.com/tensorflow/community/blob/master/rfcs/20180827-api-names.md

6. Ask for help.

The developers, committers, contributors and user community of large projects are likely more than willing to help with your questions, provided you read the manual first.

7. Watch for projects without a lot of activity

Stale code in Github more than 5 years old probably means it has been forked down another path.  It may no longer be maintained.

8. Watch for projects with too much activity.

Monster projects may have bigger challenges with documentation than smaller projects.  I see this a lot with projects in the Apache and Cloud Native ecosystems.  Projects like this become seriously complex and stale documentation can snowball a problem.  FAQs and cheat sheets are important for large projects.

TL/DR

In the end, you'll probably find some poor soul on Stackoverflow or Github with the same questions you have about why this or that example no longer works.  Figure out the answer then contribute some documentation to the project if you can.

In the case of Tensorflow, use your favourite RSS reader (Feedly?) to watch the blog.  https://blog.tensorflow.org/2019/02/upgrading-your-code-to-tensorflow-2-0.html






No comments:

Post a Comment