docbook and the NetBSD website

Recently, I’ve been doing a lot of maintenance of the NetBSD website. It contains a boatload of documentation, much of which was originally written in the 2000s. It has some special requirements: it has to work in text-based web browsers like lynx, or maybe even without any working browser installed at all, or just ftp(1) for downloading plain text over HTTP. Naturally, the most important parts are static, suitable for serving from the standard NetBSD http server, which runs from inetd by default.

Most of the NetBSD website is written in a language called docbook-xsl. This would be an unpopular choice today, as most people don’t like writing XML, view it as old fashioned, or don’t see the point when you can just write HTML. Convincing developers to write XML is difficult when they can submit Markdown to a Wiki and accomplish similar HTML output.

However, it has some advantages.

  • it provides semantic information far beyond what something like Markdown or even HTML provides.

  • it understands chapters, sections, tables of contents, and can be easily converted into book form. For example, the NetBSD Guide is available in PDF, PostScript, and plain text for offline reading.

There was an attempt to convert large parts of the Guide and website to Wiki-ish Markdown, but it was eventually reverted due to the inability to generate a nice book, which some people strongly objected to, and loss of syntax and versatility.

Example 1. Some DocBook XSL from the NetBSD site
<sect2 id="chap-net-practice-pppoe-vlan">
    <title>Configuring a VLAN</title>

    <para>A typical PPPoE connection requires a VLAN ID to be set
    on the external interface.
    On &os; this is accomplished by creating a &man.vlan.4; interface:</para>

    <screen>&rprompt; <userinput>ifconfig vlan0 create</userinput>
&rprompt; <userinput>ifconfig vlan0 vlan 6 vlanif pppoe0</userinput></screen>

    <para>The equivalent configuration can be stored in a file to be loaded
at boot-time:</para>

    <example id="ex-ifconfig-vlan0">
      <title><filename>/etc/ifconfig.vlan0</filename></title>
      <programlisting>create
vlan 6 vlanif pppoe0</programlisting>
    </example>

    <para>To ensure that <varname>vlan0</varname> is created at
    the appropriate time, refer to <xref linkend="chap-net-practice-order" />.</para>
</sect2>

Markdown

When I started programming, I was unaware of things like GitHub, or they didn’t exist yet - I’m not sure. I learned to code in a computer game hacking community, and the most common code distribution method was forum post attachments. We were doing free software without knowing or caring about the free software movement. When I started working on NetBSD, it was an easy transition - we were sending patches via email instead of forum posts. So, my first encounter with Markdown was on social networking sites like Reddit, where new users frequently visibly struggle with things like link syntax and whitespace, and emergency edits to posts are common.

Languages like docbook-xsl and mdoc have much clearer rules on what causes whitespace to appear. You clearly denote paragraphs with directives.

In Markdown, as it was intended to be used, you declare a header using a number of levels, and the renderer spits out a h1, h2, h3, or h4 HTML4 tag. It was not designed with automatic section nesting in mind, or of blocks beyond what might appear in the typical blog post.

We could convert the DocBook XSL to Markdown, and have the essentials of emphasis, code, lists, and headings, but lose the rest:

Example 2. The example from the NetBSD website rewritten in Markdown
Configuring a VLAN
-------------------

A typical PPPoE connection requires a VLAN ID to be set on the
external interface.

On NetBSD this is accomplished by
creating a [vlan(4)](https://man.NetBSD.org/vlan.4) interface:

        # ifconfig vlan0 create
        # ifconfig vlan0 vlan 6 vlanif pppoe0

The equivalent configuration can be stored in a file to be loaded
at boot-time, `/etc/ifconfig.vlan0`:

        create
        vlan 6 vlanif pppoe0

To ensure that `vlan0` is created at the appropriate time, refer to
[The NetBSD Guide: Chapter XX: Networking in Practice - ensuring interface creation order](link-to-page.html#link-to-section)

When I first encountered Markdown, I thought some parts of it were obviously good, but others made me pine for BBCode. In my opinion, it’s an acceptable language for the simple formatting of forum posts, but simply not useful enough for technical documentation.

So, what else is there?

asciidoc

asciidoc can be thought of as DocBook’s answer to Markdown (although it predates it by several years). Comparing it to Markdown, some aspects of the syntax differ, and it is suited to printed documentation rather than just HTML conversion. Many more features are standardized, including tables, examples, cross-references, tables of contents, and various macros.

From an initial study, I don’t think asciidoc includes built-in syntax to distinguish things like variable names, user input, shell prompts, or so on, but it is possible to apply styles. In this case, I’ve used the filename and varname styles.

Tools to convert docbook-xsl to asciidoc exist, although I’m unsure how much manual fiddling is required after initial conversion. asciidoc itself can be converted to docbook-xsl, HTML, latex, PDF, PostScript, epub, roff man, and plain text. If we had to convert the NetBSD website to another format, asciidoc would likely be the natural choice.

Example 3. The example from the NetBSD website rewritten in asciidoc
[sect2, id="chap-net-practice-pppoe-vlan"]
Configuring a VLAN
------------------

A typical PPPoE connection requires a VLAN ID to be set
on the external interface.

On NetBSD, this is accomplished by creating a
https://man.NetBSD.org/vlan.4[vlan(4)] interface:

.Creating a VLAN with ifconfig
====
        # ifconfig vlan0 create
        # ifconfig vlan0 vlan 6 vlanif pppoe0
====

The equivalent configuration can be stored in a file to be loaded
at boot-time:

.[filename]#`/etc/ifconfig.vlan0`#
[id="ex-ifconfig-vlan0"]
====
        create
        vlan 6 vlanif pppoe0
====

To ensure that [.varname]#`vlan0`# is created at
the appropriate time, refer to <<chap-net-practice-order>>.

The HTML document you’re currently reading was written in and generated by asciidoc.

roff_mdoc

roff_mdoc is another format that provides an equivalent amount of semantic information, and most NetBSD developers are familiar with it already. It’s a very old, and very simple format, but a very powerful one. It’s the traditional format for manual pages on BSD systems.

Note
Another format more commonly used for man pages on non-BSD systems, roff man, provides less semantic information, and is what Markdown-to-man converters typically generate. Subjectively, I also think its rendering is less pretty.

Most people associate manual pages with command line utilities, but they are suited just as well to miscellaneous documentation (typically found in man section 7, or section 5 for configuration documentation), and can be split up into as many files as you’d like. mdoc supports features that suit it well for conversion to HTML, such as citations and links. With mandoc, it can easily be converted to HTML, PDF, PostScript, roff man, Markdown, and of course, plain text.

While mdoc is intended for manual pages, there are other roff macro packages more suited towards e.g. books. Many technical books were written in roff, especially books related to Unix. NetBSD includes /usr/share/doc/usd/vi/vitut.txt, a tutorial on using vi(1) written using the "ms" macro package, and the HTML "installation notes" included in releases are also roff-generated.

I think mdoc is a good format for programming library documentation. Although man pages are unpopular outside of the C and Perl communities, other languages fit just as well - it’s just up to you to decide how to split up and organize your pages. NetBSD includes documentation for Lua libraries in mdoc format.

One problem is that mdoc is less "natural" than asciidoc or Markdown - it is based on abbreviated macros. I usually have the reference manual page open when writing it, which is easy enough to find things in, and I verify the end result with mandoc’s lint feature.

Other than this, the actual syntax is very minimal:

Example 4. The example from the NetBSD website rewritten in mdoc
.Ss Configuring a VLAN
.Pp
A typical PPPoE connection requires a VLAN ID to be set on the
external interface.
.Pp
On
.Nx
this is accomplished by
creating a
.Xr vlan 4
interface:
.Bd -literal -offset indent -unfilled
# ifconfig vlan0 create
# ifconfig vlan0 vlan 6 vlanif pppoe0
.Ed
.Pp
The equivalent configuration can be stored in a file to be loaded
at boot-time,
.Pa /etc/ifconfig.vlan0 :
.Bd -literal -offset indent -unfilled
create
vlan 6 vlanif pppoe0
.Ed
.Pp
To ensure that
.Dv vlan0
is created at the appropriate time, refer to
.Qq Sx Ensuring interface creation order
.Ed

I kinda wish I’d written this post in roff ;)

other formats

Other formats suited to technical documentation exist, such as reStructured text - another Markdown-like format with many more features useful for technical documentation and document generation. I would consider these, but for the special case of converting the NetBSD website, it is useful to have (a) an easy transition from docbook-xsl, and (b) familiarity to existing developers.

It’s regetable that while there are formats with very similar syntax to Markdown that are more suited to technical documentation, Markdown is often the only option that gets considered.