Literate Programming

Jon Breuer - September 7, 2024.

I've been fascinated with literate programming since I first read about it.

Donald Knuth created this programming paradigm to help him write his landmark work "The Art of Computer Programming". (http://www.literateprogramming.com/knuthweb.pdf) (http://www.literateprogramming.com/lpsimp.pdf) Most computer program code is written first for the compiler to generate a desired program and documentation consists of comments added on that which the compiler can ignore at its leisure. Literate Programming inverts this so that the book or the article or the blog post is the primary product and the program is a side effect from compiling the embedded examples. This works great for Knuth as he's trying to write his book and being able to compile the examples out from the text proves that the examples are still correct.

I've tried to use WEB and CWEB, but the workflow for those is to take *.web files and generate Tex/LaTeX files which I don't know how to read or write. I tried installing some converters, but after fighting for several days, I could generate a PDF file, but not the HTML web page I really wanted. I forget now - maybe I did get an HTML file out of it once, but it was a big hassle.

So, I want to experiment with literate programming and I, a programmer, see my personal need for a WEB program to use in that pursuit. It's a match made in digital heaven.

I needed to decide what the source language for this article should be. I strongly considered making the source language *.md Markdown with translation into HTML, but I don't really know Markdown and that would involve me finding/building a second tool to effect the translation.

The common GNU CWEB is specialized for C programs and LaTeX. I jump around between programming languages, so I want my tool to be as language agnostic as possible.

So, I'm writing this in HTML. The original literate programming consisted of TANGLE and WEAVE. I forget which is which, but one utility takes a *.web file and pretties the code examples a little, (maybe?) generates a table of contents and an index, and weaves(?) out TeX printer layout files. The other utility takes that same *.web file and tangles(?) the disparate code examples into functional C code or other target language. The later tool CWEB does both steps at the same time.

Bootstrapping WEB 0

An obvious problem. I am writing this post, but I don't yet have a WEB program that can convert it into code. This will require some bootstrapping.

For WEB0.exe, copy and paste the following code block and compile it:

// __main__
////////////
// WEB0.D
//
// This is a level 0 bootstrapping Literate Programming thing.  
// It will snip this code sample out of the document and generate the target file.
// This code is in Digital Mars D 2.0, but the eventual WEB program is intended
// to be as language agnostic as possible.
//
module web0;

private import std.algorithm;
private import std.file;
private import std.stdio;
private import std.string;

void main(string[] args)
{
    if(args.length < 3) {
        writefln("Usage: WEB0 inputFile outputCodeFile");
    }
    
    string fileContents = cast(string) std.file.read(args[1]);
    if(fileContents.length == 0) {
        writefln("Unable to read file '%s'.", args[1]);
        return;
    }
    
    // Generate these strings so they don't appear in the source and won't be matched by the eventual WEB0.
    const string startTag = 
 ~ "p";
    const string endTag = "
 ~ ">";
    
    const int blockStartIndex = countUntil(fileContents, startTag);
    const int blockEndIndex = countUntil(fileContents, endTag);
    if(blockStartIndex < 0 | blockEndIndex < blockStartIndex) {
        writefln("Unable to find code start '%s' or end '%s' tags.", startTag, endTag);
    }
    
    string outputFilename = args[2];
    std.file.write(outputFilename, fileContents[blockStartIndex + startTag.length .. blockEndIndex]);
}
//

In WEB0, the

__main__

and

start and end the code section. I've commented them out so if you copy and paste them, the code will still compile.

I am using version 2 of DMD from here:DMD 2 Downloads

After building WEB0, run it against this file. It should output a new copy of the D source file.

C:\literate> web0 literate_programming_0.html web0.d
C:\literate> dmd web0.d
C:\literate> web0 literate_programming_0.html web0.d

We can do better

Now I have a tool that can read this file, this actual *.HTML file you are reading, and use it to generate the source for that tool.

Things this tool can't do:

The next versions of WEB will be more complicated, so I'll need to parse every

__main__

 section, not just the first one. - <a href="literate_programming_1.html">WEB 1</a>
    <li>Generate &lt;pre&gt; sections from @p. - <a href="literate_programming_2.html">WEB 2</a>
    <li>The code needs to be prettified.  This won't happen for a while. (Wrong <a href="literate_programming_2.html">WEB 2</a>)
    <li>If we're parsing every

__main__

, then I'll need to support escaping @ to write this file. - <a href="literate_programming_3.html">WEB 3</a>
    <li>Weave sections out of order. - <a href="literate_programming_3.html">WEB 3</a>
    <li>Table of contents - <a href="literate_programming_4.html">WEB 4</a>
    <li>Index - <a href="literate_programming_4.html">WEB 4</a>
    <li>Progressively updated tutorial sources. - <a href="literate_programming_4.html">WEB 4</a>
    <li>Bolded terms
    <li>separate files.
    <li>Cleaner method of inserting display back inside code.
</ul>

<p>I'm running into an issue where the program I've bootstrapped so far can't handle the next version I wish I was writing until I write that new version.  But of course.  Splitting this article into multiple pages reduces the complexity of the evolving program.

<p>
<span><b>Top:</b><a href="literate_programming_0.html">WEB 0</a></span>
<span><b>Next:</b><a href="literate_programming_1.html">WEB 1 - Multiple sections</a></span>
</body>
</html>