Literate Programming 4

Prev:WEB 3 Top:WEB 0 Next:WEB 5

WEB 4 - Indexes and Tables of Content

Jon Breuer - September 10, 2024.

I've been looking forward to a parser that can tangle code samples out of order. On the one hand it will let me talk about the interesting things first and on the other hand it could let me define a function, discuss it, then define it a second time and have the compiler pick up the updated definition.

Now that the parser can handle more tags, I'll be writing this file in WEB as much as possible with the HTML stuff kept to a minimum.

I really want to redefine existing sections and dynamically add content into group sections.

This is the source, and this is the output.

Existing code

Skip Existing Code

Polishing the display

Here's where I insert the new index/table of contents work. Each header includes a hyperlink target and the code blocks are marked for show/hide.

New Display Work in parse_web_then_tangle_and_weave

    foreach(SSection section; fileSections) {
        insertTableOfContents
        insertIndex
        Better Headers

        if(section.type != ESectionType.CODE) {
            string paragraphReducer(string output, SBlock block) {
                if(block.type == ESectionType.INDEX_TERM) {
                    return output ~ "<b id='"~section.name~block.content~"'><i>" ~ block.content.strip("|") ~ "</i></b>";
                } else if(block.type == ESectionType.BOLD) {
                    return output ~ "<b>" ~ block.content ~ "</b>";
                } else if(block.type == ESectionType.PRE) {
                    return output ~ "<i>" ~ block.content ~ "</i>";
                } else {
                    return output ~ block.content;
                }
            }
            string content = reduce!paragraphReducer("", section.contents);
            outputDisplayContents ~= content;
        
        } else {
            Better Code Blocks
            
            foreach(block; section.contents) {
                string outputContent = formatCodeForDisplay(block.content, block.lineNumber);
                if(block.type == ESectionType.IDENTIFIER) {
                    outputDisplayContents ~= "<b><i>" ~ outputContent ~ "</i></b>";
                } else {
                    outputDisplayContents ~= outputContent;
                }
            }
            
            outputDisplayContents ~= "</pre>";
        }
    }
    

Indexes require some kind of hyperlink anchor for the links to link back to. Headers and code sections are thus named for linking.

Better Headers

    if(section.type == ESectionType.HEADER) {
        outputDisplayContents ~= "<h3 id=\"" ~ section.name ~ "\">" ~ section.name ~"</h3>" ~ "<p>";
    } else if(section.type == ESectionType.CODE) {
        outputDisplayContents ~= "<p id=\"" ~ section.name ~ "\"><b>" ~ section.name ~"</b>";
    }

This inserts a little javascript button to hide and show each code block.

Better Code Blocks

    static if(false) {
        outputDisplayContents ~= format(" <button onclick=\"toggle_element_hidden('%s_code')\">Show/Hide Code</button>", section.name);
    }
    outputDisplayContents ~= "<pre id=\"" ~ section.name ~ "_code" ~ "\">";

I want to highlight the existing blocks where I made a change.

I want to generate something different than HTML and D.

Code Scanner Bug

Bug! I've noticed a bug where code segments have to be separated by non-code sections. Here's the fix in the slurp section function.

slurp identifier subsection

                slurp: read identifier
                slurp: check for new definition
                slurp: insert subsection

Save the scanner index before reading the identifier.

slurp: read identifier

                int preIdentifierIndex = index;
                index += 2;
                SBlock[] identifierBlocks = slurp_section(contents, index, lineNumber, false, sectionType);
                assert(identifierBlocks.length == 1);
                string identifier = identifierBlocks[0].content;

An identifier inserted in a block will look like @> and the definition of a new identifier will look like @>=. If the last code section is ending because of the start of a new one, revert the identifier and allow the new section to start reading.

slurp: check for new definition

                if(contents[index..$].startsWith("@>=")) {
                    // The end of one block has bumped into the start of another. Roll back.
                    index = preIdentifierIndex;
                    break;
                }
                

Now that we're sure we're still in the old section, add the new identifier.

slurp: insert subsection

                // Now that we're sure this is a reference to an identifier and not a definition of a new identifier, continue.
                results ~= SBlock(ESectionType.IDENTIFIER, lineNumber, identifier);
                

Generating a Table of Contents

We're going to start generating a table of contents. I think WEB uses |special word| to generate a seperate index. I've inserted a special token __table_of_contents__ to control where the TOC gets generated. Header and code sections both have titles, so I can insert them in the TOC. Headers define major sections, so I indent sub-sections below them. Luckily HTML will convert an empty list <ul></ul> into no space at all, so I can start inside an empty header and then the first header will bump us out. (Saves me tracking the start of the first header.)

insertTableOfContents

        if(section.name == "__table_of_contents__") {
            outputDisplayContents ~= "<h3>Table of Contents:</h3>";
            outputDisplayContents ~= "<ul><ul>";
            foreach(SSection tocSection; fileSections) {
                if(tocSection.name.startsWith("__")) {
                    continue;
                }
                if(tocSection.type == ESectionType.HEADER) {
                    outputDisplayContents ~= "</ul><li><b><a href=\"#"~tocSection.name~"\">" ~ tocSection.name ~"</a></b><ul>";
                } else if(tocSection.type == ESectionType.CODE) {
                    outputDisplayContents ~= "<li><a href=\"#"~tocSection.name~"\">" ~ tocSection.name ~"</a>";
                }
            }
            outputDisplayContents ~= "</ul></ul>";
            continue;
        }
        

Table of Contents:

Generating an Index

An index is the same thing as a TOC except the list is alphabetic. (Version 1 has duplicates here from both the definition and references.)

insertIndex

        if(section.name == "__index__") {
            Print the index header
            Scan sections for index targets
            Sort index alphabetically
            Print index entries
        
            outputDisplayContents ~= "</ul>";
            continue;
        }

 

Print the index header

    outputDisplayContents ~= "<h3>Index:</h3>";
    outputDisplayContents ~= "<ul>";

Like the Table of Contents, we're linking to Header and Code sections. I've added tagging for Index Terms so they get added to the index as well.

Scan sections for index targets

            string[] references;
            foreach(SSection indexSection; fileSections) {
                if(indexSection.name.startsWith("__")) {
                    continue;
                }
                if(indexSection.type == ESectionType.HEADER || indexSection.type == ESectionType.CODE) {
                    references ~= indexSection.name~"@"~indexSection.name;
                }
                foreach(SBlock block; indexSection.contents) {
                    if(block.type == ESectionType.IDENTIFIER ) {
                        references ~= block.content ~"@"~indexSection.name;
                    }
                    if(block.type == ESectionType.INDEX_TERM) {
                        references ~= block.content.strip("|") ~"@"~indexSection.name~block.content;
                    }
                }
            }
          

Sort index alphabetically

            import std.algorithm.mutation : SwapStrategy;
            auto sortedReferences = sort!("a.toUpper < b.toUpper", SwapStrategy.stable) (references);
      

Print index entries

            string lastReference = "";
            int referenceCount = 0;
            foreach(string reference; sortedReferences) {
                Print a title for each index entry
                Count the links to each usage
            }

I cached the references as "block @ section" for ease in storage and sorting. Parse them out. Then check if we've hit a new term to start a new index entry versus adding numbered subscripts.

Print a title for each index entry

                string[] components = reference.split("@");
                if(components[0] != lastReference) {
                    lastReference = components[0];
                    outputDisplayContents ~= "<li><b>" ~ components[0] ~ ":" ~ "</b>";
                    referenceCount = 0;
                }
                
                outputDisplayContents ~= " <a href=\"#" ~ components[1] ~ "\">" ;
 

Index:

Explicit Main Sections

The @p tag is renamed __main__. Make sure there's exactly one.

New support for main sections

    SSection[] mainSection = find_matching_identifiers(fileSections, "__main__");
    if(mainSection.length != 1) {
        writefln("ERROR: Exactly 1 __main__ section needed.  %d found.", mainSection.length);
        foreach(section; mainSection) {
            writefln("%s found at line %d", section.name, section.contents[0].lineNumber);
        }
        return;
    }
    if(mainSection[0].type != ESectionType.CODE) {
        writefln("ERROR: __main__ section needs to be code.");
        return;
    }

Better Section Management

I've wanted to append sections like includes. I don't want to call out each include, I want to add includes through the demonstration and have them accumulate at the top. For tutorial purposes, I'd like to define a section, then expand and replace it. Knuth's original WEB supports truncated identifiers. The reference may be a full sentence and the definition is truncated.

Find Matching Identifiers

SSection[] find_matching_identifiers(SSection[] sections, string identifier) {
    SSection[] results;
    
    foreach(section; sections) {
        // A section might be name... or name...! or name...+. Check each form.
        if(section.name.endsWith("...")) {
            if(!identifier.startsWith(section.name[0..$-3])) {
                continue;
            }
        } else if(section.name.endsWith("...!") || section.name.endsWith("...+")) {
            if(!identifier.startsWith(section.name[0..$-4])) {
                continue;
            }
        } else if(section.name != identifier) {
            continue;
        }
        
        // We've found a name that matches.
        if(section.name.endsWith("...+")) {
            // Append tag.
        } else if(section.name.endsWith("...!")) {
            // Replace tag.
            results = [];
        } else if(!results.empty) {
            writefln("WARNING: Multiple matches for '%s' found.  Use ...+ to append or ...! to replace.", identifier);
        }
        results ~= section;
    }
    return results;
}

Comments to test appending and replacing

// There should be 2 appended comments, 1 replaced comment, and 1 partial comment. This message will repeat.
    Test appending
    Test replacing
// There should be 2 appended comments, 1 replaced comment, and 1 partial comment. This message is the repetition.

Test appending

// This is the original named appending

Test appending...+

// This is the first appended comment

Test appending...+

// This is the second appended comment

Test replacing

// This is the wrong replaced comment

Test replacing...!

// This is an unseen replaced comment

Test replacing...!

    // This is the correct replaced comment
    Test partial names

Test partial ...

// Partial names appended

The result is something like this:

Results:

    // There should be 2 appended comments, 1 replaced comment, and 1 partial comment. This message will repeat.
    // This is the original named appending
    // This is the first appended comment
    // This is the second appended comment
    // This is the correct replaced comment
    // Partial names appended
    // There should be 2 appended comments, 1 replaced comment, and 1 partial comment. This message is the repetition.

Custom Index Terms

I'm tired of version 1 of each parser choking on a tag and me having to escape it either temporarily or permanently. Define tokens here.

New character definitions

    const dchar CHAR_PIPE = '|';
    const dchar CHAR_AT = '@';
    const dchar CHAR_NEWLINE = '\n';

Handle embedded index points

    else if(contents[index] == CHAR_PIPE && sectionType != ESectionType.CODE) { 
        if(contents[index + 1] == CHAR_PIPE) {
            currentBlock ~= contents[index];
            // Skip the escaped pipe symbol.
            index++;
        } else {
            // Save the previous block.
            results ~= SBlock(sectionType, startLineNumber, currentBlock);
            currentBlock = "";
            startLineNumber = lineNumber;
            
            currentBlock ~= contents[index];
            for(index++; index < contents.length; index++) {
                currentBlock ~= contents[index];
                if(contents[index] == CHAR_NEWLINE) {
                    lineNumber++;
                }
                if(contents[index] == CHAR_PIPE) {
                    if(index < contents.length - 1 && contents[index + 1] == CHAR_PIPE) {
                        // Skip the escaped pipe symbol.
                        index++;
                    } else {
                        break;
                    }
                } else if(contents[index] == CHAR_AT) {
                    if(index < contents.length - 1 && contents[index + 1] == CHAR_AT) {
                        // Skip the escaped at symbol.
                        index++;
                    } else {
                        writefln("WARNING: At symbol encountered before end of piped index term.");
                        break;
                    }
                }
            }
            if(index >= contents.length || contents[index] != CHAR_PIPE) {
                writefln("WARNING: Close pipe expected near line %d.", startLineNumber);
                break;
            }
            
            // Save the indexed identifier.
            results ~= SBlock(ESectionType.INDEX_TERM, startLineNumber, currentBlock);
            currentBlock = "";
            startLineNumber = lineNumber;
        }
    }

Styled Text

'@ ' and '@^' start certain font styles and '@>' will end them. This code is inserted into slurp_section.

I am not sure how this will work with WEB3. A thing to test.

This is the next unstyled paragraph.

slurp styled section

        } else if(contents[index + 1] == '^' || contents[index + 1] == '.') {
            ESectionType styleType = contents[index + 1] == '^' ? ESectionType.BOLD : ESectionType.PRE;
            
            Save off the previous block of text
            parse the styled section
            skip the close tag around the style section
            insert the style block into the section
    

parse the styled section

            SBlock[] textBlocks = slurp_section(contents, index, lineNumber, false, sectionType);
            assert(textBlocks.length == 1);
            string textBlock = textBlocks[0].content;
                

skip the close tag around the style section

            if(contents[index..$].startsWith("@>")) {
                index += 1;
            } else {
                writefln("Identifier '%s' invoked without close tag. at %s", textBlock, contents[index..min($, index + 10)]);
                break;
            }

An identifier inserted in a block will look like @> and the definition of a new identifier will look like @>=. If the last code section is ending because of the start of a new one, revert the identifier and allow the new section to start reading.

Now that we're sure we're still in the old section, add the new identifier.

insert the style block into the section

                results ~= SBlock(styleType, lineNumber, textBlock);
   
   

Save off the previous block of text

                results ~= SBlock(sectionType, startLineNumber, currentBlock);
                currentBlock = "";
                startLineNumber = lineNumber;
                index += 2;
        

Prev:WEB 3 Top:WEB 0 Next:WEB 5