Literate Programming 7

Prev:WEB 6 Top:WEB 0 Next:WEB 8

WEB 7 - Quality of Life Improvements

Jon Breuer - September 10, 2024.

Table of Contents:

Existing features

New Features:

Desired features

I used WEB4 to write documentation for my Joy interpreter, and I hit a few rough points I'd like to improve.

Including previous code

Of this wishlist, including web4's source is the priority. I'll have to hack my changes into it to get started.

In asdf, I'll insert a new handler for @# which will be my include/pragma/define tag. My extensions will lie under that.

Existing parse_web_then_tangle_and_weave

void parse_web_then_tangle_and_weave(...)
{
...
    while(charIndex < fileContents.length) {
        dchar ch = fileContents[charIndex];
        
        if(ch == '@') { 
            if(chNext == '@') {
                // Escaped At
            } else if(chNext == 'p') {
                // Program Start
            } else if(chNext == '>') {
                // End Code Section
            } else if(chNext == '<') {
                // Start Code Section
            } else if(chNext == '*') {
                // Headers
            } else if (chNext == '#') {
                // A pragma section will instruct the parser.
                Process pragma section
  
            } else {
                // Anything else is documentation.
            }

        } else {
            // Anything else is documentation.
        }
    }
    ...
}

I'm imagining the pragmas to look like @ #include @ > or @ #define new type @ >. Read the entire pragma command header, then determine what to do with it.

Process pragma section

Read pragma command
Process include pragma

The existing slurp_section function is already built to read the whole section. I'll use that.

Read pragma command

SBlock[] pragmaBlocks = slurp_section(fileContents, charIndex, lineNumber, inputFilename, false, ESectionType.IDENTIFIER);
assert(pragmaBlocks.length == 1);
string command = pragmaBlocks[0].content;

Include is the important one.

Process include pragma

if(command.startsWith("include ")) {
    Extract filename from include command
    Read included file contents
    Parse included file
    Insert a comment in the documentation
    Insert code from included file.
}

I'll treat the rest of the identifier as the filename. no quotes to avoid.

Extract filename from include command

string includeFilename = command[8..$];

This is the same command I use in main() to read the primary input file.

Read included file contents

string includeFileContents; 
try {
    includeFileContents = cast(string) std.file.read(includeFilename);
} catch(Exception err) {
    writefln("Error: %s", err.msg);
    return [];
}

Use the existing parse_web function.

Parse included file

string parsedIncludeDisplay= "";
string parsedIncludeCode= "";
SSection[] includedSections = parse_web( parsedIncludeDisplay, parsedIncludeCode, includeFileContents, includeFilename);

I'm torn, but I don't want all the discussion from the previous file. The writer should include a hyperlink back to the previous document, but since I'm parsing the source, I don't know what the published name will be.

Insert a comment in the documentation

fileSections ~= SSection("Included " ~ includeFilename, ESectionType.PARAGRAPH, [SBlock(ESectionType.PARAGRAPH, lineNumber, "Included " ~ includeFilename)]);

The whole point. Insert the code here.

Insert code from included file.

//writefln("%d sections parsed from %s", includedSections.length, includeFilename);
//writefln("%s", includedSections[0..15]);
//SSection[] filteredSections = //std.array.array(includedSections.filter!(section=>section.type == ESectionType.CODE));
//writefln("%d filtered sections.", includedSections.length);
//fileSections ~= filteredSections;

auto filtered = includedSections.filter!(section=>section.type == ESectionType.CODE);
foreach(SSection section; filtered) {
    fileSections ~= section;
}

Skip code included from literate_programming_4_source

Library Imports...+

import std.array;

Comments to test ...+

// This is new code for Web7.

Huh. I printed the internal representation and the types aren't what I expect.

    SSection("", PARAGRAPH, [
        SBlock(PARAGRAPH, 18, "

"), SBlock(CODE, 16, "I really want to redefine existing sections and dynamically add content into group sections.\r\n\r\n")])

Escaping @s

Did some debugging. The problem isn't in the parser continuing to read the second at as a command, but in leaving stray angle brackets <content> in an HTML file. Because while my code is escaped, my content isn't. Hmm. This is a deeper design issue. I want @<Token@> to render and <b> to bold.

The program is doing the right thing. Can I add a warning message that I might be doing the wrong thing? @@<tag> won't render correctly because HTML thinks it's a tag. < hello there>

Existing parse_web

SSection[] parse_web(ref string outputDisplayContents, ref string outputCodeContents, string fileContents, string inputFilename)
{    
...
    while(charIndex < fileContents.length) {
        dchar ch = fileContents[charIndex];
        if(ch == '@') { 
            dchar chNext = charIndex < fileContents.length - 1 ? fileContents[charIndex + 1] : 0;
            if(chNext == '@') {
                // It's just an escaped at.  Continue parsing.

Let's insert an extra check for raw less than that may be interpreted as HTML tags.

parse_web warn about stray HTML tags

if(charIndex < fileContents.length - 1 && fileContents[charIndex] == '<' && fileContents[charIndex + 1] != ' ') {
    // I think the writer wrote at, at, lessthan to try and escape the at, but the lessthan will be problematic in HTML.
    writefln("Warning: Using %s%stag in an HTML file may have unintended effects.  Near line %d in %s.", CHAR_AT, '<', lineNumber, inputFilename);
}

And we need the same fix inside slurp_section.

slurp_section warn about stray HTML tags

if(index < contents.length - 1 && contents[index+1] == '<' && contents[index + 2] != ' ') {
    // I think the writer wrote at, at, lessthan to try and escape the at, but the lessthan will be problematic in HTML.
    writefln("Warning: Using %s%stag in an HTML file may have unintended effects.  Near line %d in %s.", CHAR_AT, '<', lineNumber, inputFilename);
}

Let's test this : @<an error@>? - It works.

Warning: Using @<tag in an HTML file may have unintended effects.  Near line 207

Custom Formatters

This implementation can do bold text, italic text, and bold italic terms. I need an inline code style and maybe another. I want to define a style and be able to apply it.

@#define A=<b><i>content</i></b>
...
@ This is a paragraph with @Asome styling@>.

would yield : This is a paragraph with some styling.

In New Display Work in parse_web_then_tangle_and_weave is this:

Existing code

    foreach(SSection section; fileSections) {
        if(section.type != ESectionType.CODE) {
            string paragraphReducer(string output, SBlock block) {
                if(block.type == ESectionType.INDEX_TERM) {
                    return output ~ "<b id='"~section.name~block.content~"'><i>" ~ block.content.strip("|") ~ "</i></b>";

                } else if(block.type == ESectionType.BOLD) {
                    return output ~ "<b>" ~ block.content ~ "</b>";
                } else if(block.type == ESectionType.PRE) {
                    return output ~ "<i>" ~ block.content ~ "</i>";

I'll hack WEB4 so I can improve those tags.

Before BOLD and PRE were separate types, I'm going to fold them into just CUSTOM.

Section Type Formats...!

    CUSTOM,

The SBlock type now can have the type be CUSTOM and the exact formatter specified here.

Custom Type Format

string customFormat;

SFormat is a new type which will define the tags.

Definitions...+

struct SFormat{
    string pre;
    string post;
};
Global Variables

Global Variables

SFormat[string] Custom_formats;

To preserve backward compatibility, bold and italic are defined as custom formatters.

Initialize custom formats

Custom_formats["^"]=SFormat("<b>", "</b>");
Custom_formats["."]=SFormat("<i>", "</i>");

The existing Detect a new text style has hard-coded values:

Existing code

    } else if(contents[index + 1] == '^' || contents[index + 1] == '.') {
        ESectionType styleType = contents[index + 1] == '^' ? ESectionType.BOLD : ESectionType.PRE;

and our new version will detect a registered formatter and use that.

Detect a new text style...!

    } else if("" ~ contents[index + 1] in Custom_formats) {
        ESectionType styleType = ESectionType.CUSTOM;
        string customFormat = "" ~ contents[index + 1];

And note the format on the block of text.

insert the style block into the section...!

    results ~= SBlock(styleType, lineNumber, textBlock, customFormat);

When styling the text, look up the formatter and apply it.

style display text...!

} else if(block.type == ESectionType.CUSTOM && block.customFormat in Custom_formats) {
    SFormat format = Custom_formats[block.customFormat];
    return output ~ format.pre ~ block.content ~ format.post;

Let's check that the existing formats are working and hint that the future one works too.

Let's implement define.

Process pragma section...+

Process define pragma

Process define pragma

else if(command.startsWith("define ")) {
    split define into identifier and format
    split define format into pre and post
    register new define format
}

split define into identifier and format

    string defineExpression = command[7..$];
    auto splitOnEqual = defineExpression.findSplit("=");
    string defineIdentifier = splitOnEqual[0];

split define format into pre and post

    auto splitAroundContent = splitOnEqual[2].findSplit("...");
    string pre = splitAroundContent[0];
    string post = splitAroundContent[2];

register new define format

    SFormat format = {pre, post};
    Custom_formats[defineIdentifier] = format;

Hooray! I can now define a code style and short code snippets or identifiers will be formatted! Does this work? formatting with @<example WEB tags@> in them. Yes!

Index:

Prev:WEB 6 Top:WEB 0 Next:WEB 8