Literate Programming 3

Prev:WEB 2 Top:WEB 0 Next:WEB 4

WEB 3 - Generating HTML

Jon Breuer - September 8, 2024.

Now I'd like to write parts of this program out of order. Or I want them in conceptual order instead in compiling order. This will require naming sections and then slurping multiple named sections into a group.

Also, now that I can clean up the code for the display html, the source is here (literate_programming_3_source.html) and this is the generated file. @*Existing code. @

I've cut out the file processor and moved it into a new function we'll write after the discussion. @p string outputDisplayContents = ""; string outputCodeContents = ""; parse_web_then_tangle_and_weave( outputDisplayContents, outputCodeContents, fileContents, inputFilename); @>

Existing code:

There were also two places I found where a comment without a newline, taking up the entire string would cause an infinite parse loop.

int lineLength = countUntil(scanner, "\n");
if(lineLength < 0) {
    lineLength = scanner.length;
}
@*Next Steps. I've been re-reading (http://www.literateprogramming.com/lpsimp.pdf) and that's a modified LP system called NoWeb. I reread (http://www.literateprogramming.com/knuthweb.pdf) and noticed differences between what I'm attempting and what Knuth accomplished.

I have been treating *.WEB files as TeX files with added WEB markup, that Knuth is writing primarily in TeX with the examples being pulled. That's what my files are - HTML with code insertions. This isn't quite accurate. Every section of a *.WEB file is a WEB command, and the display sections support embedded display language constructs.

Codes I don't support.

WEB also supports a fair amount of recursive embedding which I don't yet.

My work thus far has been taking WEB0 and using it to compile both literate_programming_0.html and literate_programming_1.html, then continuing that process with WEB1. I think I need to create a side example file that has the features I want WEB3 to support.

Snippet from examples.

    @p // Start my program.
    @c
    @<Header files to include@>@/
    @<Global variables@>@/
    @<Functions@>@/
    @<The main program@>
    
    @ Any other section to break the first definitions.

I don't know what @/ means.

@@<The main...@@>=
main (argc,argv)
    int argc; /* the number of arguments on the \UNIX/ command line */
    char **argv; /* the arguments themselves, an array of strings */
{
  @@<Variables local to |main|@@>@@;
  prog_name=argv[0];
  @@<Set up option selection@@>;
  writefln("asdf");
  
  @@<Process all the files@@>;

  writefln("sdfg");

  @@<Print the grand totals if there were multiple files @@>;
  exit(status);
}
@ Inferences: @ means insert the semicolon into C/Pascal, but don't render it in the display file.

I think I can treat @<blah@> as a reference to an identifier and @<blah@>=content@ as the definition. My tool thus far has been a glorified search-and-replace, but nested out of order identifiers will require more careful work.

My test snippet: (I'll copy and paste this out.) literate_programming_3_test.web Skip to the results @*Advanced Parsing. And now, let's rebuild the parser. We're going to need to track the different sections that we're assembling. * A section is either a display section or a code section. A display section should have 1 block of text. A code section might have many. @p struct SSection { string name; bool isCode; SBlock[] contents; }; struct SBlock { bool isIdentifier; string content; }; @>

This is the parsing function. @p void parse_web_then_tangle_and_weave(ref string outputDisplayContents, ref string outputCodeContents, string fileContents, string inputFilename) { SSection[] fileSections; @>

We will scan through every character, @@ control characters determine which section a block of text becomes. @p int charIndex = 0; while(charIndex < fileContents.length) { dchar ch = fileContents[charIndex]; if(ch == '@@') { dchar chNext = charIndex < fileContents.length - 1 ? fileContents[charIndex + 1] : 0; @>

Every @@ character is a pair and we'll skip both of them every time. No extra work is needed for an escaped @@@@ sequence. @p charIndex += 2; if(chNext == '@@') { @>

Program blocks get turned into code. @p } else if(chNext == 'p') { fileSections ~= SSection("__main__", true, slurp_section(fileContents, charIndex, true)); @>

Close blocks don't need extra processing. @p } else if(chNext == '>') { @>

Definition blocks. @p } else if(chNext == '<') { SBlock[] identifierBlocks = slurp_section(fileContents, charIndex, false); assert(identifierBlocks.length == 1); string identifier = identifierBlocks[0].content; SBlock[] sectionContents; if(fileContents[charIndex..charIndex + 3] == "@@>=") { charIndex += 3; sectionContents = slurp_section(fileContents, charIndex, true); } else { writefln("Identifier '%s' invoked outside program and not a definition.", identifier); } fileSections ~= SSection(identifier, true, sectionContents); @>

Display blocks with titles. @p } else if(chNext == '*') { writefln("title section"); int titleEndingPeriod = countFromPosUntil(fileContents, charIndex, "."); string title = ""; if(titleEndingPeriod > 0) { title = fileContents[charIndex..titleEndingPeriod]; charIndex = titleEndingPeriod + 1; } fileSections ~= SSection(title, false, slurp_section(fileContents, charIndex, false)); @>

Unknown blocks get turned into display. @p } else { writefln("paragraph section"); // '@@ ' will be converted into a section. fileSections ~= SSection("", false, slurp_section(fileContents, charIndex, false)); } @>

Raw text without a starting @@ isn't legal WEB, but I default it to display text. @p } else { writefln("raw display section"); fileSections ~= SSection("", false, slurp_section(fileContents, charIndex, false)); } } int lineNumber = 0; //TODO:// Move this up for debugging. @>

After parsing out the sections, format them for output.

Because this file is HTML with the occasional bit of WEB commands and thus far I've been using @@p as the "any code" marker, @@p gets a header inserted after every snippet. Add a hack to remove duplicates. @p bool isSectionNamedMain(SSection section) { return section.name == "__main__"; } int numMainSections = count!isSectionNamedMain(fileSections); bool hideMainHeaders = numMainSections > 1; writefln("There are %d main sections.", numMainSections); @>

Start formatting. @p foreach(SSection section; fileSections) { @>

Display sections get stuck into the display output. @p if(!section.name.empty) { // A hack for this version. if(hideMainHeaders && section.name == "__main__") { } else { outputDisplayContents ~= "

" ~ section.name ~"

" ~ "

"; } } if(section.isCode == false) { string content = reduce!("a ~ b.content")("", section.contents); outputDisplayContents ~= content; lineNumber += count(content, '\n'); } else { @>

Code sections get formatted into the display and stuck into the code. @p outputDisplayContents ~= "

";
            
            foreach(block; section.contents) {
                string outputContent = formatCodeForDisplay(block.content, lineNumber);
                if(block.isIdentifier) {
                    outputDisplayContents ~= "" ~ outputContent ~ "";
                } else {
                    outputDisplayContents ~= outputContent;
                }
            }
            
            
            outputDisplayContents ~= "
"; } } foreach(section; fileSections) { if(section.isCode && section.name == "__main__") { foreach(block; section.contents) { if(block.isIdentifier) { outputCodeContents ~= expand_code_identifier(fileSections, block.content); } else { outputCodeContents ~= block.content; } } } } } @> @*The SectionReader.A section is terminated by the start of a new section, so scan for @@. This function will get more complicated soon. @p SBlock[] slurp_section(string contents, ref int offset, bool recurse) { SBlock[] results; string currentBlock = ""; int index = offset; for(; index < contents.length; index++) { if(contents[index] == '@@') { if(recurse && contents[index + 1] == '<') { results ~= SBlock(false, currentBlock); currentBlock = ""; int preIdentifierIndex = index; index += 2; SBlock[] identifierBlocks = slurp_section(contents, index, false); assert(identifierBlocks.length == 1); string identifier = identifierBlocks[0].content; if(contents[index..$].startsWith("@@>=")) { // The end of one block has bumped into the start of another. Roll back. index = preIdentifierIndex; break; } // Now that we're sure this is a reference to an identifier and not a definition of a new identifier, continue. results ~= SBlock(true, identifier); if(contents[index..$].startsWith("@@>")) { index += 2; } else { writefln("Identifier '%s' invoked without close tag. at %s", identifier, contents[index..min($, index + 10)]); break; } } else if(contents[index + 1] != '@@') { break; } else { currentBlock ~= contents[index]; // Skip the escaped at symbol. index++; } } else { currentBlock ~= contents[index]; } } results ~= SBlock(false, currentBlock); offset = index; return results; } @> @*Expand Code. Recursively expand an identifier into its definitions. @p string expand_code_identifier(SSection[] sections, string identifier) { string output; output ~= "/* from "~identifier~" */"; bool identifiersMatch(SSection section) { return section.name == identifier; } SSection[] definition = find!identifiersMatch(sections); if(definition.empty) { writefln("Unable to find identifier '%s'.", identifier); return format("** %s is undefined **", identifier); } foreach(block; definition[0].contents) { // TODO:// Fix This: outputCodeContents ~= format("#line %d \"%s\"", lineNumber, inputFilename); if(block.isIdentifier) { output ~= expand_code_identifier(sections, block.content); } else { output ~= block.content; } } return output; } @> @*Results.

Results: Back to the test code

The webpage documentation:

The D source code:

Prev:WEB 2 Top:WEB 0 Next:WEB 4