Prev:WEB 2 Top:WEB 0 Next:WEB 4
Jon Breuer - September 8, 2024.
Now I'd like to write parts of this program out of order. Or I want them in conceptual order instead in compiling order. This will require naming sections and then slurping multiple named sections into a group.
Also, now that I can clean up the code for the display html, the source is here (literate_programming_3_source.html) and this is the generated file. @*Existing code. @
I've cut out the file processor and moved it into a new function we'll write after the discussion. @p string outputDisplayContents = ""; string outputCodeContents = ""; parse_web_then_tangle_and_weave( outputDisplayContents, outputCodeContents, fileContents, inputFilename); @>
Existing code:
There were also two places I found where a comment without a newline, taking up the entire string would cause an infinite parse loop.
int lineLength = countUntil(scanner, "\n"); if(lineLength < 0) { lineLength = scanner.length; }@*Next Steps. I've been re-reading (http://www.literateprogramming.com/lpsimp.pdf) and that's a modified LP system called NoWeb. I reread (http://www.literateprogramming.com/knuthweb.pdf) and noticed differences between what I'm attempting and what Knuth accomplished.
I have been treating *.WEB files as TeX files with added WEB markup, that Knuth is writing primarily in TeX with the examples being pulled. That's what my files are - HTML with code insertions. This isn't quite accurate. Every section of a *.WEB file is a WEB command, and the display sections support embedded display language constructs.
Codes I don't support.
WEB also supports a fair amount of recursive embedding which I don't yet.
My work thus far has been taking WEB0 and using it to compile both literate_programming_0.html and literate_programming_1.html, then continuing that process with WEB1. I think I need to create a side example file that has the features I want WEB3 to support.
Snippet from examples.
@p // Start my program. @c @<Header files to include@>@/ @<Global variables@>@/ @<Functions@>@/ @<The main program@> @ Any other section to break the first definitions.
I don't know what @/ means.
@@<The main...@@>= main (argc,argv) int argc; /* the number of arguments on the \UNIX/ command line */ char **argv; /* the arguments themselves, an array of strings */ { @@<Variables local to |main|@@>@@; prog_name=argv[0]; @@<Set up option selection@@>; writefln("asdf"); @@<Process all the files@@>; writefln("sdfg"); @@<Print the grand totals if there were multiple files @@>; exit(status); }@ Inferences: @ means insert the semicolon into C/Pascal, but don't render it in the display file.
I think I can treat @<blah@> as a reference to an identifier and @<blah@>=content@ as the definition. My tool thus far has been a glorified search-and-replace, but nested out of order identifiers will require more careful work.
My test snippet: (I'll copy and paste this out.) literate_programming_3_test.web Skip to the results @*Advanced Parsing. And now, let's rebuild the parser. We're going to need to track the different sections that we're assembling. * A section is either a display section or a code section. A display section should have 1 block of text. A code section might have many. @p struct SSection { string name; bool isCode; SBlock[] contents; }; struct SBlock { bool isIdentifier; string content; }; @>
This is the parsing function. @p void parse_web_then_tangle_and_weave(ref string outputDisplayContents, ref string outputCodeContents, string fileContents, string inputFilename) { SSection[] fileSections; @>
We will scan through every character, @@ control characters determine which section a block of text becomes. @p int charIndex = 0; while(charIndex < fileContents.length) { dchar ch = fileContents[charIndex]; if(ch == '@@') { dchar chNext = charIndex < fileContents.length - 1 ? fileContents[charIndex + 1] : 0; @>
Every @@ character is a pair and we'll skip both of them every time. No extra work is needed for an escaped @@@@ sequence. @p charIndex += 2; if(chNext == '@@') { @>
Program blocks get turned into code. @p } else if(chNext == 'p') { fileSections ~= SSection("__main__", true, slurp_section(fileContents, charIndex, true)); @>
Close blocks don't need extra processing. @p } else if(chNext == '>') { @>
Definition blocks. @p } else if(chNext == '<') { SBlock[] identifierBlocks = slurp_section(fileContents, charIndex, false); assert(identifierBlocks.length == 1); string identifier = identifierBlocks[0].content; SBlock[] sectionContents; if(fileContents[charIndex..charIndex + 3] == "@@>=") { charIndex += 3; sectionContents = slurp_section(fileContents, charIndex, true); } else { writefln("Identifier '%s' invoked outside program and not a definition.", identifier); } fileSections ~= SSection(identifier, true, sectionContents); @>
Display blocks with titles. @p } else if(chNext == '*') { writefln("title section"); int titleEndingPeriod = countFromPosUntil(fileContents, charIndex, "."); string title = ""; if(titleEndingPeriod > 0) { title = fileContents[charIndex..titleEndingPeriod]; charIndex = titleEndingPeriod + 1; } fileSections ~= SSection(title, false, slurp_section(fileContents, charIndex, false)); @>
Unknown blocks get turned into display. @p } else { writefln("paragraph section"); // '@@ ' will be converted into a section. fileSections ~= SSection("", false, slurp_section(fileContents, charIndex, false)); } @>
Raw text without a starting @@ isn't legal WEB, but I default it to display text. @p } else { writefln("raw display section"); fileSections ~= SSection("", false, slurp_section(fileContents, charIndex, false)); } } int lineNumber = 0; //TODO:// Move this up for debugging. @>
After parsing out the sections, format them for output.
Because this file is HTML with the occasional bit of WEB commands and thus far I've been using @@p as the "any code" marker, @@p gets a header inserted after every snippet. Add a hack to remove duplicates. @p bool isSectionNamedMain(SSection section) { return section.name == "__main__"; } int numMainSections = count!isSectionNamedMain(fileSections); bool hideMainHeaders = numMainSections > 1; writefln("There are %d main sections.", numMainSections); @>
Start formatting. @p foreach(SSection section; fileSections) { @>
Display sections get stuck into the display output. @p if(!section.name.empty) { // A hack for this version. if(hideMainHeaders && section.name == "__main__") { } else { outputDisplayContents ~= "
"; } } if(section.isCode == false) { string content = reduce!("a ~ b.content")("", section.contents); outputDisplayContents ~= content; lineNumber += count(content, '\n'); } else { @>
Code sections get formatted into the display and stuck into the code. @p outputDisplayContents ~= "
"; foreach(block; section.contents) { string outputContent = formatCodeForDisplay(block.content, lineNumber); if(block.isIdentifier) { outputDisplayContents ~= "" ~ outputContent ~ ""; } else { outputDisplayContents ~= outputContent; } } outputDisplayContents ~= ""; } } foreach(section; fileSections) { if(section.isCode && section.name == "__main__") { foreach(block; section.contents) { if(block.isIdentifier) { outputCodeContents ~= expand_code_identifier(fileSections, block.content); } else { outputCodeContents ~= block.content; } } } } } @> @*The SectionReader.A section is terminated by the start of a new section, so scan for @@. This function will get more complicated soon. @p SBlock[] slurp_section(string contents, ref int offset, bool recurse) { SBlock[] results; string currentBlock = ""; int index = offset; for(; index < contents.length; index++) { if(contents[index] == '@@') { if(recurse && contents[index + 1] == '<') { results ~= SBlock(false, currentBlock); currentBlock = ""; int preIdentifierIndex = index; index += 2; SBlock[] identifierBlocks = slurp_section(contents, index, false); assert(identifierBlocks.length == 1); string identifier = identifierBlocks[0].content; if(contents[index..$].startsWith("@@>=")) { // The end of one block has bumped into the start of another. Roll back. index = preIdentifierIndex; break; } // Now that we're sure this is a reference to an identifier and not a definition of a new identifier, continue. results ~= SBlock(true, identifier); if(contents[index..$].startsWith("@@>")) { index += 2; } else { writefln("Identifier '%s' invoked without close tag. at %s", identifier, contents[index..min($, index + 10)]); break; } } else if(contents[index + 1] != '@@') { break; } else { currentBlock ~= contents[index]; // Skip the escaped at symbol. index++; } } else { currentBlock ~= contents[index]; } } results ~= SBlock(false, currentBlock); offset = index; return results; } @> @*Expand Code. Recursively expand an identifier into its definitions. @p string expand_code_identifier(SSection[] sections, string identifier) { string output; output ~= "/* from "~identifier~" */"; bool identifiersMatch(SSection section) { return section.name == identifier; } SSection[] definition = find!identifiersMatch(sections); if(definition.empty) { writefln("Unable to find identifier '%s'.", identifier); return format("** %s is undefined **", identifier); } foreach(block; definition[0].contents) { // TODO:// Fix This: outputCodeContents ~= format("#line %d \"%s\"", lineNumber, inputFilename); if(block.isIdentifier) { output ~= expand_code_identifier(sections, block.content); } else { output ~= block.content; } } return output; } @> @*Results.
Results: Back to the test code
The webpage documentation:
The D source code: