Prev:WEB 2 Top:WEB 0 Next:WEB 4
Jon Breuer - September 8, 2024.
Now I'd like to write parts of this program out of order. Or I want them in conceptual order instead in compiling order. This will require naming sections and then slurping multiple named sections into a group.
Also, now that I can clean up the code for the display html, the source is here (literate_programming_3_source.html) and this is the generated file. @*Existing code. @
I've cut out the file processor and moved it into a new function we'll write after the discussion. @p string outputDisplayContents = ""; string outputCodeContents = ""; parse_web_then_tangle_and_weave( outputDisplayContents, outputCodeContents, fileContents, inputFilename); @>
Existing code:
There were also two places I found where a comment without a newline, taking up the entire string would cause an infinite parse loop.
int lineLength = countUntil(scanner, "\n"); if(lineLength < 0) { lineLength = scanner.length; }@*Next Steps. I've been re-reading (http://www.literateprogramming.com/lpsimp.pdf) and that's a modified LP system called NoWeb. I reread (http://www.literateprogramming.com/knuthweb.pdf) and noticed differences between what I'm attempting and what Knuth accomplished.
I have been treating *.WEB files as TeX files with added WEB markup, that Knuth is writing primarily in TeX with the examples being pulled. That's what my files are - HTML with code insertions. This isn't quite accurate. Every section of a *.WEB file is a WEB command, and the display sections support embedded display language constructs.
Codes I don't support.
WEB also supports a fair amount of recursive embedding which I don't yet.
My work thus far has been taking WEB0 and using it to compile both literate_programming_0.html and literate_programming_1.html, then continuing that process with WEB1. I think I need to create a side example file that has the features I want WEB3 to support.
Snippet from examples.
@p // Start my program.
@c
@<Header files to include@>@/
@<Global variables@>@/
@<Functions@>@/
@<The main program@>
@ Any other section to break the first definitions.
I don't know what @/ means.
@@<The main...@@>=
main (argc,argv)
int argc; /* the number of arguments on the \UNIX/ command line */
char **argv; /* the arguments themselves, an array of strings */
{
@@<Variables local to |main|@@>@@;
prog_name=argv[0];
@@<Set up option selection@@>;
writefln("asdf");
@@<Process all the files@@>;
writefln("sdfg");
@@<Print the grand totals if there were multiple files @@>;
exit(status);
}
@ Inferences: @ means insert the semicolon into C/Pascal, but don't render it in the display file.
I think I can treat @<blah@> as a reference to an identifier and @<blah@>=content@ as the definition. My tool thus far has been a glorified search-and-replace, but nested out of order identifiers will require more careful work.
My test snippet: (I'll copy and paste this out.) literate_programming_3_test.web Skip to the results @*Advanced Parsing. And now, let's rebuild the parser. We're going to need to track the different sections that we're assembling. * A section is either a display section or a code section. A display section should have 1 block of text. A code section might have many. @p struct SSection { string name; bool isCode; SBlock[] contents; }; struct SBlock { bool isIdentifier; string content; }; @>
This is the parsing function. @p void parse_web_then_tangle_and_weave(ref string outputDisplayContents, ref string outputCodeContents, string fileContents, string inputFilename) { SSection[] fileSections; @>
We will scan through every character, @@ control characters determine which section a block of text becomes. @p int charIndex = 0; while(charIndex < fileContents.length) { dchar ch = fileContents[charIndex]; if(ch == '@@') { dchar chNext = charIndex < fileContents.length - 1 ? fileContents[charIndex + 1] : 0; @>
Every @@ character is a pair and we'll skip both of them every time. No extra work is needed for an escaped @@@@ sequence. @p charIndex += 2; if(chNext == '@@') { @>
Program blocks get turned into code. @p } else if(chNext == 'p') { fileSections ~= SSection("__main__", true, slurp_section(fileContents, charIndex, true)); @>
Close blocks don't need extra processing. @p } else if(chNext == '>') { @>
Definition blocks. @p } else if(chNext == '<') { SBlock[] identifierBlocks = slurp_section(fileContents, charIndex, false); assert(identifierBlocks.length == 1); string identifier = identifierBlocks[0].content; SBlock[] sectionContents; if(fileContents[charIndex..charIndex + 3] == "@@>=") { charIndex += 3; sectionContents = slurp_section(fileContents, charIndex, true); } else { writefln("Identifier '%s' invoked outside program and not a definition.", identifier); } fileSections ~= SSection(identifier, true, sectionContents); @>
Display blocks with titles. @p } else if(chNext == '*') { writefln("title section"); int titleEndingPeriod = countFromPosUntil(fileContents, charIndex, "."); string title = ""; if(titleEndingPeriod > 0) { title = fileContents[charIndex..titleEndingPeriod]; charIndex = titleEndingPeriod + 1; } fileSections ~= SSection(title, false, slurp_section(fileContents, charIndex, false)); @>
Unknown blocks get turned into display. @p } else { writefln("paragraph section"); // '@@ ' will be converted into a section. fileSections ~= SSection("", false, slurp_section(fileContents, charIndex, false)); } @>
Raw text without a starting @@ isn't legal WEB, but I default it to display text. @p } else { writefln("raw display section"); fileSections ~= SSection("", false, slurp_section(fileContents, charIndex, false)); } } int lineNumber = 0; //TODO:// Move this up for debugging. @>
After parsing out the sections, format them for output.
Because this file is HTML with the occasional bit of WEB commands and thus far I've been using @@p as the "any code" marker, @@p gets a header inserted after every snippet. Add a hack to remove duplicates. @p bool isSectionNamedMain(SSection section) { return section.name == "__main__"; } int numMainSections = count!isSectionNamedMain(fileSections); bool hideMainHeaders = numMainSections > 1; writefln("There are %d main sections.", numMainSections); @>
Start formatting. @p foreach(SSection section; fileSections) { @>
Display sections get stuck into the display output. @p if(!section.name.empty) { // A hack for this version. if(hideMainHeaders && section.name == "__main__") { } else { outputDisplayContents ~= "
"; } } if(section.isCode == false) { string content = reduce!("a ~ b.content")("", section.contents); outputDisplayContents ~= content; lineNumber += count(content, '\n'); } else { @>
Code sections get formatted into the display and stuck into the code. @p outputDisplayContents ~= "
";
foreach(block; section.contents) {
string outputContent = formatCodeForDisplay(block.content, lineNumber);
if(block.isIdentifier) {
outputDisplayContents ~= "" ~ outputContent ~ "";
} else {
outputDisplayContents ~= outputContent;
}
}
outputDisplayContents ~= "";
}
}
foreach(section; fileSections) {
if(section.isCode && section.name == "__main__") {
foreach(block; section.contents) {
if(block.isIdentifier) {
outputCodeContents ~= expand_code_identifier(fileSections, block.content);
} else {
outputCodeContents ~= block.content;
}
}
}
}
}
@>
@*The SectionReader.A section is terminated by the start of a new section, so scan for @@. This function will get more complicated soon.
@p
SBlock[] slurp_section(string contents, ref int offset, bool recurse)
{
SBlock[] results;
string currentBlock = "";
int index = offset;
for(; index < contents.length; index++) {
if(contents[index] == '@@') {
if(recurse && contents[index + 1] == '<') {
results ~= SBlock(false, currentBlock);
currentBlock = "";
int preIdentifierIndex = index;
index += 2;
SBlock[] identifierBlocks = slurp_section(contents, index, false);
assert(identifierBlocks.length == 1);
string identifier = identifierBlocks[0].content;
if(contents[index..$].startsWith("@@>=")) {
// The end of one block has bumped into the start of another. Roll back.
index = preIdentifierIndex;
break;
}
// Now that we're sure this is a reference to an identifier and not a definition of a new identifier, continue.
results ~= SBlock(true, identifier);
if(contents[index..$].startsWith("@@>")) {
index += 2;
} else {
writefln("Identifier '%s' invoked without close tag. at %s", identifier, contents[index..min($, index + 10)]);
break;
}
} else if(contents[index + 1] != '@@') {
break;
} else {
currentBlock ~= contents[index];
// Skip the escaped at symbol.
index++;
}
} else {
currentBlock ~= contents[index];
}
}
results ~= SBlock(false, currentBlock);
offset = index;
return results;
}
@>
@*Expand Code. Recursively expand an identifier into its definitions.
@p
string expand_code_identifier(SSection[] sections, string identifier)
{
string output;
output ~= "/* from "~identifier~" */";
bool identifiersMatch(SSection section) {
return section.name == identifier;
}
SSection[] definition = find!identifiersMatch(sections);
if(definition.empty) {
writefln("Unable to find identifier '%s'.", identifier);
return format("** %s is undefined **", identifier);
}
foreach(block; definition[0].contents) {
// TODO:// Fix This: outputCodeContents ~= format("#line %d \"%s\"", lineNumber, inputFilename);
if(block.isIdentifier) {
output ~= expand_code_identifier(sections, block.content);
} else {
output ~= block.content;
}
}
return output;
}
@>
@*Results.
Results: Back to the test code
The webpage documentation:
The D source code: