Prev:WEB 1 Top:WEB 0 Next:WEB 3
Jon Breuer - September 8, 2024.
The original TANGLE and WEAVE programs generate both document and program from the source *.WEB file. Thus far I've been using the HTML as my source, but that leaves me formatting my <pre> sections manually. A good WEB program would format them for me. That will finally get rid of the ugly and redundant <pre>@p blocks in my code.
The HTML output from this file contains nested <pre> blocks so the code looks only half-way correct in this file and a different half correct in the generated file. I'll add a hack style here to fix part of it.
<style> pre pre { margin: -0.5em; } </style>
This is the header. It hasn't changed a lot from version to version.
@p //////////// // WEB2.D // // This is a level 2 bootstrapping Literate Programming thing. // It will start generating HTML from WEB files. // module web2; @>
Normal includes. These will slowly grow and it would be convenient to call them out where they become useful.
@p private import std.algorithm; // Needed for countUntil and searching private import std.file; // Needed for file input and output private import std.stdio; // Needed for error reporting and my debugging private import std.string; // These programs are all about string processing. @>
This webpage would be so much prettier to read if comments and strings were syntax colored.
The normal countUntil operates from the start of the string, but I need a variant that can be moved progressively through the string. Converting to the "last half" slice of the array and back to "index within whole array" is just a bit cumbersome at the call site.
@p ptrdiff_t countFromPosUntil(string haystack, ptrdiff_t startIndex, string needle) { ptrdiff_t offset = countUntil(haystack[startIndex..haystack.length], needle); if(offset < 0) { return offset; } return startIndex + offset; } @>
Start of program and basic error handling.
@p void main(string[] args) { if(args.length != 4) { writefln("Usage: WEB1 inputFile outputHTMLFile outputCodeFile"); } const string inputFilename = args[1]; string fileContents = cast(string) std.file.read(inputFilename); if(fileContents.length == 0) { writefln("Unable to read file '%s'.", inputFilename); return; } // Generate these strings so they don't appear in the source. const string startTag = "@" ~ "p"; const string endTag = "@" ~ ">"; @>
Now we are generating two files as we loop over the @@p blocks.
@p string outputCodeContents = ""; string outputDisplayContents = ""; ptrdiff_t blockEndIndex = 0; ptrdiff_t startDisplayIndex = 0; int lineNumber = 0; for( ptrdiff_t blockStartIndex = countUntil(fileContents, startTag); blockStartIndex != -1; blockStartIndex = countFromPosUntil(fileContents, blockEndIndex, startTag)) { @>
Copy the interstitials into the display file. A future version will really need to start tangling the code blocks out of order. Not every bit needs to be explained.
@p outputDisplayContents ~= fileContents[startDisplayIndex .. blockStartIndex]; @>
We will also start counting newlines so we can insert the source line numbers into the intermediate code files.
@p lineNumber += count(fileContents[startDisplayIndex .. blockStartIndex], '\n'); @>
The new version of escaping @ symbols will remove the doubling up from the HTML.
@p if(blockStartIndex > 0 && fileContents[blockStartIndex - 1] == '@') { // Don't parse escaped at symbols or examples. blockEndIndex = blockStartIndex + 1; startDisplayIndex++; continue; } @>
Extra robust close tag tracking.
@p string codeSourceContents = ""; for(blockEndIndex = blockStartIndex + startTag.length; blockEndIndex < fileContents.length; blockEndIndex++) { if(fileContents[blockEndIndex] == '@' && blockEndIndex < fileContents.length - 1) { dchar nextChar = fileContents[blockEndIndex + 1] ; if(nextChar == '@') { // Escaped @ blockEndIndex++; } else if(nextChar == '>') { blockEndIndex++; break; } } codeSourceContents ~= fileContents[blockEndIndex]; } if(blockEndIndex >= fileContents.length) { writefln("Start tag without end tag found at location %d near line %d.", blockStartIndex, lineNumber); return; } @>
Copying to code works the same as before.
@p outputCodeContents ~= format("#line %d \"%s\"", lineNumber, inputFilename); outputCodeContents ~= codeSourceContents; @>
But we need to start generating the prettified <pre> tags ourselves. Luckily, D lets me declare functions out of order, so I can insert future work here.
@p outputDisplayContents ~= "<"~"pre>"; outputDisplayContents ~= formatCodeForDisplay(codeSourceContents, lineNumber); outputDisplayContents ~= ""~"pre>"; @>
Update the line number and the display index.
@p lineNumber += count(codeSourceContents, '\n'); startDisplayIndex = blockEndIndex + endTag.length; @>
End the parsing for loop.
@p } @>
Add the tail of the file to the display.
@p outputDisplayContents ~= fileContents[startDisplayIndex..fileContents.length]; @>
Write the final results.
@p string outputDisplayFilename = args[2]; std.file.write(outputDisplayFilename, outputDisplayContents); string outputCodeFilename = args[3]; std.file.write(outputCodeFilename, outputCodeContents); @>
End the main function.
@p } @>
C:\literate> web1 literate_programming_2.html web2.d C:\literate> dmd web2 C:\literate> web2 literate_programming_2.html literate_programming_2b.html web2.d C:\literate> dmd web2 C:\literate> web2 literate_programming_2.html literate_programming_2b.html web2.d
Here is the result:literate_programming_2b.html
There are two bugs in this program. Because this HTML source file has the <pre> tags already in place, the generated HTML has double-nested <pre> blocks. Second, my line counts are off by one, but I haven't looked for why yet.
Now I want to start color formatting the code.
@p private import std.ascii; // Character type checks. //TODO:// Escape HTML codes so they don't mis-render. //TODO:// Trim line lengths to fit. string formatCodeForDisplay(string source, int lineNumber) { string output = ""; // Calling out future work... string scanner = escapeHTMLCharacters(source); scanner: while(!scanner.empty) { @>
Each section of the scanner converts a diffent part of code into colored blocks. First, comments. Match a // and scan til the end of line.
@p if(scanner.startsWith("//")) { // Color comments. output ~= "<"~"span class=\"code_comment\">"; int lineLength = countUntil(scanner, "\n"); output ~= scanner[0..lineLength - 1]; output ~= "<"~"/span>"; scanner = scanner[lineLength..scanner.length]; @>
Next, strings. I handle both single and double quotes and then skip escaped characters and scan for end of string.
@p } else if(scanner.startsWith("\"") || scanner.startsWith("\'")) { // Color strings. char stringType = scanner[0]; output ~= "<"~"span class=\"code_string\">"; int stringLength = 1; while(stringLength < scanner.length && scanner[stringLength] != stringType) { if(scanner[stringLength] == '\\') { stringLength += 1; } stringLength += 1; } if(stringLength >= scanner.length) { writefln("Unable to find close quote for string %s near line %d", scanner[0..min(scanner.length, 20)], lineNumber); break scanner; } output ~= scanner[0..stringLength + 1]; output ~= "<"~"/span>"; scanner = scanner[stringLength + 1..scanner.length]; @>
Identifiers were a bit tricky. I have a list of known identifiers and I have to check that the block of text starts with an alpha character and continues. I made several mistakes here, scanning for whitespace instead of non-identifier, and allowing partial matches like fo and format instead of for.
I'm still breaking my HTML tags apart to keep the browser from mis-rendering this code.
@p } else { if(isAlpha(scanner[0])) { bool isNotIdentifier(dchar ch) { return !(isAlpha(ch) || isDigit(ch) || ch == '_'); } int wordLength = countUntil!isNotIdentifier(scanner); const string[] identifiers = [ "const", "bool", "break", "char", "dchar", "else", "for", "if", "import", "int", "main", "module", "private", "return", "string", "std", "void", "while", ]; if(wordLength > 0 && !findAmong(identifiers, [scanner[0..wordLength]]).empty) { // Special identifiers output ~= "<"~"span class=\"code_identifier\">"; output ~= scanner[0..wordLength + 1]; output ~= "<"~"/span>"; } else { output ~= scanner[0..wordLength + 1]; } scanner = scanner[wordLength + 1..scanner.length]; @>
Unknown character. Advance and test again.
@p } else { output ~= scanner[0]; scanner = scanner[1..scanner.length]; } } } return output; } @>
And the result is now something like this.
void main(string[] args) { if(args.length != 4) { writefln("Usage: WEB1 inputFile outputHTMLFile outputCodeFile"); } const string inputFilename = args[1]; string fileContents = cast(string) std.file.read(inputFilename); if(fileContents.length == 0) { writefln("Unable to read file '%s'.", inputFilename); return; } // Generate these strings so they don't appear in the source. const string startTag = "@" ~ "p"; const string endTag = "@" ~ ">";
I keep having to break out <pre> and <span> tags into parts so the browser doesn't choke on them. Proper text escaping will fix that.
@p string escapeHTMLCharacters(string source) { string output; string scanner = source; foreach(dchar ch; source) { if(countUntil("<>&", ch) >= 0) { if(ch == '<') { output ~= "<"; } else if(ch == '>') { output ~= ">"; } else if(ch == '&') { output ~= "&"; } else { writefln("BUG: Only partly implemented support for '%s'.", ch); } } else { output ~= ch; } } return output; } @>
Now I'm about to leave HTML behind as the source. WEB3 will work from *.WEB source and generate HTML and D code as the outputs.
I am occasionally creating bugs - mostly infinite loops in WEB2 which leave me with no working WEB2. I have to run WEB1 on the source first before I can get a working WEB2 again.
C:\literate> web2 literate_programming_2.html literate_programming_2b.html web2.dcore.exception.ArraySliceError@literate_programming_2.html(312): slice [0 .. 4294967295] extends past source array of length 398 ---------------- 0x00419FEB 0x004019CC 0x004012E8 0x00426263 0x004261C3 0x00426032 0x0041B724 0x00401C13 0x75907BA9 in BaseThreadInitThunk 0x77D0C10B in RtlInitializeExceptionChain 0x77D0C08F in RtlClearBitsC:\literate> web1 literate_programming_2.html web2.d C:\literate> dmd web2 C:\literate> web2 literate_programming_2.html literate_programming_2b.html web2.d C:\literate> dmd web2 C:\literate> web2 literate_programming_2.html literate_programming_2b.html web2.d C:\literate>