Prev:WEB 1 Top:WEB 0 Next:WEB 3
Jon Breuer - September 8, 2024.
The original TANGLE and WEAVE programs generate both document and program from the source *.WEB file. Thus far I've been using the HTML as my source, but that leaves me formatting my <pre> sections manually. A good WEB program would format them for me. That will finally get rid of the ugly and redundant <pre>@p blocks in my code.
The HTML output from this file contains nested <pre> blocks so the code looks only half-way correct in this file and a different half correct in the generated file. I'll add a hack style here to fix part of it.
<style>
pre pre {
margin: -0.5em;
}
</style>
This is the header. It hasn't changed a lot from version to version.
@p //////////// // WEB2.D // // This is a level 2 bootstrapping Literate Programming thing. // It will start generating HTML from WEB files. // module web2; @>
Normal includes. These will slowly grow and it would be convenient to call them out where they become useful.
@p private import std.algorithm; // Needed for countUntil and searching private import std.file; // Needed for file input and output private import std.stdio; // Needed for error reporting and my debugging private import std.string; // These programs are all about string processing. @>
This webpage would be so much prettier to read if comments and strings were syntax colored.
The normal countUntil operates from the start of the string, but I need a variant that can be moved progressively through the string. Converting to the "last half" slice of the array and back to "index within whole array" is just a bit cumbersome at the call site.
@p
ptrdiff_t countFromPosUntil(string haystack, ptrdiff_t startIndex, string needle)
{
ptrdiff_t offset = countUntil(haystack[startIndex..haystack.length], needle);
if(offset < 0) {
return offset;
}
return startIndex + offset;
}
@>
Start of program and basic error handling.
@p
void main(string[] args)
{
if(args.length != 4) {
writefln("Usage: WEB1 inputFile outputHTMLFile outputCodeFile");
}
const string inputFilename = args[1];
string fileContents = cast(string) std.file.read(inputFilename);
if(fileContents.length == 0) {
writefln("Unable to read file '%s'.", inputFilename);
return;
}
// Generate these strings so they don't appear in the source.
const string startTag = "@" ~ "p";
const string endTag = "@" ~ ">";
@>
Now we are generating two files as we loop over the @@p blocks.
@p
string outputCodeContents = "";
string outputDisplayContents = "";
ptrdiff_t blockEndIndex = 0;
ptrdiff_t startDisplayIndex = 0;
int lineNumber = 0;
for(
ptrdiff_t blockStartIndex = countUntil(fileContents, startTag);
blockStartIndex != -1;
blockStartIndex = countFromPosUntil(fileContents, blockEndIndex, startTag)) {
@>
Copy the interstitials into the display file. A future version will really need to start tangling the code blocks out of order. Not every bit needs to be explained.
@p
outputDisplayContents ~= fileContents[startDisplayIndex .. blockStartIndex];
@>
We will also start counting newlines so we can insert the source line numbers into the intermediate code files.
@p
lineNumber += count(fileContents[startDisplayIndex .. blockStartIndex], '\n');
@>
The new version of escaping @ symbols will remove the doubling up from the HTML.
@p
if(blockStartIndex > 0 && fileContents[blockStartIndex - 1] == '@') {
// Don't parse escaped at symbols or examples.
blockEndIndex = blockStartIndex + 1;
startDisplayIndex++;
continue;
}
@>
Extra robust close tag tracking.
@p
string codeSourceContents = "";
for(blockEndIndex = blockStartIndex + startTag.length; blockEndIndex < fileContents.length; blockEndIndex++) {
if(fileContents[blockEndIndex] == '@' && blockEndIndex < fileContents.length - 1) {
dchar nextChar = fileContents[blockEndIndex + 1] ;
if(nextChar == '@') {
// Escaped @
blockEndIndex++;
} else if(nextChar == '>') {
blockEndIndex++;
break;
}
}
codeSourceContents ~= fileContents[blockEndIndex];
}
if(blockEndIndex >= fileContents.length) {
writefln("Start tag without end tag found at location %d near line %d.", blockStartIndex, lineNumber);
return;
}
@>
Copying to code works the same as before.
@p
outputCodeContents ~= format("#line %d \"%s\"", lineNumber, inputFilename);
outputCodeContents ~= codeSourceContents;
@>
But we need to start generating the prettified <pre> tags ourselves. Luckily, D lets me declare functions out of order, so I can insert future work here.
@p
outputDisplayContents ~= "<"~"pre>";
outputDisplayContents ~= formatCodeForDisplay(codeSourceContents, lineNumber);
outputDisplayContents ~= ""~"pre>";
@>
Update the line number and the display index.
@p
lineNumber += count(codeSourceContents, '\n');
startDisplayIndex = blockEndIndex + endTag.length;
@>
End the parsing for loop.
@p
}
@>
Add the tail of the file to the display.
@p
outputDisplayContents ~= fileContents[startDisplayIndex..fileContents.length];
@>
Write the final results.
@p
string outputDisplayFilename = args[2];
std.file.write(outputDisplayFilename, outputDisplayContents);
string outputCodeFilename = args[3];
std.file.write(outputCodeFilename, outputCodeContents);
@>
End the main function.
@p
}
@>
C:\literate> web1 literate_programming_2.html web2.d C:\literate> dmd web2 C:\literate> web2 literate_programming_2.html literate_programming_2b.html web2.d C:\literate> dmd web2 C:\literate> web2 literate_programming_2.html literate_programming_2b.html web2.d
Here is the result:literate_programming_2b.html
There are two bugs in this program. Because this HTML source file has the <pre> tags already in place, the generated HTML has double-nested <pre> blocks. Second, my line counts are off by one, but I haven't looked for why yet.
Now I want to start color formatting the code.
@p
private import std.ascii; // Character type checks.
//TODO:// Escape HTML codes so they don't mis-render.
//TODO:// Trim line lengths to fit.
string formatCodeForDisplay(string source, int lineNumber)
{
string output = "";
// Calling out future work...
string scanner = escapeHTMLCharacters(source);
scanner: while(!scanner.empty) {
@>
Each section of the scanner converts a diffent part of code into colored blocks. First, comments. Match a // and scan til the end of line.
@p
if(scanner.startsWith("//")) {
// Color comments.
output ~= "<"~"span class=\"code_comment\">";
int lineLength = countUntil(scanner, "\n");
output ~= scanner[0..lineLength - 1];
output ~= "<"~"/span>";
scanner = scanner[lineLength..scanner.length];
@>
Next, strings. I handle both single and double quotes and then skip escaped characters and scan for end of string.
@p
} else if(scanner.startsWith("\"") || scanner.startsWith("\'")) {
// Color strings.
char stringType = scanner[0];
output ~= "<"~"span class=\"code_string\">";
int stringLength = 1;
while(stringLength < scanner.length && scanner[stringLength] != stringType) {
if(scanner[stringLength] == '\\') {
stringLength += 1;
}
stringLength += 1;
}
if(stringLength >= scanner.length) {
writefln("Unable to find close quote for string %s near line %d", scanner[0..min(scanner.length, 20)], lineNumber);
break scanner;
}
output ~= scanner[0..stringLength + 1];
output ~= "<"~"/span>";
scanner = scanner[stringLength + 1..scanner.length];
@>
Identifiers were a bit tricky. I have a list of known identifiers and I have to check that the block of text starts with an alpha character and continues. I made several mistakes here, scanning for whitespace instead of non-identifier, and allowing partial matches like fo and format instead of for.
I'm still breaking my HTML tags apart to keep the browser from mis-rendering this code.
@p
} else {
if(isAlpha(scanner[0])) {
bool isNotIdentifier(dchar ch) { return !(isAlpha(ch) || isDigit(ch) || ch == '_'); }
int wordLength = countUntil!isNotIdentifier(scanner);
const string[] identifiers = [ "const", "bool", "break", "char", "dchar", "else",
"for", "if", "import", "int", "main", "module", "private", "return", "string",
"std", "void", "while", ];
if(wordLength > 0 && !findAmong(identifiers, [scanner[0..wordLength]]).empty) {
// Special identifiers
output ~= "<"~"span class=\"code_identifier\">";
output ~= scanner[0..wordLength + 1];
output ~= "<"~"/span>";
} else {
output ~= scanner[0..wordLength + 1];
}
scanner = scanner[wordLength + 1..scanner.length];
@>
Unknown character. Advance and test again.
@p
} else {
output ~= scanner[0];
scanner = scanner[1..scanner.length];
}
}
}
return output;
}
@>
And the result is now something like this.
void main(string[] args) { if(args.length != 4) { writefln("Usage: WEB1 inputFile outputHTMLFile outputCodeFile"); } const string inputFilename = args[1]; string fileContents = cast(string) std.file.read(inputFilename); if(fileContents.length == 0) { writefln("Unable to read file '%s'.", inputFilename); return; } // Generate these strings so they don't appear in the source. const string startTag = "@" ~ "p"; const string endTag = "@" ~ ">";
I keep having to break out <pre> and <span> tags into parts so the browser doesn't choke on them. Proper text escaping will fix that.
@p
string escapeHTMLCharacters(string source)
{
string output;
string scanner = source;
foreach(dchar ch; source) {
if(countUntil("<>&", ch) >= 0) {
if(ch == '<') {
output ~= "<";
} else if(ch == '>') {
output ~= ">";
} else if(ch == '&') {
output ~= "&";
} else {
writefln("BUG: Only partly implemented support for '%s'.", ch);
}
} else {
output ~= ch;
}
}
return output;
}
@>
Now I'm about to leave HTML behind as the source. WEB3 will work from *.WEB source and generate HTML and D code as the outputs.
I am occasionally creating bugs - mostly infinite loops in WEB2 which leave me with no working WEB2. I have to run WEB1 on the source first before I can get a working WEB2 again.
C:\literate> web2 literate_programming_2.html literate_programming_2b.html web2.dcore.exception.ArraySliceError@literate_programming_2.html(312): slice [0 .. 4294967295] extends past source array of length 398 ---------------- 0x00419FEB 0x004019CC 0x004012E8 0x00426263 0x004261C3 0x00426032 0x0041B724 0x00401C13 0x75907BA9 in BaseThreadInitThunk 0x77D0C10B in RtlInitializeExceptionChain 0x77D0C08F in RtlClearBitsC:\literate> web1 literate_programming_2.html web2.d C:\literate> dmd web2 C:\literate> web2 literate_programming_2.html literate_programming_2b.html web2.d C:\literate> dmd web2 C:\literate> web2 literate_programming_2.html literate_programming_2b.html web2.d C:\literate>