This is the second in a series of articles, aimed at the professional programmer. In my last article I introduced "Model-Oriented Programming", or MOP, and touched the subject of code generation.
In this article we will construct a simple modeling language that produces web sites. The goal is to generate the HTML for certain types of web site very rapidly, from a simple high-level description. You will learn how to write a simple code generator using the GSL scripting language.
You will need about one day to understand and start using these techniques in your own projects.
If you did not catch the first article in this series, you will want to read it before starting this one.
I'm going to propose a simple abstract model for a web site, as an example. When you understand this example, you'll have a much better idea of how we design new models, so that you can design your own.
To start with, I'll explain how I design a new model, and then I'll take you through the steps of building a code generator that brings it to life.
Our model lets us build simple web sites. A web site is a mixture of different types of document, for instance:
And so on. When we make a new model, it's worth asking the question, "how would I make a thousand of these?" I.E., a thousand web sites. Well, we'd have lots of content, which would be different for each web site, possibly with some common parts. The content could definitely be based on standard templates - it's unlikely we'd make each of a thousand sites entirely from scratch.
If we used JavaScript menus, we'd presumably use the same code in each site, changing only the menu content to match the structure of the site.
Most likely we'd use a unique CSS stylesheet for each site, to give each site a unique look and feel, but they could also be based on a standard template.
Finally, the images and icons would be a mixture of standard graphics and customised graphics, depending on how pretty we want each site to look.
Our model is going to be the basis for code generation, that is, the mass production of as much of the above as is reasonable. To do this, we need to make a compact and efficient statement of exactly what is needed to produce each web site.
It's like constructing a thousand houses. It's expensive to design and build each house as a unique thing. It's much cheaper to make a single common plan, and then for each house, state the differences. So one house might have a different roof shape, while another has larger windows, but all houses share the same materials, wall and floor construction, and so on.
When we mass produce something, we're clearly aiming for low cost and consistent , and hopefully high, quality. It's the same with code generation. So, let's get to our web site model. What information do we actually need to specify?
The next step is to sketch a model that can hold this information in a useful way. Remember that we use XML as a modeling language. So, we invent an XML syntax for our model. For each page, I'd like to write something like this:
<page name = "name of page" title = "Title text goes here" subtitle = "Subtitle text goes here" > <content> Content HTML goes here </content> </page>
When I design new XML languages like the above, I use entity attributes to hold single-line properties, and child entities to hold multi-line properties or properties that can occur more than once. It just seems more elegant than putting properties in child entities, since this implies those properties can occur many times. It does not make sense for a page to have more than one name, title, subtitle, or image in our model, so we define these as attributes of the page entity. The iMatix MOP tools use this style very heavily.
Once we've defined a set of pages, how do we tie these together into a web site? Let's use a second model for the overall web site:
<site copyright = "copyright statement goes here"> <section name = "name of section"> <page name = "name of page" /> ... </section>... </site>
I've defined a <section> tag that breaks the pages into groups. Now let's jump right in and make ourselves a web site. There's no better way to test a model than to try using it. As an example, I'll make a new web site for my local grocer, who has decided, finally, to go on-line.
We'll make the web site as several XML files. This is a design choice. We could also make the site as a single large XML file. It's a trade-off between ease of use (a single file is easier in smaller cases) and scalability (it's not practical to edit a large site with hundreds of pages as a single file).
To start with, we'll define the overall site like this:
<?xml version = "1.0" ?> <site copyright = "Copyright © Local Grocer" script = "sitegen_1.gsl" > <section name = "Welcome"> <page name = "index" /> </section> <section name = "Products"> <page name = "fruit" /> <page name = "vegetables" /> </section> </site>
Note the first line, which defines the file as XML, and the 'script' tag, which tells GSL what script to run to process the data. We've defined three pages. Let's write very a simple version of each of these:
Next, we will write three more short XML files as shown below. First the index page:
<page name = "index" title = "Local Grocer" subtitle = "Visit the Local Grocer" > <content> <h3>Close to you</h3> <p>We're just around the corner, if you live near by.</p> <h3>Always open</h3> <p>And if we're closed, just come back tomorrow.</p> <h3>Cheap and convenient</h3> <p>Much cheaper and easier than growing your own vegetables and fruit.</p> </content> </page>
Next, the fruit page:
<page name = "fruit" title = "Our Fruit Stand" subtitle = "Lucious Tropical Fruits" > <content> <h3>Always fresh</h3> <p>Just like it was plucked from the tree last month.</p> <h3>Special deal</h3> <p>Any five pieces of fruit, for the price of ten!</p> <h3>Money back if not satisfied</h3> <p>We'll give you your money back if we're not satisfied with it!</p> </content> </page>
and last the vegetable page:
<page name = "vegetables" title = "Our Vegetables" subtitle = "Healthy Organic Vegetables" > <content> <h3>100% organic vegetables</h3> <p>All vegetables made from cardon, oxygen, and hydrogen molecules with trace elements.</p> <h3>Country fresh style</h3> <p>We don't know what that means, but it sounded nice!</p> <h3>Unique take-away concept</h3> <p>Now you can consume your vegetables in the comfort of your own home.</p> </content> </page>
Finally, here is the first draft of the web generation script. It does not produce anything, it simply loads the web site data into an XML tree and then saves this (in a file called root.xml) that we can look at to see what live data the script is actually working with:
.# Since we run the script off the XML file, it starts in .# template mode. .template 0 for section for page # Load XML <page> data xml to section from "$(page.name).xml" # Delete old <page> tag delete page endfor endfor save root .endtemplate
Let's look at what this script does. First, it switches off template mode so we can write ordinary GSL without starting each line with a dot. GSL starts scripts in template mode if they are launched from the XML file. It's useful in many cases but not here. So, we wrap the whole script in '.template 0' and '.endtemplate'.
Second, the script works through each section and page, and loads the XML data for that page. It does this using two commands, 'xml' and 'delete'. The first loads XML data from a file into the specified scope (<section>, in this case), and the second deletes the current page (since the loaded data also contains a <page> tag).
Finally, the script saves the whole XML tree to a file. If you want to try the next steps you must have installed GSL, as I described in the last article. Run the script like this:
gsl site
GSL looks for the file called 'site.xml'. When the script has run, take a look at root.xml. This shows you what we're going to work with to generate the real HTML.
When we generate output, we insert variable values into the generated text. This is very much like using shell variables.
GSL does automatic case conversion on output variable. This is very useful when we generate programming languages. For example, the $(name) form outputs a variable in lower case:
output "$(filename).c"
The $(NAME) form outputs the same value in uppercase:
#if defined ($(FILENAME)_INCLUDED)
And the $(Name) form outputs the variable in 'title' case, i.e. the first letter is capitalised:
################# $(Filename) #################
One side-effect of automatic case conversion is that we'll often get variables converted to lower case simply because we used the $(name) form. If we don't want a variable to be automatically case converted, we use this form: $(name:). This is also called the 'empty modifier'.
A second side-effect of automatic case conversion is that variable names are not case sensitive. By default GSL ignores the case of variable names so that $(me) and $(ME) refer to the same variable.
But putting empty modifiers in every variable expansion gets tiresome, and GSL lets us switch off automatic case conversion, using this instruction:
ignorecase = 0
This tells GSL, "variable names are case-sensitive, and do not convert variable values on output".
In our first draft we loaded each page into the XML tree and deleted the original page definition. That was this text:
for section for page xml to section from "$(page.name).xml" delete page endfor endfor
To generate output for each page, we're going to iterate through the sections one more time. Since we're deleting old <page> entities and loading new ones from the XML definitions, we need to iterate through the sections and pages over again. This is the code that generates the output for each page:
for section for page include "template.gsl" endfor endfor
The include command executes GSL code in another file. We're going to do all the hard work in a separate file, which I've called template.gsl, so that it's easy to change the HTML generation independently from the top-level GSL code. This is good practice for several reasons:
It's nice, in larger projects, that each big code generation task sits in its own file where it can be owned by a single person.
We can add more templates - to produce other types of output - for the same model very easily and safely.
And you'll see in later examples that we tend to write a single GSL file for each output we want to produce. In XNF - the tool we use for larger-scale code generation projects - these scripts are called "targets".
The HTML template looks like this:
.template 1 .echo "Generating $(page.name) page..." .output "$(page.name).html" <!DOCTYPE...> <html> ... </html> .endtemplate
Most of it is fairly straight-forward, though you do need to understand how XHTML and CSS work (and I'm not going to explain that here).
The template starts by setting template mode on. This means that any GSL commands we want to use here must start with a dot. It makes the HTML very easy to read and to maintain.
Let's look at the chunk of code that produces the site index. This is - in our version of the web site generator - a menu that is embedded into each page. The CSS stylesheet can place this menu anywhere on the page. Here is the GSL code that generates it:
.for site.section <h3 class="menu_heading">$(section.name)</h3> <ul class="menu_item"> . for page <li><a class="menu_item" href="$(page.name).html">$(page.title)</a></li> . endfor </ul> .endfor
The interesting thing here is that we say for site.section in order to iterate through the sections. The site. prefix is a parent scope name, it tells GSL "look for all sections in the current site". If we don't use the scope name, GSL would look for all sections in the current scope (the page) and find nothing. This is a common beginner's error.
Note that the parent scope is not always needed. These two blocks do exactly the same thing:
.for site.section . for page . endfor .endfor
and:
.for site.section . for section.page . endfor .endfor
But the first form is simpler and I recommend you drop explicit parent scope names when you are "tunneling into" the XML data tree.
Near the end of the template you see this construction:
.for content $(content.string ()) .endfor
What is going on here? The answer is, we're grabbing the whole <content> block, including all the XML it contains, as a single string. Conveniently, XHTML is also XML, so we can read the XHTML content block as part of our XML data file. As a bonus, GSL will also validate it and tell you if there are errors, such as missing or malformed tags.
The scope string() function returns a string that holds the XML value of the specified entity. For the index page, it returns this value (as a single string):
<content><h3>Close to you</h3><p>We're just around the corner, if you live near by.</p><h3>Always open</h3><p>And if we're closed, just come back tomorrow.</p><h3>Cheap and convenient</h3><p>Much cheaper and easier than growing your own vegetables and fruit.</p></content>
When we enclose this in $( and ), it writes the string to the current output file. Thus we generate the body of the web page.
In our first draft we read the XML data from several files and we constructed a single tree with all the data we needed to generate code. This two-pass approach is the way I recommend you construct all GSL code generators:
The final web site generator consists of three pieces, shown in listings 6, 7, and 8:
Here is the revised web site generator.
.# Since we run the script off the XML file, it starts in .# template mode. .template 0 ignorecase = 0 for section for page xml to section from "$(page.name).xml" delete page endfor endfor for section for page include "template.gsl" endfor endfor .endtemplate
Here is the template for the HTML output.
.# This whole script runs in template mode. .# .template 1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html> <head> <title>$(page.title)</title> <link rel="stylesheet" href="default.css" type="text/css"/> </head> <body> <div id="left_container"> <div id="logo_container"> <a href="index.html"><img id="logo" src="$(page.name).jpg"/></a> </div> <div id="menu_container"> .for site.section <h3 class="menu_heading">$(section.name)</h3> <ul class="menu_item"> . for page <li><a class="menu_item" href="$(page.name).html">$(page.title)</a></li> . endfor </ul> .endfor <h3 class="menu_heading">Copyright</h3> </div> <div id="copyright"> <p>$(copyright)</p> </div> <h3 class="menu_heading"> </h3> </div> <div id="right_container"> <div id="title_container"> <h1 id="title">$(page.title)</h1> <h2 id="title">$(page.subtitle)</h2> </div> <div id="content_container"> <!-- Page content --> .for content $(content.string ()) .endfor <!-- End page content --> </div> </div> </body> </html> .endtemplate
Here is the CSS file. This is not generated; I assume you'll copy and modify it for each web site, since it defines all the look and feel:
/ Global defaults / * { margin: 0; padding: 0; } BODY { font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10pt; } / Left column container / #left_container { float: left; width: 220px; margin: 0; } / Right column container / #right_container { margin-left: 220px; } / Logo (left, top) container / #logo_container { height: 100px; } / Logo itself / #logo { display: block; padding: 6pt; margin-left: auto; margin-right: auto; } / Menu (left, bottom) container / #menu_container { color: black; background-color: #b9cdd8; } H3.menu_heading { color: white; background-color: #01415c; font-size: 10pt; line-height: 16pt; font-variant: small-caps; text-indent: 10pt; } UL.menu_item { font-variant: normal; list-style: none; border-width: 0 0 1pt 0; border-style: solid; border-color: white; line-height: 16pt; } UL.menu_item LI { border-width: 1pt 0 0 0; border-style: solid; border-color: white; text-indent: 15pt; line-height: 15pt; } #title_container { color: white; background-color: #01415c; height: 100px; position: relative; } H1#title { width: 80%; position: absolute; font-variant: small-caps; margin-left: 20pt; margin-top: 20pt; font-size: 18pt; } H2#title { width: 80%; color: #b9cdd8; position: absolute; font-variant: small-caps; text-align: right; margin-top: 45pt; margin-left: 20pt; font-size: 12pt; border-width: 1pt 0 0 0; border-style: dashed; border-color: #b9cdd8; } / Content (right, bottom) container / #content_container { width: 80%; margin: 20pt; } #content_container H1 { margin-top: 12pt; background-color: #b9cdd8; font-size: 14pt; font-variant: small-caps; text-indent: 10pt; } #content_container H2 { margin-top: 12pt; font-variant: small-caps; font-size: 12pt; padding-left: 10pt; } #content_container H3 { margin-top: 11pt; font-variant: small-caps; font-size: 11pt; padding-left: 10pt; } #content_container H4 { margin-top: 10pt; font-variant: small-caps; font-size: 10pt; padding-left: 10pt; } #content_container UL { margin: 1em; margin-left: 2em; margin-right: 2em; } #content_container LI { margin-left: 2em; } #content_container P { margin: 1em; margin-left: 2em; margin-right: 2em; } #content_container PRE { background-color: #E0E0E0; margin: 1em; margin-left: 4em; margin-right: 4em; } #content_container TABLE { margin-left: 3em; } #content_container TD { padding-left: 1em; } / Disclaimer (bottom right, below content / #copyright P { font-size: 7pt; background-color: #b9cdd8; border-width: 1pt 1pt 1pt 1pt; border-style: solid; border-color: #b9cdd8; margin: 0pt; padding: 1em; color: #01415c; } / Links / A:active { text-decoration: none; font-weight: bold; color: #01415c; } A:link { text-decoration: none; font-weight: bold; color: #01415c; } A:visited { text-decoration: none; font-weight: bold; color: #01415c; } A[HREF]:hover { background-color: #b9cdd8; color: black; } A.menu_item:active { text-decoration: none; color: black; } A.menu_item:link { text-decoration: none; color: black; } A.menu_item:visited { text-decoration: none; color: black; } A.menu_item[HREF]:hover { color: red; } A:link IMG, A:visited IMG { border-style: none; }
To build the final web site, make sure the site.xml specifies the correct script:
<site copyright = "Copyright © Local Grocer" script = "sitegen.gsl" >
And then build the web site using the same command as previously:
gsl site
The HTML template and the CSS file are made for each other. Note that:
It's an interesting exercise to re-implement our code generator using other code generation tools. For example, if you're familiar with XSLT, try building the web site generator using that. You may find you need to cheat, for example putting the whole web site model into a single file.
I've shown you how to design a simple model, and bring it to life using GSL. This web site generator is actually based on one that I use for some of my own web sites. You can extend this model in many directions, for instance:
But most of all, the point of this example is to teach you how to use GSL in your daily work. As you've seen, it's easy to create models, and it's easy to change them. This is the secret of code generation - you don't need to get it right the first time. Models are hard to get right. So go ahead and experiment, since GSL makes it cheap to change your mind.
In this article we defined a simple model for a web site, and we built a code generation toolset for that model. In our very simple case, the toolset consists of about 100 lines of GSL. Using that, we can turn fifty lines of modeling language into about three times that amount of perfect HTML.
In the next article in this series we'll design a much more sophisticated model using the XNF (XML Normal Form) modeling tool, which is a MOP tool that we use to build MOP tools. I'll go into more detail on different aspects of GSL's syntax and show you how to turn MOP theory into real practice on a medium-scale problem.