Can You XML a DSL, And Would You Want To?

For me, one of the principles of programming has always been: “Use the programming language best suited for the job; and if there is none, create a new language.” I believe this is directly related to the idea of Domain Specific Languages (DSL) that are all the rage right now. Creating a new language that is best suited for a specific problem domain: that really says it all, doesn’t it? Still, it’s a quote that I first heard when I started my programming career, which was about twenty years ago — and it may very well be even older than that. How did people go about creating a language back then? (I mean parsing and compiling or interpreting it, not designing the language.) And how do we do it today? Has anyone ever asked the questions:

“What programming language is best suited for implementing another (DS) language?”
“At what point will creating and maintaining a new (DS) language result in less work than using a second-best language for solving the problem?”
“Can we describe a new language that is meant specifically for creating DSLs, a meta-DSL?”

I was led to these questions after reading a post by Obie Fernandez on his blog. He describes how his team created a DSL that describes a data model. Depending on its context, the DSL code will either create a table creation script, a script for creating stored procedures, or XHTML code for display in a browser. His description is not all too elaborate, I must say, so maybe I’m missing some interesting feature here. But it sounded to me like nothing more than doing some data transformations. Wouldn’t XML and some XSLT scripts have done the job here? Also, if you create a DSL for this, you have the added disadvantage that you’ll have to maintain the DSL implementation as well (even if the DSL is easy to write in Ruby). Three DSL implementations even, one for each context you can use. The idea of contexts is nice, until you need to implement an extra feature in one context that will conflict with the others. Unfortunately, Obie never got round to answer this question that I put in my comment on his blog post.

I believe a lot of XML code written out there could actually be called a DSL in its own right. Someone wrote the other day (in one of the 40,000 blogs I try to read now and then) that XML is Java’s tool for creating DSLs; and I agree to some point. To some point. I mean, I write XML every day (in the form of JSPs), so by now I’m used to writing conditional statements and loops in the form of XML tags, but it’s hardly elegant if you think of it. It’s actually plain ugly. (And if you think that’s not ugly enough, go take a look at the next step in XML programming: Jelly, executable XML. Yuck!)

Mixing data and code is (in principle) a bad thing, I think; whether you start of with data and mix in code (like in XML) or you start with creating a DSL and add in data (like in Obie’s example). I would prefer to to put the data in XML (or Yaml, or a database, or some other form of structured data) and keep the code separate. This will also give you better insight whether you’ll need to create a DSL for processing the data, of if you can simply use XSLT.

P.S. “Simply use XSLT”? Did I just write that down? I take it back! I can fully understand how having to write XSLT can drive any sane person to code his own DSL. In assembler, if needs be!

2006-03-23. 3 responses.

Comments

Obie Fernandez on 2006-03-27 at 05:55

More on Business DSLs in Ruby…

This entry is mainly in response to Danny, who
is pondering my description of writing DSLs in Ruby …
He refers to some questions
he left me as comments , asking how writing DSLs in Ruby
different from using XML/XSLT. It’s really different ac…
John Lam on 2006-04-03 at 04:12

Many folks (myself included) started out going down the XML path that you’re at now. Sometime in the future, I suspect you will get the epiphany that code and data aren’t really meant to as separate as your current implementation languages make them out to be. At that point, I think you’ll start looking for confirmation of your epiphany, and you’ll likely find it in strange places (at least from where you’re sitting right now). It’s a very old idea. But it’s a very powerful idea.
Ravi Mohan on 2006-04-18 at 11:33

“He describes how his team created a DSL that describes a data model. Depending on its context, the DSL code will either create a table creation script, a script for creating stored procedures, or XHTML code for display in a browser. His description is not all too elaborate, I must say, so maybe Iâ€™m missing some interesting feature here. But it sounded to me like nothing more than doing some data transformations.”

Your instinct is correct. It *is* just a transformation, though there are few subtleties.

This is how (all) language processing works. Here is a simplified explanation.

A string is transformed into a data structure. This process is called parsing. Various traversals can be performed on the resulting data structure(“abstract syntax tree”) to give different effects.

e.g: (pseudocode)
aString = ” 2 + 3 ”

ast = parse(aString)
// results in a datatructure like {operator:”+” , operand1= “2”
, operand2 =”3″ }

def print(anAst) { puts anAst.operand1, anAst.operator. ast.operand2}

def evaluate(ast) { return ast.operand1 + ast.operand2}

def generate_html(ast) …

def generate_sql(ast) ..
def whatever_you_want(ast) { … }

print(ast) => 2+3

evaluate(ast) => 5

etc.

So in Obie’s case , the “dsl” code is transformed into a data structure , either explicitly, or implicitly by the ruby parser.

Then various transformations (which are traversals on the data structure, either explicit or implicit) generate different results.

Yes, for any dsl (or any programming language) you can write an equivalent xml (representing the data structure) and xsl (representing the transform) . This might be a *lot* more work than just writing it in ruby (or lisp or whatever), but the concept is the same.

Having said that, John’s comment is right on the mark. Though there is nothing particulary new or tough about ruby dsl s, the ideas that “code == data” , and that given a powerful enough language, one could use the same language to represent code and data is strange to many developers.

Hope that helped.