Thursday, March 20, 2008

Attributes and items

So, here I am trying to further my metaobject protocol library. I'm now in the eat-my-own-dog-food phase of sorts, as - after having written a MOP for POJOs, I'm trying to write actual MOPs for some dynamic language implementations, most notably, Jython and JRuby. And pretty soon, I hit an obvious design problem (which is okay, really - this is still an exercise in exploring the right approach). Namely, lots of languages actually have two namespaces when it comes to accessing data belonging to an object. I'll refer to one of them as "attributes" (as that's what they're called in both Ruby and Python) and the other are "items", which are elements of some container object. All objects have attributes, but only some objects (containers) have items. Some languages don't make a distinction, most notably, JavaScript. In JavaScript, all objects are containers and they only have items (and an item can be a function, in which case it functions as a method on the object). Other languages (Ruby, Python) will distinguish between the two; the containers are arrays/lists and hashes/dictionaries/maps. As a matter of fact, it helps thinking of Java as having the distinction - JavaBeans properties are the attributes, and arrays, Maps, and Lists will have items.

I'd like to think that most people's mental model of objects actually distinguishes the two.

Now, to make matters a bit more complicated, in lots of languages the container API is actually just a syntactic sugar. Give an object a [] and a []= method, and it's a container in Ruby! Give it __getitem__, __setitem__, and few others, and it's a container in Python! Honestly, this is okay - as a byproduct of duck typing, one shouldn't expect to there be any sort of an explicit declaration, right?

For ordered containers, most languages will also make it possible to manipulate subranges as well.

Bottom line is, I feel this is a big deal to solve in interoperable manner, as the raison d'être of this library is to allow interoperability between programs written in different languages within a single JVM; I imagine in most cases the programs will pass complex data structures built out of lists and dictionaries to one another, so it feels... essential to get this right. It also feels like something that can rightfully belong in a generic MOP as most languages do have the concept of ordered sequences and associative arrays. Of course, I might also be wrong here; it is also an essential goal to not end up with a baroque specification that contains everything plus the kitchen sink.

So, here am I wondering whether this is something that can be made sufficiently unified across the languages to the point that if a Ruby program is given a Python dictionary, and it calls []= on it, it actually ends up being translated into a __getitem__ call. The goal seems worthwhile, and is certainly possible but I'm not entirely sure how much of an effort will it take. There's only one way to find out though :-)

No comments: