Tuesday, June 17, 2008

Abstract Base Classes vs Model Table Inheritance

by Kevin Fricovsky

With the recent QuerySet-Refactor branch merged into trunk the django developer community is finally gifted with one of the more popular, outstanding ticket requests they've been waiting for... model inheritance. Funny thing is people don't really seem to care, aren't ready to try it yet, or some don't fully grok it yet. Well, that's no good because it promotes code reuse, code reduction, and increases developer productivity. So let's take a look at the three existing options for extending your model classes, take a peek at what's going on in our database, and what type of syntactical sugar (or magic) are we gaining? Then you can make the decision as to which is best for you.

Introduction

This isn't going to be an introduction to Model Table Inheritance (MTI) or Abstract base Classes (ABC), that's what the documentation is for. This also isn't going to get into the discussion or debate about "issues" or "gotchas" that exist. Read the docs, play around with it yourself, and visit the forums if you have any questions or concerns.

My goal today is to review the 3 options that exist for reducing code (keeping things DRY) — composition (MTI), inheritance (ABC), and relationship (OneToOneField). We'll take a look at the model definition (which we use to define our database schema), the resulting sql and the syntax for referencing the properties of our model instances. Let's get right into it...

Actually, before we begin... this and a few other local project concepts will be used as the basis for future tutorials. We'll explore evolving the models, db performance tweaks, constraints, etc. My code here is meant for example only. So yea, grasp the concepts ... do as I say but not as I do kinda thing. Or maybe neither. Now let's get into it...

Model Table Inheritance(MTI)

We're going to reference two model classes I use to power this blog — ContentItemBase and Post. Let's take a peek at ContentItemBase first as I employ it as the "base" for all the content classes I use throughout my blog (and I use a lot - Post, Link, Video, Photo, Music, etc).

class ContentItemBase(models.Model):
  #TODO: document
  title = models.CharField(_('title'), max_length=100, unique_for_date="publish_on")
  slug = models.SlugField(_('slug'), prepopulate_from=('title',),)
  created_on = models.DateTimeField(_('created on'), auto_now_add=True, editable=False, )
  updated_on = models.DateTimeField(_('updated on'), editable=False)
  publish_on = models.DateTimeField(_('publish on'), )
  tags = TagField()
  status = models.IntegerField(_('status'), choices=enums.CONTENT_STATUS_CHOICES, default=enums.DEFAULT_CONTENT_STATUS, db_index=True)
  comment_status = models.IntegerField(_('comment status'), choices=enums.COMMENT_STATUS_CHOICES, default=enums.DEFAULT_COMMENT_STATUS, db_index=True)
  
  class Meta:
    pass

What I'm doing here is trying to define an object that shares in similar properties all my custom content classes will share as well. Each content item has a title, for example. Each has tags, a status, a comment_status, a slug, etc. So why would I want to duplicate my code when I can define one class that is reusable? Let's not repeat ourselves and reduce whenever possible, right? Ah, why did I even bother to ask - you already know the answer. So let's take a look at my truncated version of my Post model...

class Post(ContentItemBase):
  #TODO: document  
  DEFAULT_INPUT_FORMAT = 'X'
  
  INPUT_FORMAT_CHOICES = (
  ('X','XHTML'),
  ('M','Markdown'),
  ('R','Resructured Text'),
  )

  teaser = models.TextField(_('teaser'), blank=True, null=False)

Well, woopdy doo — who cares? I don't see anything special here. Wrong. Take a look at the class constructor. Instead of inheriting from models.Model this time we're inheriting from ContentItemBase. What does this mean? Simple, my Post class is now composed of all the properties, method, etc that I defined in my parent ContentItemBase. So, for example, I didn't have to reproduce the title property definition and so forth. This child Post class is a ContentItemBase and therefore shares its members. That's grand right? Let's take a look at the generated database schema definition for both of these...

CREATE TABLE "dizzy_posts" (
    "contentitembase_ptr_id" integer NOT NULL UNIQUE PRIMARY KEY REFERENCES "dizzy_contentitembase" ("id"),
    "teaser" text NULL,
    ...
)

CREATE TABLE "dizzy_contentitembase" (
    "id" integer NOT NULL PRIMARY KEY,
    "title" varchar(100) NOT NULL,
    "slug" varchar(50) NOT NULL,
    "created_on" datetime NOT NULL,
    "updated_on" datetime NOT NULL,
    "publish_on" datetime NOT NULL,
    "tags" varchar(255) NOT NULL,
    "status" integer NOT NULL,
    "comment_status" integer NOT NULL
)

See anything interesting? I do. Behind the scenes django created two tables - one for the ContentItemBase model and another for the Post model. Ohhhh, so there is a one-to-one relationship here. Yuppo. The dizzy_posts table has a column named contentitembase_ptr_id. This column holds the ID of the associated dizzy_contentitembase record. Why do we care? We care because that means we have a join between the tables. Is this a bad thing? No, not really. But depending on your DB architecture you may want to be kept informed as to how many how many joins are being executed, how often, etc. These all are general performance concerns and while most of the time it's nothing to break a sweat over, it's just good to be aware. Let's take a look at how we'd reference the title property via the python interpreter and the generated sql...

First we'll grab a record from the database with an ID = 1.

>>> p = Post.objects.get(id=1)
>>> p.title
u'moo'

Next let's take a look at what's going on under the hood. What sql was just executed to query to generate the resulting Post instance? Wait, no. Let's not. Did you just see that? I made a reference to the title property without having to reference the relationship the database created for us. There's the syntactical magic we're now gaining from QuerySet-Refactor. If you were using a ForeignKey or a OneToOneField to maintain the relationship you would have to do something like post.contentitembase.title which actually makes a lot of sense because it's a foreign key in the database and the syntax here represents that relationship. But in this example, as far as the model api knows, title is a member of Post and easily accessible. So let's take a look at the generated sql.

>>> from django.db import connection
>>> connection.queries[-1]
{'time': '0.001', 'sql': u'SELECT "dizzy_contentitembase"."id", "dizzy_contentitembase"."title", 
"dizzy_contentitembase"."slug", "dizzy_contentitembase"."created_on", "dizzy_contentitembase"."updated_on", 
"dizzy_contentitembase"."publish_on", "dizzy_contentitembase"."tags", "dizzy_contentitembase"."status", 
"dizzy_contentitembase"."comment_status", "dizzy_posts"."contentitembase_ptr_id", "dizzy_posts"."teaser"
 FROM "dizzy_posts" INNER JOIN "dizzy_contentitembase" ON ("dizzy_posts"."contentitembase_ptr_id" = 
"dizzy_contentitembase"."id") WHERE "dizzy_posts"."contentitembase_ptr_id" = 1  ORDER BY 
"dizzy_contentitembase"."publish_on" DESC'}

Alright now, I see what's going on. There's that join right there — INNER JOIN "dizzy_contentitembase" ON ("dizzy_posts"."contentitembase_ptr_id" = "dizzy_contentitembase"."id"). So what do we get with MTI? We get the added benefit of intuitive syntax on the model and we get a separate database table associattion between parent and child. Is this good? Is this bad? Nah, we can discuss that some other time. Let's move on to Abstract Base Classes.

Abstract Base Classes

Sticking with the same theme, we'll continue to use the ContentItemBase and Post models as our examples classes. This time we're going to see how ABC differs from MTI. It should be pretty straight forward. Let's take a look first at the ContentItemBase definition.

class ContentItemBase(models.Model):
  #TODO: document
  title = models.CharField(_('title'), max_length=100, unique_for_date="publish_on")
  slug = models.SlugField(_('slug'), prepopulate_from=('title',),)
  created_on = models.DateTimeField(_('created on'), auto_now_add=True, editable=False, )
  updated_on = models.DateTimeField(_('updated on'), editable=False)
  publish_on = models.DateTimeField(_('publish on'), )
  tags = TagField()
  status = models.IntegerField(_('status'), choices=enums.CONTENT_STATUS_CHOICES, default=enums.DEFAULT_CONTENT_STATUS, db_index=True)
  comment_status = models.IntegerField(_('comment status'), choices=enums.COMMENT_STATUS_CHOICES, default=enums.DEFAULT_COMMENT_STATUS, db_index=True)
  
  class Meta:
    abstract = True

Hmmm.... not much changed. Wrong. One very important thing changed. The Meta sub class definition now contains abstract = True. Previously we implemented a pass and ignored any custom Meta definitions. By setting this value here you are telling django something about this class - hence the reason it's defined in the Meta subclass. You're saying "Hey!" this is an abstract class and you know what, treat me like one damn it! So, it's a small change but we'll see soon that it's a big one. Let's take a look next at the Post model...

class Post(ContentItemBase):
  #TODO: document  
  DEFAULT_INPUT_FORMAT = 'X'
  
  INPUT_FORMAT_CHOICES = (
  ('X','XHTML'),
  ('M','Markdown'),
  ('R','Resructured Text'),
  )

  teaser = models.TextField(_('teaser'), blank=True, null=False)

Errr...? This is exactly the same as the MTI example. Yes! Nothing changed on the child Post class. It too inherits from ContentItemBase but we didn't have to do anything different here. So what'd the big whooohaa then? Let's move to the database...

CREATE TABLE "dizzy_posts" (
    "id" integer NOT NULL PRIMARY KEY,
    "title" varchar(100) NOT NULL,
    "slug" varchar(50) NOT NULL,
    "created_on" datetime NOT NULL,
    "updated_on" datetime NOT NULL,
    "publish_on" datetime NOT NULL,
    "tags" varchar(255) NOT NULL,
    "status" integer NOT NULL,
    "comment_status" integer NOT NULL,
    "teaser" text NULL,
    ...
);

Hello! There's no dizzy_contentitembase table? Nope. When using ABC django "flattens" the relationship between parent and child classes and essentially merges the two on the backend datastore. What does this mean Einstein? Correct, no joins when querying for your inherited members. Let's dive into the code...

First, let's grab a Post instance from the database.

>>> p = Post.objects.get(id=1)
>>> p.title
u'moo'

Good. So, we have the same nice, simple, intuitive property reference syntax here and why wouldn't we — the title field bring referenced is a column on the entity's table and doesn't require any joining via a database relationship. Let's take a look at the resulting sql.

>>> connection.queries[-1]
{'time': '0.000', 'sql': u'SELECT "dizzy_posts"."id", "dizzy_posts"."title", "dizzy_posts"."slug", 
"dizzy_posts"."created_on", "dizzy_posts"."updated_on", "dizzy_posts"."publish_on", "dizzy_posts"."tags", 
"dizzy_posts"."status", "dizzy_posts"."comment_status", "dizzy_posts"."teaser" FROM "dizzy_posts" WHERE 
"dizzy_posts"."id" = 1  ORDER BY "dizzy_posts"."publish_on" DESC'}

See that? No joins mom! Pretty slick. Not much more to say here other than, good job fellas! I like this one a lot.

Classic OneToOne Field

Before I let you go we need to take a look at the classic way we'd implement this solution. Well, there's two ways we could implement this in the "classic" way. One would be to copy and paste each property definition into each model (talk about not being DRY) and the other is to create a direct relationship between ContentItemBase and Post models. Sounds kinda like MTI doesn't it? Well, it's very close but not quite the cigar. I'll keep this one short, since most of you already know this design.

Again, we'll start by taking a peek at the ContentItemBase model definition. This is exactly the same as the MTI solution.

class ContentItemBase(models.Model):
  #TODO: document
  title = models.CharField(_('title'), max_length=100, unique_for_date="publish_on")
  slug = models.SlugField(_('slug'), prepopulate_from=('title',),)
  created_on = models.DateTimeField(_('created on'), auto_now_add=True, editable=False, )
  updated_on = models.DateTimeField(_('updated on'), editable=False)
  publish_on = models.DateTimeField(_('publish on'), )
  tags = TagField()
  status = models.IntegerField(_('status'), choices=enums.CONTENT_STATUS_CHOICES, default=enums.DEFAULT_CONTENT_STATUS, db_index=True)
  comment_status = models.IntegerField(_('comment status'), choices=enums.COMMENT_STATUS_CHOICES, default=enums.DEFAULT_COMMENT_STATUS, db_index=True)
  
  class Meta:
    pass

Now let's take a look at the Post model definition. Pay attention, because this one is a little bit different.

class Post(models.Model):
  #TODO: document
  DEFAULT_INPUT_FORMAT = 'X'
  
  INPUT_FORMAT_CHOICES = (
  ('X','XHTML'),
  ('M','Markdown'),
  ('R','Resructured Text'),
  )

  contentitembase = models.OneToOneField(ContentItemBase, primary_key=True)
  teaser = models.TextField(_('teaser'), blank=True, null=False)

See what we did here? We added a new property called contentitembase and we defined it as a OneToOneField with it's association to the ContentItemBase model. So what does that look like behind the scenes?

CREATE TABLE "dizzy_posts" (
    "contentitembase_id" integer NOT NULL UNIQUE PRIMARY KEY REFERENCES "dizzy_contentitembase" ("id"),
    "teaser" text NOT NULL,
	...
);


CREATE TABLE "dizzy_contentitembase" (
    "id" integer NOT NULL PRIMARY KEY,
    "title" varchar(100) NOT NULL,
    "slug" varchar(50) NOT NULL,
    "created_on" datetime NOT NULL,
    "updated_on" datetime NOT NULL,
    "publish_on" datetime NOT NULL,
    "tags" varchar(255) NOT NULL,
    "status" integer NOT NULL,
    "comment_status" integer NOT NULL
)

So, we're very close to MTI's table schema here. What we have now is an association between the dizzy_posts and dizzy_contentitembase tables via the foreign key dizzy_posts.contentitem_id. So yes there are differences here between MTA and OneToOne. So let's take a peek at the generated sql...

>>> post = Post.objects.get(pk=1)
>>> connection.queries[-1]
{'index': 16, 'time': '0.000', 'sql': u'SELECT "dizzy_posts"."contentitembase_id", "dizzy_posts"."teaser" FROM "dizzy_posts" WHERE "dizzy_posts"."contentitembase_id" = 1 '}
>>> post.contentitembase.title
u'foo'
>>> connection.queries[-1]
{'index': 17, 'time': '0.000', 'sql': u'SELECT "dizzy_contentitembase"."id", "dizzy_contentitembase"."title", 
"dizzy_contentitembase"."slug", "dizzy_contentitembase"."created_on", "dizzy_contentitembase"."updated_on", 
"dizzy_contentitembase"."publish_on", "dizzy_contentitembase"."tags", "dizzy_contentitembase"."status", 
"dizzy_contentitembase"."comment_status" FROM "dizzy_contentitembase" WHERE "dizzy_contentitembase"."id" 
= 1 '}

Now that's a little different. The first line where I retrieve the post, I chose to use the primarykey (pk) value, and in this case that's the ContentItemBase ID and not the Post id. We set the Post's contentitembase property on the model to a OneToOneField and also defined the field to be the unique key. This properly enforces database integrity by allowing only one ContenItemBase to be associated with only one Post.. For database retrieval we have a handful of options. There's a few ways we could generate additional indexes on the Post table beyond the exiting primary key, and if needed and we could traverse the relationship by querying the ContentItemBase to retrieve the requested Post via it's slug for example, but that's not for this tutorial (although you can read more and review great examples in the documentation). One important note to not overlook here — using the syntax above to retrieve the Post instance via it's pk required an additional query to retrieve the title of the post. Again, this is just one of the performance concerns you need to keep in the back of your head when designing your models and therefore your database tables.

Conclusion

What I didn't want to get into too much here today is the arguments for and against composition and inheritance. In conversations with other developers regarding this topic I've heard issues exist with inheritance and bugs/issues in the admin but I can't speak to any of those right now, and I don't bring them up to cry wolf. I just want you to be aware. Spend time testing out the water. Go start a discussion on IRC.

For example, I need to ask some people about polymorphism here. What are the obstacles with dealing with parent/child relationships across the object graph? I assume if I request a Parent but receive a Child via a Manger class, for example, that I can somehow make reference from one to the other. But honestly, I haven't had to get into that level of granularity yet... so I have more testing to do myself. But the more people talking about it, means the more people we'll start to see implementing it. I think the django team did a terrific job here. Now, whether or not one solution is better than the other isn't for me to tell you. It's for you to find out on your own. If you're interested in follow up reading - Eric Florenzano has written up a good post discussing his opinions on composition vs inheritance and even goes into the concept of mixins on another post.

Bookmark and Share

Leave a comment

Please leave your comments below.