Data modeling is the science (and art) of creating the database schema that most purely matches the real world objects involved in your project. Part of this is defining how the objects relate to one another. Let’s say your application tracks Items and Categories. If each item can only belong to one category, then you have a one-to-many relationship; categories have many items. But if an item can appear in more than one category, you have a many-to-many relationship.
There are two ways to handle many-to-many relationships in Ruby on Rails, and this article will cover both.
The simplest approach is if you don’t need to store any information about the relationship itself. You just want to know what items are in each category, and what categories each item belongs to. This is called “has_and_belongs_to_many”. We use has_and_belongs_to_many associations in our models, and create a join table in our database. Here are your models:
# app/models/category.rb class Category < ActiveRecord::Base has_and_belongs_to_many :items end # app/models/item.rb class Item < ActiveRecord::Base has_and_belongs_to_many :categories end
Next, let’s create the join table by generating a new migration. From the command line:
script/generate migration AddCategoriesItemsJoinTable
Now we’ll edit the migration file it creates:
class AddCategoriesItemsJoinTable < ActiveRecord::Migration def self.up create_table :categories_items, :id => false do |t| t.integer :category_id t.integer :item_id end end def self.down drop_table :categories_items end end
:id => false, which keeps the migration from generating a primary key. The name of the table is a combination of the two table names we’re joining, in alphabetical order. This is how Rails knows how to find the join table automatically.
The other way to setup a many-to-many relationship between objects is used if you do, or think you will, need to track info on the relationship itself. When was item X added to category Y? That’s info you can’t store in the category or item tables, because it’s info about the relationship. In Rails, this is called a
has_many :through association, and it’s really just as easy as the first way.
First, we’re going to create a new model, that defines the relationship between items and categories. For back of a better name, let’s call it a Categorization. Setup your models like this:
# app/models/category.rb class Category < ActiveRecord::Base has_many :categorizations has_many :items, :through => :categorizations end # app/models/item.rb class Item < ActiveRecord::Base has_many :categorizations has_many :categories, :through => :categorizations end # app/models/categorization.rb class Categorization < ActiveRecord::Base belongs_to :category belongs_to :item end
We’re connecting both original models to
:categorizations, and then connecting the them to each other via the intermediary Categorization model. Now, instead of a join table whose only function is connecting the others, we add a full-fledged table to manage our new model:
class CreateCategorizations < ActiveRecord::Migration def self.up create_table :categorizations do |t| t.integer :category_id t.integer :item_id t.timestamps end end def self.down drop_table :categorizations end end
We still have the two foreign key integer columns, but we’ve removed
:id => false so this table will have an id column of its own. We also added timestamps, so we’ll be able to tell when an item was added to a specific category. I also created a migration that removes the old
categories_items table, but it’s not shown here.
Which is Better?
The simpler has_and_belongs_to_many approach has a small advantage when you *know* you’re not going to need to track info about the relationship itself. If this is the case, there’s a very slight performance gain because you’re not loading an extra model class at runtime.
More often than not, however, you’re going to eventually want to track relationship-specific data. We used the example of tracking when a relationship was created. Another would be if you want to track, over time, how many times a visitor clicks on an item under each category. That counter needs to be stored in the Categorization model, and that’s a reason not to use the simpler has_and_belongs_to_many approach.
I’ve created an example application (get it here) with tags for each version –