Posts Tagged ‘Database’

Review of the “Database is Your Friend” Workshop by Xavier Shay

August 23, 2010

Xavier ShayThis Saturday I had the privilege of attending Xavier Shay’s “Database is Your Friend” workshop right here in Kansas City. It explores the enterprise domain of high-traffic, mission-critical databases in Rails. At $350, the price tag trumps just about every regional Ruby or Rails conference I’ve heard of. It was worth the price, and more. It would have been well worth the expense of travel and lodging if it had been out of town.

I’ve never had such a thorough learning experience. Xavier had a git repository setup with several branches of a basic Rails app, which we all cloned. He would give a 10-15 minute overview of a difficult concept, then we would checkout a given branch, and spend about 20 minutes completing whatever part of the code Xavier had omitted. We used test-driven development much of the time, which forced us to fully understand the cause and effect of each technique.

Xavier was in constant motion, visiting each of the six students to ensure everybody got it. After all, the exercises themselves were mission critical, because the following lessons built upon them. After experiencing this, I wonder why all teaching isn’t done this way. The “Long Lecture, Here’s Your Homework, Now Get Out” system I remember from college could benefit a lot from this.

It’s obvious a lot of love went into the design of the workshop. The flow was so natural that several times, one of us would ask a question and Xavier would answer that the next segment addresses it. The result was a natural progression of solving more and more complex problems.

If you struggle with (or wonder about) data integrity and high-traffic database issues, take the opportunity to learn from Xavier Shay on his current tour.

PS –

While this workshop is definitely worth travelling, I didn’t have to. Wes Garrison of Databasically, who helps organize our monthly Ruby meetings, took the initiative. Xavier normally attaches his workshop to conferences, but Wes looked at the schedule and saw that Xavier had a small gap between his Chicago and Austin dates. Wes offered to arrange both travel and lodging for Xavier, who luckily agreed to squeeze another workshop into his busy schedule. Wes, thank you.

Advertisements

Basic many-to-many Associations in Rails

January 29, 2010

View the Source Code

Many-to-many relationships

Data modeling is the science (and art) of creating the database schema that most purely matches the real world objects involved in your project. Part of this is defining how the objects relate to one another. Let’s say your application tracks Items and Categories. If each item can only belong to one category, then you have a one-to-many relationship; categories have many items. But if an item can appear in more than one category, you have a many-to-many relationship.

There are two ways to handle many-to-many relationships in Ruby on Rails, and this article will cover both.

has_and_belongs_to_many

The simplest approach is if you don’t need to store any information about the relationship itself. You just want to know what items are in each category, and what categories each item belongs to. This is called “has_and_belongs_to_many”. We use has_and_belongs_to_many associations in our models, and create a join table in our database. Here are your models:

# app/models/category.rb
class Category < ActiveRecord::Base
  has_and_belongs_to_many :items
end

# app/models/item.rb
class Item < ActiveRecord::Base
  has_and_belongs_to_many :categories
end

Next, let’s create the join table by generating a new migration. From the command line:

script/generate migration AddCategoriesItemsJoinTable

Now we’ll edit the migration file it creates:

class AddCategoriesItemsJoinTable < ActiveRecord::Migration
  def self.up
    create_table :categories_items, :id => false do |t|
      t.integer :category_id
      t.integer :item_id
    end
  end

  def self.down
    drop_table :categories_items
  end
end

Notice the :id => false, which keeps the migration from generating a primary key. The name of the table is a combination of the two table names we’re joining, in alphabetical order. This is how Rails knows how to find the join table automatically.

has_many :through

The other way to setup a many-to-many relationship between objects is used if you do, or think you will, need to track info on the relationship itself. When was item X added to category Y? That’s info you can’t store in the category or item tables, because it’s info about the relationship. In Rails, this is called a has_many :through association, and it’s really just as easy as the first way.

First, we’re going to create a new model, that defines the relationship between items and categories. For back of a better name, let’s call it a Categorization. Setup your models like this:

# app/models/category.rb
class Category < ActiveRecord::Base
  has_many :categorizations
  has_many :items, :through => :categorizations
end

# app/models/item.rb
class Item < ActiveRecord::Base
  has_many :categorizations
  has_many :categories, :through => :categorizations
end

# app/models/categorization.rb
class Categorization < ActiveRecord::Base
  belongs_to :category
  belongs_to :item
end

We’re connecting both original models to :categorizations, and then connecting the them to each other via the intermediary Categorization model. Now, instead of a join table whose only function is connecting the others, we add a full-fledged table to manage our new model:

class CreateCategorizations < ActiveRecord::Migration
  def self.up
    create_table :categorizations do |t|
      t.integer :category_id
      t.integer :item_id

      t.timestamps
    end
  end

  def self.down
    drop_table :categorizations
  end
end

We still have the two foreign key integer columns, but we’ve removed :id => false so this table will have an id column of its own. We also added timestamps, so we’ll be able to tell when an item was added to a specific category. I also created a migration that removes the old categories_items table, but it’s not shown here.

Which is Better?

The simpler has_and_belongs_to_many approach has a small advantage when you *know* you’re not going to need to track info about the relationship itself. If this is the case, there’s a very slight performance gain because you’re not loading an extra model class at runtime.

More often than not, however, you’re going to eventually want to track relationship-specific data. We used the example of tracking when a relationship was created. Another would be if you want to track, over time, how many times a visitor clicks on an item under each category. That counter needs to be stored in the Categorization model, and that’s a reason not to use the simpler has_and_belongs_to_many approach.

I’ve created an example application (get it here) with tags for each version – has_and_belongs_to_many, and has_many :through.

Adding Columns and Default Data to Existing Models

January 25, 2010

If you have an existing model in your Ruby on Rails application and you’ve already run the migration to create the table in the database, you may want to add columns later on. This is easy. Let’s say we have an ActiveRecord user model, and we want to add a rating for each user, to allow others to vote a user up or down. First, create the migration:

script/generate migration AddRatingToUsers

Or if you want to be fancy, we can use a little magic and specify our new column along with it:

script/generate migration AddRatingToUsers rating:integer

Rails is smart enough to figure out the table name from the migration name, and you’ll see this in the migration file that is created:

class AddRatingToUsers < ActiveRecord::Migration
  def self.up
    add_column :users, :rating, :integer
  end

  def self.down
    remove_column :users, :rating
  end
end

If you run the migration right away, you’ll run into problems because all the users that exist will have a rating of nil, and this will probably break any calculations you have in your code. Let’s update it with a default value:

    add_column :users, :rating, :integer, :default => 0

Now the database will default all existing and new users to a rating of zero.

More Complex Default Values

But what if you need a more complex default? Let’s say you want to add a unique API key for each user. Maybe we have a user instance method called “generate_api_key”, and it’s called whenever a new user is created:

class User < ActiveRecord::Base
  after_create :generate_api_key

  def generate_api_key
    self.update_attribute(:api_key, self.username + '123')
  end
end

But what about the users that were already in the database when we added the column? Should we put that in the migration, too? Absolutely not. It can be done, but migrations are not meant for data loading – only creating and changing the structure of your underlying database.

A better way is a rake task. These belong in the lib/tasks folder of your application, and they’re easy to create:

# lib/tasks/invitation_tokens.rake
namespace :seed do
  desc "generate api keys for users that don't have one already"
  task :api_keys => :environment do
    User.all(:conditions => {:api_key => nil}).each do |user|
      user.generate_api_key
    end
  end
end

Then, you can call this rake task from the command line like this:

rake seed:api_keys

This has a couple benefits. First, if you setup a new instance of your rails app on a different server (or even somewhere else on the same server) you can call this rake task to jumpstart your users. Even better, since you’re relying on a model method, you can unit test that method to make sure it performs as expected.

Running your Rails Test Database in Memory (RAM)

August 4, 2009

I recently read a blog post by Amr Mostafa that benchmarked running MySQL databases in memory. I’ve been trying to figure out how to do this, and he had the answer: use the tmpfs filesystem, which runs in memory, to store your database.  I’ll have to figure out just how difficult that is later, since I’m not a super DBA…or even really a DBA at all.

Amr is not a Rails developer, and the purpose of his benchmark was to simulate regular web traffic.  His results seemed ambiguous, but I noticed something missing in his trial: writes to the database.  His benchmark tests only used select statements, which read from the database.  While this is the majority of most database usage, I think the perfomance gain during writes would tip the scales decidedly in favor of running MySQL in memory, if you can afford the RAM.

This has an added benefit to us Ruby on Rails developers: we could potentially use it to dramatically increase the speed of our Test Driven Development, especially for those of use (should be all of us) using autotest!  TDD requires running tests every few minutes, or even seconds.  And writing to the database in tests is a lot bigger piece of the puzzle, since the database is recreated from scratch before every test.

For a lot of us, test suites are manageable.  Autotest only runs tests for changed files during normal development, occasionally running the entire test suite.  This, coupled with judicious mocking, stubbing and unit testing techniques can keep most test suites under control.  But larger apps use increasingly more tests, and higher-level tools like RSpec can be especially resource-intensive.

I tried to contact Mostafa about running some benchmarks for write speed, but my comment was considered spam!  I did get a smaller message through, so hopefully I’ll hear back.  If so, I’ll post a link to his thoughts/results.  Until then, I may dabble with doing this myself, and seeing what amateurish benchmarks I can run myself.