By Nathan Donaldson
Tags: Development
When writing a Rails application, how do you decide on the best indexes to add to your database? It might seem obvious, especially if you work on a project from scratch. The problem is a little harder when you come to optimize an existing codebase.
Recently I’ve been using two methods to work out where to put indexes. Firstly I’d strongly recommend using New Relic RPM in development mode. When running your application you can visit /newrelic to get all kinds of useful information. Here you can see the most recent rails calls:
And you can drill down to see the database calls:
And then you can drill down further to see a statement analysis. This will give you information like so:
Query | SELECT * FROM `projects` WHERE (`projects`.account_id = 3) ORDER BY position |
Select type | SIMPLE |
Possible Keys | |
Key | |
Extra | Using where; Using filesort |
So this will tell you that there are no keys used, and no possible keys to use. If you now create a migration as follows:
class CreateIndexes < ActiveRecord::Migration
def self.up
add_index :projects, :account_id
end
end
Now run the query again after migrating:
Query | SELECT * FROM `projects` WHERE (`projects`.account_id = 3) ORDER BY position |
Select type | SIMPLE |
Possible Keys | index_projects_on_account_id |
Key | index_projects_on_account_id |
Extra | Using where; Using filesort |
But we could do better. Notice that in the extra field it says “Using filesort”. This means that MySQL is doing a normal sort using the row data. If projects are usually selected by account_id and ordered by position then we have a good case for sorting by index. We can’t just add an index on the position column as this won’t be used (try it and you’ll see exactly the same results). Instead we have to add an index that covers both account_id and position:
add_index :projects, [:account_id, :position]
Now looking in RPM again you’ll see things change to:
Query | SELECT * FROM `projects` WHERE (`projects`.account_id = 3) ORDER BY position |
Select type | SIMPLE |
Possible Keys | index_projects_on_account_id_and_position |
Key | index_projects_on_account_id_and_position |
Extra | Using where |
I talked about using two tools to work out where to put indexes. New Relic is great and gives lots of useful information, but I wanted to see what queries were running without indexes as the application was used, in order decide if they required optimizing. I placed the following (ugly) snippet of code into the development or staging environment file:
config.after_initialize do
unless ActiveRecord::Base.connection.respond_to?(:old_execute)
ActiveRecord::Base.connection.class.class_eval do
alias_method :old_execute, :execute
def log_explain(str)
if str =~ /^SELECT.*WHERE/i
results = old_execute('EXPLAIN %s' % str)
while row = results.fetch_row
if row[5].blank?
File.open(RAILS_ROOT + '/log/sql.log', 'a') do |f|
f.puts str
f.puts row.inspect
end
end
end
end
end
def execute(*args)
log_explain(args.first)
old_execute(*args)
end
end
end
end
This performs a similar task to RPM in that it runs MySQL’s EXPLAIN statement on each SELECT statement that runs. However, I’ve set it up to filter out any statements that are already using keys (if row[5].blank?). Now by tailing the file log/sql.log I’ll be able to see any unindexed queries that are running on the database. I can use this information to quickly run through the whole application to find any problem areas.
Most of the database calls from a rails application are going to be quite generic, the same thing over and over, and indexing is good in this situation. Some database calls are going to be a little different – think search, statistical count, etc. You’ll need to think careful if these require indexes or not, considering how often they’re run, and how fast they need to be. Caching can be a good replacement in these areas. Also take into account that indexes slow down database writes, and indexes on string columns are going to be much more intensive that indexes on integer columns.
Of course you might not find all problem areas, or you might not be optimizing the correct queries. These methods should give you a good idea, but they’re no replacement for doing good load testing on your application. We’ll be posting more about load testing a rails application in the future.