Joe Hruska . com

What do I want to be when I grow up?

MySQL InnoDB Clustered Indexes and Rails

Recently we experienced an issue with a high level of IO waits on the primary RescueTime.com database server, which resulted in our MySQL DB being the major bottle neck to good performance.

RescueTime essentially serves as a large data warehouse for thousands of users who send us application and website usage information 24 hours a day. Most of our database activity is made up of insert and updates coming in from our API web service mongrels. At the moment, we are running 7 dedicated mongrels that do nothing but handle incoming data streams from our users.

One of the major functions of the API mongrel is to build real-time summaries of the incoming second by second attention data. We summarize this information by hour and by day, and this is currently happening in real-time as the API mongrels receive the data streams. This summary data is used by the www.RescueTime.com dashboard to display the usage information to our users, primarily through real-time graphs and analytics.

As we monitored the average round trip for a data stream update, we noticed a significant trend up as we added users. It became quickly obvious that we did not have a scalable solution as our user numbers grew 7% week over week. During our peak load times (6am – 3pm PST), we started seeing timeouts from the user client applications which resulted in a less than optimal user experience.

By using the excellent SeattleRB’s Production Log Analyzer we were able to see that we were averaging 3.5 seconds per incoming data stream, and with thousands of users (and growing) we were quickly approaching a limit of how many users we could handle with our current infrastructure.

One of the options was to bring on more hardware resources, but we are a bootstrapped startup and we really felt we could squeeze more performance out of our existing hardware.

I dug into the specifics of where that 3.5 seconds was coming from and it turned out that the insert and update statements were taking far longer than we had thought. The reason? Indexes. We had 8 indexes on our daily summary table and 7 on our hourly summaries. We had been so focused on pushing new features that I had really neglected basic DBA duties by being lazy and adding indexes in response to performance issues.

So, I set out to remove as many of the indexes as possible from our summary tables. As I researched how InnoDB stores data, I realized that what InnoDB does by default is to create an clustered index based on the primary key. As I read further, someone equated the InnoDB clustered index to Oracle’s Index Organized Tables (IOT’s). KAACHING! A light bulb went off in my head about what was happening.

Rails and ActiveRecord make it really easy for developers, but ActiveRecord does a poor job of structuring database tables for performance. By default, ActiveRecord will create a database table with a single column primary key of “id” which is an autonumber. The problem with this concept is that instead of taking advantage of InnoDB clustered indexes, it essentially dumbs the table back down to be heap-organized, where rows are stored in the order that they are inserted.

This isn’t an issue with smaller data sets, but RescueTime currently adds over 10 million records each week and that number is increasing 7% week over week.

I analyzed on of our primary summary tables (daily_summaries) to find a natural primary key and found that I was able to create a primary key that was 3 columns wide of User_id (int), summary_date(date), and durationable_id(int). After identifying the proposed new primary key, I started analyzing the queries that used that daily_summaries and found that all of the queries used at LEAST user_id and 90% of them used user_id, summary_date and durationable_id together.

I thought to myself, what would happen if I removed ALL of the indexes off of daily_summaries, rebuilt the table to drop the primary key of the autonumber’d “ID” column and add a primary key of (user_id, summary_date and duartionable_id). So I took one of our test instances and did just that. Right away the overall table size dropped to less than half it’s original size, since all of those indexes took up more space than the data did.

So, how about performance – Well, of course, the first thing that happened was the Rails application puked all over itself, but that was to be expected – I had removed the ID column Rails was expecting and replaced it with a multiple column primary key that it knew nothing about. I worked with our resident Rails expert, Brian Fioca, and he was able to locate a great Rails gem called composite_primary_keys that allowed him to modify the existing Rails code with a fairly minimal effort and still allow the use of ActiveRecord in most places.

Once the Rails code was working again we were astonished to see that RescueTime read performance was significantly better, even with NO indexes other than the primary key on the daily_summaries table. The reason was the InnoDB clustered index has reorganized the daily_summaries data on disk, in the order that we retrieved it. Meaning that a single read operation now returned back multiple records for a single user_id, thus there were less read operations to return our query.

So, how about insert and update speed? Again there were significant improvements in performance since each insert and update did not require overhead to maintain the additional indexes.

Getting the production database rebuilt with the new clustered indexes took a couple of hours due to over 300 million records worth of data that we had collected to date, but the effort was well more than worth it. The CPU of IO WAIT stats of our database server dropped to almost nothing compared to what it was previously. Our database was less than half the size, overall performance on the site improved significantly, and incoming data streams were being handled in less than a second.

Had we had the money, it would have been easy to throw hardware at this issue instead of spending a couple of days to iteratively tune our existing infrastructure. The benefit that we saw from the InnoDB clustered index was well worth the man-hours spent researching and implementing the solution.

10 comments Digg this

10 Comments so far

  1. Dave March 10th, 2008 9:56 am

    Joe! You started a blog! That’s awesome. You’re always very helpful and informative at the YC dinners. Now I get your good advice in blog form too :) .

  2. Suleman Ali March 10th, 2008 11:47 am

    Amazing that rebuilding the table only took a couple hours for what sounds like a 5GB+ table.

    In my limited experience, it takes a lot longer. I must be doing something wrong.

    Can you share precisely how you rebuilt the innodb table?

  3. Joe March 10th, 2008 11:59 am

    @Suleman,

    Actually I “cheated” a bit on the table rebuilds. The first thing I did was to setup a slave replicated instance on an EC2 xlarge instance (15GB RAM/4 Processors). When I brought the system down for the re-index I did all of the work on the EC2 instance, which had a lot more power than our primary DB server.

    I then dropped all of the indexes on the summary tables – which decreased the overall size of the table to less than half. Once the indexes were gone, I dropped the ID column and added the new composite primary key – this actually ran quite quickly. After the new primary key was there, I ran an optimize on the table.

    The sql I used was:

    select now();

    alter table duration_summaries
    drop index user_id,
    drop index durationable_id,
    drop index summary_date,
    drop index week,
    drop index month,
    drop index year,
    drop index user_week_year,
    drop index user_month_year,
    drop index date_user_dur_idx,
    modify column user_id int(11) first,
    modify column durationable_id int(11) after summary_date;

    select now();

    alter table duration_summaries
    drop column id,
    drop primary key,
    add primary key (user_id, summary_date, durationable_id);

    select now();

    optimize table duration_summaries;

    select now();

  4. software.simian March 11th, 2008 10:06 am

    I found the experience interesting but I found it quite questionable, because

    1. you are starting from a bad setup; only index at a time can be used, so there is not need to keep all the indexes there were.

    2. you optimized the table, which can make quite a difference, especially if there is fragmentation.
    I’m not sure, but it’s possible that this can reduce also the size of the table, reducing the unused space in the blocks.

    Seems like that the table you are talking about is quite static, have you thought about partitioning? It can improve the performance even more.

    Cheers, saverio

  5. Joe March 11th, 2008 10:25 am

    @software.simian

    1. Actually the ability to merge indexes came out with MySQL 5.0, so it is now possible to use more than one index at a time within a single query.

    2. It is possible that running the optimize table alone could have had a positive affect on performance, but it was not the major factor here. InnoDB tables are a lot less prone to row fragmentation due to the implied clustered index. Deletes out of this particular table that I use as an example are very rare.

    3. Partitioning, oh yeah – that is one of the near term plans for our overall database structure. With MySQL 5.1’s introduction of partitions, this is something I want to take advantage soon (once I get around to upgrading to 5.1)

  6. Eduardo Rocha March 11th, 2008 8:33 pm

    Joe, the reindex process seems to be mostly IO bound. How could your powerful EC2 instance help the rebuild?

  7. Joe March 11th, 2008 10:16 pm

    @Eduardo

    The primary difference on the EC2 instance was that I had 15GB of InnoDB buffer to play with, vs the 2GB InnoDB buffer on our our normal DB server. I’m not quite sure of how much time was saved overall, since I was never able to complete the table restructuring on our test server before MySQL crashed due to lack of memory.

  8. Ilya Grigorik March 12th, 2008 4:32 am

    How did you guys decrease your tablespace size? In all of my tests (InnoDB specifically), the only way to get the size down is via complete mysqldump and reimport.

    I’ll also second the partitioning comment – it sounds like you guys can get a lot of benefit from it. Likewise, I’d look at ARCHIVE engine.

    Great post!

  9. Joe March 12th, 2008 6:59 am

    @Ilya

    For InnoDB table maintenance, one of the really important my.cnf parameters is innodb_file_per_table. This will create a [tablename].ibd file for each of your InnoDB tables instead of putting it all in one really large ibdataX file. This way, you can optimize the table to recover space without dumping and reimporting. Hope that helps!

  10. Mysql Database Software April 1st, 2008 7:48 pm

    I found your post here very informative. I’m gonna look around your site a bit and see if I can find more info on Mysql Database Software. If anybody else here knows of a good site to find info on Mysql Database Software please let me know. thanks.