Joe Hruska . com

What do I want to be when I grow up?

GoDaddy disables RescueTime.com with NO Warning!

We are not too pleased with GoDaddy right now.
We had the RescueTime.com domain set to auto-renew every 2 years.  The credit card they had on file was no longer valid (one of the frequent instances where banks randomly send out new credit cards and invalidate old ones to keep ahead fraud).
The amazing thing is that GoDaddy sent the notice that the billing failed at midnight last night.  And then promptly parked the domain!  No grace period.
All of the domains that we DON’T have set to auto-renew, GoDaddy bombards us with notices.  RescueTime.com?  No courtesy note saying, “hey, even though you have 15 domain names that you’ve paid for successfully, this one isn’t auto-renewing successfully– maybe use another card?”.
They just parked the domain.
Given this, I think it is MUCH safer to NOT auto-renew.
We’ve renewed the domain – and are waiting for GoDaddy to re-enable RescueTime.com.  Apologies to folks who are being inconvenienced.  All of your data is still there and while RescueTime.com is down your tracking data is being stored locally on your computer and will be sent to our servers as soon as we are back up.
Thank you

We are not too pleased with GoDaddy right now.

We had the RescueTime.com domain set to auto-renew every 2 years.  The credit card they had on file was no longer valid (one of the frequent instances where banks randomly send out new credit cards and invalidate old ones to keep ahead fraud).

The amazing thing is that GoDaddy sent the notice that the billing failed at midnight last night.  And then promptly parked the domain!  No grace period.

All of the domains that we DON’T have set to auto-renew, GoDaddy bombards us with notices.  RescueTime.com?  No courtesy note saying, “hey, even though you have 15 domain names that you’ve paid for successfully, this one isn’t auto-renewing successfully– maybe use another card?”.

They just parked the domain.

Given this, I think it is MUCH safer to NOT auto-renew.

We’ve renewed the domain – and are waiting for GoDaddy to re-enable RescueTime.com.  Apologies to folks who are being inconvenienced.  All of your data is still there and while RescueTime.com is down your tracking data is being stored locally on your computer and will be sent to our servers as soon as we are back up.

Thank you

199 comments Digg this

What is this Y Combinator thing all about?

I’ve had a few of my friends ask me what Y Combinator was all about and what I’ve been up to.

Y Combinator (YC) provides mentorship and seed funding for early stage tech startups.  Twice a year, they accept applications from thousands of prospective startups.  Out of those thousands of applications, they interview a smaller group of potentials and out of the interviews they accept 20 to 27 companies to take part in the YC “experience”.

Tony Wright, Brian Fioca and I applied to YC in Oct. 2007 and received notice on October 19th, 2007 that we were being invited to introduce RescueTime to the YC partners.  We were all really excited about the opportunity to even interview and we spent a large amount of time researching and preparing.

Long story short (I’ll do another post on the details of my YC experience) – we were accepted into YC and Tony, Brian and I moved down to Silicon Valley on January 4th to get RescueTime, Inc. kicked off.

We spent 3 months in Silicon Valley, going through the Y Combinator program.  This essentially entailed spending an insane amount of time and being hugely productive building and refining RescueTime.com. YC holds weekly dinners for the group where they bring in guest speakers to present and do Q/A sessions.  Most of the speakers where themselves founders of startup companies.  We got the opportunity to sit down and talk with people like Marc Andreessen, Chris Sacca, Evan Williams, Paul Buchheit, Ron Conway and a ton of other really cool people.

On April 1 2008, the three of us moved back to Seattle to continue building RescueTime, Inc as a business.  In September 2008, we closed a round of financing for $875,000, from True Ventures in Silicon Valley and a number of really cool Angel investors, including former Googlers and Microsofties and Tim Ferriss the author of “The 4-hour Workweek“.

Since we raised our funding, we’ve added two awesome guys to our team (Montana Low and Mark Wolgemuth), and we’ve been pretty much focused on trying to improve the RescueTime.com offering.

191 comments Digg this

Example MySQL configuration tuned for InnoDB engine

Someone asked for a sample InnoDB configuration and since nearly all of the MySQL tables in RescueTime are InnoDB, I thought I would post this.

################# /etc/my.cnf ###################
[client]
port                                 = 3306
socket                               = /var/run/mysqld/mysqld.sock

[mysqld_safe]
socket                               = /var/run/mysqld/mysqld.sock
nice                                 = 0

[mysqld]
user                                 = mysql
pid-file                             = /var/run/mysqld/mysqld.pid
datadir                              = /db/data/mysql
log-bin                              = /db/log/mysql/
mysqld-binlog-bin-index                        = /db/log/mysql/mysqld-bin.index
expire-logs-days                     = 7
log-error                            = /db/log/mysql/mysqld.err
log-slow-queries
long-query-time                      = 1
relay-log                            = /db/log/mysql/mysql-relay
relay-log-index                      = /db/log/mysql/mysql-relay.index
#default-storage-engine              = innobase
innodb_data_home_dir                 = /db/data/mysql
innodb_file_per_table
innodb_autoextend_increment          = 50
innodb_log_group_home_dir            = /db/log/mysql
innodb_log_files_in_group            = 2
innodb_log_file_size                 = 100M

# Start Replication Parameters
#server-id                           = 1
innodb_flush_log_at_trx_commit       = 1
#sync-binlog                         = 1
# End Replication Parameters

# Start Performance Parameters
#tmpdir                              = /tmp-ram
max_connections                      = 50
max_heap_table_size                  = 512M
tmp_table_size                       = 512M
table_cache                          = 128
sort_buffer_size                     = 4M
query_cache_min_res_unit             = 1K
query_cache_limit                    = 1M
query_cache_size                     = 100M
max_allowed_packet                   = 16M
thread_stack                         = 128K
thread_cache_size                    = 8
innodb_buffer_pool_size              = 2000M
innodb_additional_mem_pool_size      = 4M
innodb_lock_wait_timeout             = 2
innodb_file_io_threads               = 4
innodb_thread_concurrency            = 8
innodb_flush_method                  = O_DIRECT
transaction-isolation                = READ-COMMITTED
skip-external-locking
# End Performance Parameters

################# /etc/my.cnf ###################
3 comments Digg this

MySQL InnoDB Clustered Indexes and Rails

Recently we experienced an issue with a high level of IO waits on the primary RescueTime.com database server, which resulted in our MySQL DB being the major bottle neck to good performance.

RescueTime essentially serves as a large data warehouse for thousands of users who send us application and website usage information 24 hours a day. Most of our database activity is made up of insert and updates coming in from our API web service mongrels. At the moment, we are running 7 dedicated mongrels that do nothing but handle incoming data streams from our users.

One of the major functions of the API mongrel is to build real-time summaries of the incoming second by second attention data. We summarize this information by hour and by day, and this is currently happening in real-time as the API mongrels receive the data streams. This summary data is used by the www.RescueTime.com dashboard to display the usage information to our users, primarily through real-time graphs and analytics.

As we monitored the average round trip for a data stream update, we noticed a significant trend up as we added users. It became quickly obvious that we did not have a scalable solution as our user numbers grew 7% week over week. During our peak load times (6am – 3pm PST), we started seeing timeouts from the user client applications which resulted in a less than optimal user experience.

By using the excellent SeattleRB’s Production Log Analyzer we were able to see that we were averaging 3.5 seconds per incoming data stream, and with thousands of users (and growing) we were quickly approaching a limit of how many users we could handle with our current infrastructure.

One of the options was to bring on more hardware resources, but we are a bootstrapped startup and we really felt we could squeeze more performance out of our existing hardware.

I dug into the specifics of where that 3.5 seconds was coming from and it turned out that the insert and update statements were taking far longer than we had thought. The reason? Indexes. We had 8 indexes on our daily summary table and 7 on our hourly summaries. We had been so focused on pushing new features that I had really neglected basic DBA duties by being lazy and adding indexes in response to performance issues.

So, I set out to remove as many of the indexes as possible from our summary tables. As I researched how InnoDB stores data, I realized that what InnoDB does by default is to create an clustered index based on the primary key. As I read further, someone equated the InnoDB clustered index to Oracle’s Index Organized Tables (IOT’s). KAACHING! A light bulb went off in my head about what was happening.

Rails and ActiveRecord make it really easy for developers, but ActiveRecord does a poor job of structuring database tables for performance. By default, ActiveRecord will create a database table with a single column primary key of “id” which is an autonumber. The problem with this concept is that instead of taking advantage of InnoDB clustered indexes, it essentially dumbs the table back down to be heap-organized, where rows are stored in the order that they are inserted.

This isn’t an issue with smaller data sets, but RescueTime currently adds over 10 million records each week and that number is increasing 7% week over week.

I analyzed on of our primary summary tables (daily_summaries) to find a natural primary key and found that I was able to create a primary key that was 3 columns wide of User_id (int), summary_date(date), and durationable_id(int). After identifying the proposed new primary key, I started analyzing the queries that used that daily_summaries and found that all of the queries used at LEAST user_id and 90% of them used user_id, summary_date and durationable_id together.

I thought to myself, what would happen if I removed ALL of the indexes off of daily_summaries, rebuilt the table to drop the primary key of the autonumber’d “ID” column and add a primary key of (user_id, summary_date and duartionable_id). So I took one of our test instances and did just that. Right away the overall table size dropped to less than half it’s original size, since all of those indexes took up more space than the data did.

So, how about performance – Well, of course, the first thing that happened was the Rails application puked all over itself, but that was to be expected – I had removed the ID column Rails was expecting and replaced it with a multiple column primary key that it knew nothing about. I worked with our resident Rails expert, Brian Fioca, and he was able to locate a great Rails gem called composite_primary_keys that allowed him to modify the existing Rails code with a fairly minimal effort and still allow the use of ActiveRecord in most places.

Once the Rails code was working again we were astonished to see that RescueTime read performance was significantly better, even with NO indexes other than the primary key on the daily_summaries table. The reason was the InnoDB clustered index has reorganized the daily_summaries data on disk, in the order that we retrieved it. Meaning that a single read operation now returned back multiple records for a single user_id, thus there were less read operations to return our query.

So, how about insert and update speed? Again there were significant improvements in performance since each insert and update did not require overhead to maintain the additional indexes.

Getting the production database rebuilt with the new clustered indexes took a couple of hours due to over 300 million records worth of data that we had collected to date, but the effort was well more than worth it. The CPU of IO WAIT stats of our database server dropped to almost nothing compared to what it was previously. Our database was less than half the size, overall performance on the site improved significantly, and incoming data streams were being handled in less than a second.

Had we had the money, it would have been easy to throw hardware at this issue instead of spending a couple of days to iteratively tune our existing infrastructure. The benefit that we saw from the InnoDB clustered index was well worth the man-hours spent researching and implementing the solution.

10 comments Digg this

TipJoy Testing

Testing a fellow YCombinator.com startup – TipJoy.com

Trying to add Tip button…

Comments are off for this post Digg this

Hello, world!

Not much to say for my first blog post - so I'll just say...


#include <iostream>

int main()
{
  std::cout << "Hello, world!" << std::endl;
  return 0;
}

------------------------------------------
SELECT 'Hello, world!' FROM dual;





      
Comments are off for this post Digg this