A Few Hiccups and A Lot of Improvements

Earlier this morning, we finished making some long overdue, but behind-the-scene improvements to Sifter. We’ve been working on these changes since our unscheduled downtime a couple of weeks ago, and taking steps to make sure that doesn’t happen again has been our top priority. Unfortunately, with these changes, there’s a chance for some minor downtime now to lay a better foundation for the future.

The Hiccups

In preparing for and making the switch, we experienced a few additional windows of downtime over the last couple of days. We’re really sorry for that, but long-term, we think the improvements will far outweigh the temporary inconvenience. We really appreciate everyone’s patience during those times.

The Improvements

For the most part, we wanted to make things more stable and reliable, faster, safer, and easier for us to work with so that going forward we can spend our time focusing on the application and not the platform. So we’ve spent the last couple of weeks making dramatic improvements to our processes and platforms.

Server Stack

We’ve switched from Mongrels to Passenger, and we’re using Ruby Enterprise Edition. We’re seeing dramatic improvements in memory usage and response times, and hopefully the application feels a little snappier and more responsive. Passenger is also making life easier on us in several ways.

Source Control

We made the transition from Subversion to Git, and consequently GitHub. We also ironed out a few other source control things along the way. The bottom-line is that source control is going to be much less of a headache going forward.

Backups

It’s something you always hope that you never have to think about, but backups are important. We’ve gone from daily snapshots to doing daily and weekly snapshots keeping the last 7 days and last 4 weeks, and we’re keeping copies of those backups on S3. This is all on top of the Raid configuration. Long-story short. Our backups are now redundant and distributed. (Naturally, the backups are encrypted as well since they’re being pushed to S3.)

Monitoring

We’ve been using Monit since we launched, but our outage two weeks ago exposed the fact that a local monitoring service wouldn’t help if the entire machine goes down. So, while we’re stilling using Monit, we’ve also begun using Pingdom as an external monitoring service dedicated to keeping an eye on everything and letting us know if anything is offline.

Summary

That’s really just a quick overview of the more significant changes that we’ve made, but rest assured that we’ve done much more than that. We just don’t want to bore you with details. We also owe some thanks to Ryan Schwartz for his help with some of the fiddlier bits of server administration. He definitely helped make life easier. Now that all of these improvements are behind us, we’re free to focus on the application itself. Thanks for your patience during the transition. We’ve got some big plans for the coming months.