We can normally perform most maintenance work while keeping everything accessible but last weekend's maintenance was different. We performed a major infrastructure upgrade which will help us overcome some of the problems we've been having for the past couple of weeks due to higher than anticipated growth. We had issues with some email notifications and reminders being randomly skipped and some pages not reliably loading, or doing so at a snail's pace.
The infrastructure that powers the site did not allow us to react to load changes quickly enough and the site was simply overloading at times. In order to remedy the situation we had to make the decision to switch to a new, more flexible infrastructure and make this coincide with some of the behind the scenes enhancements we had been working on for some time to speed things up.
For the past two weeks, we had been setting up and testing this new infrastructure and decided on pulling the switch on sunday. Most everything had been switched over by saturday night with no downtime for our users, and because we had rehearsed things several times we were quite confident the whole process would only result in about an hour of downtime during the night from saturday to sunday. Unfortunately, shortly after we had begun the process of migrating the live data to the new infrastructure, our main ISP suffered some severe bandwidth issues which did affect the process drastically, making it roughly 8 times slower then normal. To top it off, some things could not be fully tested without switching to the new system and we experienced some configuration issues early on when we brought everything back online. All in all, it took the better part of sunday to make the site almost fully operational but some issues remained until late afternoon on monday.
We are now confident everything has been fully resolved and is operating normally.
But what have you gained in all this?
- well, things will be consistently faster for one. We have much more power available and more room and flexibility to grow too.
- we can now focus on adding new features and tweaking things.
- some features that would have exacerbated our capacity problems can now be considered and included, such as automatic reminder calls.
The next post will be about a feature enhancement.. we swear :)