Story

One evening, when I was working on a large customer data store, there was an emergency team meeting to figure out how to handle a bug that had incorrectly hashed the Passport authentication identifiers in our SQL databases. About 20 of us were called into a 10 person conference room to discuss options.

A dev on my team reported he had already patched the code and had a SQL script to repair the existing data. However, some random guy from another team had been talking to my boss and followed us into the meeting room. He decided to take over the meeting and loudly declared that we first had to estimate how long running the script would take. My boss, normally outspoken and decisive, deferred to him. Everyone assumed he must be important and knew what he was doing.

He grabbed a whiteboard pen and starting doing a calculation on the board. He stated that updating a row in SQL probably takes at best 1 millisecond when within a transaction. He also wrote out a bunch of hash algorithm statements that he said added up to 2.5 milliseconds per row. We had 400 million rows that needed updating. He then calculated this would take 400 million x 3.5 = 1.4 billion milliseconds => 1.4 million seconds => 23 thousand minutes => 388 hours => 16 days.

No one could use the system until all the rows were updated. The room was in a panic. I thought that we should get the script started ASAP. I pulled the dev and a service engineer into the hallway to confirm they had confidence in the script and to ask them to get it running.

In the meantime, random guy started drawing up a shift rotation schedule for each person to be on site watching the script run over the next 16 days. After 15 minutes, as random guy was finalizing the shift schedule, the dev and service engineer returned. The dev whispered in my ear the results.

Random guy was literally giving a pep talk to the room about how hard it was going to be working 24 hours a day, but that it would be a worthy and valiant effort. I tried to interrupt him, but he didn't give me any opportunity.

Finally, he finished. I let the room know that we had run the script and it completed in under 15 minutes. The script updated over 400 rows per millisecond. He muttered that was impossible as his chest caved and he hung his head.

While this guy was a bit of a blow hard, he followed how developers are taught to estimate algorithms. We mentally walk through the steps, estimate the cost of each instruction, account for loops, and multiply by the number of items to get the result.

The problem was that computation speed had exceeded human imagination. It is hard for humans to understand how fast 1/400th of a millisecond goes by. We like to think we can understand that kind of processing power, but we can't be precise enough to avoid exponential errors. This was back in 2005. Today, that script would likely complete in seconds.

We shouldn't manually think through estimates. Instead, we have to run the code in a production/test environment and measure the real performance. Don't be that (random) guy.