The Cost of Incident Response
Ever since personal computers have been populating businesses, security issues have skyrocketed. And infosec pros have been asking the same question in various forms: “What is the return on investment from all this security software?” and “Can we do without it, or do it cheaper?” To date, no one has come up with an appropriate ROI calculation, but we can answer the second question by developing a formula for how much an incident costs and comparing it to the cost of our security investments.
Calculating costs of a malware incident
Before I get into the formula and discussion, I have a disclaimer to explain. I am not an economist or an accountant so these formulas are very basic; but they are effective enough that CFOs have accepted their input. Since I couldn’t disprove a negative, I developed a simple formula showing how much a malware outbreak may cost, based on employee payroll. It is a formula that accounts for the time it takes employees to clean up after a malware outbreak:
(((U*hup)*1.3)*DT) + (((A*hap)*1.3)*RT) = incident cost
- U = the number of infected users
- hup = average hourly user’s pay
- 1.3 = overhead rate (use your organization’s rate)
- DT = hours of downtime
- A = the number of computer technicians and administrators
- hap = average hourly technician/administrator pay
- RT = hours of repair time
The formula is constructed as follows. U is the number of infected users, times the hourly user pay (hup), multiplied by 1.3 (this overhead rate is flexible and represents the cost of indirect costs and benefits on top of hourly wages), which is then multiplied by downtime (DT) in hours. This portion of the formula calculates costs for the non-productive time spent by the user community. Next, add the number of computer technicians and administrators (A) involved in the clean-up, multiplied by the hourly administrator pay (hap), multiplied by the same 1.3 overhead variable to calculate the additional benefit costs, and then multiplied by the repair time (RT). This portion of the formula results in infosec employee costs incurred for each outbreak.
For a more detailed calculation, we could drill down to variables such as the number of infected machines, broken down by user machines and servers. This would refine our numbers, but for the points that I’m making here, this formula provides a good starting point.
Small business example
Here’s a hypothetical example. Let’s say we have an outbreak that hit a facility of 100 host machines. This facility has one administrator/technician for every 20 machines. The average user pay is $23 dollars an hour, and the average computer technician pay is $25 per hour. In this case, for every hour of down time, the company loses $2,990 in non-productive employee time. The company must also pay the five technicians $162.50, based on a simple assumption that each technician will reimage the machines. To restore trust in the machines, each reimage takes four hours, and each technician can reimage four machines at a time, which translates to 20 hours of work per technician in repair time. This amounts to another $3,250 on top of the users’ $11,960 (four hours down time per user, although in the real world, the lucky ones would be up in four hours while the last machines to be fixed would be down for the full 20 hours), so this event costs the company roughly $15,210 of direct employee costs:
(((100*$23)*1.3)*4) + (((5*$25)*1.3)*20) = $15,210
Now let’s assume that this same outbreak in the same organization was detected early on. Using the same assumptions and parameters, let’s say the outbreak was detected on three machines, rather than all 100. Our four hours of user downtime now only costs $358.80, and using only one technician for four hours, costs just $130. That’s a total employee cost of only $488.80. Add in just a half day of downtime as opposed to half a week and everyone is in a much better place:
(((3*$23)*1.3)*4) + (((1*$25)*1.3)*4) = $488.80
Large organization example
Now take a look at the formula in a larger organization that has higher pay and better benefits for its employees. In this case, we will presume the average technician earns $90,000 per year and the office employee earns $85,000, which equals hourly salaries of $43.27 and $40.86 respectively. In this company, benefits are 60% of base pay, so we have to increase the 1.3 multiplier to 1.6. We will assume the same number of computers and employees involved in the incident, so our formula is now:
(((100*$40.86)*1.6)*4) + (((5*$43.27)*1.6)*20) = $33,073.60
In this case study, the technicians’ time costs the company $6,923.20, and the other employees’ time was worth $26,150.40. Compared to the same size incident in the small business, the larger organization incurs double the total incident cost with just a different overhead rate and higher salaries, even when the incident involves the same number of machines.
This little formula does not take into account things such as lost revenue during downtime, or the cost of a tarnished corporate reputation. Rather, it’s a starting point for discussions with those in financial circles.
In the earlier days of malware outbreaks when I was working as a counter-virus specialist, I recall working 36 out of 48 hours to contain some very fast spreading worms. That didn’t include the 8 hours we were disconnected from the Internet; and at that time, some companies generated $3 million revenue per hour from the Internet. These days, the malware authors have figured out that a lower volume of traffic makes malware more difficult to detect. By making their creations stealthier, they have a greater latency, and can raise the cost to an organization by stealing their intellectual property. The longer the malware is active, the more information it can export to its controllers.
An ounce of prevention
It costs three times more to clean up an incident than to prevent one
I have often stated that it costs three times more to clean up an incident than to prevent one. Simple formulas like this explain why we’re investing in our security software and why earlier detection is so important. When you can use utilities such at Tenable Network Security’s SecurityCenter Continuous View™ for continuous network monitoring, you can detect abnormalities in your traffic, leading to earlier detection of suspected outbreaks, less overall response costs, and less stress and headaches resulting from those cleanup activities by those of us involved with incident response.