Security Metrics - Counting Security and Compliance Incidents
Many IT security managers I speak with want to produce some sort of graph or statistical data that records the amount of security incidents occurring on the network. This data is used to not only inform management of business risk, but to also justify budget for ongoing security and compliance activities. In this blog, we will consider several high-level sources of "incident data" and discuss their relevance for tracking in the enterprise.
What is an Incident?
Every organization will have a different level of comfort for what constitutes an incident. I think some useful parameters when tracking incidents include:
- Was the incident self inflicted?
- Was the organization a specific target, or was this a public issue?
- Was an data lost or a server compromised?
- Was there a policy violation?
In the 90s, I felt that many security professionals focused on external threats. However, in the past few years, I've seen security professionals tend to focus on employee activity for insider threats and client-side compromises.
When classifying an incident, attempting to track the root cause is very important. For example, for a virus infected computer it would be very interesting to know if the computer had anti-virus software, if its configuration had been hardened to be resistant to infection or if there were some vulnerability on the system that allowed the infection. Simply tracking virus infections may seem useful, but being able to determine the root cause of these incidents is more useful.
With the increased use of automation by bad actors on the Internet, it can sometimes seem like an attack is directed at your organization when in fact the attack is pure chance. Consider credit card fraud. If you run an e-commerce site within your infrastructure it may be used by someone with a bank of stolen credit card data to test if the cards are valid, or perhaps to even brute force the CCV. In this case and incident is occurring, but more than likely your organization will not be the target of any great losses.
Tracking Manually Reported Incidents
Relying on humans to volunteer data that they have been compromised, lost data or suffered some sort of denial of service is difficult. The individuals involved may not even know that they have been a victim. The individuals involved could also mis-diagnose an issue such as mistaking a slow Windows XP laptop for actually having a virus infection.
Having said that, if you are a large organization, tracking this data through help desk tickets or a corporate incident response system can provide useful statistics for managers. Classifying this data up front and sharing this classification with managers can help explain fluctuations in the trends.
Tracking Automated Incidents
The "holy grail" of this sort of ongoing tracking is to automate as much as possible. If you have a NIDS, UTM, DLP or SIM, there certainly is no shortage of data, however, presenting raw log and event counts to managers over time can be very misleading.
Presenting raw counts and trends does justify the existence of the device producing the logs. For example, a SPAM blocking appliance that was able to graph out the amount of stopped SPAM emails is interesting, but what a business manager would want to know is how much real SPAM got through and was there an impact to the business? Similarly, an IPS will gladly tell you about all of the various events it stopped, but it is the events that got through undetected which are the most concerning.
Tracking internal abuse activity is also something that is difficult to audit as the act of tracking an individual can be a political issue as well as a technical one. Many organizations I've dealt with that have DLP monitoring teams report directly to the HR group. I think this is an interesting trend, as any data leakage event could really be a violation of some sort of company policy.
I have yet to see a SIM, NIDS or DLP technology in production that has been configured so well and has such a low false positive rate, that it's results get sent in real time to any consumer outside of the security monitoring or technical staff. More often, I see this technology leveraged by human operators which make the final informed decision to start considering an event or alert an incident.
Tracking Policy Violations
Regardless if policy violations are detected and reported through an automated system or through a human observation, they deserve unique incident tracking as compared to a security incident. Their uniqueness comes from the fact that most policies are black and white.
This is not a tongue and cheek comment. Compared to a network IDS signature for a zero day attack or some sort of behavioral anomaly system, detecting deviations from acceptable configuration settings, network activity, lists of authorized users and so on is much simpler. These policy violations should also have instantly recognizable penalties, such as taking HR action, reducing budget and so on.
If your organization is governed by PCI, FISMA, NERC or other types of regulations, then tracking violations of these policies is something that your management will be looking for. In more sophisticated environments, one type of monitoring activity could be applicable to multiple regulations.
Proving a Negative
One last thought I'd like to leave with readers is to structure what they consider an event around the "proof" that they are secure. Proving that you are secure is sort of like proving a negative, because there may have always been something that was missed. However, if you read through Richard Bejtlich's list of ways one can "prove" that things are OK, this could give your organization a list of items that need to be reported on in an automated fashion.
I've suggested this sort of approach to some of our customers and I've gotten the complaint that this type of monitoring is too invasive and has an impact on the operational business. I could not disagree more and feel that most organizations who claim this either don't have enough resources to run their network, or they don't have the proper priorities from senior management.
I tell Tenable customers to follow these guidelines:
- Produce graphs and charts that show your monitoring and log collection systems are indeed working and trending over time. You have to do this to justify current solutions and perhaps to obtain coverage for places in your network you don't have visibility.
- Produce unique reports on the unique types of policy violations that are driven by regulatory compliance or internal corporate IT governance.
- Manually track historic and open "incidents". Pull in data from your monitoring solution, but don't let your monitoring solution automatically conclude what is an incident that management needs to be aware of.
- Consider if your organization and operations has situations where offering continued "proof" that your network is secure.