Example Network Behavior Analysis Detection (NBAD) with the Log Correlation Engine
All Log Correlation Engine licenses include the stats daemon. This daemon reads any log source, including netflow or sniffed TCP sessions, builds a baseline of normal activity and then creates alerts when there is activity which is statistically significant. This blog entry will explain in greater detail how the stats daemon accomplishes this, and discusses several example "anomaly" detections.
Tenable's Correlation Model in General
The stats engine for log and event analysis is part of our overall strategy for detecting significant events on your network. The correlation technology includes:
- Real time vulnerability to intrusion events with most leading NIDS solutions
- More than 50 TASL scripts for sophisticated real time discovery of worms, specific types of anomalies, compliance issues and compromises
- The stats engine to find statistically significant events on your network
Each of these events is viewed under the Security Center. This means any event can be viewed for specific asset groups by asset owners as a means of filtering and reporting.
The Statistical Model
The stats daemon keeps track of the following statistics for each IP address on your network:
- number of client and server connections
- number of internal, external and inbound connections
- number of unique events or normalized logs which occur
These statistics are maintained for each hour of each day. The statistics consider both the rolling averages of any particular occurrence, as well as the grouping of the values which build the rolling averages. For more information on how the mathematics works, please read the "Statistics Daemon Guide".
Each of the items tracked by the stats daemon are not "critical" in nature. For example, a Windows IIS server may normally have "server" connections and not act like a client. It may do outbound connections for logging, to obtain patch updates and perhaps synchronize with other servers.
In the classic example of "NBAD", if the machine were compromised and used for a botnet it may begin to connect to IRC or some other form of proprietary command and control. It may even start to scan other devices. In those cases, perhaps the client/server statistics as well as the balance of internal/external and inbound connection rates would generate an alert.
We'll consider a few examples of statistical events with screen shots from live networks. These screen shots came from live systems, and have had their IP addresses obscured.
First Example Analysis - Spike of Web traffic on an email server
Below is a summary of all stats anomalies for the last 24 hours at a major university. There are a lot because the stats daemon is very sensitive to small types of change. These particular events have been selected for systems which accept SMTP traffic. There are a variety of events, including changes in connections and a single "Large_Anomaly" event. Since mail operates 24x7, detecting slight changes in the balance of email coming in from the outside world as compared to being delivered internally isn't that interesting. However, the Statistics-Large_Anomaly event is certainly worth investigating.
Below is the SYSLOG content for the Statistics-Large_Anomaly event. It tells us a few things. First it is a "network spike" event. The TNM-TCP_Session_Completed event is generated by the Tenable Network Monitor which creates logs for when specific TCP sessions start and stop. Normally for this IP address between 4:00 AM and 5:00 AM, we get 207 hits of this event with a standard deviation of 403 hits. However, today we got 45178 hits.
Investigating further, using the IP address in question, for the last 24 hours, we can look at a summary of all events as well as a time-based summary. Although interesting, the specific events (specifically the Snort events) are fairly normal for this host. If they weren't we'd get statistical events for the new events as well. The time based summary is more interesting. Clearly between 4:00 AM and 5:00 AM there was a spike.
Investigating even further, we consider two more queries involving both a port summary and a list of involved IP addresses. The conclusion is that a majority of the traffic occurred on port 80 and to a single IP address. The second IP address (the one ending with .129.242) carried the bulk of the events.
So what was this? This turned out to be a proxy device which incorrectly went out to the Internet to keep an updated cache of a certain web site. The proxy was accidently placed on the list of SMTP servers because it acted like an SMTP when being scanned. We've actually been able to detect these sorts of things with the Passive Vulnerability Scanner, but in this case, only active scanning was used to identity SMTP devices.
You may be reading this entry and ask yourself, why lead off with a "false positive" for an example? The reason is that Tenable feels that statistically profiling a network's behaviors is very enlightening, but when an anomaly occurs it is not necessarily an evil hacker. More likely it is a change in a server or an issue with assumptions made before monitoring started. Let's look at a second example.
Second Example Analysis - Firewall Traffic Spikes
Below is a screen shot of raw NetScreen firewall logs event spikes and TNM activity of UDP and ICMP traffic. They have all occurred between 11:00 PM and midnight (23:00 for us ex-military types). These particular firewall events are all "accept" events, which means the firewall has passed this traffic.
In the case of the network which was being monitored here, the TNM agent was sniffing "inside" the firewall, so we were getting "double" hits. The purpose of this example is to show that NBAD shouldn't necessarily apply to netflow or direct network monitoring. Many types of anomalies can be discovered with logs from firewalls, proxies and even applications.
In the particular example shown, during the time period, event rates of around 10 were the norm, but in some cases, more than 20,000 had occurred during that time period.
Third Example Analysis - Multiple Hours of Misbehaving and Follow-on Activity after IDS events
The last example combines the LCE's TASL time-based correlation with events from the stats daemon. The events generated by the stats daemon also follow a "bell curve" of results. If you refer to the very first image of this blog, you can see that besides the one Statistics-Large_Anomaly event, there were several dozen events of lesser statistical significance.
Given that the stats daemon keeps a separate "model' of normalcy for each host and for each hour of the day, the LCE's TASL scripting can be used to see if a host is behaving oddly for more than one hour at a time. The standard_deviation_long_term.tasl script looks for stats events occurring more than two hours in a row.
Similarly, the ids_event_followed_by_change.tasl script looks for critical IDS events inbound to your network followed by some sort of statistical activity. The idea is to correlate an attack detected by a NIDS with a "behavior" change.
For more information
Tenable has several white papers on correlation that are free for download in our Security Event Management section. These papers are:
- "Security Event Management" - summarizes Tenable correlation technology, as well as log storage and acting on the events.
- "Correlating IDS Alerts with Vulnerability Information" - shows how NIDS alerts can be correlated with vulnerabilities found though scanning, passive analysis and host analysis
- "Advanced Event Correlation Scripting" - several case studies of TASL scripts
Tenable also has a video of the Log Correlation Engine's log analysis and correlation functions being viewed through the Security Center.