Crawling Is the Wrong Way To Do Attack Surface Mapping
When analyzing methods to identify assets, crawling should be one tool in the toolbox, but not the only one. If you use crawling exclusively, you’ll likely miss a lot of assets.
A common method of DNS enumeration is through crawling. Like those old scripts that would crawl a site looking for subdomains that may be correlated, modern crawlers are set loose on the entirety of the internet, looking for anything they can find. This would seem to be a much superior method of identifying assets over brute force DNS enumeration, and it does tend to be. But it is far from ideal. Here’s why:
- Crawling cannot go to 100% depth. Even Google, which spends millions of dollars a year on infrastructure, gets nowhere near 100% crawl depth of a company. Think about how many pages are hidden behind login pages or registration forms. For that reason alone, crawling could never get anywhere close to identifying all the linked assets. Additionally, some types of application code suffer from the “calendaring problem,” which means that the crawler must have a hard-coded depth maximum or must have some software to detect if it’s hit something with infinite depth, like a calendar, that cannot be perfectly enumerated. In either case, something might be at a different depth beyond where the crawler is willing to go, and therefore the crawler will miss things outside of that maximum crawl depth.
- Application code acts differently for crawlers. There are companies that make a living preventing robotic code from reaching certain parts of application logic that are sensitive to being repeatedly hit or costly. There is a ton of code out there that attempts to identify and thwart application crawlers, meaning that anything that is prevented from being crawled will not be identified.
- A crawler cannot crawl things it does not know about. This is a chicken and the egg problem. How can a crawler crawl something it does not know about? Typically, crawlers start with a seed list of domains, subdomains, or both. For instance, Google started with something called DMOZ (an open directory of web links organized by humans). For a crawler to be effective, it needs to have an extremely accurate seed list, and anything not in the seed list must be linked to by something in the seed list or must be in a link-chain with at least one node that is within the seed list. Unfortunately, there are countless examples of where this does not happen. For instance, who links to a printer, an IP telephone, or a firewall web interface? If it does happen, it is rare; therefore, anything not linked to and falling outside of the seed list will effectively be invisible.
For the best crawler in the world, turn to Google – just because Google does not know about your data does not mean an attacker will not be able to find it in other ways. Target was compromised by a third-party HVAC (heating, ventilation and air conditioning) company that retained a backdoor into the HVAC system for maintenance. Because the HVAC system company was compromised, Target was compromised.
When discussing how to mitigate the risks, Target should have known that the HVAC system was publicly accessible. At minimum, it should have been locked down and allowed maintenance from only certain IP addresses, and ideally only during agreed-upon maintenance windows. Do you think Google would have shown the HVAC system’s web interface had you typed “target.com” into the search bar? Absolutely not.
Crawlers are okay tools to leverage and even quite useful but they are also not the right tool if your goal is to know everything about your web presence. One simple way to think about a traditional asset inventory is to say, “Find everything that Google doesn’t know about,” and then add everything Google does know about. Still, the problem is that Google only finds the “crawlable web,” and an asset inventory contains many assets that are not crawlable because no one links to them.
Think about how many items in the typical office environment have web interfaces and have no direct links to them.
- Routers and switches
- IDSs and firewalls
- HVAC systems
- Elevator controls
- And more
But beyond specialized systems, an even worse subclass of systems is typically ignored, including test systems, staging systems, QA systems and administration consoles. These systems often have no links to them anywhere on the internet because they are not meant to be publicly accessible. They are explicitly designed to be hidden from the public internet, whether by network controls or by obfuscation.
If the asset inventory does not have these test/staging/QA/admin type servers in it, the system is almost useless. The problem is that an enormous number of vulnerabilities live in these systems. Let us dig into a real-world use case.
Sands Hotel & Casino
The 2014 Sands Hotel & Casino hack is a complex story that spans the seemingly unrelated arena of geopolitical anti-Semitism and computer security. At the time, the Sands was owned by one of the wealthiest men in the world, Sheldon Adelson, a Jewish billionaire who passed away in January 2021. According to Dell SecureWorks, hacking teams from Iran were targeting him due to his religious beliefs and his place in the Jewish community.
Leading up to the attack, at least two distinct hacking teams were probing the Virtual Private Network (VPN) servers belonging to the Sands. They were attempting brute force attacks to log into the site and take control over user accounts. The Sands’s security teams realized what was happening due to the vast number of failed login attempts. As a result, Sands hardened the application and applied additional levels of security.
However, when the Sands doubled down on its VPN security, the hackers redirected their attack efforts. This time, they looked at other assets that the Sands ran. Presumably, the adversaries probed around until five days after the VPN brute force attempt, and the hackers found a test server that the Sands casino ran. The purpose of the site was to test and review code before it went live and therefore had very few protections compared to the main web application used by the casino.
It was not that the Sands did not know the attackers were attempting to breach it; the company had no idea where the next attack would come. Because it did not treat its test site the same way it treated its production site, the Sands was compromised. That site then allowed the attackers to pivot using a tool called Mimikatz to reveal usernames and passwords. The attackers gained access to virtually every digital file within the Sands corporate Intranet.
Eventually, they got the login credentials for a senior engineer whose access allowed the attackers to access the main gaming company’s servers. Ultimately, on February 10, 2014, the attackers released a small piece of code that permanently wiped all the computers the hackers had accessed. This was not a matter of extracting money – this was a vendetta-inspired attack aimed directly at Adelson’s bottom line.
The attackers won that round in what amounted to a very easily defendable attack against a target that should not have been publicly accessible in the first place. With millions of dollars at risk, and the normally strong security posture associated with a casino, there is no doubt that had the security teams known the risk associated with the site, they would have taken proactive measures. But how could they know the risks if they were not testing it?
If the asset inventory misses these critical assets, then you are likely missing a substantial percent of the interesting issues. Therefore, as you analyze the different methods of identification of assets, crawling should be one tool in the toolbox. However, if you or your vendor are using crawling exclusively, then you are likely missing a lot of assets. Crawling should not be used in a vacuum.
Visit the Tenable.asm product page to learn more about attack surface management.
Are You Vulnerable to the Latest Exploits?
Enter your email to receive the latest cyber exposure alerts in your inbox.