A Pentesters Guide – Part 5 (Unmasking WAFs and Finding the Source)

August 1, 2020
NaviSec

In this article I am going to detail a non-exhaustive overview of bypassing WAFs by identifying a misconfigured underlying server. I will cover a few techniques that will include CloudFlare unmasking and identifying an AWS WAF typically deployed alongside EC2 instances. I will not cover the development of custom payloads to bypass the WAF through obfuscation.

WAF’s

Before we can talk about how to bypass WAF’s, we need to be clear on what a WAF is. A WAF, short for Web Application Firewall, typically acts as a proxy between the client (you the user) and the server itself. The firewall can analyze and scan all http requests to and from the server, this means that it can scan the contents of the requests for payloads such as SQLi, XSS or XXE and block them in realtime. If you’ve ever tried to exploit a standard reflected XSS on a CloudFlare protected host, you’ll notice you get a “This page has been blocked page”, that is the firewall filtering part of CloudFlare taking over.

Some WAF’s might not even filter payloads, sometimes they are deployed in “monitor-only mode” and are generally in those modes so they can monitor what traffic would get blocked and tweak accordingly, so as to not block something important that is a false-positive.

Other times a WAF solution might be deployed incidentally as a “nice to have” but not relied on, personally, I use CloudFlare on 0x00sec.org for web caching and automatic hands-free SSL, the bandwidth savings add up a lot. As such – I have not made attempts to secure the source IP. If you can identify the 0x00sec source after reading this article, I’ve done my job 😉

Identifying a WAF

There are many different ways to identify if you’re either being blocked by a WAF or that a site is using one. Initial tell-tale signs may include the IP address has an organization label of “CloudFlare” or you are faced explicitly with a CloudFlare captcha or error page. To perform the former technique, we just need to resolve the domain name and then make a request to a fantastic service I use everyday ipinfo.io to identify the organization name. We can parse the data right out using jq too.

dig +short 0x00sec.org
curl -s https://ipinfo.io/<ip address> | jq -r '.org'

WAF’s such as AWS Loadbalancers are harder to detect, as they can look just like an IP of an EC2 instance, and silently block malicious requests. These can be nasty and it means you can miss vulnerabilities if you’re not whitelisted for that particular assessment. With AWS, you can often identify a load balancer with the presence of “AWSLB” and “AWSLBCORS” cookies. These will be silently dropped into your browser and be found with curl -vv.

Identifying the source

The Theory

It is very common for IT administrators to leave web servers completely open on the internet without any whitelisting to the WAF upstream itself. This configuration relies on the axiom that nobody is going to know your source server IP address, otherwise you can just make requests to the server directly and bypass your WAF solution entirely.

Well – there are millions of IP addresses, if my domain name isn’t pointing to it how would you find possibly find it? Good question, luckily for us attackers, we have a deep toolbox of OSINT and scanning solutions that we can utilize to index and identify websites by their IP.

This could be used in a penetration testing engagement or in a threat hunting scenario where you’re trying unmask an onion host or an otherwise obscured asset (for c2 or other nefarious purposes etc).

Censys & Shodan (OSINT)

The first and probably easiest method is to use internet search engines such as Censys and Shodan. These search engines index things such as HTML pages therefore title tags. Assuming you’ve already done prior reconnaissance against your target, it should be clear which organizations they commonly use for public facing web assets. Some organizations for example will host everything on AWS but front it with CloudFlare. As you do more recon the layout of the organization surface area should come naturally. To get a good 0-100 overview of a company, I recommend using dnsdumpster.com to generate a map.

Next, make a search using Censys and save the IP’s that look to match your target in a text file.

If they are using AWS – there may be a lot! Keep these IP’s safe, we have a bit more to do with them later, but we’ll get to that. You may find just visiting the IP in your browser works – in which case – congrats!

Security Trails (OSINT)

Another way you can find IP’s tied to a domain is by viewing their historical IPs. You can do this with SecurityTrails DNS trails. https://securitytrails.com/domain/0x00sec.org/dns

Here we can see what A records existed and for how long. It is so common for an administrator to switch to a WAF solution after X amount of years of using it bare-metal, and do you think they configure whitelisting? No of course not, it works fine!

I originally wrote a tool to scrape these IP’s, but it stopped working about a month ago. As a temporary replacement, you can just copy the entire table body and use awk to filter the IP’s out. It’s cheap and dirty but it works in a pinch.

grep -E -o "([0-9]{1,3}[\\.]){3}[0-9]{1,3}" tails.txt | sort -u | tee -a ips.txt

DNS Enumeration

If you enumerate your targets DNS, you may find that they have something resembling a dev.example.com or staging.example.com subdomain, and it may be pointing to the source host with no WAF. This again is security through obscurity. We can use subfinder for a quick demo, but I recommend performing an exhaustive DNS enumeration as part of your reconnaissance. Generate a list of IP’s to check with the following command:

subfinder -silent -d 0x00sec.org | dnsprobe -silent | awk  '{ print $2 }'  | sort -u | tee -a ips.txt

Checking IP’s for hosts

Now that you’ve got a list of potential candidates, go through your IP addresses manually and try to remove anything that looks like a public facing site. Remove CloudFlare IP’s and look for VPS hosts, Microsoft Azure, Vultr, DigitalOcean, GCP etc. It is most likely that the organization is hosting on one of these platforms (or perhaps they use a static site host like Godaddy, it requires some attention and there is nuance). If you want – you can just spray it against all of them, but it might produce some more noise. Use your judgement.

With a little bit of bash magic, we can quickly whip up a one liner to go through each IP and make a request with our desired host header.

for ip in $(cat ips.txt); do org=$(curl -s <https://ipinfo.io/$ip> | jq -r '.org'); title=$(timeout 2 curl --tlsv1.1 -s -k -H "Host: 0x00sec.org" <https://$ip/> | pup 'title text{}'); echo "IP: $ip Title: $title Org: $org"; done

for ip in $(cat ips.txt) # iterate through each line in file
do 
	org=$(curl -s <https://ipinfo.io/$ip> | jq -r '.org') #  Get Org from IPInfo
  title=$(timeout 2 curl -s -k -H "Host: 0x00sec.org" <https://$ip/> | pup 'title text{}') # Get title
	echo "IP: $ip Title: $title Org: $org" # Print results
done

Lets break down this command.

for ip in $(cat ips.txt)

For each line in this file, assign it to a variable called “$ip” and run the following do block.

org=$(curl -s <https://ipinfo.io/$ip> | jq -r '.org')

Assign the org variable to the organization of the IP.

 title=$(timeout 2 curl -s -k -H "Host: 0x00sec.org" <https://$ip/> | pup 'title text{}')

This one requires a bit more explanation, lets break this command down again to the following:

title=$( # set the output of the following commands into $title
timeout 2 # If the command takes longer than 2 seconds, kill it
curl -s -k -H "Host: 0x00sec.org" <https://$ip/> # Request the IP with the 0x00sec.org Host header
pup 'title text{}' # Parse the HTML result and print the title

I love using pup because it lets me do some insanely quick html parsing tasks, you can find that here.

	echo "IP: $ip Title: $title Org: $org"

And the last line, fairly self explanatory, print the details to the screen.

What we have now is a quick overview of which IP’s respond to which Host header, and we can view the titles. Immediately in this image it sticks out “0x00sec – The Home Of The Hacker”, do you see the ASN? We’ve found the source! We went through each host, requested the IP directly with the host header, and boom (yes that was a sarcastic boom), we have our source IP!

Interacting with the host

Hosts file

Now that you’ve found your source host, you probably want to interact with it. The easiest way to do this is to just update your hosts file. Edit /etc/hosts on any Linux or Unix-Like machine. This will need superuser privileges to modify.

Once your hosts file is updated, you should be able to interact with the host as you would normally and it will pass straight through. If you ever need to undo this, just delete this line and save the file.

Setting the Host Header manually

If you don’t want your changes to be global, you can use curl or BurpSuite and set your host headers manually.

curl -s -k -H "Host: 0x00sec.org" https://<ip address>/

This will set the “Host” header to “0x00sec.org” on request, this will signal to the webserver that you’re requesting a site for that domain. When you visit a domain normally, your browser assumes this and includes the Host header in the request to the source. This may be obvious to most of you – but when I realized this it made a lot of sense and uncovered some of the magic behind domains.

Mass SSL Scanning (Active)

Another method you can use to identify hosts is through SSL scanning. This is an active method and I will not be demonstrating in this article. I can however tell you how to do it and how you might go about making your own scanning setup. A good buddy of mine has a fantastic tool written for this, but I will let them release that as they feel comfortable. SSL certificates typically contain a field that contains the domain name they were generated for. In theory, if you scan the entire internet and find every port 443 that is open, then pull each certificate and parse the domain names, you can find all hosts that pertain to the domain you’re looking for. Once you have these IPs, repeat the above steps until you find something (or not).

I’ve done this a handful of times and found many many many undisclosed assets for clients – that has been a really fun asset discovery exercise. (Shadow IT is scary).

Get the server to make a request

One more nuanced method is to abuse remote URL include functionality in the site to attempt to make it reach out to your server. There are many times that a server may have a “include image from URL” feature. This method will often work for simple sever setups (for example a single webapp) as apposed to more complex multi-networked systems (anybody who’s dealt with OOB DNS will know this very well).

To abuse functionality like this, get a burp collaborator and paste it in the “include from url” section. Poll for your client interactions and you’ll see the web traffic.

Crobat Reverse Lookups

This technique blew my mind when I dreamt it up with my flatmate & talented pentester friend @calumboal. Using Cal’s Crobat, you can do reverse DNS lookups across IP ranges. An example of this would be:

Now, as it turns out, you can link this with ASN ranges pulled from ipinfo.io to make for a very nice reverse lookup tool! If you have access to their premium plans, you can pull ranges straight from their API – which is really cool.

curl -s "<https://ipinfo.io/AS16276/json?token=yourtokenifyouhaveit>"  | jq  -r '.prefixes[].netblock' | tee -a ranges.txt

But if you’re ok to rough it because you don’t have that budget – I got you.

Just take those ASN’s from our previous steps, and go to https://ipinfo.io/<ASNNUMBER>. So for example: AS16276

Now expand the bottom slider, and copy and paste all this into a file. It may take a little while to paste – that’s normal.

cat ranges.txt| awk '{ print $1 }' | grep "\\." | tee -a source.txt

Now we have a list of ranges for our target ASN, we can repeat this for each ASN that we’ve seen an asset of theirs before on. You can take this further go after entire providers with my cloud ranges project https://easyasn.xyz/.

curl -s <https://easyasn.xyz/companies/amazon/ranges.txt> | grep -v ":"

Now that we have our ranges, we can loop through them and do a reverse lookup.

for range in $(cat source.txt); do crobat -r $range | jq -c -C '' | grep "0x00sec.org"; done | tee -a results.txt

This command will loop through each range and pull DNS records assigned to it, you can find extra assets this way that may be hiding (and that you may have missed during DNS enumeration). Once you’ve found all these new assets, feed them back through the process above until you think you’ve gotten a good coverage.

CloudFail (Automagic Python)

The last method you can use, which is super easy to use is the CloudFail python tool. All you have to do with this tool is point and shoot! It works sometimes, sometimes it doesn’t. If anything is for sure – it definitely makes for a good pentest money shot.

git clone <https://github.com/m0rtem/CloudFail.git>
cd CloudFail
pip install -r requirements.txt
python3 cloudfail.py -t 0x00sec.org

Mitigation

For all you blueteamers out here, I’m sure the mitigation / remediation is clear – whitelist your WAF upstream and an optional management address or bastion host for maintenance purposes. Also don’t rely on WAF’s to mitigate really bad vulnerabilities. They should be treated as a last line of defense and the impact of a WAF bypass should not be the difference between a successful intrusion or not. If you have the ability to fix the code, that should be a priority.

IPInfo.io

I want to take this time to make a quick shoutout to our sponsor for this article, ipinfo.io! I’ve been using this API for so many different things in my day-day and I genuinely didn’t realize how much I relied on it until I wrote this article. Please do yourself a favor and make an account over there for a free higher request limit and say hi! They’re all really friendly and just all-around good vibes. Really excited to try out host.io too very soon!

Conclusion

Overall, in this article hopefully you’ve learned a few tricks to unmask servers behind a web proxy or WAF! Now that I’m writing it, it actually feels like some of these tricks would apply to unmasking onion domains too. It’s all just pulling OSINT and chucking it at a wall to see what sticks.