Create a Robots.txt Honeypot Cyber Counterintelligence Tool

Video Overview of this Blog Post

What is the Robots.txt File?

The Robots.txt is just a simple text file meant to be consumed by search engines and web crawlers containing structured text that explains rules for crawling your website.

In theory, the Search Engines are supposed to honor the Robot.txt rules and not scan any URLs in the Robots.txt file if told not to.

Robots.txt was supposed to help avoid overloading websites with requests. According to Google, it is not a mechanism for keeping a webpage out of Google Search results.

If you really want to keep a web page out of Google you should try adding a noindex tag reference or password-protect the page.

With a Robots.txt file, you can create rules for user agents specifying what directories they can access or disallow them all. It all sounds OK in principal but on the internet, nobody really plays by the rules.

In fact, the Robots.txt file is one of the first places a bad guy might look for information on how your website is structured.

Too many websites make the mistake of using the Robot.txt file without giving thought to the fact they might be rewarding possible OSINT or hacking reconnaissance efforts at the same time.

A Look at Amazon.com’s Robots.txt File

If we take a quick look a big website like Amamzon.com to see what their Robots.txt file looks like all we have to do is load up this URL.

https://www.amazon.com/robots.txt

What files or directories does Amazon tell the Google search engine not to crawl or index?

It looks like account access login and email a friend features are off limits so these are the first places a hacker will be looking.

More Sample Robots.txt files from Google

# Example 1: Block only Googlebot
User-agent: Googlebot
Disallow: /

# Example 2: Block Googlebot and Adsbot
User-agent: Googlebot
User-agent: AdsBot-Google
Disallow: /

# Example 3: Block all but AdsBot crawlers
User-agent: *,
Disallow: /

You can find more detailed information on how to make more complex robots.txt files over on the Google Search Central area for developers.

What is Counterintelligence?

Counterintelligence typically describes an activity aimed at protecting an agency’s intelligence program from an opposition’s intelligence service.

More specifically Information collection activities related to preventing espionagesabotageassassinations or other intelligence activities conducted by, for, or on behalf of foreign powers, organizations or persons.

In terms of this blog post, we’ll use a Honeypot based approach to see who is using looking at the Robots.txt file and scanning folders we’ve asked them not to and record information about the HTTP call for later review and analysis.

A Real World Robots.txt Based HoneyPot Example

Using the Robots.txt file as part of a honeypot system, we will broadcast a list of honeypot folders we don’t want search engines to index, but in this case, it will be a list of folders pointing to honeypot pages.

Having a honeypot / data collection service running in these folders allows you to see who is using the Robots.txt file to scan your web server thus tipping you off that OSINT footprinting activity on your webserver or domain names may be taking place.

These folders have a Disallow rule but contain honeypot code to collect information about the HTTP calls made against them and in some cases to redirect the user-agent somewhere else.

A Sample Honeypot Robots.txt File

A Sample Robots.txt file where we are telling all user-agents to stay away from our admin, wordpress and api folders.

# All other directories on the site are allowed by default
User-agent: *
Disallow: /admin/
Disallow: /wordpress/
Disallow: /api/

The Robots.txt file I’m discussing in this post was collected from the online classifieds website, FinditClassifieds.com.

If you try to hit any of the URLs found in the Robots.txt file, you’ll be redirected to a Rick Roll video on YouTube.com. IP address data is collected in a log for a more detailed review.

Each one of these honeypot URLs do a little something different.

The first honeypot URL replies with something naughty, the second logs it as a scan and the third URL is the Rick Roll redirect.

Depending on the honeypot page, you can collect data from the user-agent and log it before you redirect them off to the Land of Oz.

Using a tool that was originally created to be helpful that ended up becoming dangerous can now be a double agent if you set it up correctly.

Hoping this helps someone on their InfoSec journey.

~Cyber Abyss

How to Hide Executable Code in a Text File using Cloaking and Alternative Data Streams

Hacker Basics: How to Hide an Executable File Inside and Text File

Did you know that hackers can hide an executable file inside of a text file using a technique that uses something called data streams to trick a computer system from seeing text and or executable written in an alternate data stream inside a common text file.

I was pretty impressed the first time I watched someone demonstrate this. I was like, NO WAY! I really thought that this was some wizard level hacker stuff.

I’m no wizard level hacker, although I aspire to be, but I should be good enough to show you how to embed a simple calculator app inside a text file using an alternate data stream.

A big thank you to Cyber Security Expert, Malcolm Shore who presented a similar example in his Cyber Security Foundation online course I recently completed.

How Do Alternate Data Streams Work?

Way back in the old Wild West days when we had the DOS operating system, files used to be simple strings of data. Files are read btye by byte.

Later, in the NTFS file system, files are complex structures. NTFS files at a minimum contain a section called $Data where data is read by an application. $Data is the Data Stream.

Files may have many other sections or streams other than just the $Data section. This is what we call “Alternate Streams”.

THIS IS IMPORTANT: Windows only recognizes data in the $Data section so any data we put in an alternate data stream is not read by the Windows Operating System. We cloak data we want to hide in an alternate data stream. That’s the basics of how this works.

The data we are hiding could be a malicious malware payload or encrypted espionage message for our spy ring but in this example, it is just the simple calc.exe file you can find on any Windows PC for the last 20+ years.

Creating an Alternate Data Stream in a Text File

The screenshot below shows the three (3) files we’ll be using in this demonstration.

  • Simple text file with some string data.
  • calc.exe application or executable binary file
  • Secret text file with some string data

We can see the size of the text file is just 1 KB and the calc.exe file is 897 KB.

If we open the text-data.txt file with Notepad we’ll see just a simple line of text and the same with the secret-data.txt file.

To hide our secret message inside the the text data file, we’ll use this command line command.

C:\text\>type secret-data.txt > text-data.text:hidden.text

Screenshot of Alternate Data Stream: Insert Hidden Text

Below is a screenshot of the command line command “type” that we used in this example to insert our secret-data.txt file into an Alternate Data Stream inside of another text file.

If we type the command “more” we can look for the secret message.

The screenshot below shows the text file that contains our hidden text being opened in Notepad where we can’t see the hidden text we saved to the file. If we type the command line command below, we can read the hidden text we wrote to our Alternate Data Stream by keying in on the specific data stream.

c:\test>more < text-data.txt:hidden:text

Hiding an Executable Inside a Text File

Hiding an executable file inside a text file using the exact same Alternate Data Stream technique we just used in the the Secret text file example above but this time we’ll simply replace the Secret text file with the Windows Calculator application executable file.

The screenshot below shows the command line command to save the calc.exe file in an Alternate Data Stream in side our target text file.

Notice this time, the Alternate Data Stream is named “mycalc.exe”. Don’t get to hung up on this, it is just a name that is basically meta data that is saved with the data that we can use to filter the data we get out of the file. I hope that makes sense.

Important to note at this point that the file sizes didn’t change when we inserted the calc.exe file. It is still showing 52KB.

How to Execute a File Saved in an Alternate Data Stream

To execute a file you’ve stored in an Alternate Data Stream, we’ll need to use the wmic command as is done in the following example.

c:\test>wmic process call "c:\test\text-data.txt:mycalc.exe"

As you can see from the working example above, I was able to embed the calc.exe file inside as well as text file and a secret message.

If the data is text we just need to indicate which stream we saved the data in to retrieve it.

If the data we hid was an executable file, we’ll need to use the Windows “wmic” command line command to call the executable from inside the text file by keying in on the Alternate Data Stream name.

In summary, the technique is crazy easy to pull off without any 3rd party hacking tools. It just requires a little Windows Operating System inside knowledge but is something every good hacker should know.

I hope this helped somebody!
~Cyber Abyss