Wednesday, 29 July 2020

Rotating Proxies: Keep the Data Flowing

Scaping and various data collection is happening all the time.  There is an exceptional amount of internet traffic that is not directly done by humans, instead the traffic is automated through bots. I wanted to share one method to assist with data collection.

I was working on a data collection project the other day and I found a perfect website that I could send data too and have it conduct work on the data and then I could scrape the results.  However I ran into an issue that all scrapers will see at some point.  Reaching an access cap to the site.  You will know you have hit the wall because you will get a screen in your browser saying "Hey, you have reached your limit for today.  We only allow blah blah blah attempts allowed..."

For my project I have over 1300+ unique pieces of data I want to conduct work on, my cap on the website was 10 connections a day.  If I stick to this rule it will take forever.

A method I use and is quite popular among scrapers is to connect through proxies.   I have built a proxy collection script that I import into new scraping scripts so that I can utilize the proxies that are grabbed from the imported script.

Below I will show you my method for importing a python script into another python script and using the transferred data in the new script.  Using this method makes the rotating proxy so much easier.

(If you do some research you will see that there are many methods to achieve this goal [rotating proxies] however I have some specific reasons for this method and I will highlight it as we go ahead)

*I am going to walk through the above script line by line
*for reference the above script is referred to as
*the above script is not my project, it is built to demonstrate rotating proxies

1-2: Importing requests module is critical for sending GET requests to websites.  

You will see "from proxy import a"  In the same folder as I have a script called, in side this script I scrape a free proxy list website and append all the proxies into a list called a.

In I am retrieving only the https:// proxies 

There are many methods for doing this, and you will even see some other scripts that grab the free proxies all built into one script.  The reason I like doing it this way is that I have the entire proxy list loaded into memory and I am not sending out hundreds of requests to retrieve the next proxy on the list.  As well python lists are very simple to work with.

3-8: setting variables num and url

9-16:(above image is for lines 9-16) is a very simplistic script to demonstrate the method.  As you can see we are sending requests to

I set an arbitrary for loop to cycle through 10 times.

I utilize try/except which is extremely useful for automatically testing items and raising exceptions if the attempt didn't work.

line 11 we are setting proxies and giving http and https the IP and Port that were retrieved from the script

line 12 here we are actually sending the GET request and notice that we are attaching the modified proxy.

line 14-16 if the the requests does not work with the retrieved proxy I want the script to just keep going and try the next proxy IP/Port.  I achieve this through using "pass."  I have added "failed" for visuals in this demonstration, in a real scraping project I may or may not include this.

Additional notes:
You may have noticed that there were a bunch of failed attempts.  I was ok with this, because 4/10 attempts worked.  My retrieved https:// proxies was over 150.  If I included http:// I would have had over 600.

An interesting piece of scraping you will notice is that the website you are scraping will often tell you they only allow "10" attempts a day, when I conducted my tests on this site I was able to successfully send at least 100 requests.  So if we think about my 4 successful proxies out of 10, that would give me 400+ including my original IP.  My total list is 1300 unique datasets (1300-500 = 800)  I only need to use 8 or 9 more proxies to complete all my work!!

Be careful using free proxies.  I was ok with it in this project because I don't care if owner of the free proxy observes my work.  The retrieved data individually means really nothing when viewed on a granular level , when I correlate all the data it becomes information.


Monday, 20 July 2020

Route out IP Origin With Free Tools

There is so much information available to us over the internet.  If I want to research something I just open a browser and type in the subject and I am presented with a ton of data.  The same goes for if I want to find a geolocation.  Now you have likely all used google maps etc, however what if I want to find the location of device on the internet?

Today we are going to discuss how to find the geolocation of an IP address.

I wanted to collect various tools that would show me where in the world a public IP address was originating from.  Not all these tools are built equally and some return data that is questionable in its accuracy.

For the research today I have chosen a popular IP address that millions of people access on a daily basis.  I'm not reviewing the tools per se, I want to see what they return as a geolocation and what we can surmise from the findings.

Target: (Find out who this is at the end of the blog)

Head to the end of the blog to see all the coordinates mapped out! You will see my assessment on the findings.  Also I reveal my assumptions on information gathered.

This is a clean website that returns the lat/long of a public ip address.  It will also returns port information about the target system.  The simplicity of the website structure leans very well to scraping, so I built a script to do just that.

Above you can see the script source code.  Scrapes like this are nice because at line 10 we only need to go through 2 levels of html tags in order to find the data we want.  It's almost like they designed this site to be scraped.

The script produces the following

Returned value: 53.3338, -6.2488
ASN: 41564 (this is important, if this number changes something is screwy)

Another tidy little site that puts all of your data in to json format.  With json you can access the data like a dictionary and pull out exactly what you want.  You could integrate this json into a script and produce all the information you need.

Returned: 53.35388946533203,-6.243330001831055
ASN: 41564


Site with lots of information.  Comes with an API you have to subscribe for.
Interesting it also has a Twitter bot and Slack bot you can access. I'm not entirely sure why the bots are necessary, but hey.

Result: 53.343990, -6.267190
ASN 41564 (This time it is labeld Proxy ASN)
Website states that the IP is located in a data centre

4. Keycdn tools
Nothing special about the site

Returned value: 53.3338, -6.2488 (curiously exactly the same as> coincience?)
ASN: 41564


The site layout and service is almost exactly like ipstack.  Eerily similar.

Retruned Value: 53.34980,-6.26031


Exactly like ipstack and ipgeolocation
Returned Value: 53.333800, -6.248800
ASN: 41564

Returned Value: 53.3331,-6.2489
ASN: 41564


I took the coordinates of these 7 ip geolocation tools and plotted them on a map.  As you can see they are reasonably close, however they are still scattered around Dublin. 

1. Through my research I suspect that some "free" ip geolocators are piggy backing off of one another

It would be extremely easy, as they are providing an API and literally coaching you on how to use it in a script.

2. If the coordinates are not 100% accurate why utilize these APIs?  Well if all you want is a general idea of where people are visiting from, then it does the job well.  Imagine you have a website and you want to track how many people visited from outside of your country, this method would be great.

3.Probably not ideal for a detailed APT (Advanced Persistent Threat).  However with the compiled data we can get a slightly better picture of what is going on and potentially make some educated guesses.

Let's make some educated guesses based on the information gathered.  (Keep in mind the sampling is small, only 7 tools.  4 out of the seven land relatively close to one another.  That means that 57% are in relative close proximity.  Had I gone even more crazy with research there is a high probability that this number would increase even marginally.  Let's take a look at the map again at these specific locations.

These four locations land right in between two big online players.  LinkedIn and Amazon.  This is where I take some leaps, I don't have 100% conclusive evidence keep that in mind. 

-Amazon is huge player in the web services hosting business.
-One of our free tools provided evidence that the IP was from a data centre.
-It is very close to the LinkedIn head office, which [LinkedIn] spends $13 million [8] monthly on AWS Services.
-Amzaon has 3 major availability zones located in Ireland [9] [10] which act as fault tolerant hubs for their services. (One being in in Dublin)
-Here is the big assumption I am making.  That the owner of this IP is utilizing AWS (Amazon Web Services).  This may not seem like a big deal until you learn that the company that owns this IP is also one of the biggest Private VPN providers in the world. *look it up ;)

Andrew Campbell


Monday, 13 July 2020

Weird Traffic from Google?

The other day I was monitoring a port (tcpdump -i xxxxx 'port 80') on a machine in my network and some strange traffic showed up.  There is very little activity on this machine as it was just recently set up to be a NIDS machine utilizing snort.  Only browser in use was Firefox.

Some strange traffic showed up on the port.  I wanted to investigate the cause.  The picture below shows the traffic.

For no apparent reason this machine received traffic from:


After some light googling I came across this site [1] that proposed a couple options.  The one that peaked my interest and was most feasible was webcrawlers.  But why? Why this seemingly random machine?  I have it connected to the internet, but then again I have a lot of devices connected to the internet.  Are all devices on my network receiving this kind of traffic?  I am intrigued.

A quick "dig" and "whois" and I found that the traffic originated from St. Paul Minnesota.

This is getting interesting now.

I continue looking for a source of this strange traffic.  I bring up another favourite tool

I found the solution!  Sadly my initial thoughts were wrong, there was no nefarious plot by google to analyze my network and spy on me. (Too much Mr.Robot for me).  Turns out it was all my fault. 

Here is what I found. is a PTR record [2] ( A DNS record that resolves an IP address to a domain or host name).  The address it is pointing to revealed what was going on.

It turns out that my machine was doing a connectivity check back to canonical.  I had completely forgotten that when I initially set up this OS I had (for some unknown reason) selected/left this setting on.  My machine was calling back to the mother ship!

I actually do not want this setting on.  So I turned it off.

If you want to turn this feature off in Ubuntu follow these instructions:

Navigate to this folder as a user with appropriate permissions.

and change




This activity was fun.  You can learn a ton from just from observing what is happening on your network.  I had a machine that was receiving packets from somewhere else.  I wasn't sure what it was, or where it was from but with some careful digging the solution was found!

Hope you enjoyed.

Andrew Campbell


Monday, 6 July 2020

Testing Firewalls Hping3 ONLY!!

Testing Firewall Challenge -- Only Hping3

Testing your firewalls is a critical step in assuring you are protecting your assets appropriately.  This article is called "Testing Firewalls Hping3 ONLY" because I foresee many more posts on my part on this very topic.  In time I may collect my methods into a single post.

In my home lab environment I have a machine set up that is dedicated to NIDS.  The only thing this machine does is monitor the network----An excellent side project if you have a spare computer kicking around.  On my home lab network there is very little risk of intrusion.  The only activity will be from inside. 

I have set up a ton of firewalls, so when I built the target I honestly just flew through the FW setup(Not ideal).  I set it up so that I can ssh over port 22(internal only), the machine can connect to the internet only through port 80 and 443, DNS on 53 and port 4000 so I can transfer files from a different machine somewhere else in the house.  I have also set a default policy of "DROP" if the above is not specifically met.

Now I wanted to challenge myself.  I preach to my students the glories of hping3.  So for this test I am going to forgo using my favourite tool (nmap) and use hping3 for the whole testing.  Specifically I will use hping3 for host discovery, port scanning, and testing the firewall to see what gets through.

1. Get IP class:
-very easy and quick "ifconfig" and you will know what class you are working it.
- my network is class "C"

2. Host Discovery:
-There are many better tools then hping3 for this task.  For host discovery I always turn to nmap.  However that is not part of this challenge!

This method is slow and noisy, but it works.

hping3 -1 192.168.1.x --rand-dest -I wlan0 --fast

In layman terms the above command goes as follows from left to right. 
Using hping3 we are sending a icmp echo request to the random IPs in a class C private network, we are sending this packet through the interface wlan0, and because I am impatient I want it done "fast."

So I let the command sit a few minutes and it comes back with:

I stopped it before it could get my router -->254.  But we do not care about the router.

3. Scanning
Before I can do an OS fingerprint I need to know which ports will accept a connection.  Thankfully there are only three machines to scan.

There are a couple ways of doing this
One way(and it is tedious) is to send a single SYN packet to every port.

hping3 -S -p ++1

As you can see in this picture I got a lot of nothing until I got to port 22 where I got a packet back with the flags SA set.  An open port!!

I adapted my command to get a file output.

hping3 -S -p ++1 >> ports1.txt

With three terminals running I can scan all three machines and ship the info to an output file.  I go and get a coffee because I want all 65355 scanned.

Also if I wanted to be more stealthy I could set the interval on when these packets are sent out.

hping3 -S -p ++1 --interval 10
This would send a new packet every 10 seconds.  In case you are curious it would take 7.58 days to complete this scan. YIKES!!

So the results for machine are:
port 22: response SA
port 53: response RA (closed)
port 80: response RA (closed)
port 443: response RA (closed)
port 4000: response RA(closed)

----closed is good, it means the port is accessible.

We have found our target.

4. Testing Firewall:
-I can tell there is a firewall because there are very few ports detected.  That tells me that someone intentionally set this configuration up in this manner. The nature of the ports tells me that the firewall's intention is to only allow their user access to the internet.  Also most of the ports responded with nothing, which tells me that the firewall in place DROPs packets by default other than specified ports.  (Very important information)

-Here we are dealing with an interesting part of the exercise.  When you are testing your firewall you should be clear on what your intent is.  When I am probing a professional environment(pentest) I am not sure what I may find.  I want to see if I can send a packet through the firewall with the goal of learning about the environment behind the wall. 

In the scenario we are dealing with I want to see if the firewall I set up will do what I told it too.  Steps 1-3 demonstrate that we can identify a network, discover hosts and scan those hosts.  In step 4 we will have access to the target and the sender simultaneously.  Monitoring the target we will be able to identify packets that potentially make it through the firewall.

Below I will list the conducted tests and the result:

Keep note that a response of none is what was anticipated
port 80/443
 hping3 -SA -c 1 -p 80
 hping3 -c 1 -F -p 80 none
 hping3 -c 1 -FPU -p 80
(xmas packet)
 hping3 -c 1 -SF -p 80 none
 hping3 -2 -c 1 -p 80
 (UDP packet)(It shouldn't respond because it is UDP.  UDP scans work better with nmap)
 hping3 -c 100 -d 80 -S -p 80 -s 80 -a 192.168.73
(land attack, spoofing the the source as the destination IP)

port 4000
 Terminal     Response
 hping3 -S -c 1 -p 4000
 hping3 -SA -c 1 -p 4000

 hping3 -SF -c 1 -p 4000

 hping3 -2 -c 1 -p 4000


Port 53
 hping3 -2 -c 1 -p 53
 none (however on the target side using
TCPDUMP I saw a UDP blast.  Which means the UDP packet arrived.

Lessons Learned:
So it turns out my firewall on the target is not that bad.  It could use a little tightening. As a result I learned that I would like to block some ICMP packet types from internal resources (I forgot to add it....oops).  Having a default policy of DROP was a really good move.  I learned through Hping3 that I had a few specific ports open that allowed the machine to do what I wanted it too and nothing else was allowed.  It also blocked some of my more sneaky packets like SF (SYN-FIN).  This is the kind of thing you want to get out of a Firewall test.  To answer the question "did I miss anything?"  Even something seemingly small is significant.

During this practice I was reminded that we shouldn't limit ourselves to only one tool.  Through the use of many tools we can get different pieces of the picture.  Hping3 is incredibly useful, however for a true firewall test I would have also liked to have utilized nmap.  Nmap can easily conduct protocol scans, OS fingerprinting, versioning(absolutely critical).  Some of these things (like OS fingerprinting) are doable with Hping3 but are just too cumbersome to be that useful.

I liked this challenge because it made me dig deeper into a tool that I already love and have a deeper understanding of it.  Hping3 should be used for Firewall tests.  It is a simple and efficient little packet generator.

Andrew Campbell