Putting the Pieces Together: Making Assumptions based on Scraped Personal Data
Is it possible to scrape personal public data from different sources and with that data build a reasonably accurate picture of someone?
I say yes.
But I also think it can force you to dance an ethical line.
This post deals primarily with the ethics of web scraping.
Before we dive too deep into this topic let's first clear up some definitions.
Web Scraping: The automation of data acquisition from websites, or other repositories, through use of scripts.
Personal Data: Data that is specific to an individual.
Private Data: Data that is meant only for the individual's use or the approved service provider.
So when does private stop being private?
In short, when it is made public. Data is public when the individual chooses to release their personal information or privacy has been breached and data is released.
To further clarify my point, my blog is written by me and my name is attached to it. My blog is a part of my online identity. It is personal data technically and it is also public because I have chosen to make it this way. On the flip side my ad-sense data connected to my blog is personal but also private, I don't need that data floating around in the wild.
We sacrifice so much personal data ALL THE TIME.
I'm not going to go into details how accomplish this, just understand it is rampant.
Real World Hypothetical
In Canada we have laws and policies in place that protects the identity of people who have been convicted of heinous crime. The NSOR (National Sex Offender Registry) database contains critical private information on convicted individuals.
Now obviously this is only available to the appropriate people and It is not made public. This is how it has worked since the conception of NSOR in 2004 and it's a process that works.
Why is this information not made public?
Well honestly, if it were we would have literal lynch mobs roaming the streets. So we have laws to "balance" things.
As with most processes I was curious if this privacy could somehow be circumvented.
I hypothesize that yes, it is in fact possible for a non-police person to circumvent privacy and identify sexual offenders.
Now to be clear, this is not a "HOW TO." I will not be teaching you how to find sex offenders in your neighbourhood.
Like I have said before, web scraping can be used to collect information about anything. If you collect enough information from enough sources you begin to build a picture of someone.
Billy Joe Bob lives in Texas, has a wife, 3 kids, likes a particular football team, works at blah blah company A, goes fishing every spring with his highschool buddies which he graduated with in 1997, he things vaccines are stupid and proudly votes for whatever party that thinks vaccines are stupid.
See what I am saying, all this information can easily be scraped from social media.
It doesn't stop there.
the information that is saved and available on the public internet about each of us is staggering.
To that point, did you know that in Canada that all Legal cases are publicly posted? Meaning that if you have ever been convicted on something it is actually searchable. All the information is available to the public. Right back to 1905 in Alberta when there was a land dispute and someone did something they shouldn't have.
I'm intentionally not referencing these sources.
In order to be placed on the NSOR you need to be convicted. Meaning you have had your day in court. These court cases don't say "you are a sex offender" but if you read the graphic details of some cases it's not a leap to say "Dear goodness, you are a likely a sex offender."
Let's reel things back in here.
I have used a dramatic example to drive home a point that information on the internet that is public is available for everyone too see. Someone with even a simple knowledge of scraping could collect loads of data on any individual. Put pieces together you have a relatively accurate picture.
This is a hypothetical that can be played out in many different scenarios.
The moral of the story, be aware of what is online regarding you and know that likely someone is building a profile of you and that data is be stored and sold all over the world.
Again this post was not a "how-to."
Be safe, Make good choices