Rule Of Thumb for determining whether Data is Personal

Vojtech Tuma
3 min readMar 4, 2021

The coming of GDPR and similar regulations caused a lot of software engineers and data scientists ask a question whether some particular data is personal, and thus subject to GDPR measures. I have a Rule of Thumb for this — not a fully reliable one, but giving a good starting point. It is:

Could some Sales team run a campaign or promotion based on this data?

Let’s have a few examples:

  • Model name of the user’s device where your software runs. This is very useful information — users with expensive models likely have more purchasing power, so offering more expensive product lines to them makes sense for the sales team. Similarly, knowing whether the user uses mobile device or browser, or which operating system, informs which products to show.
  • Other programs or apps that the user uses. Again useful — for cross-promotions, for guessing the gender or the age of the customer, …
  • Geographic location, language settings — in some locations, certain campaigns or products will perform better — even if we don’t know a priori how.
  • What time of the day is the user using the application — early birds and night owls respond differently, prefer different products, … Furthermore, depending on your application, you can perhaps use this to guess whether the person is employed, or whether its a parent, whether it is a socially active person or a loner.
  • Search queries — oh boy. Users looking for medication? Guess the sickness, exploit, profit. Users looking for VPNs, for Tors, browsing porn? Do scareware and profit. This is a wild area.

Some folks tend to think that only data like personal name, phone number, email, et cetera, constitute personal data. The examples I’m giving run very much against that. And why am I fairly confident those are personal data? One of the reasonings behind GDPR was to increase the control the user has around “how precise profile of their personality can be created, e.g., with respect to advertising”. So this Rule of Thumb precisely fits that reasoning.

Let’s address two more interesting cases:

  • GUID — by that I mean, globally unique identifier you artificially generate upon installation. This is really random stuff, bearing no information provided by the user. Running a campaign with “show this toaster to users with guid starting with xyz” will likely bring no advantages. However, that does not mean this is out of scope of GDPR. It is a so-called indirect personal identifier — it holds no direct value, but allows you to connect other information which may be personal.
  • Email — does it make sense to run a campaign to all users with emails starting on “a”? Probably not. Does it make sense to create a model to extract which cultural franchise the user likes and use the appropriate messagings (as in, all drthvdr@gmail, darthvad@gmail, palpatinereborn@gmail would get may-the-force-be-with-your-license-renewals message, and all the mrfrodo@gmail will get to-the-mtdoom-with-our-premium)? Totally. So while Email is a personal identifier, as is a phone number, it does bear some personal content in itself!

The point of this article is to demonstrate a certain epistemic technique, a method of obtaining indirect knowledge. Assuming you are not a lawyer and you have hard time processing through the GDPR text or court decisions citing it, you have to find an approximation of it, which is, in this case, a Sales mentality — arguably easier to adopt among quite many IT folks.

--

--

Vojtech Tuma

#books - #running - #pullups - #boardGames - #dataScience - #programming - #trolling - #etc