• Skip to primary navigation
  • Skip to main content
Angelo Gio Mateo Blog

Angelo Gio Mateo Blog

  • Home
  • Blog
  • Posts
  • At The Corner

Pervasive and Invisible: Online Behavioural Tracking And the Collection and Use of Sensitive Personal Data

November 14, 2021

Written as a white paper for PUBPOL707 – Architectures of Digital Ecosystems (Professors Clifton van der Linden and Mark Surman) for the McMaster University Master of Public Policy in Digital Society

(Header image from Electronic Frontier Foundation)

In January 2020, it was revealed that a number of apps, including OKCupid – and of particular importance LGBTQ+ dating app Grindr – was selling and sharing users’ personal data, including age and location (Christl & Edwards, 2020; Cyphers, 2020). This practice is particularly problematic because Grindr users may not want their sexual and gender identities to be public to everyone. Users may decline to use their real names to protect their safety. Queer and trans individuals are afraid of revealing or “outing” their identity to their families, friends, and society for fear of ostracization or even physical harm. In some countries, being queer or trans is forbidden and individuals can be persecuted for their identity. Secrecy and privacy is important to the LGBTQ+ community, and Grindr’s actions are a violation of users’ trust. However, what makes the issue worse is that it does not matter whether Grindr was sharing “anonymous” data without “names” or just using a “device ID.” The tracking ecosystem, with its vast amounts of personal data, can likely identify you by predicting that your Grindr data correlates with other personal identifying information (PII) collected from your other digital activities on your browser or on your phone, which can then possibly predict your real-life identity. While the most prevalent use of this practice is to display relevant ads to the user, it can also be abused by parties who want to manipulate behaviour or even possibly persecute or physically harm an LGBTQ+ individual in real life. This case presents an extremely dangerous scenario where a user’s sensitive personal data – in this case, sexual and gender identity, as well as age and location – is collected, shared, and used without the user’s knowledge or consent. But the unfortunate reality is that this practice is everywhere and is the basis of the digital economy.


The foundation of the modern Internet is an ecosystem that tracks you, your identity, and your behaviours. It is surveillance of your activities, both online and in the physical world, and it sells that information to advertisers who want to grab your attention and manipulate your behaviour – either to sell your products or services or to influence your political beliefs. When users access online platforms and social media apps like Google, Facebook, and Twitter, their free use of these platforms is in exchange for displaying ads on their screens and collecting their personal socio-demographic and behavioural data to better target specific ads (Lau, 2020). Consumers do not pay online platforms for access to their products and services, but they do pay in different currencies: their attention and data. According to a popular saying, “If the product is free, you are the product.” Digital surveillance and programmatic advertising is the foundation of the online digital economy, allowing users to participate in the Internet and access the digital public sphere, while selling their attention and data.

Figure 1: A Graphic Overview of the Programmatic Advertising Ecosystem (Lau, 2020)

It may sound like science fiction – an Orwellian 1984-like “Big Brother” surveillance state. However, with modern technology and our ubiquitous use of our computers and smartphones, it is primarily corporations – not the state – that is tracking your every move and creating digital profiles of users – or as Shoshana Zuboff calls it, “surveillance capitalism” (Zuboff, 2019). Tracking technologies like cookies, browser fingerprinting, geofencing, and SDKs are always collecting data on who you are, where you are, what you are interested in, what your political beliefs are, where you have visited – even your sexual identity and the gender you identify with (Cyphers & Gebhart, 2019). The collection of this data is problematic in itself and raises concerns around privacy. But the usage – and abuse – of that data goes beyond recommending relevant content and ads, but can lead to selling your data to other companies, real-life discrimination, manipulation of your behaviours, and even persecution.

The problems – and the questions we should ask ourselves as a society – is two-fold:

  1. Collection: Tracking technologies are collecting our data from various online sources and building online profiles of ourselves, in order to make profit off our digital lives and attention. Should we allow online platforms to collect our personal identifiable information (PII) to create unique digital identities based off our interests, behaviours, locations, and socio-demographic identities? And where should the limit be drawn? What data should be prohibited from being collected?
  2. Usage and sharing: Our PII and personal data is used in many ways, primarily through content recommendation on social media and through programmatic advertising. However, that data is shared with third-parties such as data brokers, other advertising companies, and even governments. It is used to micro-target ads, which may use discriminatory practices. What are acceptable uses of our personal data? Who may our data be shared to? And how can users control the flow of our personal data?

Data collection and its usage by online platforms is pervasive, invisible, and is the cornerstone of our modern digital economy. Solutions to these problems are difficult to conceive and implement, as they are mostly untested policies addressing emergent technologies. Policies that address one problem may create side effects that may create new problems, or even fundamentally change the business model of online platforms. Innovative technologies with unique problems require unique solutions.

Definitions

For a user’s data to be applicable to build a digital profile, organizations require two types of data: “personal identifiable information” and “personal data.” Information about a user’s socio-demographics, geolocation, Internet activities, and other data is not useful unless an organization can connect it to a real human being. Tracking technology – or just simply, “trackers” – are collecting PII – data that identifies only you or your device and that will be consistent and persistent over time (Cyphers & Gebhart, 2019). There are many types of PII that trackers can look for, and it can combine weaker identifiers with other accessible PII to more easily identify you across devices and activities. The most obvious PII are names, email addresses, and phone numbers. They rarely change and can easily identify the real individual. Real-life PII also include license plates, face prints (or other biometric data), and credit card numbers. PII can also include codes or a random string of numbers and/or letters that act as a “name” on different sites (Cyphers & Gebhart, 2019, p. 6). However, trackers have other digital techniques to identify users, including, but not limited to: cookies, IP addresses, TLS states, local storage super cookie, browser fingerprints, IMSI/IMEI numbers, advertising IDs, and MAC addresses.

“Personal data” is then connected to a number of PIIs to create a digital profile of an individual. Article 4(1) of the GDPR defines “personal data” as:

“Any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.” (EU General Data Protection Regulation (GDPR), 2016)

Issues

Trackers are pervasively and invisibly collecting users’ personal data every time you use your computer and your phone – and even when you are not looking at your phone screen. Social media platforms are collecting data on what content you are sharing, what posts you look at and for how long, whether you like or react to a post, what you comment, which friends you are interacting with most, and every other activity on their sites and apps. Third-party trackers are collecting data on what websites you have visited, what you are searching for, what items you put in your Amazon cart, what you click on a website, how long your cursor is hovering over an element, your app usage, what physical brick-and-mortar stores or restaurants you have visited, your current geolocation, and much more.

Together with PII that identify a unique individual with vast amounts of personal data on your socio-demographics, interests, and activities, trackers create behavioural profiles that “can reveal political affiliation, religious belief, sexual identity and activity, race and ethnicity, education level, income bracket, purchasing habits, and physical and mental health. This is all so that advertisers can “[use] data about a user’s behavior to predict what they like, how they think, and what they are likely to buy” (Cyphers & Gebhart, 2019, p. 5).

Figure 2: Different types of data that trackers collect from users. (Christl, 2017, p. 1)

The profiles that trackers create can be abused by micro-targeting certain demographic groups of people. It can lead to discriminatory practices because advertisers want to reach their key demographics to better sell their products and services, or to manipulate them into adopting certain beliefs and vote for a particular candidate.

Discrimination is a central tenet of digital advertising and can be appropriate and beneficial for both advertisers and users who might want to see better personalized ads that are relevant to their interests (Bérubé et al., 2021). However, the unfair, harmful, and potentially illegal discriminatory practices in other industries and in the access of public services can also exist through digital advertising (Maréchal & Biddle, 2020). A 2016 investigation by ProPublica found that the Facebook advertising interface could allow companies to target users looking to purchase houses while excluding users belonging to particular ethnic identities, such as Black, Asian, or Hispanic people (Angwin & Parris Jr., 2016). This practice would violate the U.S. federal Fair Housing Act and the Civil Rights Act. Similar techniques for other sensitive areas—such as employment, banking, and immigration–and for other socio-demographic factors—such as gender, age, and religion—could violate other legislation, human rights norms, and other ethical standards.

Figure 3: A screenshot of the Facebook advertising portal from a ProPublica article demonstrating a discriminatory practice of targeting likely home buyers but excludes “African American[s],” “Asian American[s],” and “Hispanic[s]” (Angwin & Parris Jr., 2016)

Policy Options

Transparency and Opt-In

A key aspect of the online behavioural tracking system is that it is pervasive and invisible. Most users do not know what happens with their data behind the scenes. Explanations of the ecosystem require technical information, such as explaining what third-party cookies are, what trackers are on a user’s current site, how a URL request works, what “SDKs” are, and other complex processes. An important principle that has emerged in response to the collection, sharing, and processing of data is “meaningful consent.” However, because the tracking ecosystem is invisible and complex and pervasive throughout the internet, “it is unreasonable to assume that consumers can give informed consent to the excessive tracking, sharing, and profiling that pervades in the adtech industry” (Christl & Edwards, 2020, p. 179).

Nonetheless, if the current business model were to continue, policy approaches regarding clearer transparency and “opt-in” meaningful consent is necessary. The EU’s Digital Services Act (DSA) is a step in that direction. Articles 24, 30, and 36 mandate that information about each advertisement be communicated in a “clear and unambiguous manner and in real time.” (Proposal for a REGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL on a Single Market For Digital Services (Digital Services Act) and Amending Directive 2000/31/EC, 2020). Of particular importance is that the DSA forces online platforms to reveal who represents that advertisement and why the ad is being shown to that specific individual, such as the parameters or what data is being used to target that user. Additionally, browser extensions such as the Electronic Frontier Foundation’s Privacy Badger and Ghostery reveal what trackers are being used on the current site that the user is on and automatically blocks the tracker from collecting a user’s data. The functions of these extensions should be mandated by governments by default.

What has not been pursued is transparency on how a user’s data is shared to third-parties, whether it is held by external data brokers, and if it is being sold to other companies. Policies should be implemented that allows users to track where their data is going and to whom it is being shared with. Users should not just have the option to decline to share their data with third-parties, but that their data should not be shared by default.

With greater transparency for users regarding trackers and the collection and sharing of data, users can more meaningfully give consent to the processing of their data. Many reports advocate for clearer and easily understandable consent requests that do not hide behind legal jargon. However, a more important policy would be to make these data requests “opt-in” by default, meaning that data should not be collected, shared, or processed without prior, informed, and meaningful consent from the user. Most of the Internet currently operates with an “opt-out” model, where by default, platforms can use users’ data unless the user declines. This seems like a little tweak, but it changes the fundamental approach of trackers and prevents them from collecting a complete digital profile of the user (Becca Ricks & Mark Surman, 2020, p. 39; Ghosh et al., 2020, pp. 6–7).

There are two flaws to a policy mandating opt-in by default: 1) Users do not see more relevant content and ads, making their Internet experience less personalized, and more importantly, 2) users may be denied access to online platforms that require the sharing of your personal information. As platforms make revenue off the collection of personal data, an opt-in default setting might lead to fewer users choosing to participate. This could lead to platforms changing their business model from one where users have “free” access, to a paid subscription model or a “freemium” model – examples from other platforms have a “free” tier that includes advertisements, and premium paid tiers with more features and no advertisements (Lau, 2020).

Restrictions on Certain Sensitive Data

Reflecting on the case of Grindr and the sharing of their data, as well as the potential for discrimination in micro-targeting, it is clear that certain types of data are more sensitive than others. Restricting the collection, sharing, and processing of this data should be protected, and trackers and platforms should have rules on their use and transmission of that data.

GDPR Article 9(1) outlines a few key types of data that could be considered for “protected status,” including:

  1. racial or ethnic origin,
  2. political opinions,
  3. religious or philosophical beliefs,
  4. trade union membership,
  5. genetic data,
  6. biometric data for the purpose of uniquely identifying a natural person,
  7. data concerning health, and
  8. data concerning a natural person’s sex life or sexual orientation. (Christl & Edwards, 2020, p. 166; EU General Data Protection Regulation (GDPR), 2016)

The most coercive option is to prohibit the collection of this data completely. Under this framework, trackers would not be allowed to collect this data and use it to create profiles of individuals. What this would lead to is an advertising model that is more focused on context, such as a user’s activities on the Internet, what websites they visit, what advertisements and posts they interact with, among others.

A more moderate option would be to place restrictions and responsibilities on platforms as to how they handle protected personal data. They may collect these types of data, but they could be restricted in how they micro-target these individuals. Moreover, this protected personal data could be mandated to be encrypted and prohibited from selling or sharing with third-parties. This option would theoretically ensure that various trackers across the ecosystem are not able to construct a complete profile on a user based on this sensitive data. Additionally, it might force platforms to only limit the use of that data for their own legitimate interests, as discussed below.

Information Fiduciary Model

A popular proposal for regulating online platforms and digital companies that collect and use users’ data is implementing an “information fiduciary model.” Conceived by legal scholar Jack M. Balkin, it adopts the concept of “fiduciaries,” defined as “a person or business with an obligation to act in a trustworthy manner in the interest of another” (Balkin & Zittrain, 2016). Professionals like doctors, lawyers, and accountants are examples of information fiduciaries – professionals that manage information and not just money. They have certain legal duties and responsibilities to care for our best interests and to act in good faith, with the threat of decertification or lawsuits should they fail in their duties.

In “Information Fiduciaries and the First Amendment”, Balkan re-conceptualizes the information fiduciary concept for digital companies. Under this policy proposal, online platforms like Google, Facebook, and Twitter, as well as any digital companies that collect and use users’ data, a relationship of trust is established that users’ data is protected and that these information fiduciaries cannot “use the data in unexpected ways to the disadvantage of people who use their services or in ways that violate some other important social norm” (Balkin, 2015, p. 1227). Like other professional information fiduciaries, we still expect that we pay a fee for an online platforms’ services, aligning with the majority of current business models of online platforms that our access to the digital public sphere is paid for by the collection of users’ data and attention. However, under this legal framework, information fiduciaries have three basic duties: “a duty of care, a duty of confidentiality, and a duty of loyalty” (Balkin, 2020, pp. 22–23).

In practice, this would mean that digital companies will agree to guarantees that users’ data is secure and private, data breaches are reported, personal data is not leveraged to unfairly discriminate users, and that data is not sold or shared to other companies that do not follow the information fiduciary model (Balkin & Zittrain, 2016). The GDPR offers other principles that could be incorporated in this policy framework. The EU legislation mandates that the use and “processing of personal data may be based on legitimate interests” (Christl & Edwards, 2020, p. 173) Information fiduciaries will have to conduct three tests that ask the following questions: 1) “Are you pursuing a legitimate interest?” 2) “Is the processing necessary for that purpose?” and 3) “Do the individual’s interests override the legitimate interest?” (Christl & Edwards, 2020, p. 173). An information fiduciary model could also mandate that certain types of protected personal data are prohibited from being collected and processed, such as those outlined in GDPR Article 9(1), discussed above.

A major concern with the collection of data is the selling and sharing of this data between third-parties, especially different trackers and data brokers. The sharing of this data between companies is what allows trackers to compile a complete profile of a user based on the entirety of a user’s internet activity. An information fiduciary will have the duty not to sell or share that data outside of that companies’ activities and “legitimate interests” without the explicit consent of the user.

This policy is not without its criticisms. Lina Khan – current Chair of the Federal Trade Commission – and David Pozen write that the model “could cure at most a small fraction of the problems associated with online platforms—and to the extent it does, only by undercutting directors’ duties to shareholders, undermining foundational principles of fiduciary law, or both” (Tuch, 2020, p. 1899). Instead, Khan has advocated for other regulatory approaches, most notable anti-trust regulation that would mandate the “breakup” or “structural separation” of online platforms. I have argued in the past for the anti-trust approach to address issues with the programmatic advertising ecosystem and breaking up the vast control that companies with advertising business models – like Google – which would fundamentally transform the modern online economy. However, the anti-trust approach does not resolve flaws with the tracking ecosystem and the collection and sharing of personal data across third-party companies. Regardless, these two policies do not conflict with each other. They can be complementary and should be pursued simultaneously, as they address different issues.

A primary concern of Khan and Pozen is with the foundational concept of information fiduciaries, but particularly with digital companies, which is “that the model is incompatible with social media companies’ powerful self-interests” (Tuch, 2020, p. 1897). The fundamental business models of online platforms are reliant on the collection and sharing of personal data to create revenue streams. A mandate of digital companies to be considered “information fiduciaries” would require a drastic transformation of platforms’ business models, which would disrupt the entire modern Internet economy. However, “constraints on self-interested conduct” is exactly what is needed because those “self-serving incentives” to exploit users’ data for profit is the basis for societal harms (Tuch, 2020, p. 1926). Imposing an information fiduciary model does not need a transformation of companies’ business models, but rather that they change their practices of selling and sharing data and be constrained from collecting particularly sensitive information.

Recommendations and Conclusion

There is no “one-size-fits-all” policy solution that will fix all of the problems with this ecosystem. With the many dimensions of these problems, policymakers must take multiple approaches. The policy options listed above are not and should not be exclusive. With the pervasiveness and invisibility of this problem, it is necessary for users to understand how these technologies affect them. Users should know what, where, when, and how that data is being collected in order to make informed decisions about consent to the collection and usage of that data. The selling and sharing of that data is particularly problematic as it often happens behind the curtains, with secret amounts of money being exchanged, and that data being used for reasons not agreed to by the user – even with malicious intent. With more transparency, users should be offered an “opt-in” setting by default to put the burden on the platforms to need consent and not expect that it is acceptable to collect data automatically. With micro-targeting being a problem that can lead to discriminatory practices, restrictions on the use of protected personal data is necessary to protect users who are at particular risk of discrimination, manipulation, racism, loss of employment, and physical harm (among other harms). Lastly, an information fiduciary model would create responsibilities and duties for platforms to behave in users’ best interests and protect their data. While these policies have their flaws and may lead to unintended consequences, these solutions are necessary and worth the effort. Most policies have trade-offs, and policymakers always have to balance the interests of many stakeholders. With the asymmetric relationship between users who are at a disadvantage against the pervasiveness and invisibility of trackers and online platforms, it is time for policymakers to empower users and protect their data.


References and Works Cited

Angwin, J., & Parris Jr., T. (2016, October 28). Facebook Lets Advertisers Exclude Users by Race. ProPublica. https://www.propublica.org/article/facebook-lets-advertisers-exclude-users-by-race?token=fjgbvA30OM1kEN1IZuWQ75FLj86Yrxg4

Balkin, J. M. (2015). Information Fiduciaries and the First Amendment Lecture. U.C. Davis Law Review, 49(4), 1183–1234.

Balkin, J. M. (2020). How to Regulate (and Not Regulate) Social Media (Occasional Papers). Knight First Amendment Institute. https://knightcolumbia.org/content/how-to-regulate-and-not-regulate-social-media

Balkin, J. M., & Zittrain, J. (2016, October 3). A Grand Bargain to Make Tech Companies Trustworthy. The Atlantic. https://www.theatlantic.com/technology/archive/2016/10/information-fiduciary/502346/

Becca Ricks & Mark Surman. (2020). Creating Trustworthy AI: a Mozilla white paper on challenges and opportunities in the AI era. Mozilla Foundation.

Bérubé, J., Liang, D., & Mateo, A. (2021). Innovative Efficiency or Problematic Ethics? A Report on the Digital Ecosystem of PA. McMaster University. (Previous work by this author.)

Christl, W. (2017). Corporate Surveillance in Everyday Life: How Companies Collect, Combine, Analyze, Trade, and Use Personal Data on Billions. Cracked Labs. https://crackedlabs.org/en/corporate-surveillance

Christl, W., & Edwards, Z. (2020). Out of Control: How consumers are exploited by the online advertising industry. Norwegian Consumer Council.

Cyphers, B. (2020, January 27). Grindr and OKCupid Sell Your Data, but Twitter’s MoPub Is the Real Problem. Electronic Frontier Foundation. https://www.eff.org/deeplinks/2020/01/grindr-and-okcupid-sell-your-data-twitters-mopub-real-problem

Cyphers, B., & Gebhart, G. (2019). Behind the One-Way Mirror: A Deep Dive Into the Technology of Corporate Surveillance. Electronic Frontier Foundation.

EU General Data Protection Regulation (GDPR), 2016/679 (2016).

Proposal for a REGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL on a Single Market For Digital Services (Digital Services Act) and amending Directive 2000/31/EC, COM/2020/825 (2020).

Ghosh, D., Gorman, L., Schafer, B., & Tsao, C. (2020). Levers in the Digital Advertising Ecosystem (No. 1; The Weaponized Web: Tech Policy Through the Lens of National Security). GMF Alliance for Securing Democracy. https://securingdemocracy.gmfus.org/levers-in-the-digital-advertising-ecosystem/

Lau, Y. (2020). A Brief Primer on the Economics of Targeted Advertising. Bureau of Economics, Federal Trade Commission. https://www.ftc.gov/reports/brief-primer-economics-targeted-advertising

Maréchal, N., & Biddle, E. R. (2020). It’s Not Just the Content, It’s the Business Model: Democracy’s Online Speech Challenge. New America Open Technology Institute. https://www.newamerica.org/oti/reports/its-not-just-content-its-business-model/

Tuch, A. F. (2020). A General Defense of Information Fiduciaries (SSRN Scholarly Paper ID 3696946). Social Science Research Network. https://doi.org/10.2139/ssrn.3696946

Zuboff, S. (2019). The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. PublicAffairs.

Copyright © 2025 · Angelo Gio Mateo Blog on Genesis Framework
All rights reserved. All images are copyrighted by their respective copyright owners.
Made with love by me.