How AI Models are Using and Collecting Your Personal Data

Teghan O’Connell

On November 12th, 2024, tech and game producer Niantic revealed they had been using data collected from users to train their geospatial AI model. Niantic is most known for their mobile gaming app Pokémon Go, an app which gained widespread popularity in 2016. At its peak, the app hosted 232 million active users. [1] The company says the data they used came from an optional real-world scanning feature. Within the app, opening your camera for this reason allowed the company to collect visual data. The company stated that the data collected helped train the model in geospatial orientation. [2] The company claims to have received consent for the data collection through the app’s privacy policy, which never explains for what exactly the location data was going to be used. [3]

This announcement reveals how users are largely unaware of data collection for training AI. Most websites you visit or apps you use collect some data from you, burying the consent for this collection within their terms and conditions. Data privacy is fading away in our new age of large language models, which require vast amounts of data to train.

In an interview with the Human-Centered Artificial Intelligence center at Stanford University, Jennifer King (a privacy and data policy fellow at the same university) discusses the issue. She tells Katharine Miller, the interviewer, that AI systems pose the same risks as most other technology, just on a much larger scale. King says AI systems are training using very personal data, like our face for facial recognition or our resumes. King goes on to explain that the use of our data to train AI models is commonplace because there aren’t any existing comprehensive federal protections in the US. [4]

King says the issues we face with data privacy could be easily fixed by moving away from an opt-out system and towards an opt-in system, meaning we choose to share our data, rather than choose not to. Most people aren’t aware of or simply don’t care about their data being collected, and don’t opt out of this collection. [5] Companies also make it very difficult for their users to opt out of this collection, hiding the withdrawal of consent page deep within the app. For example, if a Facebook user wants to discontinue the tracking of their data, they must navigate through five web pages and then withdraw data tracking consent from each individual tracking site. [6]

The issue of data privacy and AI is always connected back to the training of AI. AI systems need to process a vast amount of data to function the way their creators want. The data these models use comes from many different sources, whether it be from within the creator’s own platforms or through a sale from big data producers. [7] For example, Reddit has plans to sell their users’ data to Google to train their Gemini model. [8]

Why does AI need your data to train?

AI models are trained by being fed data through their algorithms and adjusting the product of this process. [9] Once an AI model trains using your personal data, it is virtually impossible to remove that data from the model. [10] With the amount of data AI models hold, they are at a higher risk of cyber-attacks. Even prompt-based models can unintentionally reveal sensitive information if the party knows what prompts to ask. [11] Users aren’t always aware when or why their data is being collected, making it more likely they will be unaware of their data being a part of a large data breach. [12] This lack of awareness could leave many users at risk of spear phishing scams or having their identities stolen, and them being none the wiser.

[1] Phillip Williams, Pokemon GO Statistics 2024: Active Players, Downloads, Revenue, and Popularity Trends, LocaChange (Dec. 20, 2012), https://www.locachange.com/pokemon-go/pokemon-go-statistics/#1

[2] Eric Brachmann and Victor Adrian Prisacariu, Building a Large Geospatial Model to Achieve Spatial Intelligence, Niantic (Nov. 12, 2024), https://nianticlabs.com/news/largegeospatialmodel

[3] Felicia Wellington Radel, It’s not just a game. Your Pokémon Go player data is training AI map models., USA Today (Nov. 25, 2024, 9:04 AM), https://www.usatoday.com/story/tech/2024/11/23/niantic-pokemon-go-data-ai-map/76488340007/

[4] Katherine Miller, Privacy in an AI Era: How Do We Protect Our Personal Information?, HAI Stanford University (Mar. 18, 2024), https://hai.stanford.edu/news/privacy-ai-era-how-do-we-protect-our-personal-information

[5] Katherine Miller, Privacy in an AI Era: How Do We Protect Our Personal Information?, HAI Stanford University (Mar. 18, 2024), https://hai.stanford.edu/news/privacy-ai-era-how-do-we-protect-our-personal-information

[6] Petar Todorovski, How Facebook Collects and Uses Your Personal Data and How to Stop It, Privacy Affairs (Jun. 26, 2024), https://www.privacyaffairs.com/facebook-data-collection/#14769-10

[7] Potter Clarkson, What data is used to train an AI, where does it come from, and who owns it?, Potter Clarkson (Accessed Feb. 13, 2025, 11:45 AM), https://www.potterclarkson.com/insights/what-data-is-used-to-train-an-ai-where-does-it-come-from-and-who-owns-it/

[8] Paresh Dave, Reddit’s Sale of User Data for AI Training Draws FTC Inquiry, Wired (Mar. 15, 2024, 6:34 PM), https://www.wired.com/story/reddits-sale-user-data-ai-training-draws-ftc-investigation/

[9] Nayna Jaen, How AI is trained: the critical role of AI training data, RWS (Mar. 26, 2024), https://www.rws.com/artificial-intelligence/train-ai-data-services/blog/how-ai-is-trained-the-critical-role-of-ai-training-data/#:~:text=AI%20training%20data%20is%20a%20set%20of%20information%2C,pictures%20containing%20dogs%2C%20with%20each%20dog%20labelled%20%27dog%27.

[10] VeraSafe, What Are the Privacy Concerns with AI?, VeraSafe (Feb. 5, 2025), https://verasafe.com/blog/what-are-the-privacy-concerns-with-ai/

[11] CEPS Task Force, Artificial Intelligence and cybersecurity, CEPS (Apr. 21, 2021), https://www.ceps.eu/artificial-intelligence-and-cybersecurity/

[12] Alice Gomstyn and Alexandra Jonker, Exploring privacy issues in the age of AI, IBM (Sept. 30, 2024), https://www.ibm.com/think/insights/ai-privacy

How AI Models are Using and Collecting Your Personal Data

Episode 54 Pt 2 Law Intern Salome Valentin Asks Michael Questions

Episode 54 Pt 1 Salome Valentin Discusses Geographical Indication Types Explained

Episode 53 Inspiration for Innovation with Inventor Doris Lovelace

Episode 52 Pt 2 The Business Side of Law STEIN IP LLC 69 subscribers Analytics Edit video

Episode 52 Pt 1 Intern Khloe McDonough Discusses Intersection of Medicine and Intellectual Property