Data Lifecycle in Generative AI Development - Is Personal Data Protection Futile?
Data lineage in the development processes of generative AI may be a myth. Most organizations developing or using generative AI may not even be at the table in this regard - and consequently, in violation of privacy law compliance requirements!
Robust data lineage is the process of tracking the entire life cycle of data (at the field level) as it is processed through an organization (or system), from its origin to its destination, by documenting its flow, all transformations it undergoes, and its dependencies across systems. It serves as a map to understand data's history, verify its accuracy, troubleshoot issues, and ensure compliance with regulations. This provides transparency and traceability within an organization's data ecosystem. Without a data lineage regimen in place for AI development or deployment, several data privacy implications arise in connection with the use of publicly available, private, and sensitive personal data.
This article highlights the tension between legislatively enacted rights, such as in U.S. state laws and GDPR, and the technical impossibility of tracing model outputs back to the data that trained them. Current scholarship suggests that once data is tokenized and diffused across billions of parameters, lineage is effectively lost. That disconnect between legal frameworks and technical realities is widening — and organizations will need to reconcile it through proper audits, governance, lobbying, or technical innovation.
Privacy of Personal Data in the Generative AI Data Lifecycle” – NYU JIPEL
Citation: Mindy Nunez Duffourc, Sara Gerke & Konrad Kollnig, Privacy of Personal Data in the Generative AI Data Lifecycle, 13 N.Y.U. J. Intell. Prop. & Ent. L. 220 (2024), https://jipel.law.nyu.edu/privacy-of-personal-data-in-the-generative-ai-data-lifecycle/; (Article spans pp. 220–268 in Vol. 13; Issue No. 2)
_______________________________
Disclaimer: This blog post is provided for informational purposes only and does not constitute legal advice. The linked article is the work of its respective author(s) and publication, with full attribution provided. BAYPOINT LAW is not affiliated with the author(s) or publication; it is shared solely as a matter of professional interest.