The Death of Data: Retention, Rot, and Risk

*
Proposal
Short Form
Beginner

Excerpt

I want to problematize keeping deprecated codebases around, and emphasize that mindless retention of data and code just increases our threat surfaces for attack and data corruption. Attackers in the future may be motivated by both ideology and money, and we are responsible for that.

Description

Hoarding is only really painful when we run out of space. In a world with very cheap data storage, it never occurs to us that we should be getting rid of data instead of just storing it in giant silos. I’ll explain why we are storing increasingly dangerous poison in our databases, and why we ought to care about automated de-acquisition and deletion.

Data gets senile. It forgets its links, it only wants to tell you about the old days. Wikis overgrow like the thorns around Sleeping Beauty’s castle, and reference manuals accrete into sedimentary layers. APIs gather and leak data like municipal water systems Even with search, we can’t find what we’re looking for. Too much data is as bad or worse than no data.

We are building giant predictive structures on big data, but we are not evaluating the age or value of that data. Have you ever been boggled by your credit report? We are building thousands of reports like that with no consideration for the quality of the data we are using.

This talk is not about data in the abstract, it’s about ergot poisoning and hoarding and konmari and bitrot. When and why to kill your precious data, why data is a double-edged sword.


I think no one tell developers and project managers to throw things away. We assume that because it’s cheap to keep it around, the emotional comfort is worth the tradeoff. But we’re not thinking about how vulnerable we make ourselves by not having an automated and tested way of getting rid of things that we don’t need anymore.

I want to problematize keeping deprecated codebases around, and emphasize that mindless retention of data and code just increases our threat surfaces for attack and data corruption. Attackers in the future may be motivated by both ideology and money, and we are responsible for that.

I have given this talk twice now, and both times the audiences have walked away stunned, scared, and questioning. In a political climate of rising repression, this talk is vital to get out in front of technical people, because we’re the ones storing the data. Educating the public is too big, but maybe we can educate the data minders.

Tags

data, big data, machine learning, political

Speaking experience

Experienced speaker with over 20 conference talks, including several at OSBridge.

This talk has been given twice before, both to east-coast audiences.

Speaker