A constant hum drones out of a former church in San Francisco. It is the sound, from hundreds of fans cooling hundreds of computer servers, of the digital past being kept alive. This is the Internet Archive, the largest collection of archived web pages in the world and a constant reminder of the fragility of our digital past. It is also, thanks to a March ruling in a federal court, which found that the archive’s lending practices violate publishers’ rights, just one battlefield in a growing struggle that will define how humanity’s collective digital memory is owned, shared and preserved — or lost forever.
As a scholar of digital data, I know that not all data loss — the corrosion and destruction of our digital past — is tragic. But much data loss today occurs in ways that are deeply unjust and that have monumental implications for both culture and politics. Few nonprofit organizations or publicly backed digital libraries are able to operate at the scale needed to truly democratize control of digital knowledge. Which means important decisions about how these issues play out are left to powerful, profit-driven corporations or political leaders with agendas. Understanding these forces is a critical step toward managing, mitigating and ultimately controlling data loss and, with it, the conditions under which our societies remember and forget.
From streaming platforms removing digital-only shows from their libraries to governments defunding their national library systems to the effects of tech centralization, data is disappearing at alarming rates. Brewster Kahle, Internet Archive’s founder, told me that thanks to government pressure or simply error, data is often subject to large-scale erasure. For web pages that have been wiped clean, the Internet Archive is often the only place to look.
Traditional publishers brought the suit against the archive because of its practice of lending, for short periods, scans of their books (including, to authors’ dislike, recently published titles). The court ruled that the archive must stop lending copyrighted books. An appeal is in the works, but if the court ruling is upheld, it could seriously undermine the ability of the archive and similar bodies to defend public access to information against the encroachment of privately held platforms, according to Mr. Kahle.
Every technological revolution entails a loss. Socrates warned in Plato’s “Phaedrus” that the invention of writing destroyed memory, making people “hearers of many things” who “will have learned nothing.” More recently, the typewriter enabled the production of far more paperwork, raising profound anxieties about the number of lost, displaced and missing documents. Today’s digital societies echo these historic patterns of loss, neglect and entropy. But new actors and dynamics have also entered the stage. Public spheres now exist precariously at the mercy of social media companies. And each day, corporations like Amazon, Alphabet and Meta extract and assetize our data, stockpiling it and monetizing it under dubious consent structures.
The fact that crucial decisions about whether to keep or destroy data are kept in the hands of actors with profit motives, autocratic aspirations or other self-serving ends has a huge implication not only for individuals but also for the cultural at large.
Many instances of data loss have ramifications for cultural production, the writing of history and, ultimately, the practice of democracy. Some politicians — including ones who oversee funding for digital archives — have dubious relationships with record-keeping best practices. British officials have been accused in court of government by WhatsApp, relying on self-deleting messaging applications to avoid oversight and accountability. A similar scandal engulfed Denmark’s prime minister in 2021.
Tech companies, too, have a record of questionable policies around data, content moderation and censorship. They have their own motives — including a business model based on generating different data enclosures and on hardware and software obsolescence — and exist in a complex political and regulatory ecosystem. That ecosystem often offers perverse incentives to both maximize profit by selectively storing some data and reduce regulatory burdens by removing access to other data. Marginalized communities may be particularly vulnerable. During the 2020 Black Lives Matter protests, some activists accused social media sites like Facebook of censoring their posts. Platform removal of adult content disproportionately affects queer communities. And in conflict zones, regimes and content moderation systems frequently remove material that could be crucial evidence in war crimes investigations.
Many cultural forms are now almost entirely dependent on digital formats. While films and music were once available for purchase in a physical form, many are now digital only. Even books are sometimes released only for e-readers. That places enormous power in the hands of mostly for-profit companies, including streaming services and music platforms like Spotify, to control the dissemination of art. Platforms like Max (formerly HBO Max) have removed films and TV shows from their streaming services en masse; though some are available elsewhere, there was suddenly no legal way to watch many other programs. Even the creators of some shows could not view their own work.
Of course, not everything is worth preserving. Archivists and librarians learn how to weed and appraise files to be kept or deleted. In internet spaces, these practices, which can help create meaning, are known as digital housekeeping.And some nongovernmental organizations and governments — notably that of the European Union — promote data policies that are premised on destruction, such as the “right to erasure.”
While the history of data loss spans millenniums, digital data storage has produced fundamentally new challenges and preservation crises. The decentralized nature of the internet generates link rot and content drift; the inherently dynamic and unstable nature of digital information — premised on constant information migration — is another risk. Natural disasters and fires threaten digital as well as physical archives.
These challenges require new and innovative solutions. Some organizations have embraced radical methods to confront them. There now exists an Arctic Code Vault alongside the Global Seed Vault on Svalbard in the Norwegian high Arctic. But protecting code in a frozen, disused mine shaft does not address the broader need to rethink the power structures that govern data’s ownership and control.
For the sake of democratic accountability, governments should stop relying on privately owned communication platforms for the day-to-day operations of public administration and should place a higher priority on public archiving.
Alongside the need to maintain public trust in democratic institutions, we must consider how we ought to preserve our collective cultural memory. Institutions like museums, libraries and archives must play a more proactive role while creating stronger institutional safeguards — including rules mandating secure transport of public sector data and professional management of archives, in addition to requirements for public accessibility — on their own conduct. These organizations, whether they are upstart archival initiatives or established public institutions, require stable financial and institutional support to flourish.
Beyond the everyday functions of government and the preservation of cultural memory, digital societies must also ensure that critical data on human rights abuses are protected from erasure, both intentional and accidental.
But as we consider fundamental regulatory changes, we must also recognize that letting some data go might be just as ethical as preserving it.
The history of knowledge is not one of simple progress or accumulation. Knowledge production in the digital era, like the creation and storage of knowledge across the centuries, is unfolding as a continual oscillation between gains and losses.
Data loss on a small scale — missing phone contacts, digital files lost to a glitch — are the occupational hazard of existing in a digitally reliant world. But data erasure at scale is always political. Responses to erasure and loss must exceed technical fixes and knee-jerk reactions; instead, governments and organizations must constantly reassess the ethical and regulatory frameworks that govern our relationship with data. The mainstream narrative that we are living through an era of exponential, near-infinite knowledge accumulation no longer fits a society in which we lose our collective record of ourselves day in and day out.
Nanna Bonde Thylstrup (@NThylstrup) is a professor at the University of Copenhagen, the author of “The Politics of Mass Digitization” and the principal investigator of the Data Loss project, funded by the European Research Council.
Source photograph by Franco Origlia/Stringer, via Getty Images
The Times is committed to publishing a diversity of letters to the editor. We’d like to hear what you think about this or any of our articles. Here are some tips. And here’s our email: [email protected].
Follow The New York Times Opinion section on Facebook, Twitter (@NYTopinion) and Instagram.