ABSTRACT
The overall aim of this article is to push for access to born-digital archives, including email archives. It argues that the digital revolution has led to huge changes, but it also brought us back to an earlier situation. The world of big (digital) data is not so different from the world of big (paper) data. There is a danger of repeating the mistakes that were made in the twentieth century with large paper archives, which have often remained uncataloged, hidden, and inaccessible to users. The first section looks at the impact of the More Product, Less Process (MPLP) movement on archival repositories over the past fifteen years. Originally conceived as a response to the huge increase in paper records and uncataloged collections, MPLP has been increasingly applied to digitized collections to increase access. However, few institutions have applied MPLP to born-digital collections, and accessibility remains a huge problem. In the next section, this article presents the kind of research that can be done once access to these born-digital collections is achieved. The final section examines the MPLP approach in relation to artificial intelligence/machine learning.
Archivists value high professional standards of preservation, cataloging, and presentation of their collections. This is normally a good thing; nobody would complain about a meticulous surgeon who performs operations to the highest standards of perfection and seeks to decrease medical risks. But, in the case of archives, this ideal of perfection—combined with a low tolerance to risk—generates huge problems for researchers. Perfectionism is difficult to combine with budget constraints. Few institutions can afford to employ the full complement of archivists necessary to catalog large paper collections, let alone digital collections with petabytes of materials.1 This results in cataloging backlogs and inaccessible archives. Too often, only neatly cataloged records are made available to users, and uncataloged collections with potentially sensitive/private materials languish in storage. Despite the influence of the More Product, Less Process movement, this problem of access remains acute for paper and digital archives. The problem is particularly acute for born-digital collections (i.e., content produced in digital form, rather than having been digitized from physical form). Most born-digital archives are currently treated as “dark” archives closed to users for three main reasons: sensitivity, copyright, and technical issues.
The overall aim of this article is to push for access to born-digital archives, including email archives. It argues that the digital revolution has led to huge changes, but it also brought us back to an earlier situation. The world of big (digital) data is not so different from the world of big (paper) data. There is a danger of repeating the mistakes that were made in the twentieth century with large paper archives, which have often remained uncataloged, hidden, and inaccessible to users. The first section looks at the impact of the More Product, Less Process (MPLP) movement on archival repositories over the past fifteen years. Originally conceived as a response to the huge increase in paper records and uncataloged collections, MPLP has been increasingly applied to digitized collections to increase access. However, few institutions have applied MPLP to born-digital collections, and accessibility remains a huge problem. In the next section, this article presents the kind of research we can do once we have access to these born-digital collections. The last section examines the MPLP approach in relation to artificial intelligence/machine learning (AI/ML). Applying these techniques to archives is still at an experimental stage, but AI/ML could become an integral part of archival processes in the near future.
My own expertise is in literary studies and digital humanities. In summer 2017, I was the first researcher to consult the emails of the British writer Ian McEwan at the Harry Ransom Center in Texas. Two years later, a major grant allowed me to hire a project archivist at the John Rylands Library, University of Manchester. She prepared a selection of emails generated by the leading poetry publisher, Carcanet. This gave me unique access to a collection that is currently closed to users. In Britain and elsewhere, closing up entire collections is often seen as a way to manage risks and avoid any problems with record creators/copyright holders. Many collections are hidden, and users are not even informed of the existence of digital records. For example, the archival emails of the writer Will Self at the British Library are not listed in the finding aid describing the collection, and they are not available to users either on-site or off-site. In a 2013 survey of special collections and archives in the United Kingdom and Ireland, Jackie M. Dooley et al. estimated that only 37% of born-digital materials are in online catalogs.2 And they added: “Management of born-digital archival materials remains in its infancy.”3 In Europe and the United Kingdom, the General Data Protection Regulation (GDPR) and Data Protection Regulation 2018 have increased individuals' rights to control and even erase their data4—while also making clear that archives are here for the public good and that the right of individuals has to be balanced against the collective interest. In other words, closing access to entire collections is a deeply conservative policy, which protects the archival repository and the creators of records—rather than enabling the users.
In the past thirty years, the focus has largely been on the preservation of born-digital materials, rather than on access and usability. In the mid-1990s, the archival community started devising strategies to preserve endangered materials. In 2002, the Digital Preservation Coalition (DPC) was established as a partnership between several agencies operating in the United Kingdom and Ireland. In the late 2000s, collaborations between archivists and scholars resulted in the creation of open-source digital library tools for content curation, such as BitCurator. Following the work of the Email Task Force, the DPC report Preserving Email (2019) points out that preservation is no longer the challenge it once was.5 Email retention policies are necessary to comply with legislative requirements. In the United States, the Sarbanes-Oxley Act of 2002 thus outlined requirements for information technology departments regarding electronic records. The focus has now moved from preservation to appraisal and selection of materials of lasting value. The RATOM project (Review, Appraisal and Triage of Mail) is developing Natural Language Processing for appraisal and processing of email archives.6 The UK National Archives is also in the process of using artificial intelligence to appraise and select government records for permanent preservation as the historical record.
What we now need is a user-centered approach to born-digital archives: more data, less process, and a more liberal attitude to risk. This article situates the issue of born-digital archives within a broader discussion on traditional archival processing, which often fails to take the needs of users into account. It is essential to find solutions to the problem of access, but of course, access is not an end in itself. Researchers need access to produce new knowledge.7 In the case of email archives, we need to design research methods that fully use the potential of big data—a process that remains unusual for many humanities scholars accustomed to close reading of texts.
From Big Paper Data to Big Digital Data: Toward a User-Centered Approach to Archival Practice
In 2005, Mark A. Greene and Dennis Meissner pushed for a revamping of traditional archival processing. Their influential article, “More Product, Less Process,” published in American Archivist, criticized “the heavy legacy of a profession rooted more in service to ‘the stuff' than in service to patrons, a profession that exalts the value of the physical item.”8 Archivists preferred to leave collections closed rather than give access to messy, uncataloged collections that could contain confidential materials. Yet, it was simply not possible to review large collections and remove all problematic records. “The idea of having to review collections (or even parts of collections) item by item to identify ‘sensitive' material is impractical, both because of the time it takes and because there is no agreement about whose sensitivity we measure against.”9 References to homosexuality, adultery, mental illness, and suicide might be sensitive for some donors and their families, but other donors would find some of these topics unproblematic. Since no universal definition of “sensitive” materials exists, access policies had to change, and “unprocessed collections should be presumed open to researchers.”10
The issues of sensitivity apply equally to paper archives and digital archives. Yet, Greene and Meissner make few references to the digital revolution. They discuss digitization in relation to large paper archives made accessible in digital form—but born-digital archives are not mentioned. This is hardly surprising. In 2005, archivists were still struggling with the problem of preserving electronic records. Making them more open was not a priority.
The consensus was that paper backlogs had to be sorted out first before turning to digital records. In 2010, Meissner and Greene wrote, “archivists should long since have been devoting considerable resources to administering born-digital records but have been stymied in part because of the still looming paper backlogs.” And, they added, “The sooner we dispatch those backlogs the sooner we can begin the essential task of wrestling with digital collections.”11 In 2009, a survey of reference archivists and processors in the United States revealed that 72% cited born-digital materials among the records held in their repository.12 However, when asked, “With which types of materials have you implemented MPLP?,” only 8% mentioned born-digital materials—well below “personal papers,” “corporate or business records,” and “institutional records.”13 Since the aim of MPLP is to make archives more open, the current closure of most born-digital archives shows that the situation has not changed much in the past ten years.
When will born-digital archives be made more accessible? Following the digital revolution, there is a danger of repeating the mistakes that were made in the twentieth century with large paper archives. The word “revolution” has two meanings: a revolution is a radical change, with huge consequences; but a revolution is also the movement of a planet round another—at the end of the circuit, the planet comes back to its original position. I want to suggest that the digital revolution brings radical changes, but it also brings us back to a previous position.
On June 19, 1934, President Franklin Delano Roosevelt signed legislation creating the National Archives. Roosevelt believed that the archives should hold not only materials of lasting historical value, such as the Declaration of Independence, but also the operational records of the federal government. There was a big problem: the federal government was producing more and more paper records every year. Where can you keep all those documents? Roosevelt was aware that they would need space, a lot of space. Figure 1 shows the National Archives building under construction in downtown Washington, DC, in 1933. Three years later, in 1936, the Society of American Archivists was created—it was the first national professional association dedicated to the needs and interests of archives and archivists in North America.
At the start of the Second World War, the leadership of the Society of American Archivists started to worry: the government was producing a tremendous output of records related to the war. They decided that one of their most important tasks was to control this huge amount of records. Without control, information would be buried and impossible to find.14 The problem of big data is not a new problem. The digital revolution is a radical change, but it is also a return to an earlier situation: the moment when archivists were confronted with a world of big data.
In 1956, T. R. Schellenberg argued in Modern Archives: Principles and Techniques that all decisions on archives have to be made with users in mind. The task of the modern archivist is to preserve records “useful for research.”15 Since an avalanche of written evidence would not be useful to the historian of the future, archivists must pay particular attention to appraisal and selection.16 Following appraisal, the next step is to open up archives to users: “Since the purpose of an archival agency . . . is to make records available for use, an archivist normally favors a policy of free access.”17
Yet, it was not Schellenberg's model of laissez-faire that dominated archival practice in the second half of the twentieth century, but a much more restrictive approach. Schellenberg was well aware that cataloging at item level would be impractical for large contemporary collections. In Management of Archives, he writes that the archivist “should definitely forego the detailed description of individual record items until he has provided a comprehensive description of his holdings.”18 Ruth Bordin and Robert Warner support the same approach in The Modern Manuscript Library.19 But this user-centered approach competed against other forces. Archival repositories had become large bureaucratic organizations that relied on risk management to survive in the long term. Making “unsavory matters” public posed reputational and legal risks to an institution. It was far safer to champion a model of archival practice focused on item-level cataloging and the closure of problematic records. Backlogs and “dark” archives were the price to pay for this perfectionist model, a model that continues to influence archival practice to this day despite the pushback of prominent archivists such as F. Gerald Ham.
On October 3, 1974, Ham addressed the problem of big (paper) data in his presidential address at the thirty-eighth annual meeting of the Society of American Archivists. In this talk, entitled “The Archival Edge,” Ham noted that “with records increasing at an exponential rate, it is utopian to believe that society could ever afford the resources for us to preserve everything of possible value; for to do so would be irresponsible.”20 Instead of passively collecting large collections, the archivist should play a more active role in selecting materials of lasting value and make them available to researchers.
Six years later, Ham developed this idea further in his essay, “Archival Strategies for the Post-Custodial Era.” Published in 1981 in the American Archivist, the article predicts the impact of technological changes on the archival profession.21 Vast paper archives would soon be replaced with more manageable microfilms, microfiches, and reels of magnetic tape. In this postcustodial era, technology would facilitate access—including access from remote locations. He gave the example of the fax: “By linking the computer to long-distance telephone to form an on-line, interactive telecommunications network, the archivist can deliver computerized records to any researcher with access to a terminal.”22 To make the most of these technological changes, archivists and their institutions had to address the issue of appraisal, selective preservation, and use.23
Ham offered an optimistic view of technology and archives in what he later called the “age of abundance.”24 But he also predicted that other obstacles could get in the way of easier access to archival records. “We must participate in resolving the conflict between the freedom of information and the right to privacy as they affect the quality and content of the archival record and access to that record,” he wrote.25 Four decades after his talk, it is striking to note that the tension between open access and privacy continues unresolved. Too many archives are uncataloged and hidden, despite the influence of the More Product, Less Process method.
Seventeen years after the publication of Greene and Meissner's article, what has been the real impact of MPLP? Early adopters included Texas Christian University, which applied minimum standards processing guidelines to the Jim Wright Papers, a large collection of congressional papers.26 The University of Montana at Missoula also applied MPLP to its backlog of unprocessed collections.27 Minimal processing was not a new thing, as Tom Hyry pointed out. At Yale University, folder-level arrangement, description, and preservation techniques were already employed before 2005. But Greene and Meissner gave new impetus to a set of isolated practices, and Christine Weideman wrote about the application of MPLP to a large collection of family papers in Yale's Manuscripts and Archives Department.28
By the end of the 2000s, MPLP had become a recognizable brand in a wide range of institutions—from small regional archives to Ivy League universities. Articles published from 2005 to 2009 focused mostly on physical/analog archives rather than on digitized or born-digital collections. MPLP supporters often commented on the democratic argument at the core of the method. In “Archives of the People, by the People, for the People,” Max J. Evans urged archival institutions to tackle the backlog problem and “to organize archival work in concert with a curious and interested public.”29 Matt Gorzalski made a similar argument in support of minimal processing. “Providing access for users” is the archivist's most important responsibility, a responsibility that is too often been neglected.30 In his article on moving image archives, Rick Prelinger noted the “divergence between our theoretical acceptance of access as a goal and the poor state of access that actually reigns.”31 For many users, it was much easier and convenient to access archival films on YouTube and other websites than to book an appointment in an archival repository. The growing gap between users and archivists could lead to traditional archives becoming increasingly irrelevant in the digital age.
Not everyone agreed that MPLP was the solution to the backlog and access problems. Even early advocates of the method recognized the downsides of minimal processing. “We consciously put a heavier burden of discovery on the researcher, who must now plow through more materials to find documents,” Tom Hyry noted.32 Since well-cataloged archives make discoveries easier, it is advisable to continue item-level cataloging—at least for heavily used collections, such as the Kenneth Burke Papers at Pennsylvania State University, argued Jeannette Mercer Sabre and Susan Hamburger in a 2008 article.33 Likewise, Robert S. Cox pointed out that discoverability relies on “maximal processing.”34 Without detailed finding aids and their key words, collections risk becoming invisible to Google and other search engines.
More recently, critics of MPLP have noted that the method relies on a false opposition between preservation and access. While Greene and Meissner insist that too much time is spent on preservation-related tasks (such as refoldering and removing paper clips), others point out that well-preserved materials make long-term access possible. Laura McCann notes that preventive conservation should not be viewed as an obstacle, but as an opportunity for continued and sustainable access to archival collections.35 Jessica Phillips made a similar point in her 2015 article: “Our collections must be accessible, but they must also be sustainable.”36 Drawing on these criticisms, Kimberly Christen and Jane Anderson argue that the MPLP movement should be read in a context of growing managerial pressures to increase productivity. “Coupled with neoliberal paradigms emphasizing scale and disaggregation, the unquestioned value of ‘more product' was afforded more recognition.”37 To push back against neoliberal pressures, Christen and Anderson recommend a “slow archives” model that values collaboration and sustainability.38
Applying MPLP to Digital Archives
In the case of paper-to-digital projects, the use of MPLP accelerated in the 2010s, “whether conducted by Google Books or research libraries.”39 This has in turn raised criticisms on three main grounds: privacy, quality, and environmental concerns. Let's start with the privacy of confidential and sensitive information. In a 2018 article, Ellen LeClere wrote, “Digitisation work often relies on ‘More Product, Less Process' approaches, which limits archivists' ability to adequately meet their professional responsibility to maintain individual privacy, or contemplate how to maintain and protect ‘sensitive' information.” She argues that archivists often fail to identify and close sensitive materials because of the democratic pressure to provide open access. She gives the example of civil rights movement materials to show that large-scale digitization projects could lead to “privacy infringement and (re)traumatisation of victims.”40
A paper presented at the International Conference on Digital Preservation (IPRES 2019) echoes these privacy concerns in the Australian context. Drawing on the example of sensitive Aboriginal content, the authors argue that archivists should complement the existing legal framework with their own guidelines and protocols to avoid data breaches.41 A similar emphasis on individuals' right to privacy can be found in Michelle Moravec's article on feminist research practices and digital archives, which focuses more particularly on the British Library's digitization of the feminist magazine Spare Rib in 2013. “Have the individuals whose work appears in these materials consented to this?,” asks Moravec.42 These renewed concerns about privacy should be read in a larger context dominated by scandals involving illegal or unethical use of digital data (including the 2018 Cambridge Analytica scandal).43
In addition to privacy concerns, MPLP antagonists have pointed out that poor quality mass digitization can lead to criticisms and even conspiracy theories. Consider the John F. Kennedy Assassination Records Collection at the National Archives and Records Administration (NARA). The collection includes more than five million pages of assassination-related records, photographs, motion pictures, sound recordings, and artifacts. Following the release of digitized documents in 2017, poor scan quality and lack of adequate searchability fueled conspiracy theories against anyone who may have tampered with the archive: the FBI, the CIA, NARA, the government in general. For Yvonne Eadon, this “suspicion of mediated information” could have been avoided if archivists had been slower and more careful.44 In other words, mass digitization is not necessarily synonymous with openness and user friendliness. On the contrary, it can lead to accusations of secrecy and user dissatisfaction.45
The third category of criticisms relates to the environment. In a context of global warming, mass digitization inspired by the MPLP movement has been seen as wasteful and damaging for the planet. In a 2019 article, Keith L. Pendergrass et al. point out that digital preservation relies on technological infrastructure that consumes huge amounts of energy. To reduce their environmental impact, cultural organizations should consider “critically examining the justifications for mass digitization, implementing on-demand access strategies, adjusting storage technologies for access, and ensuring timely—but not necessarily immediate—delivery.”46 Although users demand rapid access to records, their needs should be balanced against other priorities, including long-term environmental sustainability.
Is MPLP on its way out, to be replaced with a slower archives movement more focused on privacy, quality, and environmental protection? Greene and Meissner have responded to some of these criticisms. For example, they argue that closing access to large archives containing potentially problematic materials is not a viable solution:
Our greater ethical vulnerability may come from withholding access. Any vulnerabilities resulting from exposing materials that are “sensitive,” but not contractually proscribed, pale in comparison.47
Archival institutions face growing pressures to show that they are relevant to users. In the postcustodial era described by F. Gerald Ham, the emphasis is on openness rather than secrecy. As Adrian Brown puts it in his review of Trevor Owens's The Theory and Craft of Digital Preservation, “enabling imperfect access now is better than waiting to deliver imagined perfection in the future” and “allowing users to do more for themselves will pay dividends.”48
Whatever the concerns, MPLP seems here to stay—in part because it fits well with a result-orientated, customer-satisfaction ideology of contemporary capitalist economies. Jeff Bezos, the founder of Amazon, explains that the success of his company is due to customer obsession. He divides companies into two categories, missionaries and mercenaries: “The missionary is building the product and building the service because they love the customer, because they love the product, because they love the service.”49 The mercenary is obsessed with competitors rather than customers. Likewise, archival repositories often fall on one side or the other of the “missionary” and “mercenary” models. Missionaries love the end users, they love the service, they want to make archives more open to researchers and members of the public. Mercenaries are obsessed with other competitors, with other institutions and what they are doing, rather than with end users.50
Perhaps it is time for the digital archiving community to shift the balance toward end users and to fully embrace MPLP. As Doug Reside puts it:
Over the last decade or so we've come to understand that “more product, less process” is a better approach for paper collections, but I still hear a lot of fretting about how we will process and serve born-digital collections if we, as library staff, don't know how to access or emulate the files ourselves. My feeling is that our role is simply to give the researchers what they need and get out of the way.51
This transition is already underway. Increasingly, MPLP is used not only for paper archives and digitized archives but also for research data and other born-digital archives.52 In a 2014 article, Cyndi Shein points out that the literature on born-digital stewardship has often come from well-funded institutions with high-profile humanities collections (such as the Salman Rushdie archive at Emory University). These privileged organizations can afford to implement expensive access models, such as emulation or item-level description. At Emory, for example, emulation provides access to Rushdie's records within the replicated original environment, and these records retain original appearance and full functionality. This hand-crafting approach is not suitable for smaller, underfunded institutions, Shein argues. “The staggering volume of born-digital materials on our doorsteps demands that we take measures to accelerate and automate processes related to their stewardship.”53
For a huge amount of data, MPLP offers the promise of putting the materials in the hands of users quickly, at a lesser cost than an emulated environment or granular cataloging. In her article, “Expedited Digital Appraisal for Regular Archivists,” Susanne Belovari mentions the case of one colleague who had processed 57,061 digital files (18 GB) at the individual file level, a task that had taken her three to four hours a day for eight and a half months. “Such in-depth, item-level appraisal was clearly not an option for expedited processing,” Belovari notes.54 But, as a precondition for expedited processing, archivists should be willing to take the risks of accidentally disclosing sensitive or confidential materials to researchers. In the US context, “this might require reference archivists to talk to researchers about PII [Personally Identifiable Information] and instructing them to inform the archives should they find such material,” as Ben Goldman and Timothy D. Pyatt point out. Researcher registration forms could also include language “that define the researcher's responsibility in protecting PII.”55 It is better to open up archives and to treat researchers as responsible adults, rather than as children or potential criminals who cannot be trusted with confidential and sensitive data.
Doing Research in Born-Digital Collections
As a literary scholar specializing in publishing history, I am well familiar with “difficult” archival collections. For my second book, I spent a lot of time trying to get access to publishers' archives at the University of Reading. Many of these collections are on deposit, and researchers need to obtain permissions from Random House UK before even consulting documents. This seemed surprising to me. For my doctoral thesis (which became my first book), I had spent many months working in the Random House archives at Columbia University in New York City. Random House US does not require any permissions to look at archival documents, stored on-site or off-site. When researchers are ready to publish, the same relaxed attitude applies. For short quotations from unpublished archival materials, US archival repositories will often tell you that the fair use provision of copyright law makes it unnecessary to obtain permissions from copyright holders.
On both sides of the Atlantic, getting access to emails is much more complicated than getting access to letters. As we have seen, the United Kingdom and the United States have completely different approaches to privacy. American archivists will bring you a bunch of records and assume that if you find anything sensitive, you will refrain from publishing it without permission. And yet, getting access to email archives in the United States is still not easy. Even when an institution wants to share digital files, it cannot put everything online for copyright reasons. Researchers still need to travel to the archival repository to consult documents. And few institutions have solved all the technical issues specific to digital archives, including by designing an appropriate interface to make these documents available to researchers.
Ian McEwan Email Archives
In 2017, I traveled to Austin, Texas, to do some archival work in the collection of the British writer Ian McEwan. When McEwan sold his archives to the Harry Ransom Center, he included seventeen years of emails, from 1997 to 2014. At the time of my visit, not many people knew about these emails; they were not listed in the finding aid that describes the collection. Since then, the finding aid has been updated with a brief mention of McEwan's email correspondence, which “has not been processed and is not available to researchers at this time.”56 What the finding aid tells us is that “email printouts” are available to researchers. These printouts represent a small selection of McEwan's 80,000 emails. Ironically, literary archives still rely on print at a time when most records are born digital.
Although McEwan's email archives is normally closed, I was able to get access to selected messages. I sent a list of key words to the archivist to try to find materials relevant to my research, a history of creative writing programs (McEwan did a master's degree at the University of East Anglia in 1970 and is often presented as the first student of creative writing in the United Kingdom). The archivist used a digital forensics tool called Autopsy to conduct key-word searches and tag the relevant messages. She then exported a copy to the laptop used to provide access to electronic files in the reading room.
After spending some time reading McEwan's emails, I sent some feedback to the Harry Ransom Center. On the plus side, few archival repositories have made emails available to researchers, so I was grateful for having been granted access. It was useful to view the email attachments. And the collection is really fascinating, particularly when McEwan discusses negotiations to sell his archives. As Amy Hildreth Chen notes in Placing Papers: The American Literary Archives Market, “scholars write literary history by consulting primary sources found in literary archives, but they rarely consider how these papers became accessible.”57
The fact that McEwan's collection is now in Texas was not inevitable. In a 1998 letter, Jon Cook, a professor at the University of East Anglia and a long-time friend of McEwan, wrote about plans to establish an archives of contemporary writing at UEA.58 Cook hoped that McEwan would be interested in giving his papers to the university. McEwan's email correspondence shows that until at least 2009, he toyed with the idea of transferring the archive to a British repository: either UEA, the Bodleian Library, or the British Library. Another, more lucrative option was to sell it to a US cultural organization. The archive had been evaluated at around $2 million, well above what UK institutions could afford. In a 2013 email, Professor Christopher Bigsby informed McEwan that UEA lacked the funds for this kind of purchase: “I'm afraid the obvious place is America and the Harry Ransom Center at the University of Texas which already has Winnie-the-Pooh.”59 Fortunately for UEA, Doris Lessing had agreed to give it her papers, including diaries. UEA also decided to develop a new model that enables writers to store their emerging archives on a temporary basis. The British Archive for Contemporary Writing, as it is called, is a direct response to the loss of writers' archives to US institutions.60
Exploring McEwan's email correspondence was a fascinating task, but not an easy one. There were many duplicates, and the organization was not clear to me: emails were sometimes listed in chronological order, but sometimes not. Moreover, it was difficult to understand the context. I would find McEwan's response to a query, and then, after clicking through dozens of other emails, I would discover the original question. I was not sure if the emails I was reading on the screen were originally in the Inbox, in the Sent folder, or in another folder. Some context/metadata would have been helpful.
I made three suggestions to the Harry Ransom Center. First, it would be useful to have a finding aid for the email collection and to link it to the paper part of the collection. For example, John Webb's emails to McEwan are often very long and resemble letters, which could be linked to his correspondence in paper form. Second, it would be a good idea to provide the option to classify emails by the number of “hits” for the key words. For example, an email that contains several of my key words (such as “UEA,” “Bradbury,” or “creative writing”) would come at the top. Third, I suggested the deletion of some commercial emails, since they do not present obvious scholarly interest.
These suggestions are not incompatible with an MPLP approach, since many of these tasks could be automated thanks to artificial intelligence/machine learning.61 For example, Gmail has long been using AI and rule-based filters to identify spam and other unwanted commercial emails. While rule-based filters can block the most obvious spam, AI looks for new patterns that suggest an email is not to be trusted. Algorithms examine a huge number of metrics, such as the formatting of an email or the time of day it was sent. The technology is well developed and could be deployed to identify commercial emails of little value in email archival collections. But AI can also help us distinguish between sensitive and nonsensitive data to make digital archives more accessible. In January 2020, I organized a conference on “Archives, Access and AI” in London. Rebecca Oliva presented her research on automating sensitivity review at the National Library of Scotland. AI can help archivists spot patterns and identify context-dependent sensitive data. The technology has the potential to unlock archives that are currently hidden. But to make full use of the technology in this “age of abundance,” archival repositories will need to embrace a mindset that values experimentation and agility—far from the risk-adverse attitude of the “custodial era.”
Carcanet Press Archives
The Carcanet Press archives at the John Rylands Library (JRL), Manchester, is a perfect example of traditional archival processing, valuing item-level cataloging despite the enormous size of its collection. The JRL purchases the archives from Carcanet Press on an ongoing basis. The first accession came in 1978, and accessions of new materials now come to the library on an annual basis. The largest part of the Carcanet paper collection (Accessions 3 to 24) is currently uncataloged and closed to researchers. Exceptions are sometimes made, but the process is long and complicated. I had to jump through several hoops to get access to this dark archives, including obtaining approval from the Ethics Committee at my institution. Access to the email part of the collection is even more difficult. Without external funding that allowed me to employ an archivist based at the JRL, I would not have been able to consult email records that are essential to my research. In short, the whole process is designed to discourage the consultation of materials that are potentially sensitive and embarrassing for the founder of Carcanet Press and his correspondents. With sensitivity comes risk, and it is easier to restrict access to materials rather than risk annoying individuals named in the collection.
The archivist employed as part of the project was based at the John Rylands Library, and she prepared a selection of two hundred emails generated by Carcanet Press during a single year (2010). Following the economic crisis of 2008, funding cuts had an impact on many cultural organizations funded by Arts Council England. This created uncertainties for Carcanet Press and other independent publishers, including Bloodaxe Books and Peepal Tree Press. The selection of emails shows frequent correspondence between Michael Schmidt (the founder and managing director of Carcanet) and these independent presses. I used the open-source software Gephi to create network visualizations (see Figure 2). The color of each node represents gender, and its size is proportionate to email exchanges. Unsurprisingly, Schmidt is at the center of the network. Frequent correspondents include a network of older male publishers: Schmidt (born 1947); Jeremy Poynting (born 1946), the founder of Peepal Tree Press; and Neil Astley (born 1953), who founded Bloodaxe in 1982 with Simon Thirsk.
Leadership and management responsibilities are often in the hands of men, while women occupy less prominent roles in the publishing industry. Judith Willson, an editor and poet, frequently corresponded with Schmidt. Poets such as Elaine Feinstein are other frequent correspondents. But neither Willson nor Feinstein owned the means of production, and they relied on Schmidt's press to bring their poetry to market. In her memoirs published in 2008, Feinstein points out that the increasing number of women poets is not correlated with growing power and influence: “What worries me a little is that society tends to downgrade the importance of areas where women predominate.”62
It would be simplistic to present women as marginalized and men as literary insiders. Carcanet, Bloodaxe Books, and Peepal Tree Press are based in Manchester, Newcastle, and Leeds respectively—far from the traditional literary centers of the South East. The founders of these presses often insist on their marginalized geographical position and on their independence from power brokers in London. However, an analysis of 161 tweets with the hashtag #Carcanet50 (celebrating the fiftieth anniversary of Carcanet in 2019) shows close links between London-based and Manchester-based literary figures and institutions. Drawing on Carcanet's Twitter profile and other publicly available data, I added a location associated with each of the eighty-three accounts that used #Carcanet50, up to December 2019. I then created network visualizations using Gephi to identify possible geographical clusters (see Figure 3). The resulting graph shows that tweets came mostly from accounts based in London (28%; pink) and Manchester (27%; green), followed by Dublin (5%; blue), Norwich (4%; black) and Nottingham (4%; orange). While the Irish accounts form a separate cluster, the accounts based in London and Manchester are closely interconnected. The #Carcanet50 data set shows no separate Manchester literary scene, but rather an interconnection between center and periphery.
The comparison between the 2010 email selection and the 2019 #Carcanet50 data set shows the potential of linked open data for research. Archival emails could be enriched with additional sources, including Twitter accounts. For example, Alison Brackenbury and Helen Tookey tweeted using the #Carcanet50 hashtag, and they also corresponded with Michael Schmidt via email. Adding linked data to archival sources can be automated via machine learning (ML). In his report on ML applied to libraries, Ryan Cordell notes, “the potential for ML to help identify linked data across collections or even institutions, in particular for automatically mapping metadata. . . . Organizations such as the Digital Public Library of America and Europeana have undertaken enormous projects linking metadata across collections, but doing so requires significant handwork, and consortial efforts are limited.”63 In contrast to these large organizations, small, local repositories find it difficult to link data.
MPLP with Artifical Intelligence/Machine Learning
Large data sets are necessary to apply artificial intelligence/machine learning to collections. Yet, when researchers have access to born-digital data, this usually takes the form of small data sets—up to two hundred emails in my experience with the Ian McEwan and Carcanet collections. Traditional archiving processes rely on the unrealistic expectation that records should be carefully scrutinized before being made accessible. The reliance on item-level cataloging was already problematic in the paper era, since it led to cataloging backlogs and hidden collections. It is even more problematic in the digital era. Reviewing each archival email before making it available to researchers is not sustainable. An MPLP approach is urgently needed to make digital archives more accessible to researchers and to produce new knowledge. How can archivists apply MPLP to digital collections? What is the best way to produce more with less process? AI has a major role to play to unlock these collections, but AI can also be used as a research method to analyze huge amounts of data.64
However, the ethical implication deriving from the use of AI, especially concerning bias and fairness, should not be dismissed. While AI-based solutions work well with the majority of cases in a number of popular tasks, from text classification to image recognition, they often fail for cases not sufficiently supported by statistical evidence (such as ethnic minorities and underrepresented groups). Whereas these cases may be numerically marginal, their impact on people's lives (discrimination, marginalization, or misrepresentation of groups) is notable. Implementing ethical AI is therefore essential, as Thomas Padilla points out.65
Like archivists, researchers, particularly humanities scholars like myself, need to rethink their methodologies in the digital age. Collaborative work with data scientists is one of the recommendations of a recent white paper from the Alan Turing Institute.66 Research at the intersection of humanities and data science can take different forms, including computational humanities research, which aims at creating and/or analyzing digitized and born-digital data sets to answer humanities research questions. But to apply computational methods to data sets, an infrastructure needs to be in place. “Infrastructure for cultural heritage” aims at “creating, storing and providing access to repositories of complex and nuanced digital (structured and unstructured) data from GLAM organizations for their use in research, as well as investigating the question of availability of digital resources as data, and accounting for biases and uncertainty in them.”67
Current work on infrastructure for cultural heritage is exciting but also limited in scope. Consider Living with Machines, a major research project at the British Library and the Alan Turing Institute, which reexamines the well-known history of the Industrial Revolution using data-driven approaches. Data sets from the nineteenth century (for example, digitized newspaper articles) do not present any issues with copyright and privacy, which are at the center of the problem of locked born-digital data. The Turing white paper also mentions the HathiTrust Research Center among the initiatives to answer humanities-related research questions through computational methods. The HathiTrust is a not-for-profit collaborative of academic and research libraries preserving seventeen-plus million digitized items (including about 61% not in the public domain). Copyright-protected texts are not available for download, which is inevitable given current copyright laws, but severely restricts access. The current infrastructure for cultural heritage stumbles on huge obstacles and often fails to make cultural assets accessible.
In his Library of Congress report, Ryan Cordell emphasizes the centrality of access to data to implement ML in libraries: “Data is more important than a bespoke tool because the latter constitutes a walled garden—potentially interesting, but limited—while a single machine-actionable dataset can spark many experiments, visualizations, interpretations, and arguments, both within the library and from outside researchers.”68 In other words, libraries and other cultural institutions should focus on providing access to data rather than on building expensive tools with limited potentialities. Cordell also suggests the “development of ‘data consortia' for ML research, following the models of shared resources such as OCLC or HathiTrust in the library community.”69
While I agree with many of Cordell's suggestions, his focus is mostly on digitized rather than born-digital materials. As a scholar of nineteenth-century literature and print culture, he encourages libraries to provide access to more digitized data. He thus writes, “our current digitized collections, while certainly large, comprise only a small subset of the analog collections held by libraries and other cultural heritage institutions.”70 Email archives are not mentioned in the ninety-seven-page report, despite their centrality for historians, literary scholars, and other humanists. Cordell seems aware of this limitation when he writes, “most current examples rely on out-of-copyright source materials, meaning the unique challenges raised by twentieth- and twenty-first-century materials remain unsolved.”71 But he gives few details about these challenges, which include copyright but also privacy. Indeed, the report uses the term “privacy” alongside “ethics,” as positive values that libraries should preserve in the wake of the tech giants' lack of concern for data protection. “By centering ethics, transparency, diversity, privacy and inclusion, libraries can take a leadership role in one of the central cultural debates of the twenty-first century,” he notes.72
This line of argument that equates privacy and ethics is not entirely satisfying. Of course, nobody would argue that libraries should disregard legitimate concerns for privacy and data protection (let alone ignore legal frameworks such as the GDPR in Europe). But privacy concerns are often used as excuses not to give access to archival materials. Is it ethical to withhold access on the ground that materials may be confidential? Or is it more ethical to give access to data that may be sensitive but not contractually proscribed?
Another problem is the contradiction between Cordell's support for the slow movement on the one hand and his appeal for libraries to quickly provide data to users on the other hand. Indeed, Cordell strongly criticizes the Silicon Valley ideology epitomized by Facebook's former motto, Move fast and break things.73 He insists that libraries and scholars should focus on building, not breaking. They should move slowly and deliberately, turning their backs on the unethical practices of tech giants. At the same time, libraries should “not wait for the data to be perfect, but instead present it as a pilot or prototype, learn from users, and refine from there.”74 This approach is inspired by Design Thinking, a human-centered method of solving business and social problems.75 To visualize and evaluate ideas, design thinkers create prototypes. These sketches and models make ideas tangible. The objective is not to produce something perfect; prototypes are “quick and dirty.” They allow designers to share their ideas with others and to obtain rapid feedback. The next phase is to test the model with users and refine it.
Conclusion
How can libraries and archives value slowness while also quickly providing access to data? How can they act “slowly and deliberately” while also responding to urgent needs for accessible cultural assets? I doubt that there are easy answers to these questions. Like Mark Greene and Dennis Meissner, I am convinced that the lack of access is a huge ethical vulnerability for libraries and archives. Applying the More Product, Less Process approach to born-digital materials is the best way to rapidly open up cultural data. Like Design Thinking, MPLP has been accused of being a neoliberal tool that breaks things. Rapid destruction is at the heart of capitalism, as Karl Marx and Friedrich Engels showed in the manifesto of the Communist Party: “All fixed, fast-frozen relations, with their train of ancient and venerable prejudices and opinions, are swept away; all new-formed ones become antiquated before they can ossify.”76 At first sight, slowness might be an attractive option for cultural institutions, leaving enough time for technical debates to play out and standards to emerge. But in practice, a slow speed leaves researchers and other users waiting in frustration, crying out for access to data held by gatekeepers.
Archives are meant to be used, not locked away. To unlock cultural assets, we need to work across disciplines and harness the latest technology. Access to digital archives is essential, but we also need to anticipate the moment when born-digital records will be more accessible. To make sense of this mass of data, new methodologies are urgently needed, combining traditional methods in the humanities with data-rich approaches. Collaborations between humanities scholars, computer scientists, archivists, and other stakeholders are therefore essential to make archives more accessible but also to design new methodologies to analyze huge amounts of data. AEOLIAN—a US/UK network on AI applied to cultural organizations—is part of a new wave of initiatives designed to foster such collaborations across disciplinary divides.77 The concept of “computational archival science” is also renewing the field of archival studies by bringing insights from computer science.78 While AI-powered tools and processes, such as sensitivity review, offer the hope to make more email and born-digital archives accessible,79 human oversight will remain essential to provide more data in an ethical way. Combining technology and human intervention to foster ethical accessibility to digital archives provides directions for future research.
Notes
For example, the Library of Congress has more than twenty petabytes of digital materials across its varied collections. (Digital Preservation Coalition discussion list, December 10, 2019.)
Jackie M. Dooley et al., Survey of Special Collections and Archives in the United Kingdom and Ireland (Dublin, Ohio: OCLC Research, 2013), 58, http://www.oclc.org/resources/research/publications/library/2013/2013-01.pdf, captured at https://perma.cc/6AB3-LJ3P.
Dooley et al., Survey of Special Collections and Archives in the United Kingdom and Ireland, 15.
For more on the application of the GDPR to European archives, see European Archives Group, “Guidance on Data Protection for Archive Services. EAG Guidelines on the Implementation of the General Data Protection Regulation in the Archive Sector” (October 2018), https://ec.europa.eu/info/files/guidance-data-protection-archive-services_en, captured at https://perma.cc/F9TQ-B44S. For a legal perspective, see Rónán Kennedy, “Data Protection and Archives,” Aura Network (November 2020), https://www.aura-network.net/2020/12/21/workshop-1-ronan-kennedy-data-protection-and-archives.
Christopher J. Prom, “Preserving Email,” 2nd ed. (Digital Preservation Coalition, 2019), http://doi.org/10.7207/twr19-01.
“Review, Appraisal, and Triage of Mail,” Ratom, https://ratom.web.unc.edu.
For a digital humanities perspective on born-digital archives, see Matthew Kirschenbaum, “The .txtual Condition: Digital Humanities, Born-Digital Archives, and the Future Literary,” Digital Humanities Quarterly 7, no. 1 (2013), http://www.digitalhumanities.org/dhq/vol/7/1/000151/000151.html; Adam Nix and Stephanie Decker, “Using Digital Sources: The Future of Business History?,” Business History (April 22, 2021), https://doi.org/10.1080/00076791.2021.1909572; Lise Jaillant, “How Can We Make Born-digital and Digitised Archives More Accessible? Identifying Obstacles and Solutions,” Archival Science 22 (2022), https://doi.org/10.1007/s10502-022-09390-7. See also Kirschenbaum's collaborative work with archives professionals: Matthew Kirschenbaum, Richard Ovenden, and Gabriela Redwine, “Digital Forensics and Born-Digital Content in Cultural Heritage Collections” (CLIR, 2010), https://www.clir.org/pubs/reports/pub149, captured at https://perma.cc/Q6CJ-VLZ3; Matthew Kirschenbaum et al., “Approaches to Managing and Collecting Born-Digital Literary Materials for Scholarly Use” (National Endowment for the Humanities Office of Digital Humanities, May 2009), http://drum.lib.umd.edu/handle/1903/9787.
Mark Greene and Dennis Meissner, “More Product, Less Process: Revamping Traditional Archival Processing,” American Archivist 68, no. 2 (2005): 234, https://doi.org/10.17723/aarc.68.2.c741823776k65863.
Greene and Meissner, “More Product, Less Process,” 252, n124.
Greene and Meissner, “More Product, Less Process,” 252.
Dennis Meissner and Mark A. Greene, “More Application While Less Appreciation: The Adopters and Antagonists of MPLP,” Journal of Archival Organization 8, nos. 3–4 (2010): 218, https://doi.org/10.1080/15332748.2010.554069.
Stephanie H. Crowe and Karen Spilman, “MPLP @ 5: More Access, Less Backlog?,” Journal of Archival Organization 8, no. 2 (2010): 125, https://doi.org/10.1080/15332748.2010.518079.
Crowe and Spilman, “MPLP @ 5,” 117.
James Worsham, “Our Story,” Prologue Magazine 41, no. 2 (2009), https://www.archives.gov/publications/prologue/2009/summer/history.html, captured at https://perma.cc/7JNR-B33K.
Theodore R. Schellenberg, Modern Archives: Principles and Techniques (Chicago: University of Chicago Press, 1956), 31.
Schellenberg, Modern Archives, 152.
Schellenberg, Modern Archives, 226.
Theodore R. Schellenberg, The Management of Archives (New York: Columbia UP, 1965), 111–12.
Ruth B. Bordin and Robert M. Warner, The Modern Manuscript Library (New York: Scarecrow Press, 1966).
F. Gerald Ham, “The Archival Edge,” American Archivist 38, no. 1 (1975): 9, https://doi.org/10.17723/aarc.38.1.7400r86481128424.
F. Gerald Ham, “Archival Strategies for the Post-Custodial Era,” American Archivist 44, no. 3 (1981): 207, https://doi.org/10.17723/aarc.44.3.6228121p01m8k376.
Ham, “Archival Strategies for the Post-Custodial Era,” 208.
Ham, “Archival Strategies for the Post-Custodial Era,” 211.
F. Gerald Ham, “Archival Choices: Managing the Historical Record in an Age of Abundance,” American Archivist 47, no. 1 (1984): 11–22, https://doi.org/10.17723/aarc.47.1.v382727652114521.
Ham, “Archival Strategies for the Post-Custodial Era,” 211.
Michael Strom, “Texas-Sized Progress: Applying Minimum-Standards Processing Guidelines to the Jim Wright Papers,” Archival Issues 29, no. 2 (2005): 105–12, https://www.jstor.org/stable/41102105.
Donna McCrea, “Getting More for Less: Testing a New Processing Model at the University of Montana,” American Archivist 69, no. 2 (2006): 284–90, https://doi.org/10.17723/aarc.69.2.f26251l316w02841.
Christine Weideman, “Accessioning as Processing,” American Archivist 69, no. 2 (2006): 274–83, https://doi.org/10.17723/aarc.69.2.g270566u745j3815.
Max J. Evans, “Archives of the People, by the People, for the People,” American Archivist 70, no. 2 (2007): 387, https://doi.org/10.17723/aarc.70.2.d157t6667g54536g.
Matt Gorzalski, “Minimal Processing: Its Context and Influence in the Archival Community,” Journal of Archival Organization 6, no. 3 (2008): 193, https://doi.org/10.1080/15332740802421915.
Rick Prelinger, “Points of Origin: Discovering Ourselves through Access,” The Moving Image: The Journal of the Association of Moving Image Archivists 9, no. 2 (2009): 164, https://www.jstor.org/stable/41164594. For an example of MPLP applied to audiovisual records and other documents, see Jeremy Mohr, “An Evaluation of More Product Less Process (MPLP) Processing Methods at the Provincial Archives of Saskatchewan” (master's thesis, University of Victoria, 2016), https://dspace.library.uvic.ca/bitstream/handle/1828/7715/Mohr_Jeremy_MPA_2016.pdf?sequence=1, captured at https://perma.cc/ZS43-FMML.
Tom Hyry, “Reassessing Backlogs,” Library Journal 132 (2007): 8–9.
Jeannette Mercer Sabre and Susan Hamburger, “A Case for Item-Level Indexing: The Kenneth Burke Papers at The Pennsylvania State University,” Journal of Archival Organization 6, nos. 1–2 (2008): 24–46, https://doi.org/10.1080/15332740802234771.
Robert S. Cox, “Maximal Processing, or, Archivist on a Pale Horse,” Journal of Archival Organization 8, no. 2 (2010): 134, https://doi.org/10.1080/15332748.2010.526086.
Laura McCann, “Preservation as Obstacle or Opportunity? Rethinking the Preservation-Access Model in the Age of MPLP,” Journal of Archival Organization 11, nos. 1–2 (2013): 23–48, https://doi.org/10.1080/15332748.2013.871972.
Jessica Phillips, “A Defense of Preservation in the Age of MPLP,” American Archivist 78, no. 2 (2015): 473, https://doi.org/10.17723/0360-9081.78.2.470.
Kimberly Christen and Jane Anderson, “Toward Slow Archives,” Archival Science 19, no. 2 (2019): 110, https://doi.org/10.1007/s10502-019-09307-x.
Christen and Anderson, “Toward Slow Archives,” 87.
Stefania Forlini, Uta Hinrichs, and John Brosz, “Mining the Material Archive: Balancing Sensate Experience and Sense-Making in Digitized Print Collections,” Open Library of Humanities Journal 4, no. 2 (2018): 35, https://doi.org/10.16995/olh.282.
Ellen LeClere, “Breaking Rules for Good? How Archivists Manage Privacy in Large-Scale Digitisation Projects,” Archives and Manuscripts 46, no. 3 (2018): 303, https://doi.org/10.1080/01576895.2018.1547653.
Timothy Robert Hart, Denise de Vries, and Carl Mooney, “Australian Law Implications on Digital Preservation,” IPRES 2019, 16th International Conference on Digital Preservation, Amsterdam, https://doi.org/10.17605/OSF.IO/EZ6FQ.
Michelle Moravec, “Feminist Research Practices and Digital Archives,” Australian Feminist Studies 32, nos. 91–92 (2017): 186, https://doi.org/10.1080/08164649.2017.1357006.
See Allison J. Brown, “‘Should I Stay or Should I Leave?': Exploring (Dis)Continued Facebook Use After the Cambridge Analytica Scandal,” Social Media + Society 6, no. 1 (2020), https://doi.org/10.1177/2056305120913884.
Yvonne Eadon, “‘Useful Information Turned into Something Useless': Archival Silences, Imagined Records, and Suspicion of Mediated Information in the JFK Assassination Collection,” InterActions: UCLA Journal of Education and Information Studies 15, no. 2 (2019), https://escholarship.org/uc/item/7pv1s9p7.
For more on the issue of discoverability, see Nadia Nasr, “More Product, Less Process: Adequate Metadata?,” 2018, https://scholarcommons.scu.edu/cgi/viewcontent.cgi?article=1194&context=library; Grace Therrell, “More Product, More Process: Metadata in Digital Image Collections,” Digital Library Perspectives 35, no. 1 (2018): 2–14, https://doi.org/10.1108/DLP-06-2018-0018.
Keith Pendergrass et al., “Toward Environmentally Sustainable Digital Preservation,” American Archivist 82, no. 1 (2019): 192, https://doi.org/10.17723/0360-9081-82.1.165.
Meissner and Greene, “More Application While Less Appreciation,” 204.
Adrian Brown, review of The Theory and Craft of Digital Preservation, by Trevor Owens, Archives and Records 40, no. 3 (2019): 314, https://doi.org/10.1080/23257962.2019.1664434.
Richard Feloni, “Jeff Bezos Shares His Best Advice to Entrepreneurs,” Business Insider, February 11, 2015, https://www.businessinsider.com/jeff-bezos-best-advice-to-entrepreneurs-2015-2, captured at https://perma.cc/T9P5-M3HN.
Of course, this is a simplified model. No institution is a perfect “mercenary” uninterested in end users. Research needs drive competitive collecting, and the acquisition of sought-after archival collections benefits users if these records are made rapidly available. However, institutions often rely on perceived research needs rather than on actual data from users.
Cited in Thomas Padilla, “Humanities Data in the Library: Integrity, Form, Access,” D-Lib Magazine 22, nos. 3–4 (2016), https://doi.org/10.1045/march2016-padilla.
See Sophia Lafferty-Hess and Thu-Mai Christian, “More Data, Less Process? The Applicability of MPLP to Research Data,” IASSIST Quarterly 40, no. 4 (2017): 6–13, https://doi.org/10.29173/iq907.
Cyndi Shein, “From Accession to Access: A Born-Digital Materials Case Study,” Journal of Western Archives 5, no. 1 (2014), https://doi.org/10.26077/b3e2-d205.
Susanne Belovari, “Expedited Digital Appraisal for Regular Archivists: An MPLP-Type Approach,” Journal of Archival Organization 14, nos. 1–2 (2017): 58, https://doi.org/10.1080/15332748.2018.1503014; see also Susanne Belovari, “Expedited Digital Appraisal for Regular Archivists: An MPLP-Type Appraisal Workflow for Hybrid Collections,” Journal of Archival Organization 16, no. 4 (2019): 197–219, https://doi.org/10.1080/15332748.2019.1682793.
Ben Goldman and Timothy D. Pyatt, “Security Without Obscurity: Managing Personally Identifiable Information in Born-Digital Archives,” Library & Archival Security 26, nos. 1–2 (2013): 42, https://doi.org/10.1080/01960075.2014.913966.
Although McEwan's email correspondence is closed to most researchers, other born-digital records are available on-site. See finding aid: “Many of McEwan's writings (including novels, screenplays, essays, and lectures, as well as unidentified documents), outgoing correspondence, photographs, and personal and professional documents exist as electronic files and are available to researchers. Two access copies are available,” https://norman.hrc.utexas.edu/fasearch/findingAid.cfm?eadid=01073, captured at https://perma.cc/55T5-XGE2.
Amy Hildreth Chen, Placing Papers: The American Literary Archives Market (Amherst: University of Massachusetts Press, 2020), 4.
Jon Cook to Ian McEwan, November 27, 1998, Box 31, Ian McEwan papers, Harry Ransom Center, Austin, Texas (hereafter referred to as HRC).
Christopher Bigsby to Ian McEwan, August 8, 2013, Ian McEwan emails, HRC.
Paul Gooding, Jos Smith, and Justine Mann, “The Forensic Imagination: Interdisciplinary Approaches to Tracing Creativity in Writers' Born-Digital Archives,” Archives and Manuscripts 47, no. 3 (2019): 374–90, https://doi.org/10.1080/01576895.2019.1608837.
Artificial intelligence (AI) is a large concept designating the creation of intelligent machines that can simulate human thinking capability and behavior. Machine learning (ML) is an application or subset of AI that allows machines to learn from data without being programmed directly. In practice, the terms “AI” and “ML” are often used interchangeably.
Elaine Feinstein, It Goes with the Territory (London: Alma Books, 2018), 228.
Ryan Cordell, “Machine Learning + Libraries” (Washington, DC: Library of Congress, 2020), 30, https://labs.loc.gov/static/labs/work/reports/Cordell-LOC-ML-report.pdf, captured at https://perma.cc/G3FL-8WXY.
See Lise Jaillant, ed., Archives, Access and AI (Transcript Verlag, 2022), https://www.transcript-publishing.com/978-3-8376-5584-1/archives-access-and-artificial-intelligence/?number=978-3-8394-5584-5.
Thomas Padilla, “Responsible Operations: Data Science, Machine Learning, and AI in Libraries” (Dublin, OH: OCLC, 2019), https://www.oclc.org/content/dam/research/publications/2019/oclcresearch-responsible-operations-data-science-machine-learning-ai.pdf, captured at https://perma.cc/G4GF-WV6A.
Barbara McGillivray et al., “The Challenges and Prospects of the Intersection of Humanities and Data Science: A White Paper from The Alan Turing Institute” (2020), https://doi.org/10.6084/M9.FIGSHARE.12732164.
McGillivray et al., “The Challenges and Prospects of the Intersection of Humanities and Data Science,” 12.
Cordell, “Machine Learning + Libraries,” 35.
Cordell, “Machine Learning + Libraries,” 53.
Cordell, “Machine Learning + Libraries,” 1.
Cordell, “Machine Learning + Libraries,” 38.
Cordell, “Machine Learning + Libraries,” 1.
Cordell, “Machine Learning + Libraries,” 1.
Cordell's interview with Benjamin Lee, “Machine Learning + Libraries,” 60.
Karl Marx and Frederick Engels, Manifesto of the Communist Party, 2nd ed. (New York: National Executive Committee of the Socialist Labor Party, 1898), 27, https://archive.org/details/manifestoofcommu00marx_1.
AEOLIAN Network, www.aeolian-network.net. See also AEOLIAN's sister project, AURA (Archives in the UK/Republic of Ireland and AI) Network, www.aura-network.net.
See Richard Marciano et al., “Establishing an International Computational Network for Librarians and Archivists,” in IConference 2019 Proceedings (iSchools, 2019), https://doi.org/10.21900/iconf.2019.103139; Nathaniel Payne, “Stirring the Cauldron: Redefining Computational Archival Science (CAS) for the Big Data Domain,” in 2018 IEEE International Conference on Big Data (Big Data) (2018), 2743–52, https://doi.org/10.1109/BigData.2018.8622594; Lyneise Williams, “What Computational Archival Science Can Learn from Art History and Material Culture Studies,” in 2019 IEEE International Conference on Big Data (Big Data) (2019), 3153–55, https://ai-collaboratory.net/wp-content/uploads/2020/02/Williams.pdf, captured at https://perma.cc/2UNS-LBXX.
For more on these technological advances (including sensitivity review), see Graham Mcdonald et al., “How the Accuracy and Confidence of Sensitivity Classification Affects Digital Sensitivity Review,” in ACM Transactions on Information Systems 39, no. 1 (2020): 1–34, https://doi.org/10.1145/3417334; Jaillant, ed., Archives, Access and AI.