Early collections-based digital projects and their infrastructure, including website platforms and software, digital asset management systems, information systems, metadata, and preservation protocols, serve as the foundation for many library, archives, and museum (LAM) repositories' ongoing efforts to organize, describe, and make digital assets widely accessible online. Completed projects, despite well-intended planning and execution, can become time intensive to maintain and migrate forward as new projects that meet fresh programmatic goals and current professional standards, become today's focus. Assessment of past projects, with the goal of making better decisions in the future (i.e., “lessons learned”), can be framed through an understanding of the term “technical debt,” a metaphor used within the software development community. The authors define and explore the concept of technical debt, relating it specifically to the archival field, and suggest a preliminary conceptual framework of technical debt to inform decision-making.

Technological advances over the last two decades have driven and enabled an explosion of digital collections work within libraries, archives, and museums (LAMs). The total output and impact of this work is impossible to quantify, but it includes websites that facilitate access to and discovery of collections; consortia and platforms for aggregating and sharing digital objects and their metadata; internal databases and collection management systems used to manage intellectual and physical control of materials; digital preservation systems and tools; and born-digital processing systems and workflows. Collecting digital assets and deploying systems to manage them brings unprecedented opportunities for discovery, but also challenges archivists, who must balance the ongoing tasks of preserving and providing access to records and unique materials regardless of their original format. While archivists continue to perform their traditional duties around collection development, appraisal, processing, description, reference, and records management, many of us must also incorporate vocabulary and skills from the technological and business sectors to meet reasonable—yet ever increasing—user expectations for transparent digital access and services.

The competing demands on archivists' time and expertise are apparent when we take a hard look at the growing pile of digital collections, their linked assets, and the systems in which they reside, all of which periodically require fresh attention. Responsible digital stewardship over decades presents challenges, particularly related to changing user expectations. Former projects, once dubbed “complete,” can weigh on repositories' execution of today's priorities, which must meet evolving tastes, standards, and preferences. As such, these iterative maintenance tasks go beyond the already rigorous demands of digital preservation and into a maze of more subjective choices about prioritization, repository values, and resource management. As archivists, we asked ourselves if we were actively and adequately engaging with digital project management decisions made in the past. Can archivists approach digital collections work with a strategy that helps manage future costs by anticipating them in the planning stages? We reflected on resource-intensive remediation projects at the University of North Carolina at Charlotte, Carnegie Mellon University, and Yale University's Beinecke Rare Book and Manuscript Library and documented our experiences in case studies to share at the Society of American Archivists' Annual Meeting. By examining commonalities in our shared experience, and through facilitated discussion at a subsequent symposium on digital preservation hosted by the Northeast Document Conservation Center in 2018, we developed the idea that adapting a metaphor of “technical debt” for archivists could begin to provide an effective conceptual framework for improved digital project planning and assessment.

The concept of technical debt, initially introduced in the commercial software development community, has potential relevance to archivists as they assess older digital projects and plan new ones. In software development, the term works as a metaphor to describe the impact of compromises in system code implementation and design, whether done inadvertently or purposefully, that negatively impact efficiencies of future code development. First articulated by computer programmer Ward Cunningham in 1992, the debt metaphor originally compared the prioritization of a short-term need (such as design of an urgently desired feature) over necessary maintenance of the overall project code base as an assumption of debt on the part of the coding team.1 The new feature may accelerate the appearance of the product's development to nontechnical stakeholders. However, the coding team assumes a debt that must be “repaid” by a future investment of time and resources into more comprehensive work that addresses internal quality aspects and sustainability of the base code. These improvements are often invisible to end-users, so they are easily neglected, but they are critical to the overall effectiveness of the system. The metaphor has since expanded in the software development field's professional literature and blog spaces to define different genres, or types, of technical debt, which can be understood, broadly speaking, as typical vulnerabilities in code design and project management. The term has also been incorporated into assessment techniques that seek to quantify technical debt's impact on business processes and workflows. In each evolution of the metaphor, it remains a way to think and communicate about choices and system design by relating the buildup of technical debt to the accumulation of interest.2

Substantial differences exist between the way archivists might confront technical debt and how the software development community currently defines it. First, while archivists may work closely with technologists and have some coding skills, a technical debt metaphor for archivists is not about software development, per se; it's about stewardship of their digital collections. Second, archivists manage collections with enduring or long-term value with technologies and tools that are comparatively ephemeral. The notion of perpetuity emphasizes the long-term consequences of decisions made in managing archives. Indeed, a “[technical debt] situation is exacerbated in projects that must balance short deadlines with long-term sustainability,” an apt descriptor of the tensions that may exist in an archival repository with both a preservation mandate and short-term grant-funded initiatives focused on access.3 In archives, the best purpose of a modified technical debt metaphor may be to develop a framework that helps archivists examine the strategies and work cultures driving decisions at their repositories and thereby prevent unmanageable technical debt from becoming an inevitable result of digital collections work. Remedying and mitigating technical debt need to be visible, transparent parts of archivists' workflows and decision-making processes to ensure sustainable access to digital collections. All digital projects will accrue debt, and not all debt can be absorbed or “paid off” with future resources. Archivists need to be able to anticipate and assume reasonable, defined types of debt that are acceptable to their organizations.

In 2003, software developer and consultant Martin Fowler expounded on Cunningham's original metaphor and created the “technical debt quadrant,” which moved toward a conceptual framework for thinking not only about the consequences of debt, but also the manner in which it is acquired. Fowler argued that all base code inevitably incurs cruft, or quality deficiencies of various types, that ultimately make it harder (and more resource intensive) to steer projects toward on-time and satisfactory delivery.4 Fowler's quadrant (see Figure 1) encourages readers to consider if the cruft, or debt, in question is acquired deliberately, and if the decision to acquire it was prudent or reckless. Prudent decisions may be understood as thoughtful or informed; reckless decisions may be understood as hasty and inherently more risky. Both exist on a spectrum that does not necessarily assign positive or negative values to prudence or recklessness but is instead focused on guiding the user toward thinking critically about what type of decision occurred and the circumstances that led to it.

FIGURE 1.

Fowler's Technical Debt Quadrant5

FIGURE 1.

Fowler's Technical Debt Quadrant5

Close modal

The juxtaposing values of deliberate/inadvertent are aligned on the y axis; the values of reckless/prudent are arranged on the x axis, creating a full spectrum for the intent and strategy behind decisions. The visualization thus forms a quadrant grid in which decision types can be categorized and oriented along a scale. Fowler also extends the metaphor to concepts familiar in personal finance. Technical debt can be expensive: it causes “interest” to compound in the form of future remediation work. That work, when eventually completed, is a type of payment that reduces the debt load. To actively manage existing debt, it is necessary to develop a payback strategy that prioritizes sustainable work that improves internal system quality over flashy new development. However, it is often more cost effective to make decisions that suppress debt from occurring in the first place.

Software consultant Steve McConnell revisited a practical exploration of the decision-making elements of debt incursion in a 2008 white paper. McConnell demonstrates that, in addition to the intentional and unintentional debt described by Fowler, the notions of long-term (investment) and short-term (rotating) debt can significantly impact repayment plans and product schedules. When teams can identify how technical debt decisions are made, McConnell argues that they are better able to measure and manage their debt tolerance and make the case to business managers for debt repayment projects. He then offers the reader practical solutions for tracking technical debt items, such as using bug tracking or a scrum backlog manager, and communicating technical debt issues to senior managers by using monetary language and incorporating debt discussion into regular product updates. McConnell concludes by offering readers a decision-making framework that recognizes a middle-ground between the “good” solution that incurs no debt and the “quick and dirty” solution that accrues interest in a less-controlled way, finally noting that “[o]ften a hybrid approach ends up being the best option.”6

The first annual workshop on technical debt in academic theory and practice was held in 2010 at the Software Engineering Institute at Carnegie Mellon University in Pittsburgh. From the workshop proceedings, several leading researchers produced a formative report and outlined a robust research agenda. The report identifies seven potential areas of research to help practitioners more usefully manage technical debt and to push it beyond a communication tool between engineers and managers to a more practical application. The first two areas identify refactoring and architectural opportunities (refactoring is “the restructuring of an existing body of code, altering its internal structure without changing its external behavior”).7 Three areas deal with measurement and metrics: identifying the source of debt, pinpointing how much debt is too much, and monitoring accumulation through automated tools. The final two address the environment: noncode artifacts (such as documentation) and identifying process issues to evaluate costs. While refactoring and system architecture are quite specific to software engineering, the other categories of technical debt are highly relevant for archivists. Areas that translate particularly well to digital collections in archives include monitoring and measuring products for broader debt factors, especially after initial decisions have been made; non-code artifacts such as project documentation; and process issues, or determining the management or human factor in questions surrounding technical debt. Expressed another way, these themes emphasize multilayered assessment and sound project management fundamentals.

Succeeding workshops took up these research questions and extended the inquiry to visualizing debt and creating so-called payback strategies,8 and efforts toward more concrete, actionable definitions that could aid and inform strategic decisions.9 The ninth workshop extended technical debt into project management by concentrating on its role within an agile development environment.10 By 2018, the workshop had evolved into a full-fledged conference. What started as a fairly simple metaphor to facilitate communication between software engineers and product managers had become a thriving area of research that explores the myriad, parallel definitions of technical debt and seeks to define it in a variety of contexts.11

Technical debt has also been examined as a consequence of decision-making styles or institutional culture, especially when the implications of sequential, related decisions build over time. Rios et al. conducted a systematic review that identifies planning deficiencies and unskillful project management as the most commonly cited causes of debt, shifting the metaphor from decisions about prioritizing code features to the broader ecosystem in which project-related decisions are made. Some of the other factors include a lack of knowledge or expertise among staff members, personality conflicts, weak documentation, and organizational issues such as resource allocation and business processes.12 Most often, technical debt is identified as a consequence of several of these factors in combination, all of which are relevant to archivists. Once causality is considered, the metaphor of technical debt expands and can be applied to any product or project-based teamwork environment.

Recognition of technical debt as a useful concept in libraries and archives has grown informally in recent years as a by-product of related discussions on systems and project management at conferences and events. Kevin Clair extended the technical debt metaphor explicitly to libraries and archives in 2016, writing:

Metadata can be thought of as the “codebase” necessary for properly functioning library discovery systems, institutional repositories, and collection management systems; the labor, or lack thereof, required to ensure sufficient metadata for a properly functioning system can be thought of as a down-payment toward the relief of that technical debt.13

After a thorough literature review identifying relevant types of technical debt, Clair drew a parallel between weak metadata (with metadata envisioned as a core intellectual output of library work) and weak base code in software development. Clair extended the technical debt metaphor by arguing that up-front library investment in good metadata protects against the future cost of extensive editing or improving metadata in library systems and further presented on this topic at Code4Lib in 2017.14 At Code4Lib in 2018, Kenneth Rose and Whitni Watkins discussed the notion of technical debt in the context of project management, arguing that clear project scope, articulated work roles and responsibilities, and documentation are key to managing and minimizing technical debt and have a place in any conceptual framework.15 Finally, at the Code4Lib 2019 meeting, Andreas Orphanides attempted to broaden the discussion of technical debt in libraries by suggesting a conceptual model that considers the impact of metaphorical “shear” on library systems. In this construct, technical debt is “the future cost of maintaining an existing system as it ages,” and Orphanides offered an approach to thinking about technical debt in libraries as engineers would consider the effects of shear on a structure.16 This approach conceptually strays from the original technical debt metaphor, which emphasizes trade-off costs of decisions in system development and design. However, the potential for the metaphor of technical debt to be both practical and useful within library, archival, and museum digital collections management is apparent.

We contend that all debt types, regardless of the working environment in which they occur, can be examined within three broad classifications of decision-making styles: strategic, tactical, or incremental.17 Within each of these styles, technical debt may accumulate, according to a team's or repository's vulnerabilities. More important, acknowledging decision-making styles introduces a degree of self-determination into assessing debt risk. Are decisions strategic (deliberate and in harmony with a project's or repository's long-term aims), tactical (carefully planned for a specific goal), or incremental (incidentally accruing over the course of the team's project timeline in often unforeseen ways)? This two-layered examination of debt, first into recognizable types (e.g., identifying weak documentation as a form of debt) and second, within a classification of decision-making styles (e.g., noting that documentation debt grew incrementally because the work of project documentation was piecemeal and never formalized as an end in itself), makes assessment more fruitful.

As we adapt the technical debt metaphor as a useful lens for assessing digital projects in archives, we begin with models that build on the work of the software community. The technical debt quadrant developed by Martin Fowler, for example, can be applied in a scenario that helps us better understand decision-making processes. An adaptation of Fowler's model is seen in Figure 2. This grid allows us to consider the nature of decisions triggering debt symptoms that may be familiar to those working in archives and demonstrates how such decisions appear on a spectrum of normal organizational thinking and behavior.

FIGURE 2.

Technical Debt Quadrant modified for archives

FIGURE 2.

Technical Debt Quadrant modified for archives

Close modal

Even so, because it primarily focuses on the decision-making categories of deliberate/inadvertent and prudent/risky,18 it does not address debt types and impacts. Debt types are important because they recur predictably within fields and, with documentation and understanding, can be anticipated. Debt impacts are important because, without labeling or measuring them, archivists cannot effectively leverage the technical debt metaphor as a means for communicating the nature of the problem and advocate for the time and resources needed to fix it.

The software engineering community began to unpack this relational problem at Dagstuhl Seminar 16162 in April 2016. By this time, researchers had surfaced a variety of contexts for thinking about technical debt and had come together to develop a working model upon which future research could be tested and refined. Noting the importance of context and precision, they defined technical debt for the software community as follows:

. . . Technical debt is a collection of design or implementation constructs that are expedient in the short term, but set up a technical context that can make future changes more costly or impossible. Technical debt presents an actual or contingent liability whose impact is limited to internal system qualities, primarily maintainability and evolvability.19

Drawing on this definition, participants developed a complex conceptual model that centers on identifying specific technical debt items serving as primary indicators of technical debt within a given system.20 In libraries and archives, these items might include metadata quality, image quality, project documentation, object modeling, system design, search interface issues, and others. The Dagstuhl model notes that when such debt is incurred, both its causes and its consequences can have a direct impact on business goals. In libraries and archives, we can point to significant impacts on the user experience, costs in staff time, and perceptions of value on completed projects, all of which influence archivists' and librarians' choices to expend scarce resources to maintain existing digital projects or to initiate new ones.

We propose a conceptual framework for understanding technical debt in archives.21 Like the Dagstuhl seminar and resultant model for technical debt in software development, this framework enters our discussion of technical debt as an iteration, building on previous work and providing a basis for further exploration. It is positioned firmly within the sphere of collections-oriented work (while many other important foci exist in archives, such as community engagement, education, and public services, we limit the scope of this proposed framework to collections and digital asset management). The framework is modeled in Figure 3 and begins with a Collection Management module recognizing the diverse scope of tasks for which archives professionals are responsible: this includes metadata creation, repurposing, and maintenance; all aspects of asset management, both for reformatted and born-digital assets; maintaining and exhibiting context; an array of preservation activities; ensuring access; and, increasingly, system design, selection, and enhancement (including interoperability between systems).

FIGURE 3.

Model of the Conceptual Framework for Technical Debt in Archives

FIGURE 3.

Model of the Conceptual Framework for Technical Debt in Archives

Close modal

Archivists share a common interest in preservation and facilitating access to their collections, generally described as archival functions in our model, and exercise decision-making styles in their execution of related work responsibilities. These archival functions therefore flow into the next section, Decision Styles, influenced by Fowler's quadrant spectrum of “deliberate/inadvertent.” Decisions are made, either actively (through collaboration or deliberation) or passively (through a failure to act),22 during the process of creating and maintaining collections in an attempt to enhance access, gain intellectual or physical control, or achieve other key functionality. In doing so, archivists may be vulnerable to generating something akin to the aforementioned cruft in systems, documentation, and workflows. The third module, Technical Debt, highlights some debt items common to archives. The final section, Consequences, suggests three main negative outcomes resulting from the acquisition of technical debt. We provide three categories of consequences to prompt archivists to think deeply about impacts of debt: resource costs, which are impacts to business functions or workflows generating an expense (or a loss); value, which reveals an intolerable mismatch between resources put into a project and the perceived results; and quality, which is perhaps the consequence most visible and obvious to patrons. Internal stakeholders may have a more sophisticated appreciation for the impact of quality on features such as search and retrieval; for example, users may not know what they are missing due to poor-quality metadata, while an archivist with intimate knowledge of a collection of assets may.

Archivists responsible for developing and stewarding digital collections may benefit from applying the metaphor of technical debt within a framework that speaks to their shared collections-oriented focus that emphasizes preservation, access, and/or discovery. The following cases were originally presented at the 2018 Society of American Archivists Annual Meeting in Washington, D.C., to shine a spotlight on the challenges that repositories face assessing past digital projects.23 We include them here because our experiences with these projects shaped our thinking about technical debt and led to the development of the framework previously introduced. The framework was developed in light of our experiences with legacy digital collections, and we hope by providing these examples and explaining how and where our workflows changed as we grappled with technical debt, we can further illustrate how this framework can be useful to archivists elsewhere.

University of North Carolina at Charlotte

The Special Collections and University Archives (SCUA) at J. Murrey Atkins Library began managing digital projects and collections at the University of North Carolina at Charlotte in the mid-2000s. The library started to use CONTENTdm to publish small collections of digitized materials in 2010 and received a multiyear digitization grant in 2013, which funded a digitization librarian coordinator for three years. As a result of the grant, SCUA quickly filled its CONTENTdm repository to capacity. At that point, the library decided to develop a locally hosted Islandora repository, which could support preservation and access, and accommodate more media types. The new digital collections repository, dubbed Goldmine, went live in 2015. SCUA's metadata librarian and digital production librarian began publishing new content in Goldmine and slowly migrated the CONTENTdm collections created after November 2013 (about 80 percent of the total content) into the new system. While cross-walking metadata and ingesting files into a new repository system required a considerable amount of effort, compounded by the added challenge of working with a new repository system, little technical debt had been accrued in these newer collections. Generally, project staff digitized these collections according to cultural heritage standards, selected appropriate subject terms, and documented their work.

These successes stand in stark contrast to the challenges that came next. After harvesting the low-hanging fruit of the digital collections described here, the digital production librarian took on migration of the sixteen remaining pre- 2013 digital collections created in CONTENTdm. These collections were smaller but carried a number of the technical debt signifiers noted in our conceptual framework. An initial assessment of these collections, which included about 2,800 images, identified inconsistencies in metadata, poor quality of digitized images, and a lack of documentation. This documentation debt proved especially problematic because staff turnover meant all of the staff members who had led Atkins Library's first digital projects in CONTENTdm had left. As a result, this loss of institutional memory had a multiplier effect on the technical debt that had already accrued.

The poor quality of the metadata was a key indicator of technical debt in these sixteen legacy collections and greatly impacted their usability. Digital objects were often described at the item level and frequently included detailed custom fields. While some of the detailed metadata was helpful, other lengthy descriptive notes were inaccurate—for example, misidentifying sports teams and individuals. Subject terms in one photographic collection regularly told the viewer if men, women, or children were present, but neglected broader terms that could let the viewer know what the photograph was about, such as farming or a nature park. Library of Congress Subject Headings were applied across many of the larger collections, but their interpretation, application, and entry into fields were inconsistent. Some collections had personal and corporate name, topical, and geographic subject headings inserted into a single field, while others were separated by type. Some photographic collections lacked source information, making it difficult, if not impossible, to associate digital surrogates and metadata with a specific print in the physical collection. Finally, file naming conventions were inconsistent and problematic: one collection gave digital assets descriptive file names as identifiers, such as “Car in fog,” while another contained typos in the CONTENTdm identifier field for almost every image.

In addition to the technical debt found in the metadata, additional interest was accruing on the hundreds of digitized images that did not meet current cultural heritage technical image quality standards. Low-resolution master files failed to meet digital preservation standards, thus making them inappropriate or unusable in a digital preservation environment. Other images had lost detail due to editing practices such as severe sharpening and cropping, neither of which reflect best practices for archival image capture. For some digital collections, only low-resolution JPEGs were available, and for one photograph collection, the digital surrogates were lost and could not be found on any library servers. Moreover, because CONTENTdm is a platform primarily focused on access (and not life-cycle digital asset management), this kind of mistake was easy to make. Prior to the new Goldmine digital repository, asset management had consisted of storing files on servers and metadata and documentation (when they existed) on shared drives.

Previous digital collection managers left behind very little documentation. They did not document their rationale for choosing one digital project over another, nor why some metadata schemas were adopted and then abandoned. We have some understanding of the decision to begin with CONTENTdm (a ubiquitous product in 2010 with few competitors), but very little understanding of how and why certain digital production decisions around image quality were made. Because of the lack of documentation, much of the technical debt accrued by Atkins Library's early digital collections appears to have been incremental and unintentional; it was possibly caused by project staff making expeditious access decisions on a project-by-project basis, rather than considering these decisions within a broader, programmatic context.

While previous intentions cannot be fully known, the decision-making that drove remediation can be described using the technical debt framework. The move to Goldmine was a critical first step. Adopting a full-scale digital asset management system to deal with the entire digital curation life cycle immediately broadened our field of vision about how we treat our digital objects. We moved from incremental, interest-accruing decisions that diminished the overall value of our digital offerings, to a more strategic and tactical approach within a more programmatic (rather than project-by-project) process. As a result, most of the metadata issues described were remediated. We paid off this debt as we moved forward in the new system. The same was true with some of the poor-quality images, which were rescanned to the Federal Agencies Digital Guidelines Initiative (FADGI) standards. We decided to both redigitize and expand one photographic collection that documents one of SCUA's collecting strengths and is also an area of interest among UNC Charlotte faculty. This decision reflects our shift toward developing strategic digitization projects that fit within the unit's collection development policy and include materials that are frequently used by researchers. For the weakest collections, which included several collections of images that could not be connected back to their source material and one poorly cropped book, we decided to declare bankruptcy and dispose of the digital assets, while diligently documenting our decisions. This work has resulted in higher quality collections for our users, and a more rigorously defined and transparent process for project staff.

Carnegie Mellon University Libraries

Carnegie Mellon University (CMU) Libraries have been creating and managing digital collections since 1994 when the libraries digitized Sen. H. John Heinz III's papers, consisting of over 110 cubic feet of materials. This digitization work began prior to the introduction of Dublin Core, EAD, or other community standards. At the time, the project team acknowledged the risk of developing local descriptive practices and made plans to revise the metadata at a later date. Unfortunately, that work never occurred due to resource constraints, and all future digitization projects followed the descriptive pattern set by Heinz, with poor metadata practices compounding our technical debt. Over the years, funding for digitization projects has decreased, further eroding support for creation of descriptive metadata and substantially sharpening the choice between remediating metadata and embarking on new projects to meet current needs, or—in terms of the debt metaphor—presenting archivists at CMU with the choice of paying down existing debt or spending that effort on newer, shinier digital initiatives.

With this context in mind, and preparing for a system migration, CMU Libraries embarked on an assessment of our digital collections that included a review of current (and past) system functionality, interviews with internal and external stakeholders, and a review of metadata and object relationships for approximately 750,000 documents. This analysis, as expected, uncovered major gaps in system functionality and substantial deficiencies in our metadata records that resulted from three core technical debt artifacts. First, our failure to adopt a standardized metadata schema or use controlled vocabularies at any point during the last twenty-five years resulted in substandard metadata quality. The second was the lack of descriptive metadata for archival documents and reliance on collection structure and item relationships for access. The original Heinz project focused heavily on representing the physical arrangement of the collection in an online space. That functionality was lost in subsequent system migrations, and, because of the dearth of descriptive metadata, current access to these documents relies largely on OCR. This effectively makes it impossible for users to find handwritten documents or graphical materials within archival collections. The third issue, incorrectly and inconsistently mapped metadata fields, was also, in part, the result of multiple system migrations—a twenty-five-year-long telephone game in which our metadata became increasingly distorted. Over the years, the archive completed four migrations of the digital library without metadata remediation. As a result, over time, fields were lost or transposed, and errors were introduced. For example, earlier versions of the access system used a field to denote whether a document was handwritten, which would potentially highlight issues related to OCR search reliance. However, that field—and its data—is no longer present in the current data model.

While most of these issues were known, at some level, prior to our assessment, reflecting on how those debt artifacts were introduced helped us identify and argue for the need to invest in a metadata improvement plan as part of our migration process. The lack of standardized metadata was the easiest to understand through the technical debt quadrant, as it was deliberate and arguably prudent, given there were no other options at the time. Our other two core debt artifacts—a lack of descriptive metadata and poorly mapped fields—were harder to analyze as they were not documented at that time. The lack of documentation seemingly points to that debt being largely inadvertent and somewhat riskier in nature as it grew unnoticed by archivists and staff until it reached a critical juncture.

Based on the scale of the identified debt, it was clear that we would not be able to address all our metadata issues at once. In determining which issues we would (and could) address, we looked at several factors, including visibility to users, whether it would be substantially more difficult to address after migration, and the likelihood of having resources to address the issue at a future date. First and foremost, we prioritized work that could be automated or done at scale, such as correcting fields, normalizing dates, and updating subjects to comply with authorities. Using tools like OpenRefine, we were able to make substantial improvements to our metadata very quickly, without the need to review anything at the item level, effectively paying down large portions of technical debt with relatively minimal effort. We also chose to improve meta-data that is most visible to users, such as overly complex and hierarchical file titles. Previous file titles included entire collection hierarchies and redundant information, for example: “Administrative Record—977 (bundled) (Economy/Business/Finance—Antitrust—Antitrust Enforcement Act (S. 300)—Illinois Brick Bill—1977–1979)” includes item date, folder date, folder title, series, and sub-series information. Parsing this information into individual, standardized fields not only makes it easier for users to read, it provides some future-proofing of the collection as we will be able to more easily create new system functionalities as time and technology allow.

The technical debt metaphor not only helped us create a pragmatic framework for addressing existing debt, it also encouraged us to be more thoughtful about our choices moving forward. We now articulate the goal of limiting inadvertent debt and document those choices that may lead to new debt in the future by implementing tools such as GitHub for issues and Google Sheets for tracking feature development and prioritization. We make deliberate decisions about what debt we are electing to pay down and when to best make that investment simply by discussing the technical debt consequences of doing things now versus later and maintaining a comprehensive list of prioritized postlaunch system and metadata work. We have undertaken some noncritical work, such as mapping URIs to our taxonomy terms, because the current cost to do that is low, and that is an area where we see potential debt issues in the future. Adopting the proposed technical debt framework has not eliminated debt from our digital projects. We have postponed other highly desired work, such as adding subject terms and improving the quality of titles, because, at the moment, the cost is too high and the work would have a substantial effect on our deadline, a choice that we hope is not only deliberate, but prudent as well.

The proposed framework encourages us to move past short-term digital projects, into long-term digital programs. While we have not eliminated the existence of technical debt at our repository, we can now understand it as an integral component and by-product of digital work that can be proactively managed. The technical debt metaphor has also encouraged us to think long-term about our choices, for example: to consider sustainability for projects before they launch, schedule regular assessments, and explore metrics that can help us document the impact of our investments.

Yale University, Beinecke Rare Book and Manuscript Library

The Beinecke Rare Book and Manuscript Library at Yale University is the university's largest special collection repository, and its digital library began in the late 1990s with scans from the Photonegative Collection. This artificial collection originally consisted of about 17,000 film negatives created over four decades in response to researcher photoduplication requests. In 2017, after more than a decade of reshooting images on demand and deleting the old ones, the digital Photonegative Collection consisted of about 11,000 images, representing about 7 percent of the Beinecke's total digital content. As staff prepared for a digital repository migration, they faced hurdles illustrative of the way technical debts accrue in libraries and archives: poor image quality, incomplete or inaccurate metadata, and a lack of contextual object relationships, all of which complicated the pathway forward. These issues had different origin stories grounded in past decisions, but they profoundly impacted the day-to-day work of planning and executing a migration.

Photonegative Collection images were typically black and white, digitized at a time when capture resolution was quite low and storage costs were high. Image quality (including resolution and contrast) was often below modern publication standards. Additionally, the original negatives were often cropped to suit the specific needs of the researcher requesting the image, creating a library of images that had little practical use for a wider audience.

This debt was not created intentionally. In the 1990s, it seemed obvious that it would be helpful to digitize this collection, as the negatives represented a valuable and expensive ongoing service the library provided: creating, issuing, and managing copy negatives of select materials for users. However, rapid advances in imaging technology and an associated heightening of end-user expectations regarding image quality and usability quickly rendered the ongoing management of the Photonegative Collection a questionable enterprise. As often happens in archives, an unspoken long-term commitment to preserving assets and collections, despite the obvious need for reassessment, shielded the collection until the migration shone a spotlight on the problem of image quality that the Beinecke could no longer ignore. Users now expect high-resolution, full color, well-lit images cropped just to the edge of the page. For years, the Beinecke reshot images on request and deleted the old surrogates, making an uneasy peace with the backlog of unusable, but ingested, scans. However, an examination of the issue through a framing of technical debt prompted the Beinecke to be more intentional about current practice to prevent ongoing debt of this type in the future.

The early digital library accrued two other types of debt, both of which were more deliberative. The first is related to the quality of image metadata. Photonegative description consisted only of minimal data written on the original negative sleeves. In many instances, images were described simply as “painting” or “photo” with the original call number of the source text transcribed. When images were scanned, the sleeve data were not enhanced or checked for accuracy. In the subsequent decades since the original film capture, call numbers were assigned and changed as the Beinecke processed and reprocessed collections. It is now often very time consuming, and sometimes impossible, to connect scanned photonegative surrogates to their analog counterparts. With the “reshoot on request” protocols described in place, this debt caused significant headaches for processors, metadata specialists, and public services staff as they attempted to locate originals to reshoot for researchers. Having learned from these headaches, procedures are now in place for processing archivists to notify digital library metadata staff about changing call numbers and locations for reprocessed materials. In addition, when planning for future digital library infrastructure, one key requirements became less duplication of data in disparate systems and more syncing between the catalogs of record and our digital collections platform.

Finally, most of the Photonegative Collection objects lack appropriate contextual parent-child relationships. Nearly everything else in the digital library, and all current ingests, collates images from one work (whether that work is a folder of archival materials, a bound item, or some other discrete unit) together under one parent record. When the Photonegative Collection was originally digitized and put online, it was not possible for the Beinecke to easily model such relationships. Instead, each image was, and remains, alone, disconnected from other images from the same work. This made attempts at online browsing for specific works time-consuming and tedious. Re-creating the relationships is equally so, requiring investments in staff time to ferret out relationships from minimal and incomplete metadata that could be spent enhancing other areas of the digital collections.

Because a decision was made not to verify or update metadata upon ingest, this debt was incurred intentionally. Current workflows make it unlikely that this exact form of debt would accrue again, but it lingered, as the library primarily invested staff time in creating new records instead of endlessly fixing old ones. Some categories of debt can be identified, but then, left alone; resolving them may not be worth the effort required, as long as the repository can make intentional and better decisions going forward about standards for objects. Not all debt begs full repayment or resolution when recognition of the debt can help create sound intentionality about current investments in resources and staff time.

Each of the three types of technical debt illustrated by the Photonegative Collection (which can also be seen in the framework we propose) are problematic when encountered alone, but frequently, they occur together. Here, the technical debt metaphor becomes particularly salient. As with all debt, after identifying the source(s), we had two choices: pay back the debt by reinvesting work in fixing the issue or write it off as a loss. After assessing the work necessary to fix all three of the issues, plus articulate the opportunity cost of devoting resources to solving those problems as opposed to creating new content, we decided to purge old scans from the Photonegative Collection when we migrated to a new system. This decision was not made lightly. However, staff were already replacing these images slowly over time on request, so the decision to delete the problematic portion of the collection liberates the Beinecke from the hunt-and-search work for originals and related surrogates that cascaded from these requests. By examining the origin story of these problems—identifying the context of the original decisions, looking at impacts passively or intentionally asserted over time, and reviewing current practices that supplant the old issues—the Beinecke was able to make better resource investment decisions and has informally incorporated a framework like the one we propose when considering new digital projects.

The three case studies presented here mark an initial effort to explore the commonalities in the types of technical debt identified in the conceptual framework. Our model illustrates the place where debt occurs across a variety of projects: after a team or archivist makes a decision regarding any standard archival function. Even so, three case studies alone cannot possibly demonstrate all the types and implications of potential technical debt, nor could we offer a case study that illustrates the mitigation of technical debt from the beginning of a project, which might require a forward-thinking, designed experiment rather than the backward-looking lens applied here. Nevertheless, viewed through the conceptual framework, the three case studies can help us begin to understand some of our shared challenges and assumptions.

The UNC–Charlotte case study, for example, introduced us to the ubiquitous nature of documentation debt. In fact, all the case studies show how weak documentation can occur at the project, program, and/or organizational level, often exacerbating the other types of technical debt found in the framework. Documentation debt can also indicate the areas where incremental, passive decision-making operates as an interest-creating mechanism on our work. Additional study by the archival community to refine and understand the causes of pervasive documentation debt could prompt some useful reflection, including on the role of unseen labor and the scarcity of staffing bandwidth in digital collections work.

Likewise, looking at the Carnegie Mellon case through the technical debt framework reveals a deliberate effort to mitigate its own documentation debt—or, at the very least, make it more transparent. More important, it demonstrates a learned level of planning that merits further exploration of more debt conscious and programmatically sustainable digital workflows and best practices that pay closer attention to both strategic and tactical decision-making processes as we embark on specific projects.

Finally, viewing the Beinecke case through the framework emphasizes how declaring bankruptcy on debt-laden projects can release organizations from a perceived obligation to sustain the unsustainable. Too often, the act of sunsetting a project is viewed as a failure, rather than a reasonable response to changing circumstances. This extension of the metaphor is critical to both the “decisions” and “consequences” modules of the framework, and the model could benefit from additional investigations into how and when a close examination of the origins of technical debt should lead to decisions about retirement or repayment.

All this notwithstanding, we maintain that—when carefully applied—the framework helps prevent oversimplification of the technical debt metaphor, such as an assumption that “shortcuts now equal more work later.” After all, shortcuts can represent sound decision-making (think: “more product, less process”). Moreover, inefficiencies or complexity in systems or processes are not necessarily bad; they are only problematic when staff and users are repeatedly slowed down by interacting or working around them. To that end, additional study that centers agency and the human role in technical debt management could potentially expand the framework in meaningful ways.

Because collections work is continuous and part of an overall life cycle of the archival enterprise, the consequences of technical debt diminish our ability to do core collection management. Debtful decisions of the past create the foundation for future collections management work and decisions, forcing the type of work that pays interest on the debt, rather than new, desirable features or functionality. When our work of managing and developing collections and digital projects is roadblocked by issues posed by metadata or system infrastructure from past projects, inhibiting innovation, we can be mindful of technical debt's impact and assess where to invest resources now.

The technical debt metaphor offers archivists a practical way to assess and approach digital project and collection management work and understand common pitfalls in project design and execution. It can help us understand why working with older assets, legacy project infrastructure, and incomplete documentation can be so time-consuming and challenging; it also helps us approach and prioritize the work to overcome it and discuss the problem with stakeholders and administrators. If we understand our problems as “cruft in the code,” we can pinpoint the places where the so-called base code needs to be rewritten so our organizations and collections can move forward to the next system, platform, or innovation. Technical debt for archives recognizes the perpetual life cycle of digital projects in the context of collection management and calls out weak documentation as a debt item itself, not a consequence or afterthought. We encourage archivists to employ the framework to examine how their institutional culture and decision-making processes have impacted debt accumulation. What perpetual interest payments have held your organization back from new and innovative work, and what is essential to help move you forward? Applying a technical debt framework for archives will result in better understanding of organizational debt loads, decision-making styles, and tolerances, which will lead to better outcomes for our diverse audiences and communities.

1

Ward Cunningham, “OOPSLA '92 Experience Report: The WyCash Portfolio Management System.” March 26, 1992, http://c2.com/doc/oopsla92.html, captured at https://perma.cc/7WF6-UP5H; and Ward Cunningham, “Debt Metaphor” YouTube video, 4:43, February 14, 2009, https://www.youtube.com/watch?v=pqeJFYwnkjE.

2

Monetary debt is not a literal outcome, though system deficiencies may lead to financial implications.

3

Nanette Brown et al., “Managing Technical Debt in Software-Reliant Systems,” in FoSER '10: Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research (New York: ACM, 2010), 47, https://doi.org/10.1145/1882362.1882373.

4

Martin Fowler, “Technical Debt” MartinFowler.com, May 21, 2019, https://martinfowler.com/bliki/TechnicalDebt.html, captured at https://perma.cc/8RTA-4HYJ.

5

Martin Fowler, “Technical Debt” MartinFowler.com, October 14, 2009, https://martinfowler.com/bliki/TechnicalDebtQuadrant.html, captured at https://perma.cc/V83U-727K.

6

Steve McConnell, “Managing Technical Debt” (white paper, Construx Software, June 2008), 12.

7

Brown et al., “Managing Technical Debt in Software-Reliant Systems,” 49.

8

Philippe Krutchen et al., “Technical Debt in Software Development: From Metaphor to Theory, Report on the Third Workshop on Managing Technical Debt,” ACM SIGSOFT Software Engineering Notes 37, no. 5 (2012): 36–38, http://doi.acm.org/10.1145/2347696.2347698; Paris Avgeriou et al., “Technical Debt: Broadening Perspectives. Report on the Seventh International Workshop on the Management of Technical Debt (MTD 2015),” ACM SIGSOFT Software Engineering Notes 41, no. 2 (2017): 38–41, https://doi.org/10.1145/3041765.3041774; and Clemente Izurieta et al., “Technical Debt: A Research Roadmap. Report on the Eighth International Workshop on the Management of Technical Debt (MTD 2016),” ACM SIGSOFT Software Engineering Notes 42, no. 1 (2017): 28–31, https://doi.org/10.1145/3041765.3041774.

9

Philippe Kruchten et al., “Technical Debt: Towards a Crisper Definition. Report on the 4th International Workshop on the Management of Technical Debt,” ACM SIGSOFT Software Engineering Notes 38, no.5 (2013): 51–54, https://doi.org/10.1145/2507288.2507326; Clauirton A.Siebra et al., “Theoretical Conceptualization of TD: A Practical Perspective,” The Journal of Systems and Software 120 (October 2016): 219–37, https://doi.org/10.1016/j.jss.2016.05.043; Carolyn Seaman et al., “Using Technical Debt in Decision Making: Potential Decision Approaches,” in 2012 Third International Workshop on Managing Technical Debt (MTD) (Zurich: IEEE, 2012),45–48, https://doi.org/10.1109/MTD.2012.6225999.

10

Francesca Arcelli Fontana et al., “Technical Debt in Agile Development: Report on the Ninth Workshop on Managing Technical Debt (MTD 2017),” ACM SIGSOFT Software Engineering Notes 42, no. 3 (2017): 18–21, https://doi.org/10.1145/3127360.3127372.

11

For example, in Zengyang Li et al., “A Systematic Mapping Study on Technical Debt and Its Management,” Journal of Systems and Software 101 (March 2015): 193–220, https://doi.org/10.1016/J.JSS.2014.12.027, researchers identified up to ten types of technical debt, including requirements debt (the distance between optimal requirements specs and actual implementation); architectural debt (the decisions that compromise quality for functionality); design debt (shortcuts taken by system architects); code debt (poorly written, violating best practices or coding rules); test debt (insufficient or total lack of testing); build debt; documentation debt; infrastructural debt (indicating a suboptimal configuration of development-related processes, technologies, and supporting tools); versioning debt (incorrect or sloppy version control), and defect debt.

12

Nicolli Rios et al., “A Tertiary Study on Technical Debt: Types, Management Strategies, Research Trends, and Base Information for Practitioners,” Information and Software Technology 102 (October 2018): 117–45, https://doi.org/10.1016/j.infsof.2018.05.010.

13

Kevin Clair, “Technical Debt as an Indicator of Library Metadata Quality,” D-Lib Magazine 22, nos. 11–12 (2016), https://doi.org/10.1045/november2016-clair.

14

Kevin Clair, “A Technical Debt Approach to Metadata Management” (presented at Code4Lib, Los Angeles, March 8, 2017).

15

Hillel Arnold, “Managing Technical Debt: Code4Lib 2018 Report,” Bits & Bytes: The Rockefeller Archive Center Blog (February 22, 2018), https://blog.rockarch.org/managing-technical-debt-code4lib-2018-report, captured at https://perma.cc/77RK-6H4K; and Whitni Watkins and Kenneth Rose, “Dealing with Technical Debt, a Point of View: DevOps and Managerial” (presented at Code4Lib, Washington, D.C., February 14, 2018), https://osf.io/7qf9c.

16

Andreas K. Orphanides, “Shear Forces: A Conceptual Model for Understanding (and Coping with) Risk, Change, and Technical Debt” (presented at Code4Lib, San Jose, February 21, 2019).

17

Explicit and implicit discussion of these three broad types of technical debt is ubiquitous in the literature. In his 2008 white paper (see fn 6), McConnell discusses iterations of strategic and tactical debt, focusing on the former. Incremental debt is, likewise, explicitly discussed in Rick Brenner's “Controlling Incremental Debt,” Technical Debt for Policymakers: Resources for Policymakers Concerned with Managing Technical Debt (December 25, 2018), https://techdebtpolicy.com/controlling-incremental-technical-debt, captured at https://perma.cc/A5SY-G8WZ.

18

We substitute “risky” for “reckless” in our adaptation of Fowler's original model, to emphasize that relative risk is a more appropriate and helpful benchmark on the spectrum, without the negative judgment value that may be associated with “recklessness” in a professional setting.

19

Paris Avgeriou et al., “Managing Technical Debt in Software Engineering (Dagstuhl Seminar 16162),” Dagstuhl Reports 6, no. 4 (2016): 110–38, https://doi.org/10.4230/DagRep.6.4.110.

20

See Figure 1, in Avgeriou et al., “Managing Technical Debt,” 113.

21

We believe this framework is extensible beyond the archives and into other LAM environments, even as this article focuses on archival practice and related digital activity.

22

n.b. A failure to act is not always—or necessarily often—the product of neglect or inattention. Sometimes archivists will fail to take action because they are not able to do so. This can be due to system inadequacies or broader organizational challenges.

23

Déirdre Joyce et al., “True Confessions: Paying Off the Technical Debt of Early Digital Projects,” (conference presentation, Society of American Archivists Annual Meeting, August 16, 2018).

Author notes

Déirdre Joyce is head of Digital Stewardship and the Digital Library at Syracuse University Libraries, where she started as the metadata services librarian in 2017. Her previous digital collections and archival experience includes serving as project coordinator for New York Heritage Digital Collections and as founding project manager for the Empire Archival Discovery Cooperative for the Empire State Library Network in New York State. Prior to this, she worked as the university archivist at the University of Texas at Tyler. She received both her master's degrees in history and library and information studies from the University of Wisconsin–Madison.

Laurel McPhee is the supervisory archivist for UC San Diego Library's Special Collections & Archives Program, where she collaborates with a team of librarians, manuscript processors, project archivists, and students who work to make collections of personal papers, organizational records, unique visual resources, and digital objects preserved and accessible. McPhee earned her AB from Harvard College in Cambridge, Massachusetts, and her MLIS from the University of California, Los Angeles.

Rita Johnson is head of Digital Initiatives at the University of Miami Libraries. She previously worked as the digital production librarian at J. Murrey Atkins Library Special Collections and University Archives, University of North Carolina at Charlotte. She received her MSLS from the University of North Carolina at Chapel Hill and a BA in history from the University of North Carolina at Asheville. Prior to her time at UNC Charlotte, she worked at the David M. Rubenstein Rare Book & Manuscript Library at Duke University in the Digital Production Center.

Julia Corrin has served as the university archivist at Carnegie Mellon University since 2012, where she works with a small team of archivists to document the university. In addition to leading the University Archives, she currently heads a cross-disciplinary team to migrate the libraries' digital collections. Corrin received her MLIS from the University of Michigan. She previously worked as the political collections and access archivist at Arkansas State University.

Rebecca Hirsch is the head of Digital Library at the University of Edinburgh. Previously, she was the head of the Digital Services Unit at the Beinecke Rare Book & Manuscript Library at Yale University from 2016 to 2021. As such, she was responsible for the Beinecke's digitization program and its digital collections platform. She also has worked at Yale University Library in a number of other digital-focused positions, and at the University of Southern California's Special Collections and the National Archives and Records Administration. She received her MA in history with a concentration in archives from New York University and her MLIS from Long Island University.