3 What Difference Has Digitization Made?, Tom Elliott

What Difference Has Digitization Made ?

Tom Elliott, New York University


It’s customary in talks about the digital classics to start the era with David Packard’s concordance to Livy, published in four printed volumes in 1968. This work — although deliberately unoriginal in its conception and delivery mechanism — nonetheless demonstrated conclusively that computational methods were both ready and appropriate even then to apply to philological research and reference work.


Packard’s Livy started after, but appeared before, another watershed achievement in philological computingRoberto Busa‘s monumental Index Thomisticus. First conceptualized in the 1940s, and supported for years by IBM, this comprehensive lemmatization and concordance of the complete works of Thomas Aquinas was prepared using punch cards and published in 56 print volumes in the 1970sA web-based version was published in 2005 and work continues to build on its foundations, employing treebanking and other modern methods of natural language processing.


       Classics has struggled over the subsequent half century to embrace computing in a way that evenly recognizes its practitioners as equal partners in scholarly endeavor. But despite this difficulty, there can be no denying that computing has changed the field irrevocably. Indeed, Marianne McDonald’s proposal for “the creation of a computerized databank of Greek Literature” closely followed Packard’s concordance in 1971, eventually bringing forth the earth-shaking Thesaurus Linguae Graecae (a project in which Packard and McDonald both played significant roles). The TLG is a uniquely pivotal work, resetting both expectations and methods field-wide, even though its full potential has been blocked by the aggressive enforcement of its restrictive license.


       But other flowers — increasingly varied in both beauty and function — have spread more freely across the intellectual landscape, products of both radical experimentation and patient gardening. Digital publications and research tools in epigraphy, numismatics, papyrology, philology, history, archaeology, and geography have proliferated (for a sampling, see the Digital Classicist Wiki and its list of Projects. And new ones — and new ways of using them — continue to emerge. The program for our present meeting illustrates this fact. Some naive word-searching turns up 35 unique abstracts that include terms like “digital,” “computer,” and “on line.” I’ll assert that the number of papers that depend on computation for data collection, record management and analysis is even higher, though the use of same has become so ubiquitous as not to merit mention in many an abstract.


       But if you want to come to close quarters with where classics has got to by way of digital methods in philology, and the places it might go from there, you must make plans to attend this afternoon’s session on “reconnecting the classics,” organized by the Digital Classics Association. The session features eight important voices in the digital classics, considering everything from the latest tools in computational philology to the challenges and opportunities presented by current crises in the undergraduate curriculum and classics praxis.


       Another session worthy of your attention has been organized by the Publications and Research Committee and scheduled for Sunday morning. It focuses on the Digital Latin Library, a joint initiative of the SCS, the Medieval Academy of America, and the Renaissance Society of America, that has been funded by the Andrew W. Mellon Foundation. The five papers in this session will highlight the ways in which the DLL blends traditional Latin philology with new approaches from the field of digital humanities.


       Early digital philology and its progeny constitute but one of the paths that leads to digital geography in the classics. For a fuller picture, we have to reach back earlier, to the antecedents of computing itself. Ada Lovelace‘s 1843 diagram for the computation of Bernoulli numbers is considered by many to be the first published algorithm intended to run on a computer. The computer in question was the so-called analytical engine envisioned by Charles Babbage, Lovelace, and their collaborators. Although the machine was never built, and so Lovelace’s algorithm was never implemented, it is one of the earliest antecedents of everything that happens today on your phone, tablet, laptop, or other device that includes a microprocessor.


       It then took a hundred years or so, but by the mid 20th century, an explosion of computing innovation was imminent. The explosive had been compounded of such volatile ingredients as the enlightenment science that Babbage, Lovelace, and their contemporaries inherited; and the wealth imbalances, state actors, and corporate powers that had emerged from colonialism, the Atlantic slave trade, the new imperialism, capitalism, and industrialization. To these were added other ingredients like the mechanized armies and industries that had been spawned by World War II and subsequently turned their attention and resources to a global Cold War of unprecedented technological complexity.


       Mathematicians like Dorothy VaughanGrace Hopper, and Margaret Hamilton were writing the first machine-language programs to calculate rocket and artillery trajectories; developing the first symbolic programming languages that would make software more versatile and maintainable; and coding the software that made it possible — 50 years ago this July — for Neil Armstrong and Buzz Aldrin to land on the moon and return safely to earth, thereby demonstrating (among other things) that the US had the technological and industrial capacity to deliver nuclear warheads anywhere on the earth with speed and precision. It was the space age, and the information age was swelling in its head, ready to leap forth into the fields of war and commerce.


       And into mapping for the classics, as it turns out.


       Let’s chart our return to the subject of this panel by way of our sibling discipline: geography. More specifically, let’s consider the gift that our metaphorical information-age Athena has bestowed upon that field, namely: Geographic Information Systems. The story starts in 1962 with Roger Tomlinson and the Canada Land Inventory. This project, overseen first by Canada’s Department of Forestry and Rural Development, and later by the Department of Energy, Mines and Resources, compiled maps and other information indicating the capability of land to sustain agriculture, forestry, recreation and wildlife. Given the way I’ve characterized the development of the information age, you’re probably not surprised that I’d identify the modern nation state as the primary actor in GIS and that its focus would be land management and inventory of exploitable resources. This theme dominates the early decades of GIS and remote sensing, which saw the introduction of a series of open-source and commercial software packages built to serve the interests of the nation state and of private commerce. Key domains of interest included not only land use and natural resource management, but also census, navigation, military planning, utilities and infrastructure, emergency response, and city and regional planning. The passage of the Archaeological and Historic Preservation Act of 1974 in the US added cultural resource management and archaeological impact assessment to the GIS menu, quickening its adoption as a standard tool in archaeological work.


       I’ve included on the slide you see the infamous “GIS Layer Cake” illustration. It will be familiar to those of you who have had occasion to learn GIS in a formal environment or to consult standard textbooks or help documentation. The illustration is here to help me point out one of the key legacies that GIS received from conventional cartography: the idea of layers. Richard Talbert has already mentioned the process whereby maps were composed photographically from layers of film. Most of these layers were thematic, dividing the map’s content into manageable groupings that required common visual treatments such as shaded relief, elevation tint, drainage line work, cultural symbology, labeling, and so on. Thematic layers made for intuitive data structures and associated interaction artifacts on-screen, and so they made their way into GIS. But GIS transcends its cartographic forebears in part by putting these layers to work for more than cartographic representation. The constitution of a map — its bounds, extent, projection, final scale, color scheme, labeling — can be deferred, revised, and iterated upon throughout a production or analysis project. Layers can be created by computation against the content of other layers and used solely as a data source or interstitial calculation result that never itself appears in the final map.


       But a final map is often not really the goal anymore. Most of the features provided by GI systems and most of the time spent working with them involves geospatial information management and spatial analysis, not cartographic production. That is, GIS is used to test hypotheses and answer research questions that are geographic in nature. Maps are incidental to the process: data visualizations well suited to human supervision of the analysis and illustration of its results. To be sure, the rhetorical power of an authoritative map remains strong when the goal is to convince others to adopt a course of action or accept an argument, but we should think of GIS as much more than a tool for map-making. Many other tasks that, in an earlier age, would have preceded map production – or that would have occurred afterwards in one or another use context – have been bundled up with the strictly cartographic functionality into a single software environment augmented with additional capabilities and complexities that would have been beyond the reach of pre-computational geographers.


What are we doing with all this? Geographic information systems have put unprecedented analytical capabilities in the hands of classicists. Geospatial information structures and their attendant computational methods let us analyze visibility of features in a landscape, interrogate cost surfaces to estimate travel times, construct networks of places mentioned in literary works, calculate Thiessen polygons to approximate the extents of linguistic or administrative regions, and more.


       We can also use GIS to create spatial datasets for use by others. Among the many achievements of the Ancient World Mapping Center in the 18 years since its founding is the creation of spatial datasets suitable for a variety of analytic and cartographic purposes. These datasets are being used by others not only because of their uniqueness and quality, but also because they are distributed free of charge under open license. We may get to hear a bit more about this shortly in Lindsay Holman’s paper.


       I would be remiss if I did not mention, in this of all sessions, that the AWMC owes this heritage of successful open-access publishing to the Society for Classical Studies. For, although we identified openness as a core value for the new Center, it was the SCS that granted permission for the reuse of the Classical Atlas Project’s compilation materials without charge for the benefit of the field.


       And so the SCS helped bring to life another of the AWMC’s data publications: the Pleiades gazetteer of ancient places. I think that Lindsay will probably speak about Pleiades as well, but I hope she’ll forgive me if I say a few words about its design and about the current state of geospatial data work in ancient studies, which depends heavily on gazetteers like Pleiades. For a gazetteer is a thing that any mapmaker wants, whether working in analog or digital media.


       What is a gazetteer? Basically, it’s a geographic reference work: a big bucket full of reasonably consistent information about places. In traditional print, a gazetteer takes the form of a list of placenames, together with their known variants, associated spatial coordinates, and other distinctive attributes such as feature type (for example, city, temple, or forest). They have sometimes contained additional information, or even longer descriptive or historical narrative, blurring the lines with things we sometimes call almanacs or geographic dictionaries. At the simple end of the spectrum, they can serve as indexes into a map or group of maps. Such a bare-bones gazetteer appears at the back of the Barrington Atlas.


       The notion of a digital gazetteer solidified in the information science community of the 1980s and 90s. As the idea of digital libraries began to take shape, gazetteers were re-conceived as controlled vocabularies of placenames that could be used to facilitate activities such as automatic document linking, geoparsing of texts for named place references, and for regularization during data entry. The geospatial data included in digital gazetteers could be used to enable spatially aware search functions and to map arbitrary collections of content produced, for example, by a search engine. Influential pioneers in this area included Linda Hill and Ruth Mostern for the Alexandria Digital Library project at UC Santa Barbara, and David Smith and Greg Crane for the Perseus Digital Library project at Tufts University. Crucially, the complex spatial analysis and cartographic visualization capabilities of GI systems were not seen as components of a digital gazetteer. One might use GIS in the development and maintenance of a gazetteer. One might also load the contents of a gazetteer into one or more layers in a GIS to combine with other data for map-making or analysis. But the gazetteer itself was thought of as a database or dataset, together with a few computational services meant to support its use in another digital system.


       In deciding to take the Atlas Project’s information digital and on-line, we eventually concluded that this idea of a digital gazetteer better fit our data and our mental model than a GIS.


       One particular constraint of most GI systems was inimical to our data: the expectation that every feature of interest should have a well-defined spatial geometry (a point, line, or polygon). The limits of horizontal accuracy and the numerical precision of spatial coordinates can be codified in GIS (usually at the layer level), and even topological relationships with other features can be modeled, but features whose locations are wildly uncertain must nevertheless be represented with bounding boxes or representative points if they are to make an appearance in the GIS at all. The compilers and editors of the Classical Atlas Project reflected both the ancient sources and modern scholarship in giving us tons of such places. In the Barrington Atlas you’ll find them grouped in special sections labeled “Unlocated” in the Map-by-Map Directory volumes.


       Further considerations multiplied quickly. Many types and degrees of uncertainty obtained in our data. To take one example: how confident are we that a toponym appearing in an ancient witness is to be associated with a particular archaeological site or presumed locality? Richard and I enumerated such concerns in a paper that first appeared in 2002. It has now been reprinted in Richard’s “Challenges” book.


       We also envisioned a body of spatial information that might be under periodic revision at the individual place level. New archaeology, new texts, new analyses might all necessitate updates or corrections. Volunteer fieldwork with hand-held Global Positioning System receivers might contribute coordinates with more precision than the limited scale of the atlas maps had provided. New work with better satellite imagery might someday do the same. In this regard, our information-age Athena played more tricks on us. The chief agent of disruption this time was Google, whose relentless pursuit of on-line ad revenue led them to invest in the spatial aspects of digitally mediated commerce, giving away masses of map data and aerial and satellite imagery together with information about the locations of commercial entities and the way-finding services necessary to reach them. We’ve all paid for this apparent largesse, of course, by submitting our related behaviors to Google’s scrutiny, analysis, and monetization schemes. Another, less creepy, source of surprise was the OpenStreetMap community, which has dedicated itself to producing a user-created map of the world whose content is free and open by design. It has made a lot of progress, and Pleiades relies increasingly on their work.


       The gazetteer model has proved flexible enough to accommodate all these eventualities and more. Our commitment to openness complemented this flexibility by freeing us from the responsibility to reinvent and maintain GIS functionality on line for our users. If you want to do spatial analysis that involves Pleiades data, download the data and pull it into your GIS of choice. That’s what it’s there for.


       Identification as an open digital gazetteer also positioned Pleiades to play a leading role in reconfiguring the notion of digital gazetteers themselves. Pleiades serves not just one particular, curated digital library, but instead multiple arbitrary tools and publications distributed across the world-wide web and maintained by a variety of third parties. Providing the catalyst this time was Tim Berners-Lee, the inventor of the web, whose thoughts on stable identifiers and Linked Data influenced us significantly. I’ve spoken recently about Linked Open Data, scholarly digital publication, and the value of the computationally actionable citation that results. The text of that talk is on my website, so I won’t go into much detail here. Suffice it to say that scores of digital scholarly publications around the world are aligning the geographic aspects of their content to a growing number of general and specialized digital gazetteers that are also aligning themselves to each other. As a result, the probability is growing that the data a classical scholar assembles to begin or augment a particular study will already be collated along geographic lines. You won’t have to sort out whether the inscriptions cataloged in one dataset you downloaded are from the same Alexandria as those in another dataset. Both datasets will cite a relevant gazetteer entry, leaving the ambiguity of textual reference behind. This method is already expanding beyond geography to other facets of relevance like prosopographical identity, named time periods, calendrical dates, and cross-language technical terminology.


       If I’ve timed my talk right, I should be coming to the end of my 20 minutes. I’ll conclude with one more advertisement for another session. If you’d like to learn more about the difference that digital practice is making in classics scholarship, then on Sunday morning you should attend the session on “geospatial classics” organized by two other Mapping Center veterans: Gabriel Moss and Ryan Horne. Papers in this session will explore specific applications of geospatial techniques to the research and pedagogical challenges of the Classics. Every person presenting in this session is an innovative and thoughtful scholar whose paper should not be missed.


Leave a reply

You must be logged in to post a comment.