Meeting researchers’ needs in mining web archives: the experience of the National Library of France

TitreMeeting researchers’ needs in mining web archives: the experience of the National Library of France
Type de publicationArticle de colloque/conférence
Année de publication2018
AuteursPeter Stirling, Sara Aubry
Nom du colloqueLIBER 2018 : 47th LIBER Annual Conference : Research Libraries as an Open Science Hub: from Strategy to Action
Date de la réunion2018/07/05
Lieu du colloqueVilleneuve d'Ascq, Université de Lille, LILLIAD Learning center Innovation

The digital legal deposit collections of the National Library of France (Bibliothèque nationale de France, BnF) cover a period of over twenty years and now represent almost a petabyte of data. Opened in 2008, access to the web archives is provided in the research reading rooms of the BnF and in a number of regional libraries, via the application Archives de l’internet, which allows researchers to search and view websites as they were at the moment of capture and navigate within these temporal collections. In recent years, an increasing number of researchers have sought to use these collections for analyses that employ innovative methods often grouped under the term ‘digital humanities’, such as text and data mining (TDM) or link analysis. This paper will describe how the BnF has sought to respond to the needs of these researchers, based on three recent case studies: the creation of a web cartography of sites and the analysis of a discussion forum related to WWI; a study of the early French web from the 1990s; and a study of the use of neologisms in French based on news sites.The paper will concentrate not on the results of the projects but rather on the issues raised in allowing researchers to use such methods on the BnF web archives. This subject will be studied from three different angles:● Legal context and framework: intellectual property law and the specific context of legal deposit legislation set limits on the use of these collections, which are still protected under French copyright provisions. The use of text anddata mining for research is also an area currently under discussion on a European level. The BnF uses research agreements to fix the conditions of usage of its collections for this kind of analysis while respecting the relevant legislation.● Organisational questions: it is necessary to find means of accompanying the research teams, in terms of physical reception and equipment, providing information on the available collections and facilitating exchanges on the needs of the study.● Technical aspects: each project has specific needs in terms of data and metadata, which in addition to the legal context require specific kinds of IT infrastructure and software. The BnF is experimenting with different technical solutions, including working in cooperation with the researchers to install, integrate or develop new tools.Finally, the paper will draw initial lessons from these three projects, which are carried out in the context of an internal four-year research programme called CORPUS, aimed at shaping a service to provide tools, corpora, and guidanceto researchers who wish to apply TDM to analyse the Library’s various digital collections.

