Image Retrieval in DLs - A Large Scale Multicollection Experimentation

TitreImage Retrieval in DLs - A Large Scale Multicollection Experimentation
Type de publicationArticle de colloque/conférence
Année de publication2017
AuteursJean-Philippe Moreux, Guillaume Chiron
Nom du colloqueIFLA News Media Section, Dresden
Date de la réunion2017/08/15
Mots clésautomatic image classification; CBIR; data mining; deep learning; digital libraries; heritage documents; image retrieval; metadata; OCR
Résumé

While historically digital heritage libraries were first powered in image mode, they quickly took advantage of OCR technology to index printed collections and consequently improve the perimeter and performance of the information retrieval service offered to users. But the access to iconographic resources has not progressed in the same way, and the latter remain in the shadows: manual incomplete and heterogeneous indexation, data silos by iconographic genre. Today, however, it would be possible to make better use of these resources, especially by exploiting the enormous volumes of OCR produced during the last two decades, and thus valorize these engravings, drawings, photographs, maps, etc. for their own value but also as an attractive entry point into the collections, supporting the discovery and serenpidity from document to document and collection to collection. This article presents an ETL (extract-transform-load) approach to this need, that aims to: Identify and extract iconography wherever it may be found, in image collections but also in print (dailies, magazines, monographies); Transform, harmonize and enrich the descriptive metadata (in particular with automatic classification tools); Load it all into a web portal dedicated to iconographic research. The approach is pragmatically dual, since it involves leveraging existing digital resources and (virtually) on-the-shelf technologies.