GATE and Digital Libraries
As digital libraries grow in size and coverage, so does the need for automatic content annotation and indexing. GATE's robust and customisable Named Entity recognition and Information Extraction technology has already been used successfully for metadata creation, automatic name and event annotation, indexing, and access. So far, we (and our collaborators) have developed and are developing various applications, each of which posed a unique challenge:
- The PrestoSpace project is aiming to provide technical solutions and integrated systems for a complete digital preservation of all kinds of audio-visual collections. Our role is to develop language technology methods for (semi-)automatic creation of metadata from multimedia content, building on our previous work in MUMIS (see below).
- The GATE/ETCSL project is building generic tools for linguistic annotation and Web based analysis of digital libraries, using literary Sumerian as a testbed.
- The Perseus Digital Library at Tufts is using GATE for enriching hypertextual models of cultural heritage corpora.
- The European Heritage On-Line (ECHO) project is developing a model for European culture on the web. The GATE team is represented on the technical board of ECHO and are working towards transfer of advanced text processing tools to help produce a new model of richly interlinked shared cultural materials.
- OldBaileyIE required adapting the language processing components to the non-standard written conventions of Old English used in Old Bailey court reports from the 17th Century;
- in MUMIS (Multimedia Indexing and Search) we dealt with annotating material in multiple modalities to build a conceptual index of football videos;
- EMILLE focuses on collection and annotation of large text collections in non-indigenous minority languages in the UK (including Urdu, Bengali, Sylheti and others).
We are currently working on using GATE as the basis for the creation of computational tools for the study of digital collections in cultural heritage languages, such as Ancient Greek and Latin.