Marie Laperdrix

Agnès Saal nommée à la tête de l’Ina | Archimag


Pauline Moirez

Crowdsourcing the World Service Radio Archive: an experiment from BBC R&D

Pour améliorer la description de ses archives sonores, la BBC les a numérisées, a expérimenté un traitement automatique d'indexation, et soumet cette indexation aux internautes pour validation/correction/ajouts.

  • experiment on how to put a large media archive online using a combination of algorithms and people
  • R&D is pioneering ways of generating metadata for programmes automatically using innovative algorithms that can "listen to" and tag programmes with topics and speakers
  • and then to improve the results using crowdsourcing
  • We want to learn about how good the algorithms are, whether and how people tag, and how to combine algorithms with people
  • we used robust algorithms that we had developed for this purpose to extract key topics from each programme, using Linked Data to ensure each topic is unambiguous and linked to the web.
  • we created around 1 million topics, about 20 per programme
  • we thought that these automatically generated tags, together with the original metadata, were good enough to design and build a browsable and searchable website for the archive
  • Listeners could use this online prototype and help improve it by validating the automatically generated data and adding their own - "crowdsourcing" the final part of the problem
  • users of the prototype have listened to around 12,000 of the 36,000 programmes
  • generated over 70,000 individual metadata "edits"
  • tagged or edited about 7,000 of these
  • analysing the data so far to see how good the tags are by comparing professional archivists, listeners and our algorithms