Context

Large generative language models (LLMs) are becoming increasingly popular, and are being integrated into increasingly complex processing chains, offering a wide variety of use cases.
The evaluation of these LLMs, however, poses some serious problems: existing benchmarks are largely Anglo-centric (both in terms of language and culture), and sometimes stem from other (anglo-centric) LLMs, and do not necessarily cover the full range of uses. The question of their evaluation arises particularly for French and, more generally, for languages other than English.

Presentation of the workshop

In this workshop, we propose to bring together researchers from industry and academia, interested in the many facets of LLM evaluation in languages other than English. A call will be issued inviting the submission of proposals for papers on all the work within this scope.


This includes work on :

  • the evaluation of foundation models, fine-tuned models or complete systems (e.g. RAG)
  • the creation or adaptation of benchmarks in French or other languages of interest, whether well in general or specialized fields
  • evaluation of NLP tasks (translation, summarization, information extraction, etc.), adapting existing evaluation methodologies to generative systems
  • ethical dimensions, bias, privacy, cultural or legislative alignment
  • evaluation with users, ergonomics, cognitive aspects
  • evaluation of multimodal models (eg. text-image, text-speech...)
  • ...

Challenge

As part of the workshop, we propose a task-based LLM evaluation challenge. It involves information extraction in French in a few-shot context, where only the annotation guide annotation guide describing the entity classes and their annotation conventions and a few annotated annotated documents. Participants are invited to evaluate LLM implementations, but participation with other with other approaches is welcome and will help to put the results of LLM-based systems into perspective. LLM-based systems.
Data from this challenge, developed within DGA, will be provided to participants under a signature of a data use agreement. These are newsletters and blogs in French, annotated annotated with entities of interest to the intelligence field (name, location, organization, but also function, equipment...) that can be nested and trigger events. events. Expected output formats and evaluation metrics will also be communicated. communicated. At the end of the training phase, test data will be made available to participants for participants for 3 days (to be chosen within a one-week interval) and the results will be communicated to them. Participants will be invited to present their work in a short paper and on the on the day of the workshop.

Contact and scientific committee

Organizers :

  • Nihel Kooli, DGA-MI, Rennes
  • Vincent Claveau, DGA-MI, Rennes
  • Camille Dutrey, DGA-MI, Rennes

Email : dga-mi-event-tal.contact.fct@intradef.gouv.fr

Scientifici committee

  • Rachel Bawden, Inria
  • Lucie Chasseur, Inria Défense et Sécurité
  • Caio Corro, ISIR
  • Olivier Ferret, CEA-List
  • Vincent Guigue, AgroParisTech
  • Damien Nouvel, INALCO
  • Didier Schwab, LIG
  • Gilles Sérasset, LIG
  • Karen Fort, LORIA
  • Fabian Suchanek, Télécom Paris
  • François Yvon, ISIR
Online user: 1 Privacy
Loading...