[Asis-l] CFP: SIGIR 2003: Workshop on the Evaluation of Music Information Retrieval (MIR) Systems
J. Stephen Downie
jdownie at uiuc.edu
Tue Apr 15 17:26:30 EDT 2003
This CFP is intended to encourage all that have an interest in Music Information
Retrieval and Music Digital Library research to consider submitting to, and/or
participating in, the upcoming SIGIR 2003: Workshop on the Evaluation of Music
Information Retrieval (MIR) Systems, August 1, 2003, Toronto, Canada.
Please look at the more detailed workshop information page for more detailed
submission and expression of interest information:
This workshop is designed to enhance the important and significant work being
done by the Music Information Retrieval (MIR) research community by providing an
opportunity for the community to move forward on the establishment of
sorely-needed evaluation tools. This proposal builds upon the ongoing efforts
being made to establish TREC-like and other comprehensive evaluation paradigms
within the MIR research community.
The principal workshop themes are based upon expert opinion garnered from
members of the Information Retrieval (IR), Music Digital Library (MDL) and MIR
communities with regard to the construction and implementation of scientifically
valid evaluation frameworks. As part of the "MIR/MDL Evaluation Frameworks
Project" (http://music-ir.org/evaluation), two recently held meetings form the
foundation upon which this workshop is grounded:
"The Workshop on the Creation of Standardized Test Collections, Tasks, and
Metrics for Music Information Retrieval (MIR) and Music Digital Library (MDL)
Evaluation" was held at the Second Joint Conference on Digital Libraries (JCDL
2002) in July of 2002(http://www.ohsu.edu/jcdl). "The Panel on Music Information
Retrieval Evaluation Frameworks" held in Paris, FR, 17 October 2002, as part of
the 3rd International Conference on Music InformationRetrieval (ISMIR 2002)
(http://ismir2002.ircam.fr).The findings made at each of the prior meetings have
been collected in successive editions of "The MIR/MDL Evaluation White Paper
Collection." See http://music-ir.org/evaluation for the most recent edition.
Information about SIGIR 2003: http://www.sigir2003.org/
Special thanks to the Andrew W. Mellon Foundation for its support of the
"MIR/MDL Evalution Frameworks Project".
If you have any comments, suggestions or questions please contact me, J. Stephen
Downie, at jdownie at uiuc.edu.
Two classes of participants are envisioned: 1) presenters; and 2) audience
Presenters will submit written briefing documents (i.e., White Papers) prior to
the workshop. I plan on including these briefing documents in the growing
collection at http://music-ir.org/evaluation. Based upon prior experience, there
will be 8 to 12 formal presenters. Audience members, while not acting as formal
presenters, will be encourage to respond to the presentations as active debate
on recommendations being put forward is a key goal of the workshop.
Together, the presenters and the audience members will be asked at the
conclusion of the workshop to highlight central recommendations for the
advancement of MIR evaluation with regard to TREC-like and other evaluation
paradigms for MIR research. These recommendations will also be presented to the
MIR and IR communities via http://music-ir.org/evaluation .
Major Workshop Themes:
Major general themes to be addressed in the workshop include:
--How do we adequately comprehend the complex nature of music information so
that we can properly construct our evaluation recommendations?
--How do we adequately capture the complex nature of music queries so proposed
experiments and protocols are well-grounded in reality?
--How do we deal with the “relevance” problem in the MIR context (i.e., What
does “relevance” really mean in the MIR context?)?
--How do we continue to the expansion of a comprehensive collection of music
materials to be used in evaluation experiments?
--How do we manage the interplay between TREC-like and other potential
--How do we integrate the evaluation of MIR systems with the larger framework of
MIR evaluation (i.e., What aspects are held in common and what are unique to MIR?)?
--Further prompting questions/themes can be found at
To address these majors themes, participants will be prompted to provide
recommendations and commentary on specific sub-components of the themes. For
example, a non-exclusive list of possible presentations includes suggestions and
--How best to ground evaluation methods in real-world requirements.
--How to facilitate the creation of data-rich query records that are both
grounded in real-world requirements and neutral with respect to retrieval
technique(s) being examined.
--How the possible adoption, and subsequent validation, of a “reasonable person”
approach to “relevance” assessment might address the MIR “relevance” problem.
--How to develop new models and theories of “relevance” in the MIR context.
--How to evaluate the utility, within the MIR context, of already-established
evaluation metrics (e.g., precision and recall, etc.).
--How to support the ongoing acquisition of music information (audio, symbolic
and metadata) to enhance the development of a secure, yet accessible, research
environment that allows researchers to remotely participate in the use of the
large-scale testbed collection.
Open Workshop Questions and Topics:
The following, non-exclusive (nor all-encompassing) list of open
questions should help you understand just a few of the many possible
paper and discussion topics to be tackled at the Workshop:
--As a music librarian, are there issues that evaluation standards must
address for their results to be credible? Do you know of possible
collections that might form the basis of a test collection? What prior
research should we be considering?
--As a musicologist, what things need examination that are possibly
--As a digital library (DL) developer, what standards for evaluation
should we borrow from the traditional DL community? Any perils or pitfalls that
we should consider?
--As an audio engineer, what do you need to test your approaches? What
methods have worked in other contexts that might or might not work in
the MIR/MDL contexts?
--As an information retrieval specialist, what lessons have you learned
about other traditional IR evaluation frameworks? Any suggestions about
what to avoid or consider as we build our MIR/MDL evaluation frameworks
--As an intellectual property expert, what rights and responsibilities
will we have as we strive to build and distribute our test collections?
--As an interface/human computer interaction (HCI) expert, what tests
should we consider to validate our many different types of interfaces?
--As a business person, what format of results will help you make
selection decisions? Are their business research models and methods that
should be considered?
--As a computer scientist, what are the strengths and weaknesses of the
CS approach to validation in the MIR/MDL context? etc.
These are just a few of the possible questions/topics that will be
addressed. The underlying questions are:
1.How do we determine, and then appropriately classify, the tasks
that should make up the legitimate purviews of the MIR/MDL domains?
2.What do we mean by "success"? What do we mean by "failure"?
3.How will we decide that one MIR/MDL approach works better than
4.How do we best decide which MIR/MDL approach is best suited for a
Please forward this to anyone you think might be interested.
Cheers, and thanks.
J. Stephen Downie
"Research funding makes the world a better place"
J. Stephen Downie, PhD
Graduate School of Library and Information Science; and,
Fellow, National Center for Supercomputing Applications (2000-01)
University of Illinois at Urbana-Champaign
More information about the Asis-l