Title
Audio Issues In MIR Evaluation
Abstract
Several projects are underway to create music testbeds to suit the needs of the music analysis and music information retrieval (MIR) communities. There are also plans to unify testbeds into a distributed grid. Thus the issue of audio file formats has come to the forefront. The creators of a music library or MIR testbed are confronted with many questions pertaining to file formats, their quality, metadata, and copyright issues. We discuss the various formats, their advantages and disadvantages, and give a set of guidelines and recommendations. This document is a positional paper. It is intended to foster discussion and not as a definitive statement. Nevertheless, it is hoped that the proposals put forth here may serve as a guideline to use in construction of an MIR evaluation testbed. 1. OVERVIEW OF AUDIO FORMATS In order to describe the audio formats available, we distinguish between raw audio formats, compressed formats, and multimedia interchange wrappers. This classification is generalized and there are many exceptions. For instance, WAV format supports compression, and AES31 describes file formats, compression schemes and wrappers. However, the following groups formats based on their complexity, usage and purpose, and thus serves as a good guide to the options available for audio formats used in a testbed. 1.1. Raw Audio Formats The most common audio formats for end users are based on simple, open standards that have been designed and promoted by companies for certain platforms. These include Apple's AIFF format for the Mac, Sun's au format for UNIX, and the WAV format for Windows, developed by Microsoft and IBM. Despite this legacy, these formats can be played on almost any computer using many audio applications, and contain no features specific to their original intended platforms. They are intended for storing uncompressed, PCM-encoded, raw audio in a single binary file. They support a variety of bit rates, sample rates and channels, and contain a header containing such information. Of these formats, WAV and AIFF are by far the most common. Almost all audio workstations support both. 1.2. Broadcast WAV Broadcast WAV was introduced to allow file exchange between digital audio workstations (DAWs) used in radio and television production(1) and is now the standard for file storage and exchange in the audio production industry. Almost all master recordings, including those from small studios, live recordings and remasterings are created using Broadcast WAVs. Even workstations using proprietary software allow import and export in the Broadcast WAV format. All WAV file players should recognize and play Broadcast WAV. The Broadcast WAV format is similar to a WAV file except it contains an additional header with information about the originator, a time stamp and sound sequence description metadata. The basic audio format is 16-bit linear PCM sampled at 48kHz, but additional sample rates and bit depths may also be used, and MPEG-encoded audio is supported. Broadcast WAV files are often stored as multiple mono files. A multi - track recording may thus contain a large number of high quality files and an edit decision list is needed to describe how they are combined in the final mix. 1.3. Compressed Audio Formats The choices of compressed audio formats are almost endless. The problem has arisen since many standards bodies and many companies have released different compressed formats, and these have all found niches where they have become popular and entrenched. However, for the purposes of a testbed, only the most relevant ones will be considered. Here, relevance may be defined in terms of quality, popularity and ease-of-use.
Year
Venue
Keywords
2004
ISMIR 2013
ease of use,music analysis,open standard
Field
DocType
Citations 
File format,Metadata,Byte,Music information retrieval,Computer science,Unix,Workstation,Raw audio format,Speech recognition,Endianness,Multimedia
Conference
1
PageRank 
References 
Authors
0.41
1
2
Name
Order
Citations
PageRank
Josh Reiss1294.58
Mark Sandler21029.81