A corpus of normal and abnormal sound events


Emanuele Principi, Stefano Squartini, Roberto Bonfigli, Giacomo Ferroni, Francesco Piazza, "An integrated system for voice command recognition and emergency detection based on audio signals" in Expert Systems with Applications, Pergamon, 2015, pages 5668-5683.

Roberto Bonfigli, Giacomo Ferroni, Emanuele Principi, Stefano Squartini, Francesco Piazza, "A real-time implementation of an acoustic novelty detector on the BeagleBoard-xM" in 6th European Embedded Design in Education and Research Conference (EDERC), 2014, pages 307-31.


Roberto Bonfigli, Giacomo Ferroni, Emanuele Principi, Stefano Squartini, and Francesco Piazza
The dataset presented hereby consist of more than 56 hours of recording related to daily sounds in a University student laboratory. The recordings were performed during the day and night. Thus, typical sounds such as conversations, noise of laboratory tools and environmental sounds, form the normal background. Several noveloccurrences are artificially generated by using a speaker which randomly emits different types of novel sounds (i.e., scream, fall, alarm or breakage of objects) during the recordings.


The recordings were realised arranging eight different microphones inside the laboratory as shown in Figure 1.

Figure 1: Laboratory layout. Microphones are numbered and indicated by green circles while the speaker is depicted in red.

The green circles 1, 2, 7 and 8 (cf. Figures 2a-2d) indicate the studio condenser microphones with gold-sputtered diaphragm employed using the cardioid pickup pattern. These microphones are fabricated by Behringer, model B-5. The remaining green circles (i.e., 3, 4, 5 and 6. In Figures 3a-3b) represent the AKG hypercardioid condenser boundary microphones. The latter four microphones are placed 4 cm apart, forming an array as shown in Figure 3b.

(a) Mic 1.

(b) Mic 2.

(c) Mic 7.

(d) Mic 8.
Figure 2: Photos of Behringer microphones placing.


(a) Mic array.

(b) Array details with ruler.
Figure 3: Photos of AKG array of microphones placing.

To generate novelty sounds the M-Audio Studiophile AV 20 speaker was used (cf. Figure 4) whilst the MOTU 8pre professional sound card was employed to record the microphone signals at kHz (cf. Figure 5) using NU-Tech software.

Figure 4: M-Audio speaker.

Figure 5: Motu 8pre sound card.

Novelty types

The novelty sounds randomly generated by the speaker are grouped as follow, whilst an overall description of novelty sounds is provided in Table 1:

  • Alarms, composed by three different siren sounds.
  • Falls, composed by two audio files of a person or an object falls to the ground.
  • Breakage of objects, noise produced by the breakage of an object due to the impact with ground.
  • Screams, consist in four different human scream. Both single person or a group are considered.
Novelty group Number of files Mean length (s)
Alarms 3 6.03
Falls 2 2.12
Breakage of objects 1 2.17
Screams 4 1.91
Table 1: Novelty sounds details.

Type of recording

The A3Novelty corpus is composed by two types of recording: background and background with novelty. The former contains only background sounds such as human speech, typical laboratory sounds, technical tools noise and environmental sounds. The latter, on the other hand, is recorded adding artificially generated novelty sounds. However, both the recordings were acquired during the day and night. Further information are available in Table 2. A day/nighy division of the whole dataset is also shown in Table 3.

Table 2: Recordings details.
Recording type Recording start time (hh:mm) Recording end time (hh:mm) Length (hh:mm)
Background 19:30 9:30 14:00
13:30 11:30 21:30
Background with novelty 16:00 9:30 17:30
10:00 13:30 3:30

Table 3: Recordings day/night details.
Recording stype Day/Night Length (hh:mm) Novelty events
Background Day 12:00 0
Night 24:00 0
Background with novelty Day 9:00 16
Night 12:00 30

Dataset description

Focusing on the second group of recordings, those which containing the novelty sounds, the first recording contains a total of 47 novelty sounds while the second file contains 24 artificial novelties. The novelty sounds were artificially generated during the acquisitions by means of M-Audio speaker. The files constituting the presented corpus are WAVE 16 bit PCM mono files sampled at 48 kHz.

The presented A3Novelty corpus is released in multiple wave files 1 hour length each. Further a name convention is used to quickly identify the files. As shown below, a filename is composed by three parts separated by the underscore character.

  • The first part indicates the type of recordings.
    [back1 | back2 | novel1 | novel2]
  • The second part indicates the hour.
    [h01 | h2 | ...]
  • The third part indicates the microphone ID.
    [m1 | m2 | ...| m8]
audio filename: <rec_type>_<hour>_<mic_id>.wav
annotation filename: <rec_type>_<hour>_<mic_id>.novel
annotation filename: <rec_type>_<hour>_<mic_id>.csv
annotation filename: <rec_type>_<hour>_<mic_id>.svl (Sonic Visualizer xml format)

For each file audio, there is an annotation file containing the labels and having the same name. Three different formats are provided: .novel, .csv and .svl. A typical novel-file contains one label for each line.

1500.0001	scream_1
2402.2363	fall_2

The left part of the line indicates the start time of the novelty whilst the second part of each line provides a description of the corresponding novelty.


audio filename: back1_h04_m3.wav
annotation filename: back1_h04_m3.novel
annotation filename: back1_h04_m3.csv
annotation filename: back1_h04_m3.svl


audio filename: novel2_h10_m7.wav
annotation filename: novel2_h10_m7.novel
annotation filename: novel2_h10_m7.csv
annotation filename: novel2_h10_m7.svl

Sample files