A corpus of normal and abnormal sound events
Emanuele Principi, Stefano Squartini, Roberto Bonfigli, Giacomo Ferroni, Francesco Piazza, "An integrated system for voice command recognition and emergency detection based on audio signals" in Expert Systems with Applications, Pergamon, 2015, pages 5668-5683.
Roberto Bonfigli, Giacomo Ferroni, Emanuele Principi, Stefano Squartini, Francesco Piazza, "A real-time implementation of an acoustic novelty detector on the BeagleBoard-xM" in 6th European Embedded Design in Education and Research Conference (EDERC), 2014, pages 307-31.
AuthorsRoberto Bonfigli, Giacomo Ferroni, Emanuele Principi, Stefano Squartini, and Francesco Piazza The dataset presented hereby consist of more than 56 hours of recording related to daily sounds in a University student laboratory. The recordings were performed during the day and night. Thus, typical sounds such as conversations, noise of laboratory tools and environmental sounds, form the normal background. Several noveloccurrences are artificially generated by using a speaker which randomly emits different types of novel sounds (i.e., scream, fall, alarm or breakage of objects) during the recordings.
DetailsThe recordings were realised arranging eight different microphones inside the laboratory as shown in Figure 1.
Figure 1: Laboratory layout. Microphones are numbered and indicated by green circles while the speaker is depicted in red.
The green circles 1, 2, 7 and 8 (cf. Figures 2a-2d) indicate the studio condenser microphones with gold-sputtered diaphragm employed using the cardioid pickup pattern. These microphones are fabricated by Behringer, model B-5. The remaining green circles (i.e., 3, 4, 5 and 6. In Figures 3a-3b) represent the AKG hypercardioid condenser boundary microphones. The latter four microphones are placed 4 cm apart, forming an array as shown in Figure 3b.
(a) Mic 1.
(b) Mic 2.
(c) Mic 7.
(d) Mic 8.
(a) Mic array.
(b) Array details with ruler.
To generate novelty sounds the M-Audio Studiophile AV 20 speaker was used (cf. Figure 4) whilst the MOTU 8pre professional sound card was employed to record the microphone signals at kHz (cf. Figure 5) using NU-Tech software.
Figure 5: Motu 8pre sound card.
The novelty sounds randomly generated by the speaker are grouped as follow, whilst an overall description of novelty sounds is provided in Table 1:
- Alarms, composed by three different siren sounds.
- Falls, composed by two audio files of a person or an object falls to the ground.
- Breakage of objects, noise produced by the breakage of an object due to the impact with ground.
- Screams, consist in four different human scream. Both single person or a group are considered.
|Novelty group||Number of files||Mean length (s)|
|Breakage of objects||1||2.17|
Type of recording
The A3Novelty corpus is composed by two types of recording: background and background with novelty. The former contains only background sounds such as human speech, typical laboratory sounds, technical tools noise and environmental sounds. The latter, on the other hand, is recorded adding artificially generated novelty sounds. However, both the recordings were acquired during the day and night. Further information are available in Table 2. A day/nighy division of the whole dataset is also shown in Table 3.
|Recording type||Recording start time (hh:mm)||Recording end time (hh:mm)||Length (hh:mm)|
|Background with novelty||16:00||9:30||17:30|
|Recording stype||Day/Night||Length (hh:mm)||Novelty events|
|Background with novelty||Day||9:00||16|
Focusing on the second group of recordings, those which containing the novelty sounds, the first recording contains a total of 47 novelty sounds while the second file contains 24 artificial novelties. The novelty sounds were artificially generated during the acquisitions by means of M-Audio speaker. The files constituting the presented corpus are WAVE 16 bit PCM mono files sampled at 48 kHz.
The presented A3Novelty corpus is released in multiple wave files 1 hour length each. Further a name convention is used to quickly identify the files. As shown below, a filename is composed by three parts separated by the underscore character.
- The first part indicates the type of recordings.
[back1 | back2 | novel1 | novel2]
- The second part indicates the hour.
[h01 | h2 | ...]
- The third part indicates the microphone ID.
[m1 | m2 | ...| m8]
audio filename: <rec_type>_<hour>_<mic_id>.wav annotation filename: <rec_type>_<hour>_<mic_id>.novel annotation filename: <rec_type>_<hour>_<mic_id>.csv annotation filename: <rec_type>_<hour>_<mic_id>.svl (Sonic Visualizer xml format)
For each file audio, there is an annotation file containing the labels and having the same name. Three different formats are provided: .novel, .csv and .svl. A typical novel-file contains one label for each line.
1500.0001 scream_1 2402.2363 fall_2
The left part of the line indicates the start time of the novelty whilst the second part of each line provides a description of the corresponding novelty.
audio filename: back1_h04_m3.wav annotation filename: back1_h04_m3.novel annotation filename: back1_h04_m3.csv annotation filename: back1_h04_m3.svl
audio filename: novel2_h10_m7.wav annotation filename: novel2_h10_m7.novel annotation filename: novel2_h10_m7.csv annotation filename: novel2_h10_m7.svl