Pilot study prosodic annotation
 

In the spring of 2002, a pilot study was conducted that aimed to test the protocol and the procedure that had been developed for the prosodic annotation of part of the data in the Spoken Dutch Corpus.


Below an overview is presented of those samples for which in the pilot study a prosodic annotation was made (for part of the sample).  Table 1a gives an overview of the Dutch data, Table 1b of the Flemish data.* For the pilot, the samples were divided over 12 lots (column 1 in the table). In addition, a training corpus (LC) and a test corpus were defined.

* Please note: the components referred to here are the components as they were distinguished in the original design. See also the design of the corpus.

Table 1a: Dutch data (i.e. data originating from the Netherlands)
 
lot
component # speakers duration (# sec) sample(s)
1
1
2
526 
fn000303 (LC)
2
1
2
577 
fn000346
3
1
2
616 
fn000415
4
2
2
890 
fn000081 (LC)
5
2
2
1084 
fn000140
6
5
2
343 
fn000043
7
6
3
600 
fn000115
8
6
-
  - 
9
9
-
-
-
10
11
1
300 
fn001601, fn001602, fn001603, fn001609, fn001624, fn001629,
fn001631, fn001638, fn001639, fn001642, fn001645
11
12
1
214 
fn000039 (LC)
12
13
1
     510 
fn000061 (0...120 als LC)

Note: Lot 10 comprises several samples that should remain together.
 

Table 1b: Flemish data
 
lot
component # speakers duration (# sec) samples
1
1
2
1045 
fv400073
2
1
2
457 
fv400089 (LC)
3
2
2
1275 
fv400109 (LC)
4
2
2
1002 
fv400118
5
2
2
1485 
fv400165
6
5
2
634 
fv600228
7
6
3
231 
fv600005
8
9
1
333 
fv600388, fv600764 
9
10
2
324 
fv600091, fv600551                                                        .
10
11
4
3
2
129 
69 
173 
fv600053
fv600121
fv600268
11
12
2
306 
fv6000253
12
13
1
     839 
fv400011 (LC)

Note: Lots 8, 9 and 10 comprise several samples that should remain together.

Each sample was transcribed four times, each time by a different student. The students were working in two different locations in the Netherlands and two locations in Flanders, two students per location. The students were instructed and supervised by local coordinators.

For a discussion of the way this pilot study was set up and the results that were obtained, we refer to Buhmann et al. (2002), here available in .ps and .pdf format.

The results of this pilot study have also been included in this release of the corpus. The files with the prosodic annotations can be identified by the extension .pro and in so far as they result from the pilot can be found in the directories data_NL/ (Dutch data) and data_VL/ (Flemish data). In each of these subdirectories a subdirectory /aangelev/ can be found which contains the files as they were made available to the students.