Pilot study prosodic annotation

Pilot study prosodic annotation

In the spring of 2002, a pilot study was conducted that aimed to test the protocol and the procedure that had been developed for the prosodic annotation of part of the data in the Spoken Dutch Corpus.

Below an overview is presented of those samples for which in the pilot study a prosodic annotation was made (for part of the sample). Table 1a gives an overview of the Dutch data, Table 1b of the Flemish data.* For the pilot, the samples were divided over 12 lots (column 1 in the table). In addition, a training corpus (LC) and a test corpus were defined.

* Please note: the components referred to here are the components as they were distinguished in the original design. See also the design of the corpus.

Table 1a: Dutch data (i.e. data originating from the Netherlands)

lot component # speakers duration (# sec) sample(s)

1 1 2
526
fn000303 (LC)

2 1 2
577
fn000346

3 1 2
616
fn000415

4 2 2
890
fn000081 (LC)

5 2 2
1084
fn000140

6 5 2
343
fn000043

7 6 3
600
fn000115

8 6 -
-
-

9 9 -
-
-

10 11 1
300
fn001601, fn001602, fn001603, fn001609, fn001624, fn001629,
fn001631, fn001638, fn001639, fn001642, fn001645

11 12 1
214
fn000039 (LC)

12 13 1
510
fn000061 (0...120 als LC)

Note: Lot 10 comprises several samples that should remain together.

Table 1b: Flemish data

lot component # speakers duration (# sec) samples

1 1 2
1045
fv400073

2 1 2
457
fv400089 (LC)

3 2 2
1275
fv400109 (LC)

4 2 2
1002
fv400118

5 2 2
1485
fv400165

6 5 2
634
fv600228

7 6 3
231
fv600005

8 9 1
333
fv600388, fv600764

9 10 2
324
fv600091, fv600551 .

10 11 4
3
2
129
69
173
fv600053
fv600121
fv600268

11 12 2
306
fv6000253

12 13 1
839
fv400011 (LC)

Note: Lots 8, 9 and 10 comprise several samples that should remain together.

Each sample was transcribed four times, each time by a different student. The students were working in two different locations in the Netherlands and two locations in Flanders, two students per location. The students were instructed and supervised by local coordinators.

For a discussion of the way this pilot study was set up and the results that were obtained, we refer to Buhmann et al. (2002), here available in .ps and .pdf format.

The results of this pilot study have also been included in this release of the corpus. The files with the prosodic annotations can be identified by the extension .pro and in so far as they result from the pilot can be found in the directories data_NL/ (Dutch data) and data_VL/ (Flemish data). In each of these subdirectories a subdirectory /aangelev/ can be found which contains the files as they were made available to the students.

lot	component	# speakers	duration (# sec)	sample(s)
1	1	2	526	fn000303 (LC)
2	1	2	577	fn000346
3	1	2	616	fn000415
4	2	2	890	fn000081 (LC)
5	2	2	1084	fn000140
6	5	2	343	fn000043
7	6	3	600	fn000115
8	6	-	-	-
9	9	-	-	-
10	11	1	300	fn001601, fn001602, fn001603, fn001609, fn001624, fn001629, fn001631, fn001638, fn001639, fn001642, fn001645
11	12	1	214	fn000039 (LC)
12	13	1	510	fn000061 (0...120 als LC)