In the spring of 2002, a pilot study was conducted that aimed to test the protocol and the procedure that had been developed for the prosodic annotation of part of the data in the Spoken Dutch Corpus.
Below an overview is presented of those
samples for which in the pilot study a prosodic annotation was made
(for part of the sample). Table 1a gives an overview of the Dutch
data, Table 1b of the Flemish data.* For the pilot, the samples were
divided over 12 lots (column 1 in the table). In addition, a training
corpus (LC) and a test corpus were defined.
* Please note: the components referred to here are the components as they were distinguished in the original design. See also the design of the corpus.
Table 1a: Dutch data (i.e. data
originating from the Netherlands)
|
component | # speakers | duration (# sec) | sample(s) |
|
|
|
526
|
fn000303 (LC) |
|
|
|
577
|
fn000346 |
|
|
|
616
|
fn000415 |
|
|
|
890
|
fn000081 (LC) |
|
|
|
1084
|
fn000140 |
|
|
|
343
|
fn000043 |
|
|
|
600
|
fn000115 |
|
|
|
-
|
- |
|
|
|
-
|
- |
|
|
|
300
|
fn001601, fn001602, fn001603,
fn001609, fn001624, fn001629, fn001631, fn001638, fn001639, fn001642, fn001645 |
|
|
|
214
|
fn000039 (LC) |
|
|
|
510
|
fn000061 (0...120 als LC) |
Note: Lot 10 comprises several
samples that should remain together.
Table 1b: Flemish data
|
component | # speakers | duration (# sec) | samples |
|
|
|
1045
|
fv400073 |
|
|
|
457
|
fv400089 (LC) |
|
|
|
1275
|
fv400109 (LC) |
|
|
|
1002
|
fv400118 |
|
|
|
1485
|
fv400165 |
|
|
|
634
|
fv600228 |
|
|
|
231
|
fv600005 |
|
|
|
333
|
fv600388, fv600764 |
|
|
|
324
|
fv600091, fv600551 . |
|
|
3 2 |
129
69 173 |
fv600053 fv600121 fv600268 |
|
|
|
306
|
fv6000253 |
|
|
|
839
|
fv400011 (LC) |
Note: Lots 8, 9 and 10 comprise several samples that should remain together.
Each sample was transcribed four times, each time by a different student. The students were working in two different locations in the Netherlands and two locations in Flanders, two students per location. The students were instructed and supervised by local coordinators.
For a discussion of the way this pilot study was set up and the results that were obtained, we refer to Buhmann et al. (2002), here available in .ps and .pdf format.
The results of this pilot study have
also been included in this release of the corpus. The files with the
prosodic annotations can be identified by the extension .pro and in so
far as they result from the pilot can be found in the directories
data_NL/ (Dutch data) and data_VL/ (Flemish data). In each of these
subdirectories a subdirectory /aangelev/ can be found which contains
the files as they were made available to the students.