This corpus contains 57 (2020-07-16) subtitles lectures from the Universiteit van Nederland (UVN). Subtitles were added to existing video recordings of lectures of the UVN.
Unlike common subtitles, the subtitles generated in this project are a nearly 100% literal representation of the speech as spoken by the people in the recordings. They contain exact orthographic transcriptions of subsequent words and thus show the peculiarities of the spoken language modality, lacking grammatical coherence typical for written texts.
On the other hand, the transcriptions do not contain speaker noises (such as lip smacks or coughs) nor hesitation sounds as "ehm". For the sake of readability punctuation markers were included.
The purpose of the subtitles is to add support for language learners of Dutch.
The videos are selected to reflect the language variety of spoken Dutch in an educational setting covering a large diversity of lecture topics at a popular level such as linguistics, physics and history. The videos include speakers of Northern Dutch as spoken in the Netherlands and of South Dutch as spoken in Flanders (Belgium). Moreover, some speakers have an audible different "language" background such as English or Moroccan.
|Aantal uren spraak||meer dan 14 uur|
|Dataformaat||Video: mp4, geluid: wav, transcripties: txt|
|Refereren||Corpus Ondertitelde UVN-Colleges - COUC (Version 1.0) (2020) [Data set]. Available at the Dutch Language Institute: http://hdl.handle.net/10032/tm-a2-s3|
|Toepassing||Onderzoek, testen van spraakherkenners|
- Aantal bestanden 1
- Aantal downloads 54
- Bestandsgrootte 21,935.82 MB
- Datum plaatsing 04/12/2020
- Laatst bijgewerkt 12/03/2021
- Versie 1.0