BasiScript-corpus - INT Taalmaterialen

Het BasiScript-corpus bestaat uit 9 miljoen woorden geschreven tekst geproduceerd door leerlingen van de Nederlandse basisschool.

Het corpus bevat longitudinale data verzameld over drie achtereenvolgende jaren (najaar 2012 - voorjaar 2015) en het is ontworpen om zowel de educatieve diversiteit (type school) als geografische regio's van Nederland te kunnen vergelijken.

De data bevat voornamelijk handgeschreven teksten en een klein aantal teksten geproduceerd met een tekstverwerker (met automatische spelling en grammaticacontrole uitgeschakeld).
De data is geanonimiseerd.

Voor commercieel gebruik zie de commerciële productpagina.

The BasiScript Corpus consists of 9 million words of written text produced by Dutch elementary school students.

The corpus contains longitudinal data collected over three consecutive years (fall 2012 - spring 2015) and it was designed to compare both educational diversity (type of school) and geographical regions of the Netherlands.

The data contains mainly handwritten texts and a small number of texts produced with a word processor (with automatic spelling and grammar checking disabled).
The data has been anonymized.

For commercial use, see the commercial product page.

Dit product is gratis, maar het tekenen van een licentie is vereist. De download bevat de licentie en verdere instructies voor het plaatsen van een bestelling.

This product is free, but signing a license agreement is required. The download contains the license and further instructions for placing an order.

Productdetails

Dataformaat	xml (FoLiA)
Doelpubliek	Voornamelijk voor leerkrachten, makers van lesmaterialen en toetsen, schrijvers van kinderliteratuur, uitgevers en onderzoekers.
Eigenaar	Radboud Universiteit
Financier	NWO
Jaar	2015
Project	BasiScript: a corpus of written language output as produced by elementary school children in the Netherlands, annotated for spelling, word frequencies and word properties, and a 20,000-word lexicon annotated for word senses zie details.
Refereren	Tellings, A. E. J. M. (2015), BasiScript Corpus (Version 1.0) [Data set]. Available at the Dutch Language Institute: http://hdl.handle.net/10032/tm-a2-p2
Talen	Nederlands
Versie	1.0

Downloaddetails

Bestand
BP_BasiScript-corpus_NC.zip

Aantal bestanden 1
Aantal downloads 46
Bestandsgrootte 53.13 KB
Datum plaatsing 17/07/2020
Laatst bijgewerkt 15/12/2025
Versie 1.0