SoNaR Groot-corpus Commercieel - INT Taalmaterialen

Het SoNaR Groot-corpus Commercieel is een tekstcorpus dat ongeveer 271 miljoen woorden tekst bevat, afkomstig uit uiteenlopende domeinen en genres. Alle teksten werden getokeniseerd, ge-POS-tagd en gelemmatiseerd. Ook de named entities werden gelabeld. Alle annotaties werden automatisch geproduceerd.

Dit taalmateriaal wordt standaard als download aangeboden en dan worden er geen kosten aangerekend. Maar omwille van de grote hoeveelheid data kan het SoNaR-corpus ook aangevraagd worden op een externe harde schijf. Hier rekent het INT €100,00 verzend- en afhandelingskosten voor.

The SoNaR Large Corpus Commercial contains about 271 million words from texts in standard Dutch later than 1954. All texts were tokenized, tagged for part of speech and lemmatized. The named entities were also labelled. All annotations were produced automatically, no manual verification took place.

By default, this language material is offered as a download and then there is no charge. But because of the large amount of data, the SoNaR corpus can also be requested on an external hard disk. For this the INT charges €100.00 shipping and handling fee.

Dit product is gratis, maar het tekenen van een licentie is vereist. De download bevat de licentie en verdere instructies voor het plaatsen van een bestelling.

This product is free, but signing a license agreement is required. The download contains the license and further instructions for placing an order.

Productdetails

Documentatie	Documentatie; Verschillende SoNaR-corpora
Eigenaar	Taalunie
Financier	NTU\|STEVIN
Jaar	2015
Opdrachtgever	NTU\|STEVIN
Project	SoNaR: STEVIN Nederlandstalig Referentiecorpus
Projectwebsite	https://lands.cls.ru.nl/projects/SoNaR/description.html
Refereren	SoNaR Groot-corpus Commercieel (Version 1.2.1) (2015) [Data set]. Available at the Dutch Language Institute: http://hdl.handle.net/10032/tm-a2-f4
Talen	Nederlands
Versie	1.2.1

Downloaddetails

Bestand
BP_SoNaR_Groot_C.zip

Aantal bestanden 1
Aantal downloads 64
Bestandsgrootte 53.55 KB
Datum plaatsing 04/09/2020
Laatst bijgewerkt 15/12/2025
Versie 1.2.1