SoNaR Nieuwe Media Corpus - INT Taalmaterialen

Het SoNaR Nieuwe Media Corpus 1.0 bevat nieuwemediateksten die verzameld werden binnen het STEVIN-project SoNaR. Het corpus bevat sms'en, tweets en chatberichten. De teksten werden getokeniseerd, ge-POS-tagd en gelemmatiseerd.

Omdat dit product teksten bevat die afkomstig zijn uit correspondentie zoals tweets die via Twitter verzameld zijn, chats die via publieke internetfora verzameld zijn en sms’en die individuele personen aan de licentiegever verstrekt hebben ten behoeve van dit product, dient de aanvrager extra zorgvuldig met de data om te gaan.

De SoNaR-projectpartners en de Taalunie hebben hun uiterste best gedaan om bronnen en rechthebbenden van alle SoNaR-teksten te achterhalen. Wanneer desondanks teksten zijn opgenomen waarvan u (mede)rechthebbende bent en waarvan u niet als bron of rechthebbende wordt genoemd en/of waarvan u voor het gebruik geen toestemming hebt verleend, dan kunt u met ons contact opnemen via servicedesk@ivdnt.org.

Het SoNaR Nieuwe Media Corpus maakt geen deel uit van het SoNaR-corpus maar is als apart product beschikbaar.

The SoNaR New Media Corpus 1.0 contains new media texts that were collected within the STEVIN project SoNaR. The corpus contains text messages, tweets and chat messages. The texts were tokenized, POS-tagged and lemmatized.

Because this product contains texts that originate from correspondence such as tweets collected via Twitter, chats collected via public internet forums and text messages that individuals provided to the licensor for the purpose of this product, the applicant must handle the data with extra care.

The SoNaR project partners and the Dutch Language Union have done their utmost to trace the sources and rights holders of all SoNaR texts. If, despite this, texts are included of which you are (co-)right holder and of which you are not mentioned as source or rights holder and/or for which you have not granted permission, you can contact us via servicedesk@ivdnt.org.

The SoNaR New Media Corpus is not part of the SoNaR corpus but is available as a separate product.

Productdetails

Documentatie	Documentatie; Verschillende SoNaR-corpora
Eigenaar	Taalunie
Financier	NTU\|STEVIN
Jaar	2013
Opdrachtgever	NTU\|STEVIN
Project	SoNaR
Projectwebsite	http://lands.cls.ru.nl/projects/SoNaR/description.html
Refereren	SoNaR Nieuwe Media Corpus (Version 1.0) (2013) [Data set]. Available at the Dutch Language Institute: https://hdl.handle.net/10032/tm-a2-k3
Talen	Nederlands
Omvang	36 miljoen woorden
Versie	1.0

Downloaddetails

Bestand
20150730_SoNaRNewMediaCorpus_1.0.1.zip

Aantal bestanden 1
Aantal downloads 249
Bestandsgrootte 3,484.27 MB
Datum plaatsing 04/09/2020
Laatst bijgewerkt 23/01/2026
Versie 1.0