Alle taalmaterialen - INT Taalmaterialen

Naam	Omschrijving
4-Language Finance, Economy & Business Terminology — NL-EN-FR-DE (version 2.0) (Online)	De termenbank 4-Language Finance, Economy & Business Terminology — NL-EN-FR-DE (version 2.0) bevat begrippen, afkortingen en namen van instanties uit de financieel-economische wereld. The 4-Language Finance, Economy & Business Terminology database - NL-EN-FR-DE (version 2.0) contains terms, abbreviations and names of organisations from the world of finance.
Afrikaans Custom Dictionary for Government Domain	This language resource contains an alphabetic list of words which are exclusive to the government domain or which are not part of the official orthography of the language.
Afrikaans Genre Classification Corpus	This language resource contains training and testing data for genre classification for Afrikaans.
AI-Trainingset - Tag de Tekst voor Named Entity Recognition (NER)	Handmatig getagde historische documenten die gebruikt kunnen worden om systemen voor 'Named Entity Recognition' te trainen. Manually tagged historical texts that can be used to train systems for 'Named Entity Recognition'.
Algemeen Nederlands Woordenboek (ANW)	Een corpusgebaseerd, elektronisch woordenboek van het eigentijdse Nederlands in Nederland, Vlaanderen, Suriname en het Caraïbisch gebied. A corpus-based electronic dictionary describing the contemporary Dutch language as used in the Netherlands, Flanders, Suriname and the Caribbean.
Algemene Nederlandse Spraakkunst - e-ANS (Online)	De ANS wil een zo volledig mogelijke beschrijving geven van de grammaticale aspecten van het hedendaagse Standaardnederlands (in zijn geografische en stilistische verscheidenheid), en is bedoeld is voor een breed publiek van grammaticaal geïnteresseerden en dus niet uitsluitend – en ook niet in de eerste plaats – voor gespecialiseerde taalkundigen. The ANS (Algemene Nederlandse Spraakkunst - General Grammar of Dutch) aims to provide a most comprehensive description of Standard Dutch grammar (taking into account geographical and stylistic diversity). It is intended for a general public with interest in grammar and therefore not exclusively and not firstly for linguistic experts.
Annotated Corpora for Term Extraction Research (ACTER)	ACTER is een handmatig geannoteerde dataset voor termextractie, die drie talen omvat (Engels, Frans en Nederlands), en vier domeinen (corruptie, dressuur, hartfalen en windenergie). ACTER is a manually annotated dataset for term extraction, covering 3 languages (English, French, and Dutch), and 4 domains (corruption, dressage, heart failure, and wind energy).
Attestation Tool	Multifunctionele, downloadbare gebruikersinterface voor de productie van computationele lexica, inclusief gouden standaard voor named entity tagging. Deze tool wordt gedistribueerd via GitHub. A multifunctional, downloadable user interface for the production of computational lexica, including a gold standard for named-entity tagging.
AUTONOMATA-namencorpus	Een database van in totaal circa 5000 voorgelezen voornamen, achternamen, straatnamen, plaatsnamen en controlewoorden. A database with in total about 5000 read first names, surnames, straat names, city names and check words.
AUTONOMATA-namencorpus Commercieel	Een database van in totaal circa 5000 voorgelezen voornamen, achternamen, straatnamen, plaatsnamen en controlewoorden. A database with in total about 5000 read first names, surnames, straat names, city names and check words.
AUTONOMATA-POI-corpus	Het corpus is een database van 800 voorgelezen points of interest (POI's) uit Nederland en België, bestaande uit namen van restaurants, hotels, campings, cafés etc. A corpus of 800 pronounced points of interest from the Netherlands and Belgium containing names of restaurants, camping sites, cafe's, etc.
AUTONOMATA-POI-demo	Een demo van een spraakherkenner voor POI's (points of interest). Deze demo herkent overnachtingsadressen en eetgelegenheden in enkele grote steden (o.a. Amsterdam, Antwerpen, Gent, Rotterdam). A demo of speech recognision for POI's (points of interest). This demo recognises hotels and restuarants in a number of large cities (a.o. Amsterdam, Antwerp, Ghent, Rotterdam)
AUTONOMATA-transcriptietoolset	De AUTONOMATA-transcriptietoolset bestaat uit een transcriptietool en learning tools, waarmee men woordenlijsten kan verrijken met nauwkeurige uitspraakinformatie. The AUTONOMATA transcription tool set consists of a transcription tool and a learning tool. These can be used to add accurate pronounciation details to word lists.
AutoSearch	Een tool om geannoteerde teksten te uploaden (voorzien van lemma's en woordsoortinformatie in TEI- of FoLiA-formaat), één of meerdere corpora te definiëren en deze te doorzoeken. Alleen toegankelijk met een CLARIN-account. A tool to upload corpora annotated with part of speech, lemma and word form in FoLiA or TEI format, and to define one or several corpora and to search them. Only accessible with a CLARIN account.
Autshumato Afrikaans-English Translation Memory	Translation memory from Afrikaans to English (EN-GB), in the government domain for use in the Autshumato ITE application.
Autshumato English-Afrikaans Parallel Corpora	English and Afrikaans parallel corpora aligned on sentence level.
Autshumato English-Afrikaans Translation Memory	Translation memory from English (EN-GB) to Afrikaans, in the government domain for use in the Autshumato ITE application.
Autshumato English-isiZulu Parallel Corpora	English and isiZulu parallel corpora aligned on sentence level.
Autshumato English-isiZulu Translation Memory	Translation memory from English (EN-GB) to isiZulu, in the government domain for use in the Autshumato ITE application.
Autshumato English-Sesotho sa Leboa Parallel Corpora	English and Sesotho sa Leboa (Sepedi) parallel corpora aligned on sentence level.
Autshumato isiZulu-English Translation Memory	Translation memory from IsiZulu to English (EN-GB), in the government domain for use in the Autshumato ITE application.
Autshumato Sesotho sa Leboa-English Translation Memory	Translation memory from Sesotho sa Leboa to English (EN-GB), in the government domain for use in the Autshumato ITE application.
BasiLex-corpus	Het BasiLex-corpus is een geannoteerde verzameling van teksten geschreven voor kinderen in de basisschoolleeftijd. The Basilex corpus is an annotated collection of texts written for children in the age from four to twelve years.
BasiLex-corpus Commercieel	Het BasiLex-corpus is een geannoteerde verzameling van teksten geschreven voor kinderen in de basisschoolleeftijd. The Basilex corpus is an annotated collection of texts written for children in the age from four to twelve years.
BasiLex-lexicon	Het BasiLex-lexicon bevat alle lemma's uit het BasiLex-corpus met daaraan toegevoegd extra informatie. The Basilex Lexicon contains all lemmas from the Basilex Corpus with additional information.
BasiLex-lexicon Commercieel	Het BasiLex-lexicon bevat alle lemma's uit het BasiLex-corpus met daaraan toegevoegd extra informatie. The Basilex Lexicon contains all lemmas from the Basilex Corpus with additional information.
BasiScript-corpus	Het BasiScript-corpus is een geannoteerde verzameling van teksten geschreven door kinderen in de basisschoolleeftijd. The BasiScript Corpus is an annotated collection of texts written by children in the age from four to twelve years.
BasiScript-corpus Commercieel	Het BasiScript-corpus is een geannoteerde verzameling van teksten geschreven door kinderen in de basisschoolleeftijd. The BasiScript Corpus is an annotated collection of texts written by children in the age from four to twelve years.
BasiScript-lexicon	Het BasiScript-corpus is een geannoteerde verzameling van teksten geschreven door kinderen in de basisschoolleeftijd. Het BasiScript-lexicon is afgeleid van dat corpus. The Basilex Corpus is an annotated collection of texts written by children in primary school. The BasiScript Lexicon is derived from this corpus.
BasiScript-lexicon Commercieel	Het BasiScript-corpus is een geannoteerde verzameling van teksten geschreven door kinderen in de basisschoolleeftijd. Het BasiScript-lexicon is afgeleid van dat corpus. The Basilex Corpus is an annotated collection of texts written by children in primary school. The BasiScript Lexicon is derived from this corpus.
Belgian Covid Sign Language Corpus - BeCoS Corpus	Een geannoteerd parallel corpus gesproken taal (Nederlands, Frans, Duits) en gebarentaal (VGT, LSFB) gebaseerd op uitgezonden nieuwsberichten van de Belgische federale overheid betreffende COVID-19. An annotated parallel corpus with speech (Flemish, French, German) and sign language (VGT, LSFB) based on news broadcasts from the Belgian federal government concerning COVID-19.
BlackLab	Corpuszoeksysteem op basis van Apache Lucene. Deze tool wordt gedistribueerd via GitHub. A corpus retrieval engine based on Apache Lucene. This tool is distributed through Github.
BlackLab Frontend	Een uitgebreide interface voor de BlackLab-corpuszoekmachine. A feature-rich corpus search interface for the BlackLab corpus query engine.
BLISS Dialogue Summaries	Deze dataset bevat 557 conversaties tussen mens en computer die manueel geannoteerd zijn en voorzien van samenvattingen. This dataset consists of 557 Dutch human-computer conversations that were manually annotated with turnlabels and summarized into abstract summaries of the user’s answers.
BLISS Spoken Dialogue Dataset	Nederlandse spraakopnames van deelnemers die spreken met het BLISS-dialoogsysteem (v1) over alledaagse bezigheden en activiteiten waar ze plezier aan beleven. De data bevat 55 opnames met een gemiddelde duur van 2 minuten en 34 seconden. Dutch recordings of participants who speak with the BLISS dialogue system about everyday occupations and activities which they enjoy. The copus contains 55 recordings with an average duration of 2 minutes and 34 seconds.
Boarnsterhim Corpus 2.0 (BHC 2.0) (Download)	Het Boarnsterhim Corpus bestaat uit 250 uur spraak in zowel West-Fries als Nederlands door dezelfde groep tweetalige sprekers. The Boarnsterhim Corpus consists of 250 hours of speech in both West Frisian and Dutch by the same sample of bilingual speakers.
Brieven als Buit - Gouden Standaard	De circa 1000 met hoofdwoordsoort en modern lemma verrijkte bronbestanden van het Brieven als Buit-programma, geleid door prof. dr. M.J. van der Wal. Letters as Loot – Gold Standard contains the 1000 or so source files from the Letters as Loot program (directed by Prof. Dr. M.J. van der Wal), each enriched with main part-of-speech and modern lemma.
Brieven als Buit (Online)	Een selectie van ongeveer 1.000 privébrieven uit de late zeventiende tot de late achttiende eeuw werd getokeniseerd, gelemmatiseerd en voorzien van een POS-tag. De taalkundige verrijkingen werden manueel gecontroleerd. A selection of about one thousand Dutch private letters from the late seventeenth and late eighteenth centuries were digitised, tokenized, tagged with Part of Speech and lemmatised. The linguistic annotations were verified manually.
Brieven als Buit-2 (Online)	Dit corpus is een aanvulling op het Brieven als Buit-corpus. This corpus is an addition to the Letters as Loot corpus.
Cd-rom Middelnederlands	De Cd-rom Middelnederlands (1998) bevat het Middelnederlands Woordenboek, de teksten van het Corpus Gysseling en een collectie van ruim 300 rijm- en prozateksten. The CD-ROM Middle Dutch (1998) contains the Middle Dutch Dictionary, the texts of the Gysseling Corpus, and a collection of over 300 rhyming and prose texts.
CELEX-2 Dutch	CELEX-2 Dutch is een Nederlands lexicon met uitgebreide orthografische, fonologische, morfologische en syntactische informatie en frequentiegegevens. CELEX-2 Dutch is a Dutch lexicon with extensive othographical, phonological, morphological, and syntactical information and frequencies.
CGN-annotaties	De CGN-annotaties bevatten het volledig geannoteerde corpus in getranscribeerde vorm. The CGN Annotations contain the data from the CGN (Corpus of Spoken Dutch) minus the sound data.
CGN-annotaties Commercieel	De CGN-annotaties bevatten het volledig geannoteerde corpus in getranscribeerde vorm. The CGN Annotations contain the data from the CGN (Corpus of Spoken Dutch) minus the sound data.
Children's Oral Reading Corpus (CHOREC)	Een verzameling van 130 uur voorgelezen kinderspraak. A collection of 130 hours of speech by children (reading loud).
CHN N-grams	N-grammen (lengten één, twee en drie) met frequenties uit het Corpus Hedendaags Nederlands. N-grams (lengths one, two, and three) and their frequencies from the Corpus of Contemporary Dutch.
CHN N-grams Commercieel	N-grammen (lengten één, twee en drie) met frequenties uit het Corpus Hedendaags Nederlands. N-grams (lengths one, two, and three) and their frequencies from the Corpus of Contemporary Dutch.
CoBaLT	Applicatie om een verzameling tekstbestanden in te laden en taalkundig te annoteren. Deze applicatie wordt gedistribueerd via GitHub. Application for importing and linguistically annotating a collection of text files. This application is distributed through Github.
CombiLex	CombiLex is een lijst van lemma's en woordvormen zonder toegevoegde taalkundige informatie. Combilex is a list of Dutch lemmas and word forms without further annotation.
CombiLex Commercieel	CombiLex is een lijst van lemma's en woordvormen zonder toegevoegde taalkundige informatie. Combilex is a list of Dutch lemmas and word forms without further annotation.
COREA-coreferentiecorpus	Het corpus bestaat uit Nederlandse teksten waarin coreferentierelaties systematisch gemarkeerd zijn. A corpus of Dutch texts with annotated coreference relations.
COREA-coreferentiecorpus Commercieel	Het corpus bestaat uit Nederlandse teksten waarin coreferentierelaties systematisch gemarkeerd zijn. A corpus of Dutch texts with annotated coreference relations.
Cornetto-LMF	Lexicale database voor het Nederlands met semantische relaties en combinatorische informatie. Lexical database for Dutch with semantic relations and combinatorial information.
Corpus Gesproken Nederlands (CGN)	Een verzameling van ongeveer 900 uur gesproken Standaardnederlands afkomstig van Vlamingen en Nederlanders. A collection of about 900 hours spoken standard Dutch from Flanders and the Netherlands.
Corpus Gesproken Nederlands (CGN) Commercieel	Een verzameling van ongeveer 900 uur gesproken Standaardnederlands afkomstig van Vlamingen en Nederlanders. A collection of about 900 hours spoken standard Dutch from Flanders and the Netherlands.
Corpus Gysseling (Data)	Een verzameling van alle dertiende-eeuwse teksten die als bronnenmateriaal hebben gediend voor het Vroegmiddelnederlands Woordenboek. A collection of all 13th-century texts that served as source material for the Early Middle Dutch Dictionary.
Corpus Gysseling (Online)	Corpus van Middelnederlandse teksten (tot en met het jaar 1300), uitgegeven in de periode 1977-1987 door de taalkundige Maurits Gysseling. Corpus of Middle Dutch texts (until 1300), published in the period 1977-1987 by the linguist Maurits Gysseling.
Corpus Hedendaags Nederlands - CHN (Online)	Het Corpus Hedendaags Nederlands (CHN) is een tekstverzameling met ongeveer 10,1 miljoen teksten uit kranten, boeken, blogs, tijdschriften,... uit Nederland, Vlaanderen, Suriname en de Nederlandse Antillen. Samen zijn deze teksten goed voor ruim 3 miljard woorden. The Corpus Hedendaags Nederlands (CHN) is a text collection with approximately 10,1 million texts from newspapers, books, blogs, magazines,... from the Netherlands, Flanders, Suriname and the Netherlands Antilles. Together these texts amount to more than 3 billion words.
Corpus Juridisch Nederlands (Online)	Het Corpus Juridisch Nederlands omvat een verzameling van 5.856 wetsteksten uit de periode 1814 tot 1989, die per jaar zijn samengevoegd. The Corpus Juridisch Nederlands comprises a collection of 5,856 legal texts from the period 1814 to 1989, compiled year by year.
Corpus Middelnederlands (Data)	Een verzameling van ca. 350 Middelnederlandse literaire teksten uit de periode 1250-1500, in TEI gecodeerd (oorspronkelijk gepubliceerd op de cd-rom Middelnederlands). A collection of 350 Middle Dutch literary texts from the period between 1250 and 1500, encoded in TEI (originally published on the CD-ROM Middle Dutch).
Corpus Middelnederlands (Online)	Het Corpus Middelnederlands is een verzameling van 336 Middelnederlandse literaire teksten uit de periode 1250-1500. The Corpus Middle-Dutch is a collection of literary texts from the period 1250-1500.
Corpus Nederlandse Gebarentaal (CNGT)	Dit product is nog niet beschikbaar. This product is not yet available. Productdetails Dataformaat Jaar Opdrachtgever Project Refereren Corpus Nederlandse Gebarentaal (Version 1.0) (202?) [Data set]. Available at the Dutch Language Institute: https://hdl.handle.net/10032/tm-a2-u5 Talen Vlaamse Gebarentaal Toepassing Versie 1.0 Downloaddetails
Corpus Ondertitelde UvN-colleges (COUC)	Dit corpus bevat 57 ondertitelde colleges van de Universiteit van Nederland (UVN). De ondertitels zijn een bijna 100% letterlijke weergave van de spraak zoals gesproken door de mensen in de opnames. This corpus contains 57 subtitled lectures from the University of the Netherlands (UVN). The subtitles are an almost 100% literal representation of the speech as spoken by the people in the recordings.
Corpus Oudfries (Online)	Het Corpus Oudfries bevat een grote aantal woorden van de Oudfriese taal van ca. 1200-1550. The Corpus Old Frisian contains a large sample of the Old Frisian language from ca. 1200-1550.
Corpus Oudnederlands (Online)	Het Corpus Oudnederlands is de verzameling van al het overgebleven Nederlandse woordmateriaal uit de periode 475-1200. The Corpus Old Dutch is a collection of all remaining Dutch word material from the period 475-1200
Corpus Pathologische en Normale Spraak (COPAS)	Een verzameling opnames van bijna 200 sprekers met een hoorbare spraakstoornis en van 122 controlesprekers. A collection recordings of almost 200 speakers with an audible speech impediment and a control group of 122 speakers.
Corpus Vlaamse Gebarentaal (Corpus VGT)	Het Corpus VGT is een verzameling van gannoteerde video’s in Vlaamse Gebarentaal. Informanten (sprekers VGT) spreken twee aan twee over een reeks thema’s. The Corpus VGT is a collection of videos in Flemish Sign Language. Informants (native speakers of VGT) discuss in pairs a series of subjects.
Corpus Vlaamse Gebarentaal (CVGT)	Dit product is nog niet beschikbaar. This product is not yet available. Productdetails Dataformaat Jaar Opdrachtgever Project Refereren Corpus Vlaamse Gebarentaal (Version 1.0) (202?) [Data set]. Available at the Dutch Language Institute: http://hdl.handle.net/10032/tm-a2-u4 Talen Vlaamse Gebarentaal Toepassing Versie 1.0 Downloaddetails
Couranten Corpus (Online)	Het Couranten Corpus bevat dertien zeventiende-eeuwse Nederlandse kranten uit de periode 1619-1700 die momenteel op Delpher.nl beschikbaar zijn. The Couranten Corpus contains thirteen seventeenth-century Dutch newspapers from the period 1619-1700, which are currently available on Delpher.nl.
D-TUNA-corpus	Het D-TUNA-corpus bestaat uit 2400 geschreven en (getranscribeerde) gesproken referentiële expressies. The D-TUNA Corpus consists of 2400 written and (transcribed) spoken referential expressions.
DAESO-corpus: parallelle Nederlandstalige monolinguale treebank	Een parallelle monolinguale treebank voor het Nederlands. A parallel monolingual treebank for Dutch.
DAESO-corpus: parallelle Nederlandstalige monolinguale treebank Commercieel	Een parallelle monolinguale treebank voor het Nederlands. A parallel monolingual treebank for Dutch.
Database van de Zuidelijk-Nederlandse Dialecten - DSDD (Online)	De database van de Zuidelijk-Nederlandse dialecten is samengesteld uit drie regionale dialectwoordenboeken: het Woordenboek van de Vlaamse Dialecten (WVD), het Woordenboek van de Brabantse Dialecten (WBD) en het Woordenboek van de Limburgse Dialecten. The Database of the Southern Dutch Dialects (DSDD) is an aggregation of three regional dialect dictionaries: the Dictionary of Flemish Dialects (Woordenboek van de Vlaamse Dialecten - WVD), the Dictionary of the Brabantian Dialects (Woordenboek van de Brabantse Dialecten - WBD), and the Dictionary of the Limburgian Dialects (Woordenboek van de Limburgse Dialecten - WLD).
Dataset containing hypothetical manner clauses in English and Dutch	Deze dataset bevat informatie over het gebruik van bijzinnen die worden ingeleid door het voegwoord 'as if' in de hedendaagse Britse spraak en van bijzinnen die worden ingeleid door het voegwoord 'alsof' in de hedendaagse Nederlandse spraak. This dataset contains information about the usage of clauses introduced by the conjunction as if in contemporary British speech and of clauses introduced by the conjunction alsof (‘as if’) in contemporary Dutch speech.
Dataset Synthetische Simplificatie	De dataset bestaat uit drie delen: 6.986 zinnen uit het SoNaR-corpus, een synthetische vereenvoudiging van de SoNaR-zinnen die gemaakt werd door GPT-4 en zinsparen bestaande uit telkens een SoNaR-zin en de vereenvoudigde versie daarvan. The dataset consists of three parts: 6,986 sentences from the SoNaR corpus, a synthetic simplification of the SoNaR sentences created by GPT-4 and sentence pairs consisting of one SoNaR sentence and its simplified version each.
Diachroon seMantisch lexicon van de Nederlandse Taal - DiaMaNT (Online)	Een interface voor het doorzoeken van het Diachroon seMantisch lexicon van de Nederlandse Taal (DiaMaNT). Dat is een computationeel semantisch lexicon waarin (historische) woordvormen en concepten zijn verbonden. An interface for querying the Diachronic Semantic Lexicon of the Dutch language (DiaMaNT); a computational semantic lexicon connecting (historical) word forms and concepts.
DuELME	Een lexicon met ruim 5.000 Nederlandstalige meerwoordexpressies. A lexicon with over 5,000 Dutch multiword expressions.
DuELME Commercieel	Een lexicon met ruim 5.000 Nederlandstalige meerwoordexpressies. A lexicon with over 5,000 Dutch multiword expressions.
DuOMAn Subjectivity Lexicon	Een verzameling van ongeveer 9000 woorden waarvoor aangegeven werd of ze een negatieve, neutrale of positieve gevoelswaarde hebben. A collection of about 9000 marked with sentiment values negative, neutral or positive.
DuOMAn Subjectivity Lexicon Commercieel	Een verzameling van ongeveer 9000 woorden waarvoor aangegeven werd of ze een negatieve, neutrale of positieve gevoelswaarde hebben. A collection of about 9000 marked with sentiment values negative, neutral or positive.
Dupira	Parser voor het Nederlands voor toepassingen in information retrieval. Parser for Dutch for applications in information retrieval.
Dutch C-CLAMP (Download)	The Dutch Corpus of Contemporary and late Modern Periodicals (C-CLAMP) is een historisch corpus dat bestaat uit een verzameling artikelen uit 13 culturele of literaire tijdschriften die in Vlaanderen en Nederland zijn gepubliceerd. The Dutch Corpus of Contemporary and late Modern Periodicals (C-CLAMP) is a historical corpus that consists of a collection of articles from 13 cultural or literary periodicals published in Flanders and The Netherlands.
Dutch Idiom Database: Native Speakers (DID-NS)	Een database met beoordelingen van Nederlandse uitdrukkingen door moedertaalsprekers. A database with appreciations by native speakers of Dutch expressions.
Dutch Idiom Database: Native Speakers (DID-NS) Commercieel	Een database met beoordelingen van Nederlandse uitdrukkingen door moedertaalsprekers. A database with appreciations by native speakers of Dutch expressions.
Dutch Parallel Corpus (DPC)	Een parallel corpus van 10 miljoen woorden voor de taalparen Nederlands-Engels en Nederlands-Frans. A parallel corpus of 10 million words for the language pairs Dutch-English and Dutch-French.
Dutch Parallel Corpus (DPC) Commercieel	Een parallel corpus van 8,77 miljoen woorden voor de taalparen Nederlands-Engels en Nederlands-Frans. A parallel corpus of 8,77 million words for the language pairs Dutch-English and Dutch-French.
Dutch Renaissance Poetry Corpus	Het Dutch Renaissance Poetry Corpus bevat alexandrijnen en jambische pentameters geschreven door een selectie van Nederlandse Renaissancedichters (eind 16de en 17de eeuw). The Dutch Renaissance Poetry Corpus contains alexandrines and iambic pentameters written by a selection of Dutch Renaissance poets (end of 16th and 17th century).
e-Lex	Lexicon met ruim 200.000 lemma's en ruim 640.000 woordvormen voorzien van o.a. POS-tag, complementatiepatroon, semantisch type en uitspraakinformatie. A lexical database consisting of over 200,000 entries and over 640,000 word forms, enriched with part of speech, complementation type, semantic type, and phonological information.
e-Lex Commercieel	Lexicon met ruim 200.000 lemma's en ruim 640.000 woordvormen voorzien van o.a. POS-tag, complementatiepatroon, semantisch type en uitspraakinformatie. A lexical database consisting of over 200,000 entries and over 640,000 word forms, enriched with part of speech, complementation type, semantic type, and phonological information.
Eindhoven Corpus	Een verzameling Nederlandstalige geschreven en getranscribeerde gesproken teksten uit de periode van 1960 tot 1976. A corpus of Dutch written and transcribed spoken texts from the period 1960 to 1976.
elektronische Woordenbank van de Nederlandse Dialecten - eWND (Online)	De elektronische Woordenbank van de Nederlandse Dialecten (eWND) bevat oude en modernere Nederlandse dialectwoordenboeken. The electronic Dictionary of Dutch Dialects (eWND) contains old and more modern Dutch dialect dictionaries.
Etymologiebank	De Etymologiebank biedt alle belangrijke etymologische publicaties van het Nederlands op woordniveau aan op één centraal punt (data geleverd door het INT). Etymologiebank presents all important etymological publications on Dutch words in one place (data have been supplied by the Dutch Language Institute).
Etymologisch Woordenboek van het Nederlands (EWN)	Een wetenschappelijk etymologisch woordenboek voor het moderne bovenregionale Nederlands. A scientific etymological dictionary of modern, superregional Dutch.
Federated Search Lexica (Online)	Een interface voor het gelijktijdig doorzoeken van een aantal lexica. An interface for simultaneous querying a number of lexica.
Frequentielijsten corpora	De 5000 meest voorkomende woorden uit de Miljoenencorpora, het PAROLE-corpus 2004, het CGN, het ANW-corpus, het Eindhoven-corpus, het D-Coi-corpus en het SoNaR-corpus. The 5000 most frequent words from the Millions Corpora, the PAROLE 2004 Corpus, the Spoken Dutch Corpus, the ANW Corpus, the Eindhoven Corpus, the D-Coi Corpus and the SoNaR corpus.
Frequentielijsten corpora Commercieel	De 5000 meest voorkomende woorden uit de Miljoenencorpora, het PAROLE-corpus 2004, het CGN, het ANW-corpus, het Eindhoven-corpus, het D-Coi-corpus en het SoNaR-corpus. The 5000 most frequent words from the Millions Corpora, the PAROLE 2004 Corpus, the Spoken Dutch Corpus, the ANW Corpus, the Eindhoven Corpus, the D-Coi Corpus and the SoNaR corpus.
Frog	Een tokenizer, tagger, lemmatizer, morphological segmenter, shallow parser, named entity recognizer, en dependency parser in één.
GaLAHaD (Online)	GaLAHaD biedt een flexibele omgeving waarin je diachronisch corpusmateriaal taalkundig kunt annoteren en de taalkundige annotaties kunt evalueren. GaLAHaD provides a flexible environment in which you can linguistically annotate diachronic corpus material and evaluate the linguistic annotations.
GCND GrETEL (Online)	De applicatie GCND-GrETEL zorgt ervoor dat de handmatig geverifieerde syntactische annotaties van het Gesproken Corpus van de zuidelijk-Nederlandse Dialecten doorzocht kunnen worden. The GCND-GrETEL application allows the manually verified syntactic annotations of the Gesproken Corpus van de zuidelijk-Nederlandse Dialecten to be searched.
Gekaapte Brieven (Online)	Transcripties van 5862 brieven en andere documenten aan en van zeelieden en anderen uit de 17e en 18e eeuw, voorzien van metadata. Transcriptions of 5862 letters and other documents from an to sailors and others from the 17th and 18th century, with metadata.
Gesproken Corpus van de zuidelijk-Nederlandse Dialecten - GCND (Online)	Het Gesproken Corpus van de zuidelijk-Nederlandse Dialecten (GCND) is het eerste grammaticaal verrijkte corpus van spontaan gesproken dialecten in het Europees-Nederlandse taalgebied. The Gesproken Corpus van de zuidelijk-Nederlandse dialecten (GCND) is the first grammatically enriched corpus of spontaneously spoken dialects in the European-Dutch language area.
GiGaNT-Molex	Het GiGaNT-Molex-lexicon bevat Nederlands materiaal uit Nederland, Vlaanderen, de Antillen en Suriname afkomstig uit hedendaagse corpusteksten van het Instituut voor de Nederlandse Taal (INT). Alle lemmata en paradigmata zijn handmatig nagekeken en conform de officiële spelling. The GiGaNT-Molex lexicon contains Dutch language material from the Netherlands, Flanders, the Netherlands Antilles, and Surinam coming from corpus texts of the Dutch Language Institute (Instituut voor de Nederlandse Taal - INT). It has been manually verified and it follows the official Dutch spelling.
GiGaNT-Molex Commercieel	Het GiGaNT-Molex-lexicon bevat Nederlands materiaal uit Nederland, Vlaanderen, de Antillen en Suriname afkomstig uit hedendaagse corpusteksten van het INT. Alle lemmata en paradigmata zijn handmatig nagekeken en conform de officiële spelling. The GiGaNT-Molex lexicon contains Dutch language material from the Netherlands, Flanders, the Netherlands Antilles, and Surinam coming from corpus texts of the Dutch Language Institute (Instituut voor de Nederlandse Taal - INT). It has been manually verified and it is in compliance with the official Dutch spelling rules.
Global Anglicism Database - GLAD (Online)	De Global Anglicism Database bevat Engelse leenwoorden die voorkomen in zeventien verschillende talen. The Global Anglicism Database contains English loanwords found in seventeen different languages.
Gold Standard Parallel Corpus of Sign and spoken Language - GoSt-ParC-Sign	GoSt-ParC-Sign is een multimodaal corpus van VGT met een vertaling in geschreven Nederlands als doeltaal. Alle VGT-materiaal in dit corpus bestaat uit reeds bestaande video's die werden gemaakt door oorspronkelijke VGT-sprekers voor een VGT-publiek. The Gost-ParC-Sign is a multimodal corpus of VGT as source and a translation into written Dutch as target language. All VGT material included in this corpus consists of already existing VGT videos which were produced by authentic VGT signers for a signing audience.
Greedy Extraction of Trees for Empirical Linguistics - GrETEL 4 (Online)	Een gebruiksvriendelijke interface voor het doorzoeken van syntactisch geannoteerde corpora of treebanks. A common interface for querying syntactically annotated corpora and tree banks.
Hieronymus Catalogue of Translations in the Burgundian and Spanish Netherlands (1470-1700) (online)	De Hieronymus-database verzamelt informatie over personen die betrokken waren bij vertalingen en boeken die verband houden met vertalingen in de Bourgondische en Spaanse Nederlanden in de vroegmoderne periode. The Hieronymus database gathers information on people involved with and books related to translation in the Burgundian and Spanish Netherlands in the early modern period.
Historical Corpus of Dutch - HCD (Online)	Het Historisch Corpus van het Nederlands (HCD) is een diachronisch, regionaal gebalanceerd corpus van verschillende genres geschreven Nederlands. The Historical Corpus of Dutch (HCD) is a diachronic, regionally balanced, multigenre corpus of written Dutch.
Hoger Onderwijs Terminologie in Nederland en Vlaanderen - HOTNeV	Een terminologische database met Nederlandse en Vlaamse onderwijstermen. A terminology database with Dutch and Flemish terms from the educational domain.
Hotel Review Corpus in Nederlandse Gebarentaal - NGT_HoReCo	Een multimodaal parallel corpus met de talen Nederlands en Nederlandse Gebarentaal (NGT). 297 geschreven hotelbeoordelingen werden vertaald uit het Nederlands in NGT door 6 professionele, dove vertalers. A multimodal parallel corpus of Dutch and Sign Language of the Netherlands (NGT). 297 hotel reviews in written Dutch were translated into NGT videos by 6 professional, deaf translators.
Hotel Review Corpus in Spanish Sign Language - LSE_HoReCo	Een multimodaal parallel corpus met de talen Spaans en Spaanse Gebarentaal (Lengua de Signos Española - LSE). 283 geschreven hotelbeoordelingen, oorspronkelijk in het Nederlands werden vertaald in het Spaans en vervolgens door 6 professionele, dove vertalers in het LSE. A multimodal parallel corpus of Spanish and Spanish Sign Language (Lengua de Signos Española - LSE). 283 hotel reviews, originally written in Dutch were translated into Spanish and subsequently in LSE videos by 6 professional, deaf translators.
Hotel Review Corpus in Vlaamse Gebarentaal - VGT_HoReCo	Een multimodaal parallel corpus met de talen Nederlands en Vlaamse Gebarentaal (VGT). 297 geschreven hotelbeoordelingen werden vertaald uit het Nederlands in NGT door 6 professionele, dove vertalers. Elke beoordeling is vertaald door slechts 1 vertaler. Het aantal woorden in de beoordelingen varieerde tussen 15 en 400. De duur van de VGT-video's varieerde tussen 10 seconden tot ongeveer 4 minuten. Het resulterende corpus bevat 21.825 woorden in het Nederlands en ongeveer 4 uur aan VGT-videomateriaal. A multimodal parallel corpus of Dutch and Flemish Sign Language (VGT). 297 hotel reviews in written Dutch were translated into VGT videos by 6 professional, deaf translators. Each review was translated by only one translator. The word length of the Dutch reviews varies from around 15 to 400 words; the duration of the VGT videos ranged from around 10 seconds to around 4 minutes. The total amount of words contained in the corpus is 21,825; the VGT translations consist of about 4 hours of videos.
Hulk / Keurmerk Spelling	HulK / Keurmerk Spelling: keurmerk voor producten die de regels en principes van de officiële spelling van de Taalunie volgen. Certification mark for products written in compliance with the official spelling rules and principles formulated by the Dutch Language Union.
IFA Corpus	Een database voor fonetisch onderzoek die bestaat uit Nederlandse spraakdata van 8 personen; 4 mannelijk en 4 vrouwelijk. A corpus for phonetic research consisting of speech data of 4 male and 4 female persons.
IFA Dialogue Video corpus	Video- en geluidsopnamen van spontane dialogen tussen proefpersonen. Video- and sound recordings of spontaneous dialogues between subjects.
INT Historische Woordenlijst	Twee lijsten met elk ca. 500.000 historische woordvormen ten behoeve van OCR en OCR-postcorrectie, voor de periode ca. 1550 - ca. 1970. Two lists, each consisting of approx. 500,000 historical word forms, to be used for OCR and OCR post-correction, for the period of 1550 – 1970, approximately.
INT IMPACT NE-lexicon	Lexicon voor het Nederlands, met historische namen en varianten uit de periode 1750-1945. Lexicon for Dutch, featuring historical names and variants from the period between 1750 and 1945.
isiNdebele Custom Dictionary for Government Domain	This language resource contains an alphabetic list of words which are exclusive to the government domain or which are not part of the official orthography of isiNdebele.
isiNdebele Genre Classification Corpus	Contains training and testing data for genre classification for isiNdebele.
isiXhosa Custom Dictionary for Government Domain	This language resource contains an alphabetic list of words which are exclusive to the government domain or which are not part of the official orthography of isiXhosa.
isiXhosa Genre Classification Corpus	Contains training and testing data for genre classification for isiXhosa.
isiZulu Custom Dictionary for Government Domain	This language resource contains an alphabetic list of words which are exclusive to the government domain or which are not part of the official orthography of isiZulu.
isiZulu Genre Classification Corpus	Contains training and testing data for genre classification for isiZulu.
JASMIN-spraakcorpus	Een verzameling van circa 115 uur Nederlandse spraak van jongeren, anderstaligen en senioren, bestaande uit voorgelezen tekst en mens-machinedialogen. A corpus of about 115 hours of Dutch speech from juveniles, non-native speakers and seniors, consisting of read text and man-machine dialogues.
JASMIN-spraakcorpus Commercieel	Een verzameling van circa 115 uur Nederlandse spraak van jongeren, anderstaligen en senioren, bestaande uit voorgelezen tekst en mens-machinedialogen. A corpus of about 115 hours of Dutch speech from juveniles, non-native speakers and seniors, consisting of read text and man-machine dialogues.
LAnCeLoT (Online)	LAnCeLoT stelt onderzoekers in staat om taalkundige verrijkingen handmatig te corrigeren en te verfijnen. LAnCeLoT (Linguistic Annotation Corpus Laundry Tool) allows researchers to manually correct and refine linguistic enrichments.
Lassy Groot-corpus	Een corpus bestaande uit circa 700 miljoen woorden dat automatisch voorzien werd van syntactische annotaties. Lassy Groot-corpus: A corpus of about 700 million words that has been annotated syntactically by machine.
Lassy Groot-corpus Commercieel	Een corpus bestaande uit circa 476 miljoen woorden dat automatisch voorzien werd van syntactische annotaties. The corpus contains about about 476 million words with automatically generated syntactic annotations.
Lassy Klein-corpus	Het Lassy Klein-corpus is een corpus van ongeveer 1 miljoen woorden met manueel geverifieerde syntactische annotaties. The Lassy Small Corpus contains about a million words with manually verified syntactical annotations.
Lassy Klein-corpus Commercieel	Een syntactisch geannoteerd corpus bestaande uit 772.000 woorden. A syntactically annotated corpus consisting of a million words.
LeTTuce-PoS Dataset (Download)	De LeTTuce-PoS-dataset is een meertalig benchmarkcorpus voor part-of-speech tagging in verschillende gegevensgenres en domeinen. The LeTTuce-PoS dataset is a multilingual benchmark corpus for part-of-speech tagging across different data genres and domains.
Lexicon Frisicum (Online)	Dit is een online en toegankelijke versie van (een deel van) het Lexicon Frisicum van Johann Halbertsma. This is the online and accessible version of (part of) the Lexicon Frisicum of Johann Halbertsma.
Lwazi Afrikaans ASR Corpus	Audio recordings and orthographic transcriptions used for Lwazi speech recognition systems.
Lwazi Afrikaans Pronunciation Dictionary	General phonemic pronunciations for frequently occurring words in Afrikaans.
Lwazi English ASR Corpus	Complete audio recordings and orthographic transcriptions used for Lwazi speech recognition systems.
Lwazi English Pronunciation Dictionary	General phonemic pronunciations for frequently occurring words in English.
Lwazi isiNdebele ASR Corpus	Complete audio recordings and orthographic transcriptions used for Lwazi speech recognition systems.
Lwazi isiNdebele Pronunciation Dictionary	General phonemic pronunciations for frequently occurring words in isiNdebele.
Lwazi isiXhosa ASR Corpus	Complete audio recordings and orthographic transcriptions used for Lwazi speech recognition systems.
Lwazi isiXhosa Pronunciation Dictionary	General phonemic pronunciations for frequently occurring words in isiXhosa.
Lwazi isiZulu ASR Corpus	Complete audio recordings and orthographic transcriptions used for Lwazi speech recognition systems.
Lwazi isiZulu Pronunciation Dictionary	General phonemic pronunciations for frequently occurring words in isiZulu.
Lwazi Sepedi ASR Corpus	Complete audio recordings and orthographic transcriptions used for Lwazi speech recognition systems.
Lwazi Sepedi Pronunciation Dictionary	General phonemic pronunciations for frequently occurring words in Sepedi.
Lwazi Sesotho ASR Corpus	Complete audio recordings and orthographic transcriptions used for Lwazi speech recognition systems.
Lwazi Sesotho Pronunciation Dictionary	General phonemic pronunciations for frequently occurring words in Sesotho.
Lwazi Setswana ASR Corpus	Complete audio recordings and orthographic transcriptions used for Lwazi speech recognition systems.
Lwazi Setswana Pronunciation Dictionary	General phonemic pronunciations for frequently occurring words in Setswana.
Lwazi Siswati ASR Corpus	Complete audio recordings and orthographic transcriptions used for Lwazi speech recognition systems.
Lwazi Siswati Pronunciation Dictionary	General phonemic pronunciations for frequently occurring words in Siswati.
Lwazi Tshivenda ASR corpus	Complete audio recordings and orthographic transcriptions used for Lwazi speech recognition systems.
Lwazi Tshivenda Pronunciation Dictionary	General phonemic pronunciations for frequently occurring words in Tshivenda.
Lwazi Xitsonga ASR Corpus	Complete audio recordings and orthographic transcriptions used for Lwazi speech recognition systems.
Lwazi Xitsonga Pronunciation Dictionary	General phonemic pronunciations for frequently occurring words in Xitsonga.
MAchine Translation Evaluation Online	MATEO is een online platform voor het evalueren van machinevertalingen. MATEO is an online platform for evaluating machine translations.
Medische Pilot (MedPilot)	De Medische Pilot is een bij wijze van experiment ingerichte database waarin een klein deel van de medische woordschat beschreven wordt op verschillende niveaus, van wetenschappelijk tot toegankelijk voor laaggeletterden, en waarin ook verschillen tussen Vlaamse en Nederlandse termen worden getoond. The Medical Pilot is an experimental database in which a small selection of medical terms is described at several levels, from scientific to functionally illiterate, and which also hilights the differences between Flemish and Dutch from the Netherlands.
Medische Termen Belgisch-Nederlands (MedTermBN)	Een lijst met medische begrippen waarvoor in België en Nederland afwijkende termen worden gebruikt. A list with medical notions for which in Belgium and the Netherlands differing terms are used.
Medische Termen Belgisch-Nederlands (MedTermBN) Commercieel	Een lijst met medische begrippen waarvoor in België en Nederland afwijkende termen worden gebruikt. A list with medical notions for which in Belgium and the Netherlands differing terms are used.
Meertalige Ondertiteldata 2BDutch	De ondertiteldata behorend bij de Nederlandstalige video’s op de website www.2BDutch.nl, vormen het product Meertalige Ondertiteldata 2BDutch. A corpus of subtitles belonging to the Dutch video's on the website www.2BDutch.nl.
Memory-Based Morphological Parser (MBMP)	Een geheugengebaseerde morfologische parser voor de programmeertaal Python. Deze tool wordt gedistribueerd via GitHub. A memory-based morphological parser for the programming language Python. This tool is distributed through Github.
Menselijke evaluatie van geautomatiseerde tekstvereenvoudiging: resultaten van crowdsourcing	Dit taalmateriaal bestaat uit zinnen uit het SoNaR-corpus, een door GPT-4 vereenvoudigde versie daarvan en de menselijke beoordelingen van die vereenvoudigingen. This dataset consists of sentences from the SoNaR corpus, a version simplified by GPT-4 and the human evaluations of those simplifications with respect to simplicity, accuracy and fluency.
META-Covid Ontology 1.0	De META-COVID Ontology verbindt 30 interdiciplenaire COVID onderwerpen met 203 specifieke concepten vanuit wetenschappelijke ontologieën. The META-COVID Ontology links 30 high-level interdisciplinary COVID topics to 203 specific concepts from scientific ontologies.
Middelnederlandsch Woordenboek - MNW (Online)	Beschrijft de Nederlandse woordenschat uit de periode ca. 1250 tot ca. 1550. Describes the vocabulary of the Dutch spoken from the thirteenth to the sixteenth century.
Moroccorp	Moroccorp is een corpus van chats tussen Marokkaans-Nederlandse taalgebruikers, bestaande uit tien miljoen woorden. Moroccorp is a corpus of chats between Maroccan-Dutch language users consisting of about a million words.
MuST-Cinema-PE: post-editing in automatic subtitling	MuST-Cinema-PE is een corpus met post-editingdata van automatisch gegenereerde ondertitels. MuST-Cinema-PE is a corpus containing post-editing data of automatically-generated subtitles.
NAMES Corpus	Een corpus van Nederlandse voor- en achternamen zoals gevonden in 19de-eeuwse geboorte-, huwelijks- en overlijdensakten. De naamvarianten zijn gekoppeld aan een standaardvorm. A corpus of Dutch given names and surnames as present in 19^th century certificates for birth, marriage and decease. The name variants have been assigned to a standard form.
NAMES Corpus Commercieel	Een corpus van Nederlandse voor- en achternamen zoals gevonden in 19de-eeuwse geboorte-, huwelijks- en overlijdensakten. De naamvarianten zijn gekoppeld aan een standaardvorm. A corpus of Dutch given names and surnames as present in 19^th century certificates for birth, marriage and decease. The name variants have been assigned to a standard form.
Nederlands als Wetenschapstaal: Natuurkunde (Data)	Als onderdeel van het project "Nederlands als Wetenschapstaal" is een lijst met natuurkundige termen samengesteld. As part of the project "Dutch as language of Science", a list of physics terms has been compiled.
Nederlands als Wetenschapstaal: Scheikunde (Data)	Als onderdeel van het project "Nederlands als Wetenschapstaal" is een lijst met scheikundige termen samengesteld. As part of the project "Dutch as language of Science", a list has been compiled of terms used in chemistry.
Nederlands als Wetenschapstaal: Scheikunde (Online)	Als onderdeel van het project "Nederlands als Wetenschapstaal" is een lijst met scheikundige termen samengesteld. Die lijst is beschikbaar in een zoekinterface. As part of the project "Dutch as language of Science", a list has been compiled of terms used in chemistry. That list is available in a search interface.
Nederlands als Wetenschapstaal: Wiskunde (Data)	Als onderdeel van het project "Nederlands als Wetenschapstaal" is een lijst met wiskundige termen samengesteld. As part of the project "Dutch as language of Science", a list has been compiled of terms used in mathematics.
OMBI Arabisch-Nederlands	Bilinguaal lexicaal bestand met als brontaal Arabisch en als doeltaal Nederlands. Bilingual lexicon with Arabic as source language and Dutch as target language.
OMBI Arabisch-Nederlands Commercieel	Bilinguaal lexicaal bestand met als brontaal Arabisch en als doeltaal Nederlands. Bilingual lexicon with Arabic as source language and Dutch as target language.
OMBI Nederlands-Arabisch	Bilinguaal lexicaal bestand met als brontaal Nederlands en als doeltaal Arabisch. Bilingual lexicon with Dutch as source language and Arabic as target language.
OMBI Nederlands-Arabisch Commercieel	Bilinguaal lexicaal bestand met als brontaal Nederlands en als doeltaal Arabisch. Bilingual lexicon with Dutch as source language and Arabic as target language.
OMBI Nederlands-Deens	Bilinguaal lexicaal bestand met als brontaal Nederlands en als doeltaal Deens. Bilingual lexicon with Dutch as source language and Danish as target language.
OMBI Nederlands-Deens Commercieel	Bilinguaal lexicaal bestand met als brontaal Nederlands en als doeltaal Deens. Bilingual lexicon with Dutch as source language and Danish as target language.
OMBI Nederlands-Indonesisch	Bilinguaal lexicaal bestand met als brontaal Nederlands en als doeltaal Indonesisch. Bilingual lexicon with Dutch as source language and Indonesian as target language.
OMBI Nederlands-Indonesisch Commercieel	Bilinguaal lexicaal bestand met als brontaal Nederlands en als doeltaal Indonesisch. Bilingual lexicon with Dutch as source language and Indonesian as target language.
Oosterveld & Vuyk Juridisch Woordenboek Nederlands – Spaans II	Oosterveld & Vuyk Juridisch Woordenboek Nederlands – Spaans II is een digitaal, corpusgebaseerd woordenboek in wording. Oosterveld & Vuyk staat onder redactie van Consuelo Oosterveld-Egas Repáraz en mr. Theresa Munneke-Lourens, met medewerking van drs. Margriet Muris. Oosterveld & Vuyk Legal Dictionary Dutch - Spanish II is a digital corpus base dictionary in progress. Oosterveld & Vuyk is edited by Oosterveld-Egas Repáraz and mr. Theresa Munneke-Lourens, in cooperation with drs. Margriet Muris.
Open Dutch Wordnet	Open Dutch Wordnet is een lexicale database voor het Nederlands, die 116.992 synsets bevat. Open Dutch Wordnet is a lexical dabase of Dutch containing 116,922 synsets.
OpenSoNaR (Online)	Online zoeksysteem voor het SoNaR-corpus, een tekstverzameling van hedendaags geschreven Nederlands dat uit meer dan 500 miljoen woorden bestaat. Het SoNaR-corpus is ook als download beschikbaar. Online search engine for the SoNaR Corpus, a text collection of contemporary written Dutch containing over 500 million words. The SoNaR corpus is also available as a download.
Oudnederlands Woordenboek - ONW (Online)	Het Oudnederlands Woordenboek is een wetenschappelijk woordenboek van het oudste Nederlands. The Oudnederlands Woordenboek is a scientific dictionary of the oldest Dutch.
PaCo-MT Parallelle Corpora	Twee (bestaande) parallelle corpora voorzien van automatisch gegenereerde syntactische annotaties en node alignments. Paco-MT Parallelle Corpora: Two (existing) parallel corpora provided with automatically generated syntactic annotation and node alignments.
PAROLE-lexicon	Het PAROLE-lexicon bevat ruim 20.000 entry's, die voorzien werden van woordsoort, getal, naamval en syntactische complementatiepatronen. The PAROLE Lexicon contains over 20,000 entries, enriched with word class, number, case, and syntactic complementation patterns.
Philosophical Integrator of Computational and Corpus Libraries (PICCL)	PICCL biedt een workflow aan voor het samenstellen van corpora waarbij een aantal bestaande tools zijn samengevoegd. PICCL offers a workflow for corpus building and builds on a variety of tools.
Pinkhof Geneeskundig Woordenboek (Online)	Het Pinkhof Geneeskundig Woordenboek (Online) bevat ruim 52.000 medische termen met hun betekenissen en/of verwijzingen. The Pinkhof Medical Dictionary contains more than 52,000 medical terms with their meanings and/or references.
Referentiebestand Belgisch-Nederlands (RBBN)	Een verzameling van 4000 woorden en uitdrukkingen die typisch zijn voor het Nederlands in België. A collection of 4000 words and expressions that are typical for Belgian Dutch.
Referentiebestand Belgisch-Nederlands (RBBN) Commercieel	Een verzameling van 4000 woorden en uitdrukkingen die typisch zijn voor het Nederlands in België. A collection of 4000 words and expressions that are typical for Belgian Dutch.
Referentiebestand Nederlands (RBN)	Een verzameling van ongeveer 50.000 frequente Nederlandse woorden aangevuld met taalkundige informatie. A collection of ca. 50,000 frequently used Dutch words, enriched with linguistic information.
Referentiebestand Nederlands (RBN) Commercieel	Een verzameling van ongeveer 50.000 frequente Nederlandse woorden aangevuld met taalkundige informatie. A collection of ca. 50,000 frequently used Dutch words, enriched with linguistic information.
RND Woordenlijsten	Fonetische transcripties van dialectwoorden verzameld in Nederland en België. Oorspronkelijk gepubliceerd in de “Reeks Nederlandse Dialectatlassen”. Phonetic transcriptions of dialect words collected in the Netherlands and Belgium. Originally published in the "Reeks Nederlandse Dialectatlassen”.
Sepedi Custom Dictionary for Government Domain	Custom dictionary developed in a spelling checker project for the Department of Arts and Culture. Contains words exclusive to the government domain or words that are not part of the official orthography of the language.
Sesotho Custom Dictionary for Government Domain	Custom dictionary developed in a spelling checker project for the Department of Arts and Culture. Contains words exclusive to the government domain or words that are not part of the official orthography of the language.
Sesotho Genre Classification Corpus	Contains training and testing data for genre classification for Sesotho.
Sesotho sa Leboa Genre Classification Corpus	Contains training and testing data for genre classification for Sesotho sa Leboa.
Setswana Custom Dictionary for Government Domain	Custom dictionary developed in a spelling checker project for the Department of Arts and Culture. Contains words exclusive to the government domain or words that are not part of the official orthography of the language.
Setswana Genre Classification Corpus	Contains training and testing data for genre classification for Setswana.
Siswati Custom Dictionary for Government Domain	Custom dictionary developed in a spelling checker project for the Department of Arts and Culture. Contains words exclusive to the government domain or words that are not part of the official orthography of the language.
Siswati Genre Classification Corpus	Contains training and testing data for genre classification for Siswati.
SoNaR Character N-grams	Uit de bestanden van het SoNaR-corpus versie 1.2 (SONAR500) zijn n-grammem met de lengtes 1, 2 en 3 afgeleid. From the SoNaR Corpus version 1.2 (SONAR500) n-grams have been derived with the lengths 1, 2, and 3.
SoNaR Groot-corpus Commercieel	Het SoNaR Groot-corpus Commercieel bevat ruim 271 miljoen woorden afkomstig uit (standaard) Nederlandstalige teksten van na 1954. The SoNaR Corpus contains more than 500 million words from texts in standard Dutch later than 1954.
SoNaR Klein-corpus Commercieel	Het SoNaR Klein-corpus Commercieel bevat ongeveer 825.000 woorden tekst die semantisch geannoteerd werden. The SoNaR Small Corpus Commercial contains approx. 825.000 words. The corpus is semantically annotated.
SoNaR Nieuwe Media Corpus	Het SoNaR Nieuwe Media Corpus 1.0 bevat nieuwemediateksten (sms'en, tweets en chatberichten) die verzameld werden binnen het STEVIN-project SoNaR. The SoNaR New Media Corpus 1.0 contains texts from new media (sms, tweets and chat messages) that were collected within the STEVIN-project SoNaR.
SoNaR-corpus	Het SoNaR-corpus bevat ruim 500 miljoen woorden afkomstig uit (standaard) Nederlandstalige teksten van na 1954. The SoNaR Corpus contains more than 500 million words from texts in standard Dutch later than 1954.
Spoken Academic Belgian Dutch Corpus - SABeD	Het Spoken Academic Belgian Dutch Corpus bestaat uit gedeeltes van 200 colleges gegeven op Vlaamse hogescholen en universiteiten. The Spoken Academic Belgian Dutch Corpus consists of parts of 200 lectures given in higher education institutions in Flanders.
SUBCAT-lexicon (Download)	Het SUBCAT-lexicon is een uitgebreid subcategorisatielexicon voor werkwoorden (in TeX-formaat). Het bestaat uit 12.000 werkwoordsingangen met abstracte subcategorisatieschemata. The SUBCAT Lexicon is an extended subcategorization lexicon for verbs. It consists of 12.000 verb entries with abstract subcategorization frames.
SumNL-samenvattingencorpus	Het SumNL-samenvattingencorpus is gebaseerd op 30 clusters. Ieder cluster bestaat uit een onderwerp en 5-25 krantenartikelen die relevant zijn voor het onderwerp. The SumNL Corpus of Abstracts is based on 30 clusters. Each cluster consists of a topic and 5-25 newspaper articles that are relevant for that topic.
Taalportaal (Online)	Taalportaal is een uitgebreide grammatica van het Nederlands, Fries en Afrikaans beschreven in het Engels. Het portaal bevat een lijst van taalkundige termen en een taalkundige bibliografie. Taalportaal wordt regelmatig geüpdatet. Taalportaal is an comprehensive grammar of Dutch, Frisian and Afrikaans written in English. The portal contains a list of linguistic terms and a linguistic bibliography. Taalportaal receives frequent updates.
TermWerk (Online)	TermWerk is een webapplicatie voor termextractie, termbeschrijving en termbeheer in het Nederlands. TermWerk is a web application for term extraction, term description, and term management in Dutch.
Text2Picto (Online)	Text2Picto is een vertaaltool die bedoeld is om de communicatie voor mensen met een leesbeperking te verbeteren. Text2Picto is a translation tools aimed at enhancing communication for people with reading disabilities.
Textlens (Online)	Textlens is een online tekstverwerkingsdashboard voor taken als automatische tokenisatie, lemmatisering, tagging, named entity recognition en afhankelijkheidsanalyse voor Nederlands, Engels, Frans en Duits. Textlens is an online text processing dashboard for tasks such as automatic tokenization, lemmatization, part of speech tagging, named entity recognition and dependency analysis for Dutch, English, French and German.
The Digital Pallas (Online)	De Digitale Pallas is de digitale versie van de Comparative Dictionary of All Languages and Dialects (1790-1791), een Russisch woordenboek van Peter Simon Pallas. The Digital Pallas is the digital version of the Comparative Dictionary of All Languages and Dialects (1790-1791), a Russian dictionary by Peter Simon Pallas.
The LiLaH Emotion Lexicon of Greek, Kurdish, Turkish, Spanish, Farsi and Chinese	Een lijst met woorden in het Grieks, Koerdisch, Turks, Spaans, Farsi en Chinees (traditioneel en vereenvoudigd) en hun associaties met acht basisemoties en twee sentimenten. A list of words in Greek, Kurdish, Turkish, Spanish, Farsi and Chinese (traditional and simplified) and their associations with eight basic emotions and two sentiments.
Tshivenda Custom Dictionary for Government Domain	Custom dictionary developed in a spelling checker project for the Department of Arts and Culture. Contains words exclusive to the government domain or words that are not part of the official orthography of the language.
Tshivenda Genre Classification Corpus	Contains training and testing data for genre classification for Tshivenda.
Vertaalwoordenschat (Online)	Applicatie voor tweetalige woordenboeken met Nederlands als bron- of doeltaal. Momenteel zijn de taalparen Nederlands-Nieuwgrieks, Nederlands-Portugees, Nederlands-Estisch en Nederlands-Fins gratis beschikbaar. Application for bilingual dictionaries with Dutch as a source language or target language. Dutch - Modern Greek, Dutch-Portuguese, Dutch-Estonian and Dutch-Finnish are the first language combinations available for free.
Vroegmiddelnederlands Woordenboek - VMNW (Online)	Het VMNW is een wetenschappelijk woordenboek gebaseerd op ambtelijke bescheiden en literaire teksten uit de dertiende eeuw. The VMNW is a scientific dictionary based on official documents and literary texts from the thirteenth century.
VU-DNC-corpus (Online)	Een diachroon Nederlands krantencorpus dat bestaat uit data van vijf kranten. Voor elk van de kranten is data uit twee jaren beschikbaar (1950/1951 en 2002). A diachronic Dutch newspaper corpus, consisting of data from five newspapers, covering 2 separate years (1950/1951 and 2002).
Wablieft-corpus	Het Wablieft-corpus bevat het digitaal archief van de Wablieft-krant (periode 2011-2017). The Wablieft Corpus contains the digital archive of the Wablieft paper (from 2011-2017).
WAI-NOT Corpus	Het WAI-NOT-corpus bestaat uit 874 krantenartikels, afkomstig uit de WAI-NOT-krant. De artikels zijn opgesteld in eenvoudig te lezen Nederlands en zijn afkomstig uit de periode 2009-2021. Het corpus bevat ongeveer 75.000 woorden. The WAI-NOT Corpus contains the digital archive of the WAI-NOT paper (from 2009-2021) and it contains easy-to-read articles written in Dutch. The corpus contains approximately 75,000 words.
WebCelex (Online)	Interface waarmee de CELEX-lexicaledatabases van het Duits, Engels, Nederlands kunnen worden geraadpleegd. Voor iedere taal zijn de lemma's aangevuld met orthografische, fonologische, morfologische en syntactische informatie en frequentiegegevens. Interface through which the CELEX lexical databases of German, English and Dutch can be consulted. For each language, the lemmas have been enriched with orthographical, phonological, morphological and syntactic information, as well as linguistic frequency data.
Woordcombinaties (Online)	Woordcombinaties toont hoe woorden gebruikt worden in voorbeeldzinnen, welke woorden typisch en/of vaak met elkaar gecombineerd worden en hoe (valentie)patronen samen met collocaties gebruikt worden voor het bouwen van zinnen. Woordcombinaties shows how words are used in example sentences, which words are typically and/or frequently combined and how (valency) patterns together with collocations are used in building sentences.
Woordenboek der Friese Taal - WFT (Online)	Het "Wurdboek fan de Fryske taal" is een wetenschappelijk, descriptief woordenboek en bevat ongeveer 120.000 lemma's. The Dictionary of the Frisian Language is a scientific, descriptive dictionary containing ca. 120,000 entries.
Woordenboek der Nederlandsche Taal - WNT (Online)	Een historisch, wetenschappelijk, beschrijvend woordenboek van het Nederlands van 1500-1976. A scientific, historical, descriptive dictionary of the Dutch language as it was written between 1500 and 1976.
Woordenboek van Nieuwe Woorden - WNW (Online)	Het Woordenboek van Nieuwe Woorden (WNW) is een online woordenboek waarin woorden die vanaf het jaar 2000 zijn ontstaan, worden beschreven. The Dictionary of New Words (WNW) is an online dictionary describing words created from the year 2000 onwards.
Woordenboek Vlaamse Gebarentaal (Woordenboek VGT)	Dit product bevat het videomateriaal uit het online Woordenboek Vlaamse Gebarentaal. In de 10.025 video's is per video een gebaar vastgelegd. This product contains the video material of the Dictionary of Flemish Sign Language. The 10,025 videos contain a gesture per video.
Woordenlijst Nederlandse Taal (Online)	Op Woordenlijst.org vind je de Woordenlijst Nederlandse Taal: de lijst met de officiële spelling van het Nederlands. On Woordenlijst.org, you will find the Woordenlijst Nederlandse Taal: the list containing the official spelling of Dutch.
Woordpeiler (Online)	Woordpeiler toont hoe vaak woorden door de tijd heen voorkomen in teksten uit Nederlandstalige kranten vanaf 2000. Woordpeiler shows how often words appear over time in texts from Dutch-language newspapers from 2000 onwards.
Xitsonga Custom Dictionary for Government Domain	Custom dictionary developed in a spelling checker project for the Department of Arts and Culture. Contains words exclusive to the government domain or words that are not part of the official orthography of the language.
Xitsonga Genre Classification Corpus	Contains training and testing data for genre classification for Xitsonga.