Woordfrequentie

JASMIN-spraakcorpus

Een verzameling van circa 115 uur Nederlandse spraak van jongeren, anderstaligen en senioren, bestaande uit voorgelezen tekst en mens-machinedialogen.
A corpus of about 115 hours of Dutch speech from juveniles, non-native speakers and seniors, consisting of read text and man-machine dialogues.

JASMIN-spraakcorpus Commercieel

IFA Corpus

Een database voor fonetisch onderzoek die bestaat uit Nederlandse spraakdata van 8 personen; 4 mannelijk en 4 vrouwelijk.
A corpus for phonetic research consisting of speech data of 4 male and 4 female persons.

Frequentielijsten corpora

De 5000 meest voorkomende woorden uit de Miljoenencorpora, het PAROLE-corpus 2004, het CGN, het ANW-corpus, het Eindhoven-corpus, het D-Coi-corpus en het SoNaR-corpus.
The 5000 most frequent words from the Millions Corpora, the PAROLE 2004 Corpus, the Spoken Dutch Corpus, the ANW Corpus, the Eindhoven Corpus, the D-Coi Corpus and the SoNaR corpus.

e-Lex

Lexicon met ruim 200.000 lemma’s en ruim 640.000 woordvormen voorzien van o.a. POS-tag, complementatiepatroon, semantisch type en uitspraakinformatie.
A lexical database consisting of over 200,000 entries and over 640,000 word forms, enriched with part of speech, complementation type, semantic type, and phonological information.

« Vorige