New! Croatian National Corpus v3.0

Croatian National Corpus v3.0 is online and is accessible through new interface Bonito2 at this address.

What is the Croatian National Corpus?

Croatian National Corpus (HNK) is a systematized collection of selected texts mainly written in contemporary Croatian covering different media, genres, styles, fields and topics. The Corpus is accompanied by additional linguistic and non-linguistic data and stored in a database on our server which can be accessed with the search client program Bonito.

To whom it could be useful?

HNK is publicly available for research, education and other non-commercial purposes. Commercial users should register and subscribe in order to get their user account and password.

Corpus is published "as-it-is" in the form available for search, but it could be also be subsumed to a changes without obligation for prior notice to users. The list of (new) sources will be always available.

Provisional access

While the Corpus is still being collected, the temporary provisional access has been granted without any registration. This acces is available with guest account and without password. However, this access is limited in options for ad hoc subcorpora generation according to selected criteria. It is also limited in options for complex queries.

How big is HNK?

At the moment, HNK has 216.8 million tokens.

How can you make a reference to HNK?

Tadić, Marko (2009) New version of the Croatian National Corpus. In: Hlaváčková, Dana ; Horák, Aleš ; Osolsobě, Klara ; Rychlý, Pavel (eds.) After Half a Century of Slavonic Natural Language Processing. Masaryk University, Brno, pp. 199-205.