Structure and sources

Ideal structure

The ideal structure of text samples regarding media, text type, genre, field and topic is presented in a table below:

Text type %
Informative texts 74
  Newspapers 37
    daily 22
    weekly  9
    bi-weekly  6
  Magazines, journals 16
    weekly  9
    monthly  4
    bi-, tri-monthly  3
  Books, brochures, correspondence... 21
    publicistics  4
    popular texts 3.5
    correspondence, ephemera 0.5
    arts and sciences 13
Imaginative texts (fiction): prose 23
    novels 13
    stories  5
    essays  4
    diaries, (auto)biographies...  1
Mixed texts  3

Since the compilation of the Corpus is still going on, the desired balance has not been reached yet. When enought source texts for filling up each of outlined constituents will be collected, the final selection of text samples will be done with overall size at least 100 million tokens (100 Mw). In the meantime almost all what has been collected will be open to public access and querying. Our estimation was that it was important to offer public access to HNK, even in this unbalanced form, because it would be introduced faster as the source of primary linguistic data for Croatian, facilitate the accessibility to that data in domestic and foreign universities and their respective Croatistic and/or Slavistic departments, and finally, rise the general level of Croatian linguistic culture.

Provisional structure (HNK v 2.0)

In this moment HNK encompasses the following subcorpora:

Text typology

HNK.M         medium
HNK.M.G        spoken
HNK.M.E        electronic...
HNK.M.P        written
HNK.M.P.O       published
HNK.M.P.O.B      brochure
HNK.M.P.O.K      book
HNK.M.P.O.N      newspapers
HNK.M.P.O.N.D     daily
HNK.M.P.O.N.T     weekly
HNK.M.P.O.N.2     bi-weekly
HNK.M.P.O.R      magazines, journals
HNK.M.P.O.R.T     weekly
HNK.M.P.O.R.M     monthly
HNK.M.P.O.R.V     bi-, tri-monthly
HNK.M.P.N       not published
HNK.M.P.N.J      public
HNK.M.P.N.I      internal
HNK.M.P.N.O      personal

HNK.V       type, genre
HNK.V.N      faction
HNK.V.N.Z     sciences
HNK.V.N.Z.P    natural sciences...
HNK.V.N.Z.T    technical sciences...
HNK.V.N.Z.M    biomedical sciences...
HNK.V.N.Z.B    biotechnical sciences...
HNK.V.N.Z.D    social sciences...
HNK.V.N.Z.H    humanities...
HNK.V.N.S	   expert texts
HNK.V.N.S.Z    travels
HNK.V.N.S.K    criticism
HNK.V.N.S.M    media
HNK.V.N.S.C    criminalistics
HNK.V.N.S.S    sports
HNK.V.N.S.P    politics
HNK.V.N.S.E    ecology, bioethics etc.
HNK.V.N.N	   non-expert texts
HNK.V.N.N.P    publicistics
HNK.V.N.N.O    popular texts
HNK.V.N.N.E    ephemera

HNK.V.U	  fiction
HNK.V.U.P	    prose...
HNK.V.U.D	    drama...
HNK.V.U.S	    poetry...

HNK.V.M      mixed texts