HNK 3: text collecting
HNK text collecting started: November 1998
the cheapest and quickest way: WWW
- until today: more than 80 mW from .hr domain
- daily WWW token-gain: approx. 180.000 tokens
DTP sources
- more than 100 publications (now approx. 9 mW)
- domains:
- fiction, medicine, agronomy, law, literature theory and criticism, economy, philosophy, philology…
- lack of texts from natural sciences
- highly disbalance on humanities side
typing/scanning text is not expected