Hrvatski jezik

Marko Tadić


After Classical high school in Zagreb, graduated in 1987 at Faculty of Humanities and Social Sciences (formerly Faculty of Philosophy), University of Zagreb. Studied general linguistics, phonetics, latin and informational sciences for humanities.

In 1988 I started to work in the Institute of linguistics at the Faculty of Humanities and Social Sciences, University of Zagreb where I work today as the principal researcher at the projects of Ministry of science, education and sports of the Republic of Croatia No. 0130418.

In 1989 in Pisa and in 1991 in Birmingham specialisation in the field of corpus linguistics.

M.A. in 1992 and Ph.D. in 1994 with the topic Computational processing of the Croatian morphology at Faculty of Humanities and Social Sciences, University of Zagreb.

In 1998 assistant professor at the Department of linguistics at the Faculty of Humanities and Social Sciences, University of Zagreb.

From 2001 head of the Chair of algebraic and informatical linguistics.

From 2003 to 2007 Head of the Department of linguistics.

In 2004 associated professor at the same department.

Current lectures at the Department of linguistics:

Current lectures at the post-graduate studies of the Faculty of Humanities and Social Sciences, Univ. of Zagreb:

At the Jadertina Summer School in Empirical and Computational Linguistics (JSSECL) jointly organized by University of Zadar, Institute of Croatian Language and Linguistics and Croatian Language Technologies Society (CLTS) lectured:

At Zagreb Polytechnic from 1998 to 2002 lectured Introduction to computational linguistics.

Member of organising committees:

  • symposium Languages in Contact in 12th World Anthropological Congress, Zagrebu 1988.

  • symposium Computational Linguistics and Language Technology — needs and prespectives (in cooperation with European Council), Dubrovnik 1989.

  • 7th TELRI seminar (in cooperation with University of Birmingham and TELRI association), Dubrovnik, 26.-29. rujna 2002. (president of the organising committee).

Member of programme committees of workshops:

  • Shallow Processing of Large Corpora, Corpus Linguistics 2003 conference, Lancaster (United Kingdom), 2003-03-27.

  • Morphological Processing of Slavic Languages, EACL2003 conference, Budapest (Hungary), 2003-04-13.

  • Adaptation of Automatic Learning Methods for Analytical and Flective Languages, ESSLI2003, Vienna (Austrija),2003-08-18.

  • Information Extraction for Slavonic and other Central and Eastern European Languages, RANLP2003 conference, Borovetsu (Bulgaria), 2003-09-08.

  • Natural Language Understanding and Cognitive Science, ICEIS 2004 (Sixth International Conference on Enterprise Information Systems), (Portugal), 2004-04-13.

  • Language Resources and Evaluation Conference (LREC2006) u Genovi (Italija), 22-28 May 2006

  • European Semantic Web Conference (ESWC2006) u Budvi (Crna Gora), 11-14 June 2006

  • Language Resources and Evaluation Conference (LREC2008) u Marakešu (Maroko), 26-31 May 2008

In 1995/96 chairperson of the Zagreb linguistic circle.

Since 1997 member of the editorial board of the Suvremena lingvistika (Contemporary linguistics), journal cited in MLA, BL and LLBA.

From 1999 to 2000 member of the Board of CARNet (Croatian Academic Research Network).

From 2000 member of the Lexicographic committee of the Croatian Academy of Sciences and Arts.

Since 2005 member of Field Council for Humanities of the Ministry of Science, Education and Sports of the Republic of Croatia.

Since 2005 member of the Council for Humanities and Social Sciences of the University of Zagreb.

President and one of the founders of Croatian society for language technologies. Member of Croatian philological society, Croatian society for applied linguistics, Slovenian Language Technologies Society, TELRI association, ACL, EAMT, IAMT and Global WordNet Association.

Fields of interest:
computational linguistics, corpus linguistics, computational lexicography, computational morphology, SGML/XML, M(A)T

Programmes (leader):

Projects (leader):

I-projects (leader):


  • (with Milan Moguš and Maja Bratanić) Hrvatski čestotni rječnik (Croatian Frequency Dictionary), Školska knjiga–Institute of linguistics at the Faculty of Humanities and Social Sciences, University of Zagreb, Zagreb 1999. (ISBN 953-0-40012-8)

  • Jezične tehnologije i hrvatski jezik (Language Technologies and Croatian Language), Exlibris, Zagreb 2003. (ISBN 953-6310-19-8)

Important references: (Complete list see at CROSBI)

  • »Zašto nam treba višemilijunski referentni korpus?« (Why Do We Need Multimillion Reference Corpus?), Informatička tehnologija u primijenjenoj lingvistici, Društvo za primijenjenu lingvistiku Hrvatske, Zagreb, 1990, pp. 95-98.

  • »Od korpusa do čestotnoga rječnika hrvatskoga književnog jezika« (From the Corpus to the Croatian Frequency Dictionary), Radovi Zavoda za slavensku filologiju 27, (1991), pp. 169-178. (ISSN 0514-5090)

  • »Problemi računalne obrade imeničnih oblika u hrvatskome« (Problems of Noun Word-form Processing in Croatian), Suvremena lingvistika 34, (1992), pp. 301-308. (ISSN 0586-0296) pdf (in Croatian)

  • Računalna obrada morfologije hrvatskoga književnog jezika (Computational processing of the morphology of the Croatian literary language), Ph.D. thesis, Faculty of Humanities and Social Sciences, University of Zagreb, Zagreb 1994, 160 p. pdf (in Croatian)

  • »Natural Language Procesing of Croatian and the Croatian National Corpus«, Suvremena lingvistika 41-42, (1996), pp. 603-612. (ISSN 0586-0296) (translated from Croatian) rtf  pdf ps

  • »Computational Processing of Croatian Corpora: history, state-of-art and perspectives«, Suvremena lingvistika 43-44, (1997), pp. 387-394. (ISSN 0586-0296) (translated from Croatian) rtf  pdf  ps

  • »Računalni pogled na Šulekov rječnik znanstvenoga nazivlja« (Computational processing of Šulek's Dictionary of Scientific Terms), Zbornik o Bogoslavu Šuleku, HAZU, Zagreb 1998, pp. 149-159. (ISBN 953-154-367-4)

  • »Croatian electronic edition of the Plato's Republic« u: Erjavec, Tomaž–Lawson, Ann–Romary, Laurent: East meets West — A Compendium of Multilingual Resources, TELRI-IDS, Mannheim, 1998. (ISBN 3-922641-46-6)

  • »Raspon, opseg i sastav korpusa suvremenoga hrvatskoga jezika« (Time-span, Size and Composition of the Corpus of Croatian Contemporary Language), Filologija 30-31, (1998), pp. 337-347. (ISSN 0449-363X) rtf  pdf ps

  • »Building the Croatian-English Parallel Corpus«, LREC2000 Proceedings, Athens, 31st May-2nd June 2000, ELRA, Paris-Athens 2000, Vol. I, pp. 523-530. doc  pdf ps

  • »Information Retrieval Meets Human Language Technology«, CUC2000 Proceedings, CD-ROM, Zagreb, 24-26th September 2000, CARNet, Zagreb 2000. (ISBN 953-6802-01-5) doc pdf ps

  • »Uporaba XML-a u hrvatskim korpusima«, CroInfo2000 – Upravljanje informacijama u gospodarstvu i znanosti/Information Management in Economy and Science, Proceedings, Dubrovnik, 16-18th October 2000, Nacionalna i sveučilišna knjižnica-Pliva, Zagreb 2000, pp. 132-137. (ISBN ISBN 953-6000-89-X) doc pdf ps

  • (with Vesna Požgaj-Hadži) »Hrvatsko-slovenski paralelni korpus«, in: Erjavec, Tomaž–Gros, Jerneja (ur.) Jezikovne tehnologije / Language Technologies 2000, proceedings, Ljubljana, 17-18th October 2000, Institut Jožef Stefan, Ljubljana 2000, pp. 70-74. (ISBN 961-6303-25-2) doc pdf ps

  • »Procedures in Building the Croatian-English Parallel Corpus«, International Journal of Corpus Linguistics, special issue, Vol. 0(0), 2001, pp. 107-123. doc pdf ps

  • (with Ivana Simeon) »Building Croatian Language Technologies Portal«, CUC2001 Proceedings, CD-ROM, Zagreb, 24-26. September 2001, CARNet, Zagreb 2001. (ISBN 953-6802-XX-X) doc pdf ps

  • (with Krešimir Šojat) »Identifikacija prijevodnih ekvivalenata u hrvatsko-engleskom paralelnom korpusu« (Identification of Translational Equivalents in Croatian-English Parallel Corpus), Filologija 38-39, (2002), pp. 247-262. (ISSN 0449-363X) doc  pdf ps

  • »Building the Croatian National Corpus«, LREC2002 Proceedings, Las Palmas, 27th May-2nd June 2002, ELRA, Paris-Las Palmas 2002, Vol. II, pp. 441-446. doc  pdf ps

  • (with Tomaž Erjavec, Cvetana Krstev, Vladimir Petkevič, Kiril Simov and Duško Vitas) »The MULTEXT-East Morphosyntactic Specifications for Slavic Languages« in: Proceedings of the EACL2003 Workshop on Morphological Processing of Slavic Languages (Budapest 2003), ACL, pp. 25-32 (ISBN 1-932432-02-7). pdf

  • »Building the Croatian Morphological Lexicon«, in: Proceedings of the EACL2003 Workshop on Morphological Processing of Slavic Languages (Budapest 2003), ACL, pp. 41-46. pdf

  • (with Božo Bekavac) »Preparation of POS tagging of Croatian using CLaRK System« in: Proceeding of RANLP2003 Conference (Borovets 2003), Bulgarian Academy of sciences, pp. 455-459 (ISBN 954-90906-6-3) pdf

  • (with Krešimir Šojat) »Finding Multiword Term Candidates in Croatian« in: Proceedings of IESL2003 Workshop (Borovets 2003), Bulgarian Academy of sciences, pp. 102-107 (ISBN ISBN 954-8361-06-X) pdf

  • (with Sanja Fulgosi and Krešimir Šojat) »The Applicability of Lemmatization in Translational Equivalents Detection« in: Barnbrook, Geoff; Danielsson, Pernilla; Mahlberg, Michaela Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corpora, Continuum Books, London-New York 2004, str. 195-206 (ISBN 082647490X) Book description

  • (s Antonijem Oliverom) »Enlarging the Croatian Morphological Lexicon by Automatic Lexical Acquisition from Raw Corpora«, in: LREC2004 Proceedings, Lisbon, 24-30 May 2004, ELRA, Paris-Lisbon 2004, Vol. IV, pp. 1259-1262 (ISBN 2-9517408-1-6). pdf

  • (s Božom Bekavcem, Petyom Osenovom, Kirilom Simovom) »Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian«, in: LREC2004 Proceedings, Lisbon, 24-30 May 2004, ELRA, Paris-Lisbon 2004, Vol. IV, pp. 1187-1190 (ISBN 2-9517408-1-6). pdf

  • (with Krešimir Šojat and Božo Bekavac) »Zašto nam treba hrvatski WordNet?« (Why do we need Croatian Wordnet?) in: Granić, Jagoda (ed.) Semantika prirodnog jezika i metajezik semantike, HDPL 2004 Proceedings, Zagreb-Split 2005, pp. 733-743 (ISBN 953-96391-6-6). pdf

  • »Developing the Croatian National Corpus and Beyond« in: Grzybek, Peter (ed.) Contributions to the Science of Text and Language. Word Length Studies and Related Issues, Kluwer, Dordrecht 2006, pp. 295-300. (ISBN 1-4020-4067-9) Book description

  • (with Željko Agić) »Evaluating Morphosyntactic Tagging of Croatian Texts« in: LREC2006 Proceedings, Genoa, 23-25 May 2006, ELRA, Genoa-Paris 2006. (ISBN 2-9517408-2-4) pdf

  • (with Božo Bekavac) »Inflectionally Sensitive Web Search in Croatian using Croatian Lemmatization Server« u: Lužar-Stiffler, Vesna & Hljuz Dobrić, Vesna (ur.) Proceedings of ITI2006 Conference, SRCE, Zagreb 2006, pp. 481-486. (ISBN 953-7138-05-4, ISSN 1330-1012) pdf

  • »Croatian Lemmatization Server« in: Koeva, Svetla & Dimitrova-Voulchanova, Mila (eds.) Proceedings of the 5th Formal approaches to South Slavic and Balkan languages Conference (FASSBL2006), Bugarian Academy of Sciences, Sofia 2006, pp. 140-146. pdf

  • »Croatian Dependency Treebank in Multilingual Context« in: Slavcheva, Milena; Angelova, Galia; Simov, Kiril (eds.). Readings in Multilinguality. Selected papers for young researchers, Bugarian Academy of Sciences, Sofia 2006, pp. 125-128. (ISBN 954-91743-6-0) pdf

  • (with Zdravko Dovedan, Ida Raffaelli, Sanja Seljan and Bojana Dalbelo Bašić) »Computational Linguistic Models and Language Technologies for Croatian« uin: Lužar-Stiffler, Vesna & Hljuz Dobrić, Vesna (eds.) Proceedings of ITI2007 Conference, SRCE, Zagreb 2007, pp. 521-528. (ISBN 953-7138-05-4, ISSN 1330-1012) pdf

  • (with Božo Bekavac) »Implementation of Croatian NERC system« in: Piškorski, Jakub; Tanev, Hristo; Pouliquen, Bruno; Steinberger, Ralf (eds.) Proceedings of the Workshop on Balto-Slavonic Natural Language Processing 2007, ACL, Prag 2007, pp. 11-18. (ISBN 978-1-932432-88-6) pdf

  • »Building the Croatian Dependency Treebank: the initial stages«, Suvremena lingvistika 63, (2007) pp. 85-92. (ISSN 0586-0296) pdf

Invited lectures and conferences:

  • »Napredak u radu na Hrvatskom nacionalnom korpusu« (The Progress in Developing Croatian National Corpus), Drugi hrvatski slavistički kongres/2nd Croatian Slavistic Congress, Osijek, 14th-18th September 1999. slides

  • »Hrvatski čestotni rječnik« (Croatian Frequency Dictionary), invited lecture, Faculty of Philosophy, University of Ljubljana, 26th October 1999 slides

  • »Procedures in building Croatian-English parellel corpus«, 4th TELRI seminar, Bratislava, 4th-7th November 1999. slides

  • »Corpus-building projects in Institute of Linguistics, Faculty of Philosophy, Univ. of Zagreb«, invited lecture, University of Tübingen, 1st December1999. slides

  • »Korpusni projekti u Zavodu za lingvistiku Filozofskoga fakulteta Sveučilišta u Zagrebu« (Corpus-building projects in Institute of Linguistics, Faculty of Philosophy, Univ. of Zagreb), Hrvatsko društvo za primijenjenu lingvistiku/Croatian Association for Applied Linguistics, annual conference, Opatija 19-20th May 2000. slides

  • (with Vesnom Požgaj-Hadži) »Slovensko-hrvatski paralelni korpus« (Slovene-Croatian Parallel Corpus), Hrvatsko društvo za primijenjenu lingvistiku/Croatian Association for Applied Linguistics, annual conference, Opatija, 19-20th May 2000. slides

  • (with Krešimir Šojat) »Possibilities of Identification of Translation Equivalents in a Parallel Corpus«, 5th TELRI seminar, Ljubljana, 20-24th September 2000. slides

  • »Encoding Croatian Corpora«, invited lecture, University of Tübingen, 22nd February 2001. slides

  • »Developing Croatian National Corpus and beyond«, Wortlängen in Texten, Karl-Franzens University, Graz, 2002-06-23. slides

  • »Croatian National Corpus and its Tagging«, Slovak National Corpus, L'udovit Štúr Linguistic Institute, Slovak Academy of Sciences, Bratislava, 2003-12-01. slides

  • (with Bojana Dalbelo Bašić) »Computer Aided Document Indexing System (CADIS) with Eurovoc«, EUROVOC Conference 2006, European Parliament, Bruxelles, Belgium, 2006-03-10. slides

  • (with Bojana Dalbelo Bašić and Marie-Francine Moens) »Computer Aided Document Indexing System (CADIS) with Eurovoc«, Workshop "Toegang Tot De Wet", Katholieke Universiteit Leuven, Belgium, 2007-05-22. slides

  • »Language Technologies: a Happy Marriage between Linguistics and Informatics«, European Computer Science Summit - ECSS 2009, Paris, France, 2009-10-09. slides

Did you know that 'cravatte' ('tie') comes from 'Croatian'?

photo by Primož Jakopin

My kids in summer

photo by MT