TELRI

TRACTOR

TELRI Research Archive of Computational Tools and Resources
TRACTOR Archive
www.tractor.bham.ac.uk

In order to follow the links below to access the resources, you need to be a registered user. Find out about how to register.

See the latest acquisitions.

Here are some shortcuts to the resources listed by language below:
Bulgarian Croatian Czech Dutch English Estonian French
Finnish German Greek Hungarian Italian Latvian Lithuanian
Polish Romanian Russian Serbian Slovak Slovene Swedish
Turkish Ukrainian Uzbek
Multilingual resources


Bulgarian

  • POS tagged corpus

    2460 Bulgarian sentences marked-up with part of speech information (BTB-POS Corpus I). The corpus is in XML format, non-standard with respect to TEI or CES, DTD is included. Available in three different encodings of cyrillic letters: ISO 8879:1986, MS Windows, and Unicode.

    Resource provider: Kiril Iv. Simov, Linguistic Modelling Laboratory, Bulgarian Academy of Sciences, Sofia.

    Browse the files

  • Corpus of Bulgarian Texts

    Corpus of texts in Bulgarian, representing text genres such as news, legal, and poetry. Encoded with SGML according to the Corpus Encoding Standard (CES). Approximately 275 000 words.

    The corpus includes the following texts:

    1. A selection of newspaper articles from "24 Hours", 1996
    2. A selection of newspaper articles from 'Zemedelsko zname' ("Agrarian Flag"), 1996
    3. A collection of 42 newspaper articles on Soros' "Open Society" Fund
    4. A selection of literary texts: a part of the novel "Love at the Age of Sclerosis" by Natasha Manolova; a part of the novel "The Big Fraud" by Vesela Lyutskanova; 11 short stories and a novella by Asen Sirakov; 12 short stories from the book "Cyclops' Eye" by Todor Velchev; a part of Snezhana Snegovana's novella "The Fiery Violin"
    5. A selection of poems from "We Are a Hopeless Case" by Miryana Ba sheva; a collection of modern Bulgarian love poetry "Love - a Reality of Magic" (many authors)
    6. Zhelyu Zhelev "Fascism" (2 chapters)
    7. Polya Goleva "Bulgarian Insurance Law"
    8. An unpublished sociological study about Bulgaria
    9. Bulgarian Fiction - 2 novels: Emilia Dvoryanova 'PASSION ili smy1rtta na Alisa' ("Passion or the Death of Alice"), Julia Berberyan 'Iskam, vyarvam, moga' ("I want, I believe, I can")
    10. Newspapers: a few issues of 'Capital' and "Continent' (1996)
    Restrictions: not available to industrial users. Please contact the resource provider to negotiate licensing. 

    Resource provider: Linguistic Modelling Laboratory, Bulgarian Academy of Sciences, Sofia, Bulgaria. Contact details

    Browse or Download everything (gzipped tar archive). 

  • Bulgarian, English and French parallel translation texts

    MS Word files containing source and target text on alternate lines. There are 20 files in different language pairs.

    Restrictions: Not available to industrial users. Please contact the resource provider to negotiate licensing.

    Resource provider: Linguistic Modelling Laboratory, Bulgarian Academy of Sciences, Sofia, Bulgaria. Contact details

    Browse or Download everything (gzipped tar archive). 

  • Translation of Plato's Republic

    Available in SGML, plain text and HTML formats, plus alignments with parallel texts in many languages.

    Texts and table of alignments.

Croatian

  • ZAG-ELAN Croatian Corpus

    1.87 million word corpus of written Croatian comprising texts from the leading Croatian daily newspaper Večernji list, encoded with TEI-conformant SGML.

    Resource provider: Marko Tadic, Institute of Linguistics, Philosophical Faculty, University of Zagreb. Contact details

    Browse or download everything (gzipped tar file).

  • Translation of Plato's Republic

    Available in SGML, plain text and HTML formats, plus alignments with parallel texts in many languages.

    Resource provider: Marko Tadic, Institute of Linguistics, Philosophical Faculty, University of Zagreb. Contact details

    Texts and table of alignments.

Czech

  • Czech ELAN Corpus

    Texts from Lidové noviny newspaper from 1994 and various ephemera.

    Resource provider: Computational Fund of the Czech Language, Charles University, Prague, Czech Republic. Contact details

    Browse or Download everything (gzipped tar archive).

  • Newspaper corpus of Czech

    5 million word newspaper corpus of Czech, and other miscellaneous Czech corpus files.

    Resource provider: Computational Fund of the Czech Language, Charles University, Prague, Czech Republic. Contact details

    Browse or Download everything (gzipped tar archive).

  • Translation of Plato's Republic

    Available in SGML, plain text and HTML formats, plus alignments with parallel texts in many languages.

    Resource provider: Computational Fund of the Czech Language, Charles University, Prague, Czech Republic. Contact details

    Texts and table of alignments.

Dutch

  • Materiaalverzameling Noord corpus data

    300 000 words with TEI conformant markup and POS tagging.

    Resource provider: Institute for Dutch Lexicology, Leiden, Netherlands. Contact details

    Browse or download everything (gzipped tar archive). 

  • Jeugdjournaal corpus ("Youth Journal)

    September issues of 1992, 1993, 1994 and 1995. Parole TEI conformant markup, c. 93000 words.

    Resource provider: Institute for Dutch Lexicology, Leiden, Netherlands. Contact details

    Browse or download everything (gzipped tar archive). 

English

  • The East African Component of The International Corpus of English (ICE-EA).

    Corpus of written and spoken English of Tanzania and Kenya. The files are available as plain ASCII, as prepared for use with Wordsmith Tools or as RTF files. See the website and manual listed below for more information.

    Resource provider: Josef Schmied, REAL Centre, Department of English Chemnitz University of Technology.

    Visit the ICE-East Africa website for more information and online searches.

  • Lampeter Corpus of Early Modern English Tracts

    See the online manual for further information.

    Resource provider: Josef Schmied, REAL Centre, Department of English Chemnitz University of Technology.

    Browse the available files.

  • EU enlargement corpus

    Journalism articles about EU enlargement, c. 600,000 words.

    Resource provider: Martin Wynne, Centre for Corpus Linguistics, Department of English, University of Birmingham.

    Contact details.

    Access the resources

  • Free Britain Corpus

    A corpus of recent texts written by Eurosceptics about Britain and the European Union, containing approx. 2 million words.

    Resource provider: Wolfgang Teubert, Institut für Deutsche Sprache, Mannheim, Germany (now University of Birmingham: email teubertw@hhs.bham.ac.uk).

    Browse the corpus files or download everything

  • Speech, Thought and Writing Presentation Corpus

    This is a corpus of modern British English narrative texts. There are approximately 250,000 words, and the texts are 2000 word samples from printed works, representing news, fiction and biography (including autobiography). Forms of speech, thought and writing Presentation Corpus have been manually annotated in the corpus. The annotation scheme is documented in the handbook. 

    Resource provider: Elena Semino, Mick Short and Martin Wynne at the Department of Linguistics and Modern English Language, Lancaster University, Lancaster LA1 4YT. Contact details (see also the corpus header for further contact details).

    Read the handbook or browse the corpus files

  • Translation of Plato's Republic

    Available in SGML, plain text and HTML formats, plus alignments with parallel texts in many languages.

    Resource provider: Corpus Research, University of Birmingham, Birmingham, B15 2TT. Contact details

    Texts and table of alignments.

  • Texts from US Army Center of Military History

    Texts about the Gulf War, in HTML format, approx. 2.2 million words.

    Resource provider: Institut für Deutsche Sprache, Mannheim, Germany. Contact details

    Browse or download everything (gzipped tar archive). 

  • Texts from US Army Foreign Military Studies Office

    HTML format.

    Resource provider: Institut für Deutsche Sprache, Mannheim, Germany. Contact details

    Browse

  • Texts from North Atlantic Treaty Organization (also in French and German)

    HTML format.

    Resource provider: Institut für Deutsche Sprache, Mannheim, Germany. Contact details

    Browse

  • Texts from European Free Trade Organization (also in German)

    MS Word and HTML files.

    Resource provider: Institut für Deutsche Sprache, Mannheim, Germany. Contact details

    Browse

  • Texts from the US Government

    Various texts on the subject of defense, in HTML.

    Resource provider: Institut für Deutsche Sprache, Mannheim, Germany. Contact details

    Browse

  • English word list

    English word list, split into 4 files, 109,582 words long, plain text.

    Resource provider: Institut für Deutsche Sprache, Mannheim, Germany. Contact details

    Browse

  • Texts from Deutsche Bundesregierung (also in French and German)

    Texts from Deutsche Bundesregierung (German Federal Government), Bonn and Berlin, Germany, in HTML, plus the Grundgesetz (Constitution) in French and English as Word documents. 

    Resource provider: Institut für Deutsche Sprache, Mannheim, Germany. Contact details

    Browse

  • Texts from World Intellectual Property Organization (also in French)

    Intellectual Property and Copyright magazine in French and English versions, in MS Word files. 

    Resource provider: Institut für Deutsche Sprache, Mannheim, Germany. Contact details

    Browse

  • Universal Copyright Convention

    HTML file, 8000 words in English, 1971.

    Resource provider: Institut für Deutsche Sprache, Mannheim, Germany. Contact details

    Browse

  • Articles in English from Le Monde Diplomatique

    HTML files. Resource provider: Institut für Deutsche Sprache, Mannheim, Germany. Contact details

    Browse

Estonian

  • Corpus of Estonian

    1 million word corpus of Estonian, with TEI conformant markup.

    Contains 340 files from Estonian journals and books of 1983-1987 (mostly 1985) covering all classes of Universal Decimal Classification with the exception of fiction. The variability of themes is reflected in the ranked list of source-journals: 'Sotsialistlik Pollumajandus' ("Socialist Agriculture"), 'Teater. Muusika. Kino', 'Eesti Kommunist', 'Tehnika ja Tootmine' ("Engineering and Industry"), 'No6ukogude Naine ("Soviet Woman"), 'Eesti Loodus' ("Estonian Nature"), 'Horisont', 'Looming' ("Creativity") , 'Kunst', 'Kultur ja Elu' ("Culture and Life"), 'No6ukogude o6igus' ("Soviet Justice"), 'Noorus' ("Youth"). Among excerpts from books the most popular themes are geography of Estonia, Estonian Encyclopedia, legal documents, medicine, agriculture, biology, sports, economics, religion and linguistics.

    Resource provider: Department of Computer Science and Department of General Linguistics, University of Tartu, Tartu, Estonia. Contact details

    Browse or download everything (gzipped tar archive).

Finnish

  • Finish poetry

    A complete collection of poems by Aleksis Kivi in one ASCII file, its KWIC concordance, and frequency list.

    Resource provider: Kimmo Kettunen. Contact details

    Browse or download everything (gzipped tar archive).

  • Databases of words

    A database of Finnish word forms with category information (17,000 words) and a list of basic verb forms (2,000 words) in ASCII files.

    Resource provider: Kimmo Kettunen. Contact details

    Download 17000.zip or 2000verbs.zip

  • Translation of Plato's Republic

    Available in SGML, plain text and HTML formats, plus alignments with parallel texts in many languages.

    Resource provider: Anna Mauranen, Savonlinna University, Finland and Laurent Romary, LORIA, France.

    Texts and table of alignments

French

  • Translation of Plato's Republic

    Available in SGML, plain text and HTML formats, plus alignments with parallel texts in many languages.

    Resource provider: Centre of Computational Linguistics, University Vytauti Magni, Kaunas, Lithuania. Contact details

    Texts and table of alignments

  • Texts from the German Embassy in Paris (also in German)

    Texts from Centre d'Information et de Documentation de l'Ambassade de la République Fédérale d'Allemagne, Paris, France in German and French, in HTML.

    Resource provider: Institut für Deutsche Sprache, Mannheim, Germany. Contact details

    Browse

  • Texts from Deutsche Bundesregierung (also in English and German)

    Texts from Deutsche Bundesregierung (German Federal Government), Bonn and Berlin, Germany, in HTML, plus the Grundgesetz (Constitution) in French and English as Word documents. 

    Resource provider: Institut für Deutsche Sprache, Mannheim, Germany. Contact details

    Browse

  • Texts from Swiss Government (also in German and Italian)

    Documents relating to the reform of the federal constitution (all HTML).

    Resource provider: Institut für Deutsche Sprache, Mannheim, Germany. Contact details

    Browse online

  • Texts from North Atlantic Treaty Organization (also in English and German)

    HTML format.

    Resource provider: Institut für Deutsche Sprache, Mannheim, Germany. Contact details

    Browse

  • Texts from World Intellectual Property Organization (also in English)

    Intellectual Property and Copyright magazine in French and English versions, in MS Word files.

    Resource provider: Institut für Deutsche Sprache, Mannheim, Germany. Contact details

    Browse.

German

  • PAROLE Corpus

    Approx. 20 million words, TEI conformant markup, some tagged text.

    Browse or download (gzipped file). 

  • Texts from Deutsche Bundesregierung (also in English and French)

    Texts from Deutsche Bundesregierung (German Federal Government), Bonn and Berli$ in HTML, plus the Grundgesetz (Constitution) in French and English as Word docu$

    Resource provider: Institut für Deutsche Sprache, Mannheim, Germany. Contact details

    Browse

  • Proceedings from the Deutscher Bundestag

    Proceedings of debates in the Deutscher Bundestag, Bonn, Germany (file encoding not known).

    Resource provider: Institut für Deutsche Sprache, Mannheim, Germany. Contact details

    Browse

  • Texts from North Atlantic Treaty Organization (also in English and French)

    HTML format.

    Resource provider: Institut für Deutsche Sprache, Mannheim, Germany. Contact details

    Browse

  • Texts from Rheinischer Merkur

    Texts from Rheinischer Merkur (German Weekly Newspaper). Sorry, not yet documented.

    Resource provider: Institut für Deutsche Sprache, Mannheim, Germany. Contact details

    Browse

  • Texts from European Free Trade Organization (also in English)

    Texts from European Free Trade Organization (EFTA), Geneva, Switzerland in English and German. Mixture of MS Word and HTML files.

    Resource provider: Institut für Deutsche Sprache, Mannheim, Germany. Contact details.

    Browse

  • Texts from Swiss Government (also in French and Italian)

    Documents relating to the reform of the federal constitution (all HTML).

    Resource provider: Institut für Deutsche Sprache, Mannheim, Germany. Contact details

    Browse online

  • German Texts of the Gutenberg Archive

    Web pages in German.

    Resource provider: Projekt Gutenberg - DE Contact details.

    Browse online.

Greek

  • Greek government press releases

    586 text files, encoded in Windows Greek code page.

    Resource provider: Philip King, English For International Students Unit, Department of English, University of Birmingham. Contact details.

    Browse the files or download everything (gzipped tar file, 3.6 Mb).

Hungarian

  • Online Corpus of spoken Hungarian

    More than 250 files from interviews undertaken for sociolinguistic research. Transcriptions and digitised sound files.

    Resource provider: Research Institute for Linguistics, Hungarian Academy of Sciences, Budapest, Hungary. Contact details

    Browse in Hungarian or in English

  • Early 19th century Hungarian poetry

    Including the works of Jainos Arany, Sandor Petofia, Ferenc Kocsey and Mihály Vör&oumlsmarty.

    Resource provider: Research Institute for Linguistics, Hungarian Academy of Sciences, Budapest, Hungary. Contact details

    Browse or download (gzipped tar file). 

  • Translation of Plato's Republic

    Available in SGML, plain text and HTML formats, plus alignments with parallel texts in many languages.

    Resource provider: Research Institute for Linguistics, Hungarian Academy of Sciences, Budapest, Hungary. Contact details

    Texts and table of alignments

Italian

  • Italian newspapers

    Newspaper texts of La Repubblica (26m words) and La Stampa (11m words) in ASCII files.

    Resource provider: Francois Esvan.

    Browse or download.

  • Italian Reverse lexicon

    Italian Reverse lexicon of 101,655 words in RTF and ASCII files.

    Resource provider: Rodolfo Delmonte. Contact details

    Browse the files.

  • Italian literary texts

    Italian literary texts, some originals, some translations, in 23 ASCII text files, approximately 685,000 words.

    The following texts are available:

    1. Dino Buzzatti (1906-1972) 'Il deserto dei tartari' ("The Desert of Tartars")
    2. Two plays by Luigi Pirandello (1867-1936)
    3. Lewis Carroll (1832-1898) 'Alice nel paese delle meraviglie' ("Alice's Adventures in Wonderland"). Translation by Elda Bossi.
    4. F. Scott Fitzgerald (1896-1940) "The Great Gatsby"
    5. Six popular science texts (translated from English)
    6. 225 articles in 'La Stampa' newspaper, translated mostly from British ('The Guardian', 'The Observer,), French ('Le Monde', 'Liberation'), American ('Tne New York Times', 'Los Angeles Times') newspapers. Files stampa1 (for 1991), stampa2 (for 1992), stampa3 (for 1993-94)
    7. Five files of translations of French 'bandes dessinées'.
    Resource provider: LORIA, Nancy, France. Contact details

    Browse or download (gzipped tar file). 

  • Texts from Swiss Government (also in French and Italian)

    Documents relating to the reform of the federal constitution (all HTML).

    Resource provider: Institut für Deutsche Sprache, Mannheim, Germany. Contact details

    Browse online

Latvian

Lithuanian

  • Samples from a Lithuanian Corpus

    Texts of Lithuanian magazines and newspapers. ASCII text files. For information on the full 56 million word corpus, contact the resource provider. Resource provider: Centre of Computational Linguistics, University Vytauti Magni, Kaunas, Lithuania Contact details

    Browse or download everything (gzipped tar archive). 

  • Corpus of Lithuanian Philosophical Texts

    Texts of 15 philosophical works. 1.5 million words, Parole-conformant SGML.

    The texts available here are the following:

    1. Arvydas Sliogeris 'Konservatoriaus ispazintys' ("Confessions of a conservator")
    2. Arvydas Sliogeris 'Niekio vardai: Septyni antropotopijos etiudai'
    3. Arvydas Sliogeris 'Pamatiniai filosofijos klausimai' ("Fundamentals of philosophy")
    4. R. Ozolas 'Issivadavimas' ("Liberation")
    5. Aristotelis 'Politika'
    6. Viljamas Dzeimsas 'Pragmatizmas' (W.James, 1842-1910)
    7. David Hume (1711-1776) 'Zmogaus proto tyrinejimas' ('An Enquiry Concerning Human Understanding') 
    8. George H. Sabine & Thomas L. Thorson 'Politiniu teoriju istorija' ('History of political theories')
    9. Lawrence A. Scaff 'Verziantis is gelezinio narvo'
    10. Simone de Beauvoir (1908-1986) 'Antroji lytis' ("Le deuxi?me Sexe / The Second Sex") )
    11. Emanuelis Munje 'Personalizmas' (E.Mounier, 1905-1950)
    12. Friedrich Wilhelm Joseph Schelling (1775-1854) 'Laisves filosofija' ("Philosophy of Freedom")
    13. Georg Wilhelm Friedrich Hegel (1770-1831) 'Dvasios fenomenologija' ("Phenomenology of Spirit")
    14. Friedrich Nietzsche 'Linksmasis mokslas' ("Die Fröhliche Wissenschaft / The Gay Science")
    15. Janos Kis 'Siuolaikine politine filosofija: antologija' ("Contemporary political philosophy").

    Resource provider: Centre of Computational Linguistics, University Vytauti Magni, Kaunas, Lithuania Contact details

    Browse or download everything (gzipped tar archive). 

  • Translation of Plato's Republic

    Available in SGML, plain text and HTML formats, plus alignments with parallel texts in many languages.

    Texts and table of alignments

Polish

  • Polish Newspaper Corpus

    From the Gazeta Wyborcza newspaper in 1998, 2 million words, Parole-conformant SGML.

    Resource provider: PELCRA, Department of English, Lodz University, Poland. Contact details

    Browse or download (gzipped tar archive). 

  • The works of Adam Mickiewicz

    The poetical works of Adam Mickiewicz. Full texts of Polish national poet Adam Mickiewicz (1798-1855), including verses, longer poems, dramatic pieces and 'Pan Tadeusz'.

    Resource provider: Computer Fund of the Russian Language, Institute of Russian Language, Russian Academy of Sciences, Moscow, Russia. Contact details.

    Browse

  • Translation of Plato's Republic

    Available in SGML, plain text and HTML formats, plus alignments with parallel texts in many languages.

    Texts and table of alignments

Romanian

  • Orwell's 1984

    English and Romanian versions of Orwell's 1984 in an aligned parallel text, in HTML format. The alignment has been manually checked. This resource was created for the MULTEXT-EAST project.

    Resource provider: Center for Advanced Research in Machine Learning, NLP and Cognitive Modelling, Academy of Sciences, Bucharest, Romania. Contact details

    Browse

  • Translation of Plato's Republic

    Available in SGML, plain text and HTML formats, plus alignments with parallel texts in many languages.

    Resource provider: Dan Tufiş Center for Artificial Intelligence NLP division Romanian Academy. Contact details

    Texts and table of alignments

  • Plato's Republic

    Vertical (one word per line) in Romanian with part of speech (POS) annotation (More documentation is on the way!).

    Resource provider: Center for Advanced Research in Machine Learning, NLP and Cognitive Modelling, Academy of Sciences, Bucharest, Romania. Contact details

    Browse.

Russian

  • Database of Russian Proper names

    The database consists of geographic names, person names, diminutives, patronymics and last names (11501 normalized entry words (lemmas) plus inflexion paradigms that procreate 163462 words (inflections)).

    Resource provider: Serge A. Yablonsky, Russicon Company. Contact details

    Browse.

  • Computer Fund of the Russian Language

    Mirror site for the CFRL collection of Russian Texts, with the following works:
    1. Nikolai V. Gogol (1809-1852). Complete prose. Three collections of tales and stories: 'Vechera na hutore bliz Dikan'ki' ("Evenings on a Farm near Dikanka") (1831-32), including "Sorochintcy Fair", "Christmas Night", "A May Night", "Terrible Revenge", etc; "Mirgorod" (1835) - two variants of "Taras Bul'ba", "Old Style Landlords", "Viy", "How quarreled Ivan Ivanovich and Ivan Nikiforovich"; "Peterburgskie povesti" ("Peterburg Tales") (1835-42), including two variants of "The Portrait", "Nevski Prospect", "The Nose", "The Greatcoat"; and "Mertvye dushi" ("Dead Souls"), a comic epic.
    2. Ivan A. Goncharov (1812-1891) Two novels: 'Oblomov' (1859) and 'Obryv' ("Precipice") (1869).
    3. Mihail Yu. Lermontov (1814-1841). Prosaic works: 'Vadim' (1832), a novel; 'Knyaginya Ligovskaya' ("Princess Ligovskaya") (1836); 'Geroy nashego vremeni' ("A Hero of Our Times") (1840), a novel; and two short stories.
    4. Ivan S. Turgenev (1818-83). Prosaic works - 7 novels and some shorter pieces: 'Rudin' (1856); 'Dvoryanskoe gnezdo' ("A Nest of Gentlefolk") (1859); 'Nakanune' ("On the Eve") (1860); 'Otcy i deti' ("Fathers and Sons") (1862); 'Dym' ("Smoke") (1867); 'Veshnie Vody' ("Torrents of Spring") (1870); 'Nov' ("Virgin Soil") (1877). 'Povesti', including "Faust","Asya", "First Love", "Brigadeer", "A King Lear of the Steppe", "Three meetings", "Clara Milich".
    5. Fyodor M. Dostoevsky (1821-1881). Complete prose (33 texts), icluding: 'Bednye lyudi' ("Poor Folk") (1846); 'Dvoinik' ("The Double") (1846); 'Belye nochi' ("White Nights") (1848), 'Netochka Nezvanova' (1848); 'Dyadyushkin son' ("Uncle's Dream") (1858); 'Selo Stepanchikovo i ego obitateli' ("The Village of Stepanciko vo and its Inhabitants") (1858); 'Zapiski iz mertvogo doma' ("Notes from the House of the Dead") (1860); 'Unizhennye i oskorblennye' ("The Insulted and the Injured") (1861); 'Zapiski iz podpolja' ("Notes from Underground") (1864); 'Prestuplenie i nakazanie' ("Crime and Punishment") (1866); 'Igrok' ("Gambler") (1866); "Idiot' (1868); 'Besy' ("The Possessed") (1872); 'Podrostok' ("A Raw Youth "), 'Bratja Karamazovy' ("The Brothers Karamazov") (1880).
    6. Aleksei F. Pisemski (1821-1881), a novel: 'Vzbalamuchennoe more' ("Troubled Sea") [115] (1863).
    7. Mihail E. Saltykov-Shchedrin (1826-1889), a novel: 'Gospoda Golovlevy' ("The Golovlevs") [87] (1880).
    8. Lev N. Tolstoy (1828-1910), the autobiographic trilogy and two greatest novels: 'Detstvo' ("Childhood") (1852), "Otrochestvo' ("Boyhood") (1854), 'Yunost' ("Youth") (1857); 'Voina i mir' ("War and Peace") (1863-69); 'Anna Karenina' (1873-77).
    9. Nikolai G. Chernyshevski (1828-1889), a novel: 'Chto delat' ("What is to be done") (1863).
    10. Nikolai S. Leskov (1831-1895), novels: 'Nekuda' ("Nowhere") [173] (1864); 'Na nozhah' ("At Daggers Drawn") [229] (1872); 'Soboryane' ("Church Folk") [93] (1872).
    11. Ivan A. Bunin (1870-1953), prosaic works: 'Antonovskie yabloki' ("Antonov apples") (1900); 'Suchodol' (1912); 'Chasha zhizni' ("The Cup of Life") (1914); 'Istok' ("Source")
    Availability: The CFRL is also available to all researchers in the former Soviet Union, including non-members of the TUC. Please contact Anatole Shaikevich (see CFRL contact details below) or the TRACTOR Helpdesk for a password.

    Resource provider: Computer Fund of the Russian Language, Institute of Russian Language, Russian Academy of Sciences, Moscow, Russia. Contact details

    Browse the archive. 

  • Translation of Plato's Republic

    Available in SGML, plain text and HTML formats, plus alignments with parallel texts in many languages. Encoded with Cyrillic ISO-8859-5 character set. 

    Texts and table of alignments

  • Russian texts on Linguistics

    200 000 words.

    Resource provider: Department of Computer Science and Applied Linguistics, Minsk State Linguistic University, Minsk, Belarus. Contact details

    Download self-unpacking DOS file. 

  • German-Russian dictionary of computers

    German-Russian bilingual dictionary of "computers, informatics and robot technology", 43500 entries.

    Resource provider: Department of Computer Science and Applied Linguistics, Minsk State Linguistic University, Minsk, Belarus. Contact details

    Download self-unpacking DOS files. 

  • English-Russian dictionary of computers

    English-Russian bilingual dictionary of terms in "computers, numeric control, data processing in computer networks, flexible production systems". 43 500 words.

    Resource provider: Department of Computer Science and Applied Linguistics, Minsk State Linguistic University, Minsk, Belarus. Contact details

    Download self-unpacking DOS file. 

Serbian

  • Translation of Plato's Republic

    Available in SGML, plain text and HTML formats, plus alignments with parallel texts in many languages.

    Texts and table of alignments

  • News Texts from TANJUG Agency

    1.2 million words, from agency reports in the periods September-November 1995 and May-June 1996.

    Browse

  • Newspaper Texts

    9 000 words of short news items, and 90 000 words of cultural news from Vukova Danica.

    Browse

  • Proverbs

    More than 6 thousand Serbian proverbs, gathered and published by Vuk Karadjic - the main founder of Serbian literary language.

    Browse or download everything (gzipped tar archive, 204 Kb). 

  • Literature texts

    140 000 words, texts from 13 authors: Andric, Josic, Kostic, Momcyilo, Nikol, Pavic, Pekic, Petrovic, Popa, Popov, Savic, Selen and Velma.

    Browse or download everything (gzipped tar archive, 241 Kb). 

  • Translated Texts

    Texts translated into Serbian. 322 000 words.

    Browse or download everything (gzipped tar archive, 885 Kb). 

  • Textbook texts

    Various subjects and levels, 16 texts, 263 000 words.

    Browse or download everything (gzipped tar archive, 541 Kb). 

  • Legal texts

    One text, 6 000 words.

    Browse or download (gzipped file, 12 Kb). 

  • Electronic morphological dictionary

    Browse or download (gzipped tar archive, 575 Kb). 
Resource provider for all Serbian resources: Faculty of Mathematics, Belgrade University, Yugoslavia. Contact details

Read the documentation

Slovak

  • Text files in Slovak

    30 Raw Text Files in Slovak, one per letter of the Slovak Alphabet. Encoded in PC Latin 2 (Code Page 852). 

    Resource provider: Computational Linguistics Laboratory, Comenius University, Bratislava, Slovakia. Contact details

    Browse or download everything (gzipped tar archive). 

  • Translation of Plato's Republic

    Available in SGML, plain text and HTML formats, plus alignments with parallel texts in many languages.

    Resource provider: Alexandra Jarošov, Slovak Academy of Sciences, Bratislava (sasaj@juls.savba.sk) editorship, corrections Vladimir Benko; Comenius Univeristy, Bratislava (jazybenk@savba.savba.sk). Contact details

    Texts and table of alignments.

Slovene

  • Slovene fiction

    Fiction texts and poetry in HTML format.

    Resource provider: Miran Hladnik, Faculty of Philosophy, University of Ljubljana, Slovenia Contact details

    Browse or download (gzipped tar archive).

  • Parallel corpus

    Slovene-English and English-Slovene, covering various domains, 500 000 words, TEI encoding.

    Resource provider: the Language and Speech Group, Intelligent Systems Dept, Jozef Stefan Institute, Ljubljana, Slovenia Contact details

    Browse or download (gzipped tar archive).

  • Multext-East corpus

    Fiction (100 000 words), newspapers (100 000 words), speech (2 000 words) and Orwell's 1984 (100 000 words), all with CES encoding.

    Resource provider: the Language and Speech Group, Intelligent Systems Dept, Jozef Stefan Institute, Ljubljana, Slovenia Contact details

    Browse or download (gzipped tar archive).

  • Newspaper corpus

    270 000 words, encoded in TEI-lite.

    Resource provider: the Language and Speech Group, Intelligent Systems Dept, Jozef Stefan Institute, Ljubljana, Slovenia Contact details

    Browse or download (gzipped file, 700Kb).

  • Kosmac corpus

    18 works from 1952-72 by the late (1910-1981) Slovenian writer Ciril Kosmac. The file contains 18 works, from the period 1952-1972. It is the second half of his opus and it comes from the Appendix II of the PhD thesis by Primoz Jakopin.

    Resource provider: Institute for Slovene Language "Fran Ramovs", Slovene Academy for Sciences and Arts, Ljubljana, Slovenia. Contact details

    Read online or download (gzipped file, 430Kb). 

  • Newspaper Texts from 'DELO'

    Extracts from the Slovenian daily, DELO, 6th May to 17th June 1997, part of speech (POS) tagged, 111 000 words, 923kb. 

    The file is based on excerpts from the leading Slovenian daily newspaper DELO, which is available on the homepage http://www.delo.si (Delofax). The file has been prepared by Primoz Jakopin and Aleksandra Bizjak. Individual numbers of the newspaper are separated by title-lines, which start with a line of asterisks (*). 

    Resource provider: Institute for Slovene Language "Fran Ramovs", Slovene Academy for Sciences and Arts, Ljubljana, Slovenia. Contact details

    Read online or download (gzipped file, 281Kb). 

  • Translation of Plato's Republic

    Available in SGML, plain text and HTML formats, plus alignments with parallel texts in many languages.

    Texts and table of alignments

Swedish

  • Corpus of Swedish Newspaper Texts

    CES encoding.

    Resource provider: Lexilogik AB. Contact them via their website.

    Download the corpus

  • Newspaper Corpus

    1 million words, from the Swedish press in 1965, encoded to Eagles Corpus Encoding Standard (CES).

    Resource provider: Department of Swedish, Gothenburg University, Sweden. Contact details

    Read online or download (gzipped file, 8.7 Mb).

Turkish

  • Academic, technical and conference papers

    Papers on spelling correction, corpus tagger, ATN grammar, lexical funtional grammar, spelling checker, morphological specification, PhD thesis proposal and PhD theses, project plan, etc. All files are compressed postscript. Resource provider: Bilkent University, Ankara, Turkey. Contact details

    Browse or download everything (tar archive). 

  • Miscellaneous wordlists

    List of Turkish words whose reverses are also valid words in Turkish, and list of words which are palindromes.

    Resource provider: Bilkent University, Ankara, Turkey. Contact details

    Browse

  • Turkish texts

    Plain text, approx. 69 000 words.

    Resource provider: Samarkand State Institute for Foreign Languages, Samarkand, Uzbekistan. Contact details

    Browse or download everything (gzipped tar archive, 208 Mb).

Ukrainian

  • Corpus

    500,000 words, representing a variety of genres.

    Texts include:

    1. Constitution of the Ukraine
    2. Ukrainian Fairy Tales
    3. Ukrainian national writer Ivan Franko (1856-1916) 'Zahar Berkut' (an historic tale)
    4. Radiy Radutny - a contemporary writer: stories "People and dawns" and 6 short stories
    5. Oksana Zabuzhko "Explorations in Ukrainian Sex" (short novel)
    6. Andriy Okara - a short story
    7. James Fenimore Cooper (1789-1851) "The Prairie" (translation from English) Isaac Asimov (1920-1992) 'Ya, Robot' (translation from English)
    8. Stanislaw Lem (b. 1921) "Eden" (translation from Polish)
    9. Stanislaw Lem (b. 1921) 'Solaris' (translation from Polish)
    10. UNO Declaration on human rights (translation)
    11. Copyright Convention of 1952 (translation)
    Resource provider: Macbride Trading Corporation, Severodonetsk, Ukraine. Contact details coming soon.

    Browse reports and documentation (NB lexicon not available) or download everything (gzipped tar archive).

Uzbek

  • Uzbek texts

    Several chapters of the Constitution of the Tamerlan State.

    Resource provider: Samarkand State Institute for Foreign Languages, Samarkand, Uzbekistan. Contact details

    Browse or download everything (gzipped tar archive, 23 Kb).

Multilingual

The following are links to multilingual resources also listed above under the individual languages: 
Email the Tractor helpdesk for queries regarding accessing and depositing resources.