 |
|
In order to follow the links below to access the resources, you need
to be a registered user. Find out about how to register.
See the latest acquisitions.
Here are some shortcuts to the resources listed by language below:
Bulgarian
POS tagged corpus
2460 Bulgarian sentences marked-up with part of speech information (BTB-POS Corpus I). The corpus
is in XML format, non-standard with respect to TEI or CES, DTD is included. Available in three
different encodings of cyrillic letters: ISO 8879:1986, MS Windows, and Unicode.
Resource provider: Kiril Iv. Simov, Linguistic Modelling Laboratory, Bulgarian
Academy of Sciences, Sofia.
Browse the files
Corpus of Bulgarian Texts
Corpus of texts in Bulgarian, representing text genres such as news, legal,
and poetry. Encoded with SGML according to the Corpus Encoding Standard
(CES). Approximately 275 000 words.
The corpus includes the following texts:
-
A selection of newspaper articles from "24 Hours", 1996
-
A selection of newspaper articles from 'Zemedelsko zname' ("Agrarian Flag"),
1996
-
A collection of 42 newspaper articles on Soros' "Open Society" Fund
-
A selection of literary texts: a part of the novel "Love at the Age of
Sclerosis" by Natasha Manolova; a part of the novel "The Big Fraud" by
Vesela Lyutskanova; 11 short stories and a novella by Asen Sirakov; 12
short stories from the book "Cyclops' Eye" by Todor Velchev; a part of
Snezhana Snegovana's novella "The Fiery Violin"
-
A selection of poems from "We Are a Hopeless Case" by Miryana Ba sheva;
a collection of modern Bulgarian love poetry "Love - a Reality of Magic"
(many authors)
-
Zhelyu Zhelev "Fascism" (2 chapters)
-
Polya Goleva "Bulgarian Insurance Law"
-
An unpublished sociological study about Bulgaria
-
Bulgarian Fiction - 2 novels: Emilia Dvoryanova 'PASSION ili smy1rtta na
Alisa' ("Passion or the Death of Alice"), Julia Berberyan 'Iskam, vyarvam,
moga' ("I want, I believe, I can")
-
Newspapers: a few issues of 'Capital' and "Continent' (1996)
Restrictions: not available to industrial users. Please contact
the resource provider to negotiate licensing.
Resource provider: Linguistic Modelling Laboratory, Bulgarian
Academy of Sciences, Sofia, Bulgaria.
Contact
details
Browse or Download
everything (gzipped tar archive).
Bulgarian, English and French parallel translation
texts
MS Word files containing source and target text on alternate lines. There
are 20 files in different language pairs.
Restrictions: Not available to industrial users. Please contact
the resource provider to negotiate licensing.
Resource provider: Linguistic Modelling Laboratory, Bulgarian
Academy of Sciences, Sofia, Bulgaria. Contact
details
Browse or Download
everything (gzipped tar archive).
Translation of Plato's Republic
Available in SGML, plain text and HTML formats, plus alignments with parallel
texts in many languages.
Texts and table
of
alignments.
Croatian
ZAG-ELAN Croatian Corpus
1.87 million word corpus of written Croatian comprising texts from the leading
Croatian daily newspaper Večernji list, encoded with
TEI-conformant SGML.
Resource provider: Marko Tadic, Institute of Linguistics,
Philosophical Faculty, University of Zagreb.
Contact details
Browse or
download everything (gzipped tar file).
Translation of Plato's Republic
Available in SGML, plain text and HTML formats, plus alignments with parallel
texts in many languages.
Resource provider: Marko Tadic, Institute of Linguistics,
Philosophical Faculty, University of Zagreb.
Contact
details
Texts and table
of
alignments.
Czech
Czech ELAN Corpus
Texts from Lidové noviny newspaper from 1994 and various ephemera.
Resource provider: Computational Fund of the Czech Language,
Charles University, Prague, Czech Republic.
Contact
details
Browse or
Download
everything (gzipped tar archive).
Newspaper corpus of Czech
5 million word newspaper corpus of Czech, and other miscellaneous Czech
corpus files.
Resource provider: Computational Fund of the Czech Language,
Charles University, Prague, Czech Republic.
Contact
details
Browse or
Download
everything (gzipped tar archive).
Translation of Plato's Republic
Available in SGML, plain text and HTML formats, plus alignments with parallel
texts in many languages.
Resource provider: Computational Fund of the Czech Language,
Charles University, Prague, Czech Republic. Contact
details
Texts and table
of
alignments.
Dutch
Materiaalverzameling Noord corpus data
300 000 words with TEI conformant markup and POS tagging.
Resource provider: Institute for Dutch Lexicology, Leiden, Netherlands.
Contact
details
Browse or
download
everything (gzipped tar archive).
Jeugdjournaal corpus ("Youth Journal)
September issues of 1992, 1993, 1994 and 1995. Parole TEI conformant markup,
c. 93000 words.
Resource provider: Institute for Dutch Lexicology, Leiden, Netherlands.
Contact
details
Browse or
download
everything (gzipped tar archive).
English
The East African Component of The International Corpus of English (ICE-EA).
Corpus of written and spoken English of Tanzania and Kenya. The files are
available as plain ASCII, as prepared for use with Wordsmith Tools or as
RTF files. See the website and manual listed below for more information.
Resource provider: Josef Schmied, REAL Centre, Department of English
Chemnitz University of Technology.
Visit the ICE-East Africa website for more information and online searches.
Lampeter Corpus of Early Modern English Tracts
See the online manual for further information.
Resource provider: Josef Schmied, REAL Centre, Department of English
Chemnitz University of Technology.
Browse the available files.
EU enlargement corpus
Journalism articles about EU enlargement, c. 600,000 words.
Resource provider: Martin Wynne, Centre for Corpus
Linguistics, Department of English, University of Birmingham.
Contact details.
Access the resources
Free Britain Corpus
A corpus of recent texts written by Eurosceptics about Britain and the
European Union, containing approx. 2 million words.
Resource provider: Wolfgang Teubert, Institut für Deutsche Sprache,
Mannheim, Germany (now University of Birmingham: email teubertw@hhs.bham.ac.uk).
Browse the
corpus files or
download
everything.
Speech, Thought and Writing Presentation Corpus
This is a corpus of modern British English narrative texts. There are approximately
250,000 words, and the texts are 2000 word samples from printed works,
representing news, fiction and biography (including autobiography). Forms
of speech, thought and writing Presentation Corpus have been manually annotated
in the corpus. The annotation scheme is documented in the handbook.
Resource provider: Elena Semino, Mick Short and Martin Wynne
at the Department of Linguistics and Modern English Language, Lancaster
University, Lancaster LA1 4YT.
Contact
details (see also the corpus header for further contact details).
Read the handbook
or
browse the corpus files.
Translation of Plato's Republic
Available in SGML, plain text and HTML formats, plus alignments with parallel
texts in many languages.
Resource provider: Corpus Research, University of Birmingham,
Birmingham, B15 2TT.
Contact
details
Texts and table
of
alignments.
Texts from US Army Center of Military History
Texts about the Gulf War, in HTML format, approx. 2.2 million words.
Resource provider: Institut für Deutsche Sprache, Mannheim, Germany.
Contact
details
Browse or
download
everything (gzipped tar archive).
Texts from US Army Foreign Military Studies Office
HTML format.
Resource provider: Institut für Deutsche Sprache, Mannheim, Germany.
Contact
details
Browse.
Texts from North Atlantic Treaty Organization (also
in French and German)
HTML format.
Resource provider: Institut für Deutsche Sprache, Mannheim, Germany.
Contact
details
Browse.
Texts from European Free Trade Organization (also in
German)
MS Word and HTML files.
Resource provider: Institut für Deutsche Sprache, Mannheim, Germany.
Contact
details
Browse.
Texts from the US Government
Various texts on the subject of defense, in HTML.
Resource provider: Institut für Deutsche Sprache, Mannheim, Germany.
Contact
details
Browse.
English word list
English word list, split into 4 files, 109,582 words long, plain text.
Resource provider: Institut für Deutsche Sprache, Mannheim, Germany.
Contact
details
Browse.
Texts from Deutsche Bundesregierung (also in French and German)
Texts from Deutsche Bundesregierung (German Federal Government), Bonn and
Berlin, Germany, in HTML, plus the Grundgesetz (Constitution) in French
and English as Word documents.
Resource provider: Institut für Deutsche Sprache, Mannheim, Germany.
Contact
details
Browse.
Texts from World Intellectual Property Organization
(also in French)
Intellectual Property and Copyright magazine in French and English
versions, in MS Word files.
Resource provider: Institut für Deutsche Sprache, Mannheim, Germany.
Contact
details
Browse.
Universal Copyright Convention
HTML file, 8000 words in English, 1971.
Resource provider: Institut für Deutsche Sprache, Mannheim, Germany.
Contact
details
Browse.
Articles in English from Le Monde Diplomatique
HTML files.
Resource provider: Institut für Deutsche Sprache, Mannheim,
Germany.
Contact
details
Browse.
Estonian
Corpus of Estonian
1 million word corpus of Estonian, with TEI conformant markup.
Contains 340 files from Estonian journals and books of 1983-1987 (mostly
1985) covering all classes of Universal Decimal Classification with the
exception of fiction. The variability of themes is reflected in the ranked
list of source-journals: 'Sotsialistlik Pollumajandus' ("Socialist Agriculture"),
'Teater. Muusika. Kino', 'Eesti Kommunist', 'Tehnika ja Tootmine' ("Engineering
and Industry"), 'No6ukogude Naine ("Soviet Woman"), 'Eesti Loodus' ("Estonian
Nature"), 'Horisont', 'Looming' ("Creativity") , 'Kunst', 'Kultur ja Elu'
("Culture and Life"), 'No6ukogude o6igus' ("Soviet Justice"), 'Noorus'
("Youth"). Among excerpts from books the most popular themes are geography
of Estonia, Estonian Encyclopedia, legal documents, medicine, agriculture,
biology, sports, economics, religion and linguistics.
Resource provider: Department of Computer Science and Department
of General Linguistics, University of Tartu, Tartu, Estonia.
Contact
details
Browse or
download
everything (gzipped tar archive).
Finnish
Translation of Plato's Republic
Available in SGML, plain text and HTML formats, plus alignments with parallel
texts in many languages.
Resource provider: Anna Mauranen, Savonlinna University, Finland
and Laurent Romary, LORIA, France.
Texts and table
of
alignments.
French
Translation of Plato's Republic
Available in SGML, plain text and HTML formats, plus alignments with parallel
texts in many languages.
Resource provider: Centre of Computational Linguistics, University
Vytauti Magni, Kaunas, Lithuania.
Contact
details
Texts and table
of
alignments.
Texts from the German Embassy in Paris (also in German)
Texts from Centre d'Information et de Documentation de l'Ambassade de
la République Fédérale d'Allemagne, Paris, France in German and French,
in HTML.
Resource provider: Institut für Deutsche Sprache, Mannheim, Germany.
Contact
details
Browse
Texts from Deutsche Bundesregierung (also in English and German)
Texts from Deutsche Bundesregierung (German Federal Government), Bonn and
Berlin, Germany, in HTML, plus the Grundgesetz (Constitution) in French
and English as Word documents.
Resource provider: Institut für Deutsche Sprache, Mannheim, Germany.
Contact
details
Browse.
Texts from Swiss Government (also in German and Italian)
Documents relating to the reform of the federal constitution (all HTML).
Resource provider: Institut für Deutsche Sprache, Mannheim, Germany.
Contact
details
Browse
online.
Texts from North Atlantic Treaty Organization (also
in English and German)
HTML format.
Resource provider: Institut für Deutsche Sprache, Mannheim, Germany.
Contact
details
Browse.
Texts from World Intellectual Property Organization
(also in English)
Intellectual Property and Copyright magazine in French and English
versions, in MS Word files.
Resource provider: Institut für Deutsche Sprache, Mannheim, Germany.
Contact
details
Browse.
German
PAROLE Corpus
Approx. 20 million words, TEI conformant markup, some tagged text.
Browse or
download
(gzipped file).
Texts from Deutsche Bundesregierung (also in English and French)
Texts from Deutsche Bundesregierung (German Federal Government), Bonn and
Berli$ in HTML, plus the Grundgesetz (Constitution) in French and English
as Word docu$
Resource provider: Institut für Deutsche Sprache, Mannheim, Germany.
Contact
details
Browse.
Proceedings from the Deutscher Bundestag
Proceedings of debates in the Deutscher Bundestag, Bonn, Germany (file
encoding not known).
Resource provider: Institut für Deutsche Sprache, Mannheim, Germany.
Contact
details
Browse.
Texts from North Atlantic Treaty Organization (also
in English and French)
HTML format.
Resource provider: Institut für Deutsche Sprache, Mannheim, Germany.
Contact
details
Browse.
Texts from Rheinischer Merkur
Texts from Rheinischer Merkur (German Weekly Newspaper). Sorry, not yet
documented.
Resource provider: Institut für Deutsche Sprache, Mannheim, Germany.
Contact
details
Browse.
Texts from European Free Trade Organization (also in English)
Texts from European Free Trade Organization (EFTA), Geneva, Switzerland
in English and German. Mixture of MS Word and HTML files.
Resource provider: Institut für Deutsche Sprache, Mannheim, Germany.
Contact
details.
Browse.
Texts from Swiss Government (also in French and Italian)
Documents relating to the reform of the federal constitution (all HTML).
Resource provider: Institut für Deutsche Sprache, Mannheim, Germany.
Contact
details
Browse
online.
German Texts of the Gutenberg Archive
Web pages in German.
Resource provider: Projekt Gutenberg - DE
Contact
details.
Browse online.
Greek
Greek government press releases
586 text files, encoded in Windows Greek code page.
Resource provider: Philip King, English For International Students
Unit, Department of English, University of Birmingham.
Contact
details.
Browse the files or
download
everything (gzipped tar file, 3.6 Mb).
Hungarian
Online Corpus of spoken Hungarian
More than 250 files from interviews undertaken for sociolinguistic research.
Transcriptions and digitised sound files.
Resource provider: Research Institute for Linguistics, Hungarian
Academy of Sciences, Budapest, Hungary.
Contact
details
Browse in Hungarian
or in English.
Early 19th century Hungarian poetry
Including the works of Jainos Arany, Sandor Petofia, Ferenc Kocsey and
Mihály Vörösmarty.
Resource provider: Research Institute for Linguistics, Hungarian
Academy of Sciences, Budapest, Hungary. Contact
details
Browse or
download
(gzipped tar file).
Translation of Plato's Republic
Available in SGML, plain text and HTML formats, plus alignments with parallel
texts in many languages.
Resource provider: Research Institute for Linguistics, Hungarian
Academy of Sciences, Budapest, Hungary.
Contact
details
Texts and table
of
alignments.
Italian
Italian literary texts
Italian literary texts, some originals, some translations, in 23 ASCII
text files, approximately 685,000 words.
The following texts are available:
-
Dino Buzzatti (1906-1972) 'Il deserto dei tartari' ("The Desert of Tartars")
-
Two plays by Luigi Pirandello (1867-1936)
-
Lewis Carroll (1832-1898) 'Alice nel paese delle meraviglie' ("Alice's
Adventures in Wonderland"). Translation by Elda Bossi.
-
F. Scott Fitzgerald (1896-1940) "The Great Gatsby"
-
Six popular science texts (translated from English)
-
225 articles in 'La Stampa' newspaper, translated mostly from British ('The
Guardian', 'The Observer,), French ('Le Monde', 'Liberation'), American
('Tne New York Times', 'Los Angeles Times') newspapers. Files stampa1 (for
1991), stampa2 (for 1992), stampa3 (for 1993-94)
-
Five files of translations of French 'bandes dessinées'.
Resource provider: LORIA, Nancy,
France.
Contact
details
Browse or
download
(gzipped tar file).
Texts from Swiss Government (also in French and Italian)
Documents relating to the reform of the federal constitution (all HTML).
Resource provider: Institut für Deutsche Sprache, Mannheim, Germany.
Contact
details
Browse
online.
Latvian
Lithuanian
Samples from a Lithuanian Corpus
Texts of Lithuanian magazines and newspapers. ASCII text files. For information
on the full 56 million word corpus, contact the resource provider.
Resource provider: Centre of Computational Linguistics, University
Vytauti Magni, Kaunas, Lithuania
Contact
details
Browse or
download
everything (gzipped tar archive).
Corpus of Lithuanian Philosophical Texts
Texts of 15 philosophical works. 1.5 million words, Parole-conformant SGML.
The texts available here are the following:
-
Arvydas Sliogeris 'Konservatoriaus ispazintys' ("Confessions of a conservator")
-
Arvydas Sliogeris 'Niekio vardai: Septyni antropotopijos etiudai'
-
Arvydas Sliogeris 'Pamatiniai filosofijos klausimai' ("Fundamentals of philosophy")
-
R. Ozolas 'Issivadavimas' ("Liberation")
-
Aristotelis 'Politika'
-
Viljamas Dzeimsas 'Pragmatizmas' (W.James, 1842-1910)
-
David Hume (1711-1776) 'Zmogaus proto tyrinejimas' ('An Enquiry Concerning
Human Understanding')
-
George H. Sabine & Thomas L. Thorson 'Politiniu teoriju istorija' ('History
of political theories')
-
Lawrence A. Scaff 'Verziantis is gelezinio narvo'
-
Simone de Beauvoir (1908-1986) 'Antroji lytis' ("Le deuxi?me Sexe / The
Second Sex") )
-
Emanuelis Munje 'Personalizmas' (E.Mounier, 1905-1950)
-
Friedrich Wilhelm Joseph Schelling (1775-1854) 'Laisves filosofija' ("Philosophy
of Freedom")
-
Georg Wilhelm Friedrich Hegel (1770-1831) 'Dvasios fenomenologija' ("Phenomenology
of Spirit")
-
Friedrich Nietzsche 'Linksmasis mokslas' ("Die Fröhliche Wissenschaft /
The Gay Science")
-
Janos Kis 'Siuolaikine politine filosofija: antologija' ("Contemporary
political philosophy").
Resource provider: Centre of Computational Linguistics, University
Vytauti Magni, Kaunas, Lithuania
Contact
details
Browse or
download
everything (gzipped tar archive).
Translation of Plato's Republic
Available in SGML, plain text and HTML formats, plus alignments with parallel
texts in many languages.
Texts and table
of
alignments.
Polish
Polish Newspaper Corpus
From the Gazeta Wyborcza newspaper in 1998, 2 million words, Parole-conformant
SGML.
Resource provider: PELCRA, Department of English, Lodz University,
Poland.
Contact
details
Browse or
download
(gzipped tar archive).
The works of Adam Mickiewicz
The poetical works of Adam Mickiewicz. Full texts of Polish national poet
Adam Mickiewicz (1798-1855), including verses, longer poems, dramatic pieces
and 'Pan Tadeusz'.
Resource provider: Computer Fund of the Russian Language, Institute
of Russian Language, Russian Academy of Sciences, Moscow, Russia.
Contact
details.
Browse
Translation of Plato's Republic
Available in SGML, plain text and HTML formats, plus alignments with parallel
texts in many languages.
Texts and table
of
alignments.
Romanian
Orwell's 1984
English and Romanian versions of Orwell's 1984 in an aligned parallel text,
in HTML format. The alignment has been manually checked. This resource
was created for the MULTEXT-EAST project.
Resource provider: Center for Advanced Research in Machine Learning,
NLP and Cognitive Modelling, Academy of Sciences, Bucharest, Romania.
Contact
details
Browse.
Translation of Plato's Republic
Available in SGML, plain text and HTML formats, plus alignments with parallel
texts in many languages.
Resource provider: Dan Tufiş Center for Artificial
Intelligence NLP division Romanian Academy.
Contact
details
Texts and table
of
alignments.
Plato's Republic
Vertical (one word per line) in Romanian with part of speech (POS) annotation
(More documentation is on the way!).
Resource provider: Center for Advanced Research in Machine Learning,
NLP and Cognitive Modelling, Academy of Sciences, Bucharest, Romania.
Contact
details
Browse.
Russian
Computer Fund of the Russian Language
Mirror site for the CFRL collection of Russian Texts, with the following
works:
-
Nikolai V. Gogol (1809-1852). Complete prose. Three collections of tales
and stories: 'Vechera na hutore bliz Dikan'ki' ("Evenings on a Farm near
Dikanka") (1831-32), including "Sorochintcy Fair", "Christmas Night", "A
May Night", "Terrible Revenge", etc; "Mirgorod" (1835) - two variants of
"Taras Bul'ba", "Old Style Landlords", "Viy", "How quarreled Ivan Ivanovich
and Ivan Nikiforovich"; "Peterburgskie povesti" ("Peterburg Tales") (1835-42),
including two variants of "The Portrait", "Nevski Prospect", "The Nose",
"The Greatcoat"; and "Mertvye dushi" ("Dead Souls"), a comic epic.
-
Ivan A. Goncharov (1812-1891) Two novels: 'Oblomov' (1859) and 'Obryv'
("Precipice") (1869).
-
Mihail Yu. Lermontov (1814-1841). Prosaic works: 'Vadim' (1832), a novel;
'Knyaginya Ligovskaya' ("Princess Ligovskaya") (1836); 'Geroy nashego vremeni'
("A Hero of Our Times") (1840), a novel; and two short stories.
-
Ivan S. Turgenev (1818-83). Prosaic works - 7 novels and some shorter pieces:
'Rudin' (1856); 'Dvoryanskoe gnezdo' ("A Nest of Gentlefolk") (1859); 'Nakanune'
("On the Eve") (1860); 'Otcy i deti' ("Fathers and Sons") (1862); 'Dym'
("Smoke") (1867); 'Veshnie Vody' ("Torrents of Spring") (1870); 'Nov' ("Virgin
Soil") (1877). 'Povesti', including "Faust","Asya", "First Love", "Brigadeer",
"A King Lear of the Steppe", "Three meetings", "Clara Milich".
-
Fyodor M. Dostoevsky (1821-1881). Complete prose (33 texts), icluding:
'Bednye lyudi' ("Poor Folk") (1846); 'Dvoinik' ("The Double") (1846); 'Belye
nochi' ("White Nights") (1848), 'Netochka Nezvanova' (1848); 'Dyadyushkin
son' ("Uncle's Dream") (1858); 'Selo Stepanchikovo i ego obitateli' ("The
Village of Stepanciko vo and its Inhabitants") (1858); 'Zapiski iz mertvogo
doma' ("Notes from the House of the Dead") (1860); 'Unizhennye i oskorblennye'
("The Insulted and the Injured") (1861); 'Zapiski iz podpolja' ("Notes
from Underground") (1864); 'Prestuplenie i nakazanie' ("Crime and Punishment")
(1866); 'Igrok' ("Gambler") (1866); "Idiot' (1868); 'Besy' ("The Possessed")
(1872); 'Podrostok' ("A Raw Youth "), 'Bratja Karamazovy' ("The Brothers
Karamazov") (1880).
-
Aleksei F. Pisemski (1821-1881), a novel: 'Vzbalamuchennoe more' ("Troubled
Sea") [115] (1863).
-
Mihail E. Saltykov-Shchedrin (1826-1889), a novel: 'Gospoda Golovlevy'
("The Golovlevs") [87] (1880).
-
Lev N. Tolstoy (1828-1910), the autobiographic trilogy and two greatest
novels: 'Detstvo' ("Childhood") (1852), "Otrochestvo' ("Boyhood") (1854),
'Yunost' ("Youth") (1857); 'Voina i mir' ("War and Peace") (1863-69); 'Anna
Karenina' (1873-77).
-
Nikolai G. Chernyshevski (1828-1889), a novel: 'Chto delat' ("What is to
be done") (1863).
-
Nikolai S. Leskov (1831-1895), novels: 'Nekuda' ("Nowhere") [173] (1864);
'Na nozhah' ("At Daggers Drawn") [229] (1872); 'Soboryane' ("Church Folk")
[93] (1872).
-
Ivan A. Bunin (1870-1953), prosaic works: 'Antonovskie yabloki' ("Antonov
apples") (1900); 'Suchodol' (1912); 'Chasha zhizni' ("The Cup of Life")
(1914); 'Istok' ("Source")
Availability: The CFRL is also available to all researchers in the
former Soviet Union, including non-members of the TUC. Please contact Anatole
Shaikevich (see CFRL contact details below) or the TRACTOR Helpdesk for
a password.
Resource provider: Computer Fund of the Russian Language, Institute
of Russian Language, Russian Academy of Sciences, Moscow, Russia.
Contact
details
Browse the archive.
Translation of Plato's Republic
Available in SGML, plain text and HTML formats, plus alignments with parallel
texts in many languages. Encoded with Cyrillic ISO-8859-5 character set.
Texts and table
of
alignments.
Russian texts on Linguistics
200 000 words.
Resource provider: Department of Computer Science and Applied
Linguistics, Minsk State Linguistic University, Minsk, Belarus.
Contact
details
Download
self-unpacking DOS file.
German-Russian dictionary of computers
German-Russian bilingual dictionary of "computers, informatics and robot
technology", 43500 entries.
Resource provider: Department of Computer Science and Applied
Linguistics, Minsk State Linguistic University, Minsk, Belarus.
Contact
details
Download self-unpacking
DOS files.
English-Russian dictionary of computers
English-Russian bilingual dictionary of terms in "computers, numeric control,
data processing in computer networks, flexible production systems". 43
500 words.
Resource provider: Department of Computer Science and Applied
Linguistics, Minsk State Linguistic University, Minsk, Belarus.
Contact
details
Download self-unpacking
DOS file.
Serbian
Translation of Plato's Republic
Available in SGML, plain text and HTML formats, plus alignments with parallel
texts in many languages.
Texts and table
of
alignments.
News Texts from TANJUG Agency
1.2 million words, from agency reports in the periods September-November
1995 and May-June 1996.
Browse
Newspaper Texts
9 000 words of short news items, and 90 000 words of cultural news from
Vukova Danica.
Browse
Proverbs
More than 6 thousand Serbian proverbs, gathered and published by Vuk Karadjic
- the main founder of Serbian literary language.
Browse or
download
everything (gzipped tar archive, 204 Kb).
Literature texts
140 000 words, texts from 13 authors: Andric, Josic, Kostic, Momcyilo,
Nikol, Pavic, Pekic, Petrovic, Popa, Popov, Savic, Selen and Velma.
Browse or
download
everything (gzipped tar archive, 241 Kb).
Translated Texts
Texts translated into Serbian. 322 000 words.
Browse or
download
everything (gzipped tar archive, 885 Kb).
Textbook texts
Various subjects and levels, 16 texts, 263 000 words.
Browse or
download
everything (gzipped tar archive, 541 Kb).
Legal texts
One text, 6 000 words.
Browse or
download
(gzipped file, 12 Kb).
Electronic morphological dictionary
Browse or
download
(gzipped tar archive, 575 Kb).
Resource provider for all Serbian resources: Faculty of Mathematics,
Belgrade University, Yugoslavia.
Contact
details
Read the documentation
Slovak
Text files in Slovak
30 Raw Text Files in Slovak, one per letter of the Slovak Alphabet. Encoded
in PC Latin 2 (Code Page 852).
Resource provider: Computational Linguistics Laboratory, Comenius
University, Bratislava, Slovakia. Contact
details
Browse or
download
everything (gzipped tar archive).
Translation of Plato's Republic
Available in SGML, plain text and HTML formats, plus alignments with parallel
texts in many languages.
Resource provider: Alexandra Jarošov, Slovak Academy
of Sciences, Bratislava (sasaj@juls.savba.sk) editorship, corrections Vladimir
Benko; Comenius Univeristy, Bratislava (jazybenk@savba.savba.sk).
Contact
details
Texts and table
of
alignments.
Slovene
Slovene fiction
Fiction texts and poetry in HTML format.
Resource provider: Miran Hladnik, Faculty of Philosophy, University of
Ljubljana, Slovenia
Contact details
Browse or
download
(gzipped tar archive).
Parallel corpus
Slovene-English and English-Slovene, covering various domains, 500 000
words, TEI encoding.
Resource provider: the Language and Speech Group, Intelligent
Systems Dept, Jozef Stefan Institute, Ljubljana, Slovenia
Contact
details
Browse
or
download
(gzipped tar archive).
Multext-East corpus
Fiction (100 000 words), newspapers (100 000 words), speech (2 000 words)
and Orwell's 1984 (100 000 words), all with CES encoding.
Resource provider: the Language and Speech Group, Intelligent
Systems Dept, Jozef Stefan Institute, Ljubljana, Slovenia
Contact
details
Browse or
download
(gzipped tar archive).
Newspaper corpus
270 000 words, encoded in TEI-lite.
Resource provider: the Language and Speech Group, Intelligent
Systems Dept, Jozef Stefan Institute, Ljubljana, Slovenia
Contact
details
Browse or
download
(gzipped file, 700Kb).
Kosmac corpus
18 works from 1952-72 by the late (1910-1981) Slovenian writer Ciril Kosmac.
The file contains 18 works, from the period 1952-1972. It is the second
half of his opus and it comes from the Appendix II of the PhD thesis by
Primoz Jakopin.
Resource provider: Institute for Slovene Language "Fran Ramovs",
Slovene Academy for Sciences and Arts, Ljubljana, Slovenia.
Contact
details
Read online or
download
(gzipped file, 430Kb).
Newspaper Texts from 'DELO'
Extracts from the Slovenian daily, DELO, 6th May to 17th June 1997, part
of speech (POS) tagged, 111 000 words, 923kb.
The file is based on excerpts from the leading Slovenian daily newspaper
DELO, which is available on the homepage http://www.delo.si
(Delofax). The file has been prepared by Primoz Jakopin and Aleksandra
Bizjak. Individual numbers of the newspaper are separated by title-lines,
which start with a line of asterisks (*).
Resource provider: Institute for Slovene Language "Fran Ramovs",
Slovene Academy for Sciences and Arts, Ljubljana, Slovenia.
Contact
details
Read online or
download
(gzipped file, 281Kb).
Translation of Plato's Republic
Available in SGML, plain text and HTML formats, plus alignments with parallel
texts in many languages.
Texts and table
of
alignments.
Swedish
Corpus of Swedish Newspaper Texts
CES encoding.
Resource provider: Lexilogik AB. Contact them via their website.
Download the corpus
Newspaper Corpus
1 million words, from the Swedish press in 1965, encoded to Eagles Corpus
Encoding Standard (CES).
Resource provider: Department of Swedish, Gothenburg University,
Sweden.
Contact
details
Read online or
download
(gzipped file, 8.7 Mb).
Turkish
Academic, technical and conference papers
Papers on spelling correction, corpus tagger, ATN grammar, lexical funtional
grammar, spelling checker, morphological specification, PhD thesis proposal
and PhD theses, project plan, etc. All files are compressed postscript.
Resource
provider: Bilkent University, Ankara, Turkey.
Contact
details
Browse or
download
everything (tar archive).
Miscellaneous wordlists
List of Turkish words whose reverses are also valid words in Turkish, and
list of words which are palindromes.
Resource provider: Bilkent University, Ankara, Turkey.
Contact
details
Browse.
Turkish texts
Plain text, approx. 69 000 words.
Resource provider: Samarkand State Institute for Foreign Languages,
Samarkand, Uzbekistan.
Contact
details
Browse or
download
everything (gzipped tar archive, 208 Mb).
Ukrainian
Uzbek
Uzbek texts
Several chapters of the Constitution of the Tamerlan State.
Resource provider: Samarkand State Institute for Foreign Languages,
Samarkand, Uzbekistan.
Contact
details
Browse or
download
everything (gzipped tar archive, 23 Kb).
Multilingual
The following are links to multilingual resources also listed above under
the individual languages:
Email the Tractor helpdesk
for queries regarding accessing and depositing resources. |