Bibliography and Useful Links for Data-Driven Language Learning:

The Uses of Concordancing in Advanced Language Learning and Teaching

compiled by Betsy Kerr


I. Print Sources

II. Online sources (current as of 3/1/03)

III. Online text and concordancer sites for French

IV. Online text and concordancer sites for Spanish

V. Online text and concordancer sites for German

VI. English

VII. Other languages


I. Print Sources

***Johns, T. "From printout to handout: Grammar and vocabulary teaching in the context of Data-driven Learning." In T. Odlin, ed., Perspectives on Pedagogical Grammar. New York: Cambridge University Press, 1994. A brief introduction to the subject.

***Tribble, C. and G. Jones. Concordances in the Classroom: A Resource Guide for Teachers. Houston, TX: Athelstan, 1997. A hands-on, example-filled guide for the novice concordancer. (Can be ordered from the Athelstan website, see below.)

Aston, G., ed. Learning with corpora. Houston, TX: Athelstan, 2001. A collection of nine scholarly articles about the pedagogical applications of concordancing and corpora.

Wichmann, A., S. Fligelstone, T. McEnery and G. Knowles, eds. Teaching and Language Corpora. New York: Longman, 1997. A collection of practical articles about various projects invovling the use of corpora in the teaching of various languages and of linguistics.

McEnery, T. and A. Wilson. Corpus Linguistcs. 2nd ed. Edinburgh: Edinburgh University Press, 2001. An introduction to the use of corpora in linguistics proper (not applied linguistics or language teaching).

Biber, D., S. Conrad and R. Reppen. Corpus linguistics: Investigating Language Structure and Use. New York: Cambridge University Press, 1998. Another good introduction to the use of corpora in linguistic research.

Partington, A. Patterns and Meanings: Using Corpora for English Language Research and Teaching. Philadephia, John Benjamins, 1998. Contains a number of case studies making use of coprora and concordance technology, particularly with reference to collocations and other lexical studies.


 II. Online sources (current as of 3/1/03)

Pedagogical Uses of Concordancing

***Hadley, G."Concordancing in Japanese TEFL: Unlocking the power of data-driven learning." A short article about the author's experience using concordancing for a beginning-level course in technological English.

***"Grammar Safari" LinguaCenter Homepage Grammar Safari. Jointly created by Doug Mills and Ann Salzmann of the Intensive English Institute, University of Illinois. A great example of the use of Web-based concordancing for ESL instruction, this clever introduction gives step-by-step instructions for two simple concordancing methods: (1) using the 'Find' function of your Web browser with any online text (for common words or structures), and (2) using any Web browser (e.g. Google, Yahoo) to look for less common items in a series of online texts.

Barlow, M. "Corpus Linguistics." Michael Barlow is a pioneer and the leading American proponent of concordancing. There are many useful links here, including sources of online texts for many different languages.

"Athelstan Online" homepage. Athelstan is a company founded by Michael Barlow. You can order various publications here (see Part I above).

"Athelstan" Concordancing programs for purchase.

Johns, T. "Tim Johns Data-driven Learning Page." More links, from the British guru of concordancing for language learning.

Concordancing in linguistics

"Bookmarks for Corpus-based Linguists" by David Lee A very complete and current site, sueful for language teachers as well as linguists.

"Tutorial: Concordances and Corpora", by Catherine N. Ball, Department of Linguistics, Georgetown University. A nice online introduction to the subject, for the serious student of concordancing.

"Corpus Linguistics" Website to supplement book on corpus linguistics by McEnery & Wilson

"Conc: A concordance generator for the Macintosh" A basic, easy-to-use concordancer, downloadable from SIL (Summer Institute of Linguistics).


III. Online text and concordancer sites for French

Written French texts only:

***'The Compleat Lexical Tutor': UQAM (Université de Québec à Montréal) Web Concordancer. Easy-to-use and fairly versatile concordancer, with a choice of several different texts, including all of Le monde from 1998.

Two easy-to-use concordancers for literary texts:

Balzac, La comédie humaine.

Rabelais (les oeuvres complètes?)

ARTFL. Project for American and French Research on the Treasury of the French Language, University of Chicago. A huge searchable database of texts from 15th-20th century French literature, philosophy, arts, sciences. (U of M students can access this controlled-access site through the U of M Librairies' site.) Can do simple or sophisticated searches, but requires a little learning.

"Québétext" Good concordancer for Quebec literary texts and texts about anglicisms.

Corpus Lexicaux Québécois A more recent compilation of corpora from various Québécois universities, intended primarily for study of Québécois vocabulary.

Spoken French corpora only:

***Minnesota Corpus. This is a corpus of conversation that I recorded and transcribed for research purposes, with funding from the University of Minnesota Graduate School. I have also used the corpus for pedagogical purposes. It is available by e-mail as a Word document by request from Betsy Kerr,

***ELICOP. Etude LInguistique de la COmmunication Parlée, Département de Linguistique, Université Catholique de Louvain (Belgique). On this intorductory page, you'll find a description of the ELICOP project, which includes an online concordancer and extensive transcripts of several oral corpora, notably 80 hours of the Orléans corpus (1968-71). Also smaller samples of role-plays by Flemish-speaking learners of French (27 hrs.), and a very small sample (3-4 hrs.) of the same by Belgian Francophones and French Francophones. There are several different versions of the concordancer, and several different entry points: gives you (in the righthand frame) instructions for using the various concordancers. or both give you access to the basic lexical concordancer. (Be sure to enlarge the upper frame to see it all.) The difference is that the second one displays all results on one page, while the first one gives a list of links on which you must click to see the text of each occurrence. Also, the second one gives you some hints, right on the search page, about using 'regular expressions' ('wild cards'). For more help on using these, go to

The concordancer at allows one to specify the syntactic category of each word in a string of words to be searched for. Easy to use.

Corpus OTG 315 dialogues- 26000 mots -2h d'enregistrement: 315 touristes + 5 receptionnistes- reseignements touristiques (downloadable but requires ZipIt)

Beeching Corpus Downloadable pdf file of 17-hour corpus of interviews. Would have to use 'Find' to do searches, or load into a concordancing software program.

Written texts and oral corpora (French):

Serveur SILFIDE. Server at Université de Rennes with a large collection (currently 62) of diverse texts, predominately written but some oral, with a concordancer. The keyword appears in a whole paragraph (sometimes long!) of text, so it is often not practical to print it directly. Written texts include administrative and legal documents, bandes dessinés, and literary, scientific, historical, and travel texts. Oral texts appear to be predominately transactional telephone conversations (e.g. SNCF).

Tips for using the SILFIDE site: This one is a little tricky to learn (at least it was for me!), but well worth it, given the diversity of texts. Go first to 'Ressources' then 'Langues' to see th list of texts in French. You must first click on the panier (shopping cart) next to each title you wish to search, then go to the bottom frame, scroll to the extreme righthand corner, and click on the larger panier in that corner. This should bring up a page entitled 'Gestion du panier SILFIDE'. Ignore the request for your name and code, and just click on 'Outils' in the menu at the bottom. Then click on either one of the 'Concordances' icons. (The second is more sophisticated--try them both.) At the Concordance, type in your word or phrase, then click on 'Envoyer' or 'Accepter', scrolling down if necessary. You should shortly see a message appear (on the same screen) indicating that 'Le résultat a été affiché dans le navigateur'. To see the results, go to the window behind the current one. Voilà!


IV. Online text and concordancer sites for Spanish

***Corpus del Español This 100 million word corpus of Spanish texts has been funded by the NEH and has been created by Prof. Mark Davies of Illinois State University. In addition to being very fast, the search engine allows a wider range of searches than almost any other large corpus in existence. Texts from 1200s to 1900s, oral and written.

See also Prof. Davies' course site 'Variation in Spanish Syntax'.

Real Academia Española - Corpus de Referencia del Español Actual (CREA) Prof. Davies calls this a 'very nice textual corpus, but very unsearchable'. There are many texts, but the concordancer lacks the versatility of his own site in terms of what you can search for (only exact words and phrases).

Real Academia Española - Corpus Diacrónico del Español (CORDE) It's not clear on inspection what the difference is between this and the preceding site, except that they appear to include different texts.

Base de Datos Sintácticos del español actual (Syntactic Database for modern Spanish A syntactically annoted corpus, including mostly written plus some oral texts.


V. Online text and concordancer sites for German

COSMAS/Mannheimer Corpus Collection This site provides a description (in German) of the copyright-free part of this larger corpus, which is accessible from

European Corpus Initiative Multilingual Corpus I (ECI/MCI) Ordering information for a CD-ROM containing newspaper texts in German, French, and Dutch.

NEGRA corpus version 2 20,602 sentences of German newspaper text, taken from the Frankfurter Rundschau as contained in the preceding (ECI) corpus. Tagged with part-of-speech and completely with syntactic structures.


VI. English

British National Corpus The 'grandmother' of all corpuses!

UQAM (University of Quebec at Montreal) Online Concordancer, English version This is the English equivalent of 'The Compleat Lexcial Tutor' site (see French above).


VII. Other languages

See David Lee's Bookmarks Lists links to concordancing sites for many languages.

Advanced Google Search Allows the user to choose one of many languages and search the entire Web. Also allows the choice of searching only in page titles or only in page contents. (You must clikck on the 'Google Search' button to initiate the search. If you don't see it, scroll to the right.)

Google Advanced Groups Search A similar search engine that searches in postings of Google Groups (groups of people with similar interests who carry on online discussions about a given topic). These can be useful for finding more colloquial language, even if it is written.


Document updated 03/07/03

The views and opinions expressed in this page are strictly those of the page author.
The contents of this page have not been reviewed or approved by the University of Minnesota.