The joys of analyzing aggregate linguistic data


The joys of analyzing aggregate linguistic data
The joys of analyzing aggregate linguistic data
Benedikt Szmrecsanyi, University of Freiburg
1. Introduction
2. Methodology
3. Analytic vs synthetic coding of grammatical information: text-type variability
4. Corpus-based dialectometry
5. Probabilistic grammars in aggregate comparison
6. Conclusion
RuG/L04: software for dialectometrics and cartography. Peter Kleiweg. Available online at:
SplitsTree4. Daniel Huson and David Bryant. Available online at:
Secondary Sources
Biber, Douglas. 2003. Compressed noun-phrase structure in newspaper discourse: the competing demands of
popularization vs. economy. In New Media Language, eds. J. Aitchison and D.M. Lewis, 169-181. London/New
York: Longman.
Danchev, Andrei. 1992. The evidence for analytic and synthetic developments in English. In History of Englishes: New
Methods and Interpretations in Historical Linguistics, eds. Matti Rissanen, Ossi Ihalainen, Terttu Nevalainen
and Irma Taavitsainen, 25-41. Berlin, New York: Mouton de Gruyter.
Goebl, Hans. 1982. Dialektometrie: Prinzipien und Methoden des Einsatzes der Numerischen Taxonomie im Bereich der
Dialektgeographie. Wien: Österreichische Akademie der Wissenschaften.
Greenberg, Joseph H. 1960. A quantitative approach to the morphological typology of language. International Journal of
American Linguistics 26:178-194.
Hernández, Nuria. 2006. User's Guide to FRED. Available online:
Freiburg: English Dialects Research Group.
Hinrichs, Lars, and Szmrecsanyi, Benedikt. 2007. Recent changes in the function and frequency of Standard English
genitive constructions: a multivariate analysis of tagged corpora. English Language and Linguistics 11:437–
Humboldt, Wilhelm von. 1836. Über die Verschiedenheit des menschlichen Sprachbaues und ihren Einfluss auf die
geistige Entwicklung des Menschengeschlechts. Berlin: Dümmler.
Kruskal, Joseph B., and Wish, Myron. 1978. Multidimensional Scaling: Quantitative Applications in the Social Sciences.
Newbury Park, London, New Delhi: Sage Publications.
Mair, Christian. 2006. Twentieth-century English: History, Variation, and Standardization. Cambridge: Cambridge
University Press.
Nerbonne, John. 2006. Identifying Linguistic Structure in Aggregate Comparison. Literary and Linguistic Computing
Pampel, Fred. 2000. Logistic Regression. A Primer: Quantitative Applications in the Social Sciences. Thousand Oaks:
Sage Publications.
Schlegel, August Wilhelm von. 1846. Oevres de M. Auguste Guillaume de Schlegel écrites en français. Leipzig:
Séguy, Jean. 1971. La relation entre la distance spatiale et la distance lexicale. Revue de Linguistique Romane 35:335357.
Szmrecsanyi, Benedikt. Submitted. “Typological parameters of intralingual variability: grammatical analyticity vs.
syntheticity in varieties of English”. Available online:
Szmrecsanyi, Benedikt, and Hernández, Nuria. 2007. Manual of Information to accompany the Freiburg Corpus of
English Dialects Sampler ("FRED-S"). Available online:
Freiburg: English Dialects Research Group.
Szmrecsanyi, Benedikt, and Hinrichs, Lars. 2008. Probabilistic determinants of genitive variation in spoken and written
English: a multivariate comparison across time, space, and genres. In The Dynamics of Linguistic Variation:
Corpus Evidence on English Past and Present, eds. Terttu Nevalainen, Irma Taavitsainen, Päivi Pahta and
Minna Korhonen. Amsterdam: Benjamins.
Trudgill, Peter. The Dialects of England. Cambridge, MA, Oxford: Blackwell