The Helsinki Corpus Of British English Dialects

The Helsinki Corpus of British English Dialects (HD) is a corpus project initiated in the early 1970s by Professor Tauno F. Mustanoja from the English Department, Helsinki, and Professor Harold Orton from the University of Leeds. The corpus contains over a million words of transcribed dialect speech. Covered are the rural regions of Cambridgeshire, Devon, Isle of Ely, Somerset and Suffolk, all recorded in the 1970s and early 1980s, as well as the urban regions of Essex and Lancashire, recorded in the late 1980s. The recordings were made by Finnish postgraduates. What makes the corpus special is that it contains free, continuous speech, in contrast to the questionnaire method used earlier in projects such as the Survey of English Dialects. The questionnaire format gave the interviewees (aka informants) a list of questions to which they'd answer with a dialectal form. This method of interviewing provides material useful only for lexical and phonological analysis, which was not in the interest of the HD group. The Helsinki Dialect project was mainly concerned in morphosyntax, i.e. the study of sentence structure and morphology, to which one-word responses in a questionnaire would not provide sufficient material for research.

The fieldworkers, all Finnish postgraduates, lived in the region usually for a summer or two and went from village to village interviewing older people. The interviews were free in form, giving the informants a chance to choose their own topics of discussion. In the early 1980s, under the leadership of Professor Ossi Ihalainen, himself one of the fieldworkers in the HD project, the group began to transcribe and transfer the recorded material into computer format. The use of computers is nowadays an obvious choice, but one must remember that in the early 1980s computers were usually as large as the rooms they were in, and the work involved was more than merely typing with the keyboard.

The dialect corpus is unique in many ways, primarily because it contains free, continuous speech, but also because it covers regions very poorly documented in any earlier studies. The Cambridgeshire dialect was, for example, in the Survey of English Dialects represented with just one locality, whereas in the HD the localities number almost thirty.

Numerous studies have been conducted on the basis of the HD data, most of them, as stated before, morphosyntactic in nature, but some phonological studies were made too. In the present day the material is more valuable than ever. Dialectology is taking new steps with the advent of better software and technology for speech analysis and transcribing. New theoretical approaches have surfaced too, as dialectological research can no longer be based on theories from the 1960s and 1970s that were dominated by the linguistic theories of the time.

The most rewarding approach is studying the grammar of spoken English and how it is in conflict with the grammar of Standard English used in almost all textbooks and linguistic studies not involved with spoken language. It is obvious that the restrictions of written grammar cannot be used as a theoretical basis for studying the free form of spoken language. Even more so with dialects, which are generally considered "non-standard", a pejorative term that doesn't even begin to describe the diversity and variety of the vernacular. It is with great anticipation that dialectologists and those interested in regional variation wait for new articles and theories to be published on the dichotomy between standard and non-standard forms of language. One particularly interesting approach developed in the recent years is the Optimality Theory. Its claim is that the grammatical forms of Standard English underlie all the spoken forms too. Whenever these rules are violated in the output of spoken language, they aren't immediately labelled "non-grammatical" or "non-standard", but the focus of the theory is to what extent and how these rules are violated. This, in my opinion, is a far more sound approach to describing spoken language than the prescriptivism that has dominated syntactic and grammatical research over the past decades. A study using Optimality Theory to describe morphosyntactic variation in dialects is yet to be done, as far as I know, and I'm sure it will be a welcome change of pace into dialect research.

As students of the English Department in the University of Helsinki, we can hold our heads up with certain pride whenever the Helsinki Corpus is discussed, since it is the flagship of the department, and now under the coordination of the VARIENG research unit.

For anyone interested in studying regional variation and dialectology, I would strongly suggest to visit VARIENG's website and learn about their various corpora, whose uniqueness is recognised all over the world. [tags]varieng, helsinki, corpus, dialect[/tags]

