Description

COSER is a dialect corpus, but it is restricted to the speech of those informants who were the object of interest in traditional dialectology: rural speakers, preferably older, with little schooling and native to the place where they are interviewed. COSER feeds off the same type of informants as the linguistic atlases. For the time being (December 2022), 2,961 informants are registered in our database, although only slightly more than half of them have been interviewed in depth:


Informants Number Average age
Men: 1.415 (47,8%) 75 years
Women: 1.546 (52,1%) 73,6 years
Total: 2.961 74,2 years



The overall average age of the informants is 74 years, slightly higher for men (74.8 years) than for women (73.6 years). These are informants who were born in the first third of the 20th century and who have received a certain amount of education: in general, they have attended, with varying degrees of success, a few years of primary school, learning, according to their statements, "to read and write, and the four basic mathematical rules", although there is no shortage of illiterates.

The recordings that make up the COSER have been obtained regularly from 1990 to the present time -December 2022- in a series of survey campaigns. This fieldwork has been organised with the support of several research projects and as part of the field work practices of the courses "Dialectología hispánica (1988-1996) and "El español hablado. Variantes peninsulares" (1996-2004), "Curso monográfico de variedades del español (2005-2011), optional subjects belonging to the bachelor's degree in Hispanic Philology at the Autonomous University of Madrid (Universidad Autónoma de Madrid, UAM). From 2011 to the present, they have been integrated as an optional activity of the subject "Lengua española. Variedades de la lengua" (3rd year) of the Degree in Hispanic Studies at that university.


Surveyed localities Provinces or islands Total amount of the recordings Average recording per interview Number of interviews Interviews available in text and audio (May 2022)
1,415 55 1,910 hours 1 hour, 4 min. 1,772 218



Until 2022, interviews were carried out in 1,415 rural localities in the Iberian Peninsula and the two archipelagos, belonging to 55 provinces or islands (which we have counted independently even though they belong to a single province). Their geographical location is shown on the map, where they can be identified by means of a numerical code that summarises the province and the locality, in alphabetical order (for example, Berganzo, in the province of Álava, has the code 0101). The sound materials cover a large part of the Iberian Peninsula and the density of the network of points is comparable to that of the regional atlases, or even thicker.

In total, COSER currently has 1,910 hours of recordings. Although most of them were recorded in analog format, in 2010 it was possible to complete the digitalisation of all the materials, of which we present a sample as sound files. Half of the materials have transcriptions, of varying nature and accuracy, undertaken thanks to the support obtained by various research projects and the participation of several generations of UAM undergraduate students, who have transcribed, as part of their academic course work, recordings they had collected. In 2015 the 147 transcriptions corresponding to 141 localities (approximately 183 hours), revised and standardised with the BConcord editor, were published on this website (available files) and made searchable through a search engine. From then until May 2022 that number has increased to 218 transcriptions, corresponding to 295 hours, 48 minutes of recording, making up an interrogable corpus of 3,596,205 words. Since 2017, this corpus has been accessible in both the Simple Search and Advanced Search modes (which allows querying by lemmas and morphosyntactic tags). In 2019, the Advanced Search was revised, and, among other improvements, it is now possible to download the search data in Excel format. In 2020, geographic coordinates and the postcode of the localities were enabled in this query, so that the data can be analysed in Geographic Information Systems, and the synchronisation of text with audio has been completed. In 2021 and 2022, synchronisation, spelling and labelling errors have been revised throughout the available corpus.

 


Localities whose transcript is available and questionable in the Search (May 2022) Provinces or islands Number of hours transcribed Total of words transcribed Total units (tokens)
218 55 295 hours, 48 minutes 3,596,205 words 4,591,828 units