COSER

COSER is a corpus of dialects, restricted to the speech of informants who were the object of interest in traditional dialectology. These include rural speakers, preferably older, who have lower levels of education and are native to the place where they are interviewed. COSER feeds off the same type of informants as the linguistic atlases. To date (December 2023), 3,009 informants are registered in our database, although only slightly more than half of this number has been interviewed in depth:


Informants	Number	Average age
Men:	1.431 (47,56%)	74.8 years
Women:	1.578 (52,44%)	73.4 years
*Total*:	3.009	74.1 years

The overall average age of the informants is 74.1 years, slightly higher for men (74.8 years) than for women (73.4 years). These informants were born in the first third of the twentieth century. In terms of education, they all attended a few years of primary school with varying degrees of success. According to their statements, at school they learned "to read and write, and the four basic mathematical rules". Despite this, there are also numerous illiterate informants.

The recordings held in COSER have been obtained regularly from 1990 to date (December 2023) in a series of survey campaigns. This fieldwork has been organised with the support of several research projects and as part of the field work included in courses on Spanish dialectology ["Dialectología hispánica" (1988-1996)] and peninsular variants of Spoken Spanish ["El español hablado. Variantes peninsulares" (1996-2004)], as well as a course on varieties of Spanish ["Curso monográfico de variedades del español" (2005-2011)]. All of these courses were optional modules available to students studying Spanish at the Universidad Autónoma de Madrid. From 2011 to date, they have been an optional activity for third-year undergraduate students of the module on varieties of Spanish ["Lengua española. Variedades de la lengua"] on the university's Degree in Hispanic Studies.


Surveyed localities	Provinces or islands	Total hours of recordings	Average recording length per interview	Interviews available in text and audio (December 2023)
1433	55	1947 hours	1 hour, 5 minutes	229

Before 2023, interviews were carried out in 1433 rural localities in the Iberian Peninsula and the two archipelagos, belonging to 55 provinces or islands (which we have counted independently even though they belong to a single province). Their geographical location is shown on the map, where they can be identified with a numerical code that summarises the province and the locality, in alphabetical order (for example, Berganzo, in the province of Álava, has the code 0101). The sound materials cover a large part of the Iberian Peninsula and the density of the network of points is comparable to that of the regional atlases, or even thicker.

In total, COSER currently has 1947 hours of recordings. Although most of these were recorded in analogue format, in 2010 we were able to digitalise all the materials, of which we present a sample as sound files. Half of the materials have transcriptions of varying nature and accuracy, undertaken thanks to support from various research projects and the participation of several generations of UAM undergraduate students, who have transcribed, as part of their academic course work, recordings they had collected. In 2015 the 147 transcriptions corresponding to 141 localities (approximately 183 hours), revised and standardised with the BConcord editor, were published on this website (available files) and made searchable through a search engine. Between 2015 and December 2023, that number has increased to 229 transcriptions, equalling 311 hours, 53 minutes of recording, making a searchable corpus of 3.384,041 words. Since 2017, this corpus has been accessible in both the Simple Search and Advanced Search modes (which allows searches using lemmas and morphosyntactic tags). In 2019, the Advanced Search option was revised, and, among other improvements, it is now possible to download the search data in Excel format. In 2020, geographic coordinates and the postcode of the localities were enabled in this search option. This means that the data extracted can be analysed in Geographic Information Systems and text has also been synchronised with its corresponding audio. In 2021, synchronisation, spelling and labelling errors were revised throughout the available corpus.


Localities whose transcript is available and searchable (December 2023)	Provinces or islands	Number of hours transcribed	Total number of words transcribed	Total units (tokens)
229	55	311 hours, 53 minutes	3.384,041 words	4.860,596 units

COSER

Audible Corpus of Spoken Rural Spanish

Description