Intonational Variation in Arabic

Lead Research Organisation: University of York
Department Name: Language and Linguistic Science

Abstract

Twenty five countries have Arabic as an official language, but the dialects spoken vary greatly, and even within one country different accents are heard. Many features create the impression of 'a different accent', including how particular sounds are pronounced, where stress falls in a word, and what intonation pattern is used. There is extensive prior research on the first two of these for Arabic, but few descriptions of the intonation of individual dialects, and what is known is based on different data types so direct comparisons cannot be made.



The Intonational Variation in Arabic project is hosted by the Department of Language and Linguistic Science at the University of York, a leading centre for sociophonetic research. Adapting methodology from earlier ESRC funded work on English (www.phon.ox.ac.uk/IViE/) the project will generate a public-access corpus of Arabic speech, using a parallel set of sentences, stories and conversations, recorded with 18-24 year olds in five regions of the Arab world. Additional data from older speakers (50+) and in nearby cities will reveal changes in progress and local variation. Detailed prosodic transcription will yield intonational descriptions of individual dialects and cross-dialectal comparisons, for use by linguists, learners and teachers of Arabic and other users.

 
Description To date, the most significant achievement of the grant is the collection of a large parallel corpus of speech data, elicited for the purposes of intonational analysis, in eight dialects of Arabic. The rationale of the corpus design is set out in a recent book chapter (Hellmuth 2014).

On completion of the project these speech recordings, and accompanying transcriptions, will be made available via an online searchable database. We expect all of the remaining objectives of the grant to be achieved by its projected completion date.
Exploitation Route The findings of our research will be useful to learners and teachers of Arabic, who will benefit from the availability of descriptions of the pronunciation differences between different Arabic dialects of Arabic, and from the availability of sample sound recordings to download.

To lay a foundation for this future use, we produced a position paper explaining why, in particular, a description of the intonation patterns of different dialects may be useful for learners and teachers of Arabic (Hellmuth 2014). The paper takes research-led recommendations for teaching of the pronunciation of English as a starting point and explores what the equivalent recommendations would be for Arabic, taking into account the known differences between the two languages.

In addition, recordings from the IVAr database are currently being used in development of a prototype online training module designed to evaluate the extent to which 'lay listeners' (with no prior knowledge of linguistics or of Arabic dialects) can be trained to more reliably identify differences between spoken Arabic dialects; the technique tests both for potential gains from both explicit instruction, alerting listeners to salient features of each dialects, as well as implicit learning/familiarisation over time from repeated listening.

We have also produced papers i) to show innovative methodology used to collect interactive data in languages such as Arabic where the written form of the language differs from the spoken form (Gargett et al 2014), and ii) to explore whether or not it is possible to detect traces of a person's mother tongue Arabic dialect when they are speaking English as a foreign language (Almbark et al 2014).
Sectors Digital/Communication/Information Technologies (including Software),Education,Government, Democracy and Justice,Security and Diplomacy
URL http://ivar.york.ac.uk/
 
Title Implementation of the ProsodyLab forced alignment tool for dialectal Arabic 
Description We adapted open source Python scripts distributed by the McGill prosodylab for the ProsodyLab Aligner forced alignment tool, for use for forced alignment of text transcriptions of the IVAr data to the audio recordings, resulting in time-aligned Praat textgrids at the word (and segment) level. An innovation in our lab was adaptation of the tools to ensure robust alignment of longer sound files (i.e. containing longer narratives and/or conversations). 
Type Of Material Improvements to research infrastructure 
Provided To Others? No  
Impact HMM models for each dialect analysed, and Praat textgrids automatically time-aligned at the word (and segment) level to audio recordings. Textgrids time-aligned at the word level will be made available alongside the audio files via the IVAr database. 
 
Title IVAr corpus 
Description The Intonational Variation in Arabic (IVAr) corpus is one of the primary outputs of the IVAr project. It is a parallel corpus of speech data in eight dialects of Arabic (plus one bilingual sub-corpus dataset). Data collection was completed in September 2015. All of the read speech portions of the data are orthographically transcribed, using forced-alignment (time aligned to the digital audio signal). Transcriptions are also available for at least half of the spontaneous speech portions of the database. An SQL database has been constructed to allow users to search for sound files and accompanying transcriptions, as well as relevant metadata. 
Type Of Material Database/Collection of data 
Provided To Others? No  
Impact The database will be released on completion of the project (September 2017). 
 
Description BAB-MSA 
Organisation University of Jordan
Country Jordan, Hashemite Kingdom of 
Sector Academic/University 
PI Contribution We have created a corpus of Boundary Annotated Broadcast Modern Standard Arabic (BAB-MSA) for input to computational analysis. The annotations are informed by our work on development of prosodic annotation protocols for regional Arabic dialects.
Collaborator Contribution Our partners, Dr Claire Brierley (Leeds) and Majdi Sawalha (Jordan), then used the corpus to test a model of automated phrase break prediction.
Impact This research is multidisciplinary: linguistics ~ computer science. The resulting journal article is currently under revision.
Start Year 2013
 
Description BAB-MSA 
Organisation University of Leeds
Country United Kingdom of Great Britain & Northern Ireland (UK) 
Sector Academic/University 
PI Contribution We have created a corpus of Boundary Annotated Broadcast Modern Standard Arabic (BAB-MSA) for input to computational analysis. The annotations are informed by our work on development of prosodic annotation protocols for regional Arabic dialects.
Collaborator Contribution Our partners, Dr Claire Brierley (Leeds) and Majdi Sawalha (Jordan), then used the corpus to test a model of automated phrase break prediction.
Impact This research is multidisciplinary: linguistics ~ computer science. The resulting journal article is currently under revision.
Start Year 2013
 
Description COS 
Organisation Universite de la Manouba
Country Tunisia, Tunisian Republic 
Sector Academic/University 
PI Contribution We have collected parallel data in (so far) 8 dialects of Arabic, to determine the phonetic correlates of word level stress in each dialect, using an elicitation paradigm devised by Dr Bouchhioua. The resultant data will allow directly parallel comparison of the correlates of word stress across Arabic dialects for the first time. We will analyse the data after completion of the annotation of the main IVAr data for each dialect.
Collaborator Contribution The elicitation paradigm was devised by our partner, Dr Nadia Bouchhioua of the Universite de la Manouba, Tunis, Tunisia.
Impact Acquiring the phonetics and phonology of English word stress : Comparing learners from different L1 backgrounds. / Alhussein Almbark, Rana; Bouchhioua, Nadia; Hellmuth, Sam. In: Concordia Working Papers in Applied Linguistics, Vol. 5, 2014, p. 19-35.
Start Year 2013
 
Description Comparison of Moroccan Arabic and Tamazight prosodic phonology 
Organisation University of Cologne
Department Institute for Linguistics - Phonetics
Country Germany, Federal Republic of 
Sector Academic/University 
PI Contribution Joint PhD supervision with Prof Dr Martine Grice, for Anna Bruggeman who is working on comparison of the realisation of word-/phrase-stress in Moroccan Arabic and Tamazight.
Collaborator Contribution Anna Bruggeman is analysing some of the Moroccan Arabic bilingual sub-corpus for comparison to parallel work previously carried out by Anna and colleagues at the Cologne lab on Tamazight prosody.
Impact Analysis of a) the acoustic correlates of putative word-level stress and b) the scaling and alignment of f0 peaks observed on q-words in wh-questions in Moroccan Arabic, produced by speakers who are/are not also bilingual in Tashlhiyt, using data from the Moroccan Arabic bilingual sub-corpus.
Start Year 2016
 
Description DiVE-Arabic 
Organisation University of Birmingham
Country United Kingdom of Great Britain & Northern Ireland (UK) 
Sector Academic/University 
PI Contribution In one of our fieldwork locations we collected an additional corpus of data elicited using a virtual world game environment developed by Andrew Gargett (University of Birmingham), and yields audio data which is time-aligned with a log of the actions (movements/orientations) in the virtual world. Dr Gargett is developing methods for annotation and/or analysis of the actions data.
Collaborator Contribution We will provide prosodic annotation of the audio data, using the annotation protocols for the dialect in question, once these are developed (based on the main IVAr corpus data). Once the two levels of analysis are available we will have a rich resource for examining the role of prosody and intonation in situated dialogue in Arabic (for the first time).
Impact DiVE-Arabic: Gulf Arabic Dialogue in a Virtual Environment. / Gargett, Andrew; AlGethami, Ghazi; Hellmuth, Sam. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). European Language Resources Association (ELRA), 2014. This collaboration is multi-disciplinary: linguistics ~ computer science.
Start Year 2013
 
Description Radio broadcast (Word of Mouth) 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Participation in Radio 4 'Word of Mouth' programme on 'Intonation: the Music of Speech' focussed on variation in the form and function of intonation across languages.
Year(s) Of Engagement Activity 2017
URL http://www.bbc.co.uk/programmes/b08dnrqd