Corpus linguistic methods in EMI research: A missed opportunity?

Research output: Chapter in Book/Report/Conference proceedingChapter


Although corpus linguistic methods are widely used in analysing educational and professional discourse (e.g. Biber & Barbieri, 2007; Cheng, 2014), there are surprisingly few studies that apply this approach to analyse discourse in EMI educational contexts (Briggs, 2017; Csomay & Wu, 2020). This chapter argues that in order to extend our understanding of EMI language use and the linguistic demand on students and teachers in EMI contexts, more studies are needed that exploit the potentials of corpora and corpus linguistic methods in EMI research. The chapter highlights three broad areas where EMI research could especially benefit from the use of corpus linguistic methods. First, the strengths of corpus linguistic methods are probably most obvious in investigating vocabulary and grammatical aspects of discourse, for example, vocabulary load (Nation, 2016) and lexical and syntactic complexity of texts (Johnson, 2017; Lu, 2010). Second, EMI studies that aim to explore specific contextual and disciplinary language variations can profit from corpus-based analytical frameworks such as multidimensional analysis that can provide comprehensive descriptions of such varieties (Biber et al., 2002). Finally, corpus-based studies can be applied to investigate interaction and discourse functions in texts. For such purposes corpus approaches are often used to identify features that emerge from the corpus as frequent words and lexical bundles for further analysis. Studies focusing on these aspects typically use corpus linguistic methods in combination with qualitative methods such as conversational analysis (Jahwar, 2012) or investigate the recurring patterns qualitatively in concordance lines (Biber & Barbieri, 2007). The chapter reviewed the handful of studies that analysed corpora of classroom discourse and academic disciplinary discourses in EMI contexts and proposes further directions of corpus research in EMI to unlock its hitherto untapped potential. Keywords: corpus-based methods, vocabulary load, EMI corpus, linguistic complexity, lexical bundles
Original languageEnglish
Title of host publicationResearch methods in English Medium Instruction
EditorsJack K. H. Pun, Samantha M. Curle
Place of PublicationLondon, U. K.
ISBN (Print) 9780367457556
Publication statusPublished - 20 Jul 2021


  • English Medium Instruction
  • lexical bundles
  • corpus-based methods
  • vocabulary load
  • EMI corpus
  • linguistic complexity


Dive into the research topics of 'Corpus linguistic methods in EMI research: A missed opportunity?'. Together they form a unique fingerprint.

Cite this