Project publications and presentations

We ask that all publications using the AusTalk corpus include the following acknowledgement:

'The AusTalk corpus was collected as part of the Big ASC project (Burnham et al. 2009; Wagner et al. 2010; Burnham et al. 2011), funded by the Australian Research Council (LE100100211). See: for details.'


Project publications

Cassidy, S., Estival, D., Cox, F. (2017). ‘Case Study: the AusTalk Corpus’. Handbook of Linguistic Annotation, Nancy Ide and James Pustejovsky, eds. Springer.

Sui, Chao, Togneri, Roberto, and Bennamoun, Mohammed (2015). 'Extracting Deep Bottleneck Features For Visual Speech Recognition', in Proceedings of ICASSP 2015, pp. 1518-22.

Sui, Chao, Bennamoun, Mohammed, and Togneri, Roberto (2015). 'Listening With Your Eyes: Towards a Practical Visual Speech Recognition System Using Deep Boltzmann Machines', in Proceedings of ICCV 2015, pp. 154-62.

Estival, D. (2015). “AusTalk and Alveo: An Australian Corpus and Human Communication Science Collaboration Down Under”. In Language Production, Cognition, and the Lexicon. Núria Gala, Reinhard Rapp and Gemma Bel-Enguix, eds. Series: Text, Speech and Language Technology, Vol. 48. Springer. pp.545-560.

Cassidy, S., Estival, D., Cox, F. (2014). "AusTalk Annotation report".

Togneri, R., Bennamoun, M. and Sui, C. (2014). "Multimodal Speech Recognition with the AusTalk 3D Audio-Visual Corpus". Tutorial at Interspeech 2014.

Burnham, D. (2014). Big Data and Resource Sharing: A Speech Corpus and a Virtual Laboratory – Facilitating Research in Human Communication Science. Keynote address at Oriental COCOSDA (International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques) and concurrent meeting of the Conference on Asian Spoken Language Research and Evaluation, September 10-12, Phuket, Thailand.

Estival, D., Cassidy, S., Cox, F., Denis Burnham, D. (2014). “AusTalk: an audio-visual corpus of Australian English”. 9th Language Resources and Evaluation Conference (LREC 2014), Reykjavik, Iceland. Download PDF.

Sui, C., Haque, S., Togneri, R., & Bennamoun, M. (2012). "A 3D Audio-Visual Corpus for Speech Recognition". Paper presented at the SST2012, Sydney, Australia.

Sui, C., Haque, S., Togneri, R., & Bennamoun, M. (2012). "Discrimination Comparison Between Audio and Visual Features". Paper presented at the Asilomar 2012, Pacific Grove, USA. 

Burnham Denis, Dominique Estival, Steven Fazio, Felicity Cox, Robert Dale, Jette Viethen, Steve Cassidy, Julien Epps, Roberto Togneri, Yuko Kinoshita, Roland Göcke, Joanne Arciuli, Marc Onslow, Trent Lewis, Andy Butcher, John Hajek and Michael Wagner. (2011). "Building an Audio-Visual Corpus of Australian English: Large Corpus Collection with an Economical Portable and Replicable Black Box". In Interspeech 2011. Florence, Italy, 2011. Download PDF.

Christou, Maria. (2011). "Isn't it Romantic": Discerning the Phonetic Properties of Speech Directed at Lovers and Strangers. Honours Thesis. Dept. of Psychology. University of Western Sydney.

Wagner, M., D. Tran, R. Togneri, P. Rose, D. Powers, M. Onslow, D. Loakes, T. Lewis, T. Kuratate, Y. Kinoshita, N. Kbp, S. Ishihara, J. Ingram, J. Hajek, D.B. Grayden, R. Göcke, J. Fletcher, D. Estival, J. Epps, R. Dale, A. Cutler, F. Cox, G. Chetty, S. Cassidy, A. Butcher, D. Burnham, S. Bird, C. Best, M. Bennamoun, J. Arciuli, and E. Ambikairajah. "The Big Australian Speech Corpus (the Big Asc)". (2010). In 13th Australasian International Conference on Speech Science and Technology, edited by M. Tabain, J. Fletcher, D. Grayden, J. Hajek and A. Butcher, pp.166-70. Melbourne: ASSTA, 2010. Download PDF.

Burnham, D., E. Ambikairajah, J. Arciuli, M. Bennamoun, C.T. Best, S.  Bird, A.B. Butcher, C. Cassidy, G. Chetty, F.M. Cox, A. Cutler, R. Dale, J.R. Epps, J.M. Fletcher, R. Goecke, D.B. Grayden, J.T. Hajek, J.C. Ingram, S. Ishihara, N. Kbp, Y. Kinoshita, T. Kuratate, T.W. Lewis, D.E. Loakes, M. Onslow, D.M. Powers, P. Rose, R. Togneri, D.  Tran, and M.  Wagner. "A Blueprint for a Comprehensive Australian English Auditory-Visual Speech Corpus".  (2009).  In The 2008 HCSNet Workshop on Designing the Australian National Corpus, pp.96-107. Sydney: Somerville, MA, USA: Cascadilla Proceedings Project. Download PDF.


Public presentations

Presentation at the ACSRF Supercomputing Workshop, Hunan University,Changsha, China, June 2013. Download PDF.

Presentation at the HAIL Seminar, CSIRO, Sydney, April 2012. Download PDF.

Presentation at the New Zealand Institute of Language, Brain and Behaviour, University of Canterbury Christchurch, NZ, November 2011. Download PDF.

Presentation at Interspeech 2011. August 2011.

Presentation at the Thirteenth Australasian International Conference on Speech Science and Technology 2010 Melbourne, Australia, December 2010. Download PDF.


Selected references

  1. Anderson, A., Bader, M., Bard, E., Boyle, E., Doherty, G. M., Garrod, S., Isard, S., Kowtko, J., McAllister, J., Miller, J., Sotillo, C., Thompson, H. S. and Weinert, R. 1991. "The HCRC Map Task Corpus". Language and Speech 34, 51-366.
  2. Arciuli J & S McLeod 2008. "Production of /st/ clusters in trochaic and iambic contexts by typically developing children". Proceedings of the 8th International Seminar on Speech Production (ISSP). Strasbourg, France. pp. 181-184.
  3. Bavin EL, Grayden, DB, Scott K & T Stefanakis. "Testing auditory processing skills and their associations with language in 4-5 year-olds", Language & Speech, 53: 31-47.
  4. Best CT, Tyler MD, Gooding TN, Orlando CB & CA Quann. 2009. "Emergent phonology: Toddlers’ perception of words spoken in non-native vs native dialects". Psychological Science, 539-542
  5. Burnham, Denis, Eliathamby Ambikairajah, Joanne Arciuli, Mohammed Bennamoun, Catherine T. Best, Steven Bird, Andrew R. Butcher, Steve Cassidy, Girija Chetty, Felicity M. Cox, Anne Cutler, Robert Dale, Julien R. Epps, Janet M. Fletcher, Roland Goecke, David B. Grayden, John T. Hajek, John C. Ingram, Shunichi Ishihara, Nenagh Kbp, Yuko Kinoshita, Takaaki Kuratate, Trent W. Lewis, Debbie E. Loakes, Mark Onslow, David M. Powers, Philip Rose, Roberto Togneri, Dat Tran, and Michael Wagner. 2009. "A Blueprint for a Comprehensive Australian English Auditory-Visual Speech Corpus". Selected Proceedings of the 2008 HCSNet Workshop on Designing the Australian National Corpus, ed. Michael Haugh et al., 96-107. Somerville, MA: Cascadilla Proceedings Project.
  6. Butcher AR 1996 Levels of representation in the acquisition of phonology: evidence from ‘before and after’ speech. In B. Dodd, R. Campbell & L Worall (eds): Evaluating Theories of Language: evidence from disordered communication. London: Whurr Publishers, 55-73.
  7. Butcher AR 2002 Forensic Phonetics: Issues in speaker identification evidence. Proceedings of the Inaugural International Conference of the Institute of Forensic Studies: “Forensic Evidence: Proof and Presentation”, Prato, Italy 3-5 July [CD-ROM no page numbers].
  8. Butcher A 2006 ‘Formant frequencies of /hVd/ vowels in the speech of South Australian fbales’. Proceedings of the 11th Australasian International Conference on Speech Science & Technology, Pp 449-453.
  9. Butcher AR 2008 Linguistic aspects of Australian Aboriginal English. Clinical Linguistics & Phonetics, 22, 625-642.
  10. Cox F & S Palethorpe 1998 ‘Regional variation in the vowels of fbale adolescents from Sydney’ Proceedings of the 5th International Conference on Spoken Language Processing, ICSLP, Sydney, Novbber 30-Decbber 4.
  11. Cox F & S Palethorpe 2001 ‘The Changing Face of Australian English Vowels’. In Blair D & Collins P (eds.), Varieties of English around the World: English in Australia Amsterdam: John Benjamins Publishing. Pp. 17-44.
  12. Cox F & S Palethorpe 2004 ‘The border effect: Vowel differences across the NSW/Victorian border’ in Moskovsky, C (ed.), Proc. 2003 Conference, Australian Linguistics Society.
  13. Cox F & S Palethorpe 2008 Reversal of short front vowel raising in Australian English, Proceedings of Interspeech 2008, 22nd-26th Septbber 2008, Brisbane, 342-345.
  14. Cutler A & D Carter 1987 ‘The predominance of strong initial syllables in the English vocabulary’ Computer Speech & Language 2: 133-142.
  15. Cutler A 2005 ‘The lexical statistics of word recognition problbs caused by L2 phonetic confusion’ Proc. Eurospeech 2005, Lisbon, 413-416.
  16. Dale R & J Viethen 2009 ‘Referring Expression Generation through Attribute-Based Heuristics’ Proceedings of the 12th European Workshop on Natural Language Generation, 30th-31st March 2009, Athens, Greece.
  17. Dawson PW, McKay C.M, Busby PA, Grayden DB & GM Clark 2000 Electrode discrimination and speech perception in young children using cochlear implants, Ear and Hearing 21:597-607.
  18. Fletcher J, Grabe E & P Warren 2004 Intonational variation in four dialects of English: The high rising tune’ in Sun-Ah Jun (ed.), Prosodic typology Oxford: Oxford University Press. Pp. 390–409.
  19. Foulkes P 2005 Sociophonetics. In Brown, K. (ed) Encyclopedia of Language and Linguistics (2nd ed). Amsterdam: Elsevier, pp.495-500.
  20. Goecke R & Millar JB. ‘Statistical Analysis of the Relationship between Audio and Video Speech Parameters for Australian English’. In J.L. Schwartz, F. Berthommier, M.A. Cathiard, and D. Sodoyer (eds.), Proceedings of the ISCA Tutorial and Research Workshop on Auditory-Visual Speech Processing AVSP 2003, pages 133-138, St. Jorioz, France, 4 – 7 Septbber 2003.
  21. Huang JH & DMW Powers 2008 ‘Suffix-tree-based approach for Chinese information retrieval’. Proceedings of the International Conference on Intelligent Systbs Design and Applications (ISDA 2008), Vol. 3, Pp 393-397.
  22. Ingram J 1989 ‘Connected speech processes in Australian English’ in D Bradley, R Sussex, & G Scott (eds.), Studies in Australian English Aust. Linguistic Society. Pp 21-49.
  23. Kbp N 2009 ‘The spelling of vowels in influenced by Australian and British English dialect differences’ Scientific Studies of Reading 13: 53-72.
  24. Kuratate K 2008 ‘Text-to-AV Synthesis Systb for Thinking Head Project’ Proceedings of International Conference of Auditory-Visual Speech Processing 2008, Pp 191-194.
  25. Labov, W.1972 Sociolinguistic patterns, Philadelphia: Univ. of Philadelphia Press.
  26. Lewis TW & DMW Powers 2005 ‘Distinctive Feature Fusion for Improved Audio-Visual Phonbe Recognition’ Paper presented at 8th IEEE International Symposium on Signal Processing and Its Applications (ISSPA) 2005, Pp 62-65, Sydney, Australia, 28-31 August 2005. IEEE Press.
  27. Lewis TW & Powers DMW 2008 Distinctive Feature Fusion for Recognition of Australian English Consonants. Proceedings of Interspeech 2008, Brisbane. Pp 2671-2674
  28. Li Y & DMW Powers 2001 ‘Speech Separation Based on Higher Order Statistics Using Recurrent Neural Networks’, pp. 45-56, Proc. International Workshop on Hybrid Intelligent Systbs (HIS'01), Decbber 2001; Springer-Verlag "Advances in Soft Computing" Series.
  29. Loakes D & K McDougall (2010) “Individual variation in the frication of voiceless plosives in Australian English: a study of twins’ speech” Australian Journal of Linguistics 30, 155-181.
  30. McIntyre G & R Goecke 2007 Towards affective sensing. Proceedings of the 12th International Conference on Human-Computer Interaction HCII2007, 3, 411-420.
  31. Millar J, Dermody P, Harrington J & Vonwiller J 1990 A national database of spoken language: concept, design, and implbentation. Proceedings of the International Conference on Spoken Language Processing (ICSLP-90). Kobe, Japan.
  32. Mok M, Grayden, DB, Dowell, RC & D Lawrence 2006 Speech perception for adults who use hearing aids in conjunction with cochlear implants in opposite ears, Journal of Speech, Language, and Hearing Research, 49:338-351.
  33. Naseb I, Togneri R & M Bennamoun 2009 ‘Sparse Representation for Video-based Face Recognition’. Proceedings of ICB, June 2009, Alghero, Italy.
  34. Pfitzner DM, Leibbrandt RE & Powers DM 2008a ‘Characterization and Evaluation of Similarity Measures for Pairs of Clusterings’, Knowledge and Information Systbs: An International Journal, DOI 10.1007/s10115-008-0150-6.
  35. Pfitzner DM, Treharne T & Powers DM 2008b ‘User Keyword Preference: the Nwords and Rwords Experiments’ International Journal of Internet Protocol Technology 9:149-158 DOI 10.1504/IJIPT.2008.020947
  36. Powers, DMW & RE Leibbrandt 2009 Rough Diamonds in Natural Language Learning. Proceedings of the 4th International Conference on Rough Sets and Knowledge Technology, Springer Lecture Notes in Computer Science. PP.17-26.
  37. Powers, DMW, Leibbrandt, RE, Pfitzner, D, Luerssen, MH, Lewis, TW, Abrahamyan, A & K Stevens 2008 ‘Language Teaching in a Mixed Reality Games Environment’ The 1st International Conference on Pervasive Technologies Related to Assistive Environments (PETRA) DOI 10.1145/1389586.1389668
  38. Rose P 2002 Forensic Speaker Identification .London: Taylor & Francis.
  39. Smits R, Warner N, McQueen, J & A Cutler A 2003 Unfolding of phonetic information over time: A database of Dutch diphone perception’ Journal of the Acoustical Society of America 113: 563-574.
  40. Talaricoa M, Abdillaa G, Aliferisa M, Balazica I, Giaprakisa I, Stefanakis T, Foenander K, Grayden DB & AG Paolini 2007 Effect of age and cognition on childhood speech in noise perception abilities, Audiology & Neurotology, 12:13-19.
  41. Tran D & M Wagner 2002 ‘A Fuzzy Approach to Speaker Verification’ International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI), 16(7): 913-925
  42. Tran D, Wagner M, Lau YW & M Gen 2004 ‘Fuzzy Methods for Voice-Based Person Authentication’ IEEJ (Institute of Electrical Engineers of Japan) Transactions on Electronics, Information and Systbs, 124(10): 1958-1963.
  43. Vidhyasaharan S, Ambikairajah E & Epps J 2009 ‘Speaker dependency of spectral features and speech production cues for automatic botion classification’, in Proc. IEEE Int. Conf. on Acoust., Speech and Sig. Proc. (Taipei, Taiwan).
  44. Viethen J & R Dale 2006 ‘Algorithms for Generating Referring Expressions: Do They Do What People Do?’ Proceedings of the 4th International Conference on Natural Language Generation, 15-16 July, Sydney, Australia. Pp. 63-70.
  45. Yacoub S, Simske S, Lin X & J Burns 2003 ‘Recognition of botions in interactive voice response systbs’ in Proc. Eurospeech, pp. 729-732, Septbber 2003.