Monaural audio source separation using variational autoencoders

Laxmi Pandey, Anurendra Kumar, Vinay Namboodiri

Research output: Contribution to journalConference articlepeer-review

14 Citations (SciVal)


We introduce a monaural audio source separation framework using a latent generative model. Traditionally, discriminative training for source separation is proposed using deep neural networks or non-negative matrix factorization. In this paper, we propose a principled generative approach using variational autoencoders (VAE) for audio source separation. VAE computes efficient Bayesian inference which leads to a continuous latent representation of the input data(spectrogram). It contains a probabilistic encoder which projects an input data to latent space and a probabilistic decoder which projects data from latent space back to input space. This allows us to learn a robust latent representation of sources corrupted with noise and other sources. The latent representation is then fed to the decoder to yield the separated source. Both encoder and decoder are implemented via multilayer perceptron (MLP). In contrast to prevalent techniques, we argue that VAE is a more principled approach to source separation. Experimentally, we find that the proposed framework yields reasonable improvements when compared to baseline methods available in the literature i.e. DNN and RNN with different masking functions and autoencoders. We show that our method performs better than best of the relevant methods with ∼ 2 dB improvement in the source to distortion ratio.

Original languageEnglish
Pages (from-to)3489-3493
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication statusPublished - 1 Jan 2018
Event19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, India
Duration: 2 Sept 20186 Sept 2018


  • Autoencoder
  • Deep learning
  • Generative models
  • Latent variable
  • Source separation
  • Variational inference

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation


Dive into the research topics of 'Monaural audio source separation using variational autoencoders'. Together they form a unique fingerprint.

Cite this