Music involves different senses and is emotional in nature, and musicians show enhanced detection of audio-visual temporal discrepancies and emotion recognition compared to non-musicians. However, whether musical training produces these enhanced abilities or if they are innate within musicians remains unclear. Thirty-one adult participants were randomly assigned to a music training, music listening, or control group who all completed a one-hour session per week for 11 weeks. The music training group received piano training, the music listening group listened to the same music, and the control group did their homework. Measures of audio-visual temporal discrepancy, facial expression recognition, autistic traits, depression, anxiety, stress and mood were completed and compared from the beginning to end of training. ANOVA results revealed that only the music training group showed a significant improvement in detection of audio-visual temporal discrepancies compared to the other groups for both stimuli (flash-beep and face-voice). However, music training did not improve emotion recognition from facial expressions compared to the control group, while it did reduce the levels of depression, stress and anxiety compared to baseline. This RCT study provides the first evidence of a causal effect of music training on improved audio-visual perception that goes beyond the music domain.