This work presents a framework for Human-Robot Collaboration (HRC) in assembly tasks that uses multimodal sensors, perception and control methods. First, vision sensing is employed for user identification to determine the collaborative task to be performed. Second, assembly actions and hand gestures are recognised using wearable inertial measurement units (IMUs) and convolutional neural networks (CNN) to identify when robot collaboration is needed and bring the next object to the user for assembly. If collaboration is not required, then the robot performs a solo task. Third, the robot arm uses time domain features from tactile sensors to detect when an object has been touched and grasped for handover actions in the assembly process. These multimodal sensors and computational modules are integrated in a layered control architecture for HRC collaborative assembly tasks. The proposed framework is validated in real-time using a Universal Robot arm (UR3) to collaborate with humans for assembling two types of objects 1) a box and 2) a small chair, and to work on a solo task of moving a stack of Lego blocks when collaboration with the user is not needed. The experiments show that the robot is capable of sensing and perceiving the state of the surrounding environment using multimodal sensors and computational methods to act and collaborate with humans to complete assembly tasks successfully.