In this paper, we present preliminary results on the use of deep learning techniques to integrate the user’s selfbodyand other participants into a head-mounted video seethrough augmented virtuality scenario. It has been previously shown that seeing user’s bodies in such simulations may improve the feeling of both self and social presence in the virtual environment, as well as user performance. We propose to use a convolutional neural network for real time semantic segmentation of users’ bodies in the stereoscopic RGB videostreams acquired from the perspective of the user. We describe design issues as well as implementation details of the system and demonstrate the feasibility of using such neural networks for merging users’ bodies in an augmented virtuality simulation.