We present a robust deep learning based 6 degrees-of-freedom (DoF) localization system for endoscopic capsule robots. Our system mainly focuses on localization of endoscopic capsule robots inside the GI tract using only visual information captured by a mono camera integrated to the robot. The proposed system is a 23-layer deep convolutional neural network (CNN) that is capable to estimate
the pose of the robot in real time using a standard CPU. The dataset for the evaluation of the system was recorded inside a surgical human stomach model with realistic surface texture, softness, and surface liquid properties so that the pre-trained CNN architecture can be transferred confidently into a real endoscopic scenario. An average error of 7.1% and 3.4% for translation and rotation has been obtained, respectively. The results accomplished from the experiments demonstrate that a CNN pre-trained with raw 2D endoscopic images performs accurately inside the GI tract and is robust to various challenges posed by reflection distortions, lens imperfections, vignetting, noise, motion blur, low resolution, and lack of unique landmarks to track.