|Robotic grasping of unknown objects in cluttered scenes is already well established, mainly based on advances in Deep Learning methods. A major drawback is the need for a big amount of real-world training data. Furthermore these networks are not interpretable in a sense that it is not clear why certain grasp attempts fail. To make the process of robotic grasping traceable and simplify the overall model we suggest to divide the complex task of robotic grasping into three simpler tasks to find stable grasp points. The first task is to find all grasp points where the gripper can be lowered onto the table without colliding with the object. The second task is to determine for the grasp points and gripper parameters from the first step how the object moves while the gripper is closed. Finally in the third step for all grasp points from the second step it is predicted whether the object slips out of the gripper during lifting. By this simplification it is possible to understand for each grasp point why it is stable and - just as important - why others are unstable or not feasible. In this study we focus on the second task, the prediction of the physical interaction between gripper and object while the gripper is closed. We investigate different Convolutional Neural Network (CNN) architectures and identify the architecture(s) that predict the physical interactions in image space best. We perform the experiments for training data generation in the robot and physics simulator V-REP.|
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.