|The ﬁeld of sound event detection is a growing sector which has mainly focused on the identiﬁcation of sound classes from daily life situations. In most cases these sound detection models are trained on publicly available sound databases, up to now, however, they do not include acoustic data from manufacturing environments. Within manufacturing industries, acoustic data can be exploited in order to evaluate the correct execution of assembling processes. As an example, in this paper the correct plugging of connectors is analyzed on the basis of multimodal contextual process information. The latter are the connector’s acoustic properties and visual information recorded in form of video ﬁles while executing connector locking processes. For the ﬁrst time optical microphones are used for the acquisition and analysis of connector sound data in order to diﬀerentiate connector locking sounds from each other respectively from background noise and sound events with similar acoustic properties. Therefore, diﬀerent types of feature representations as well as neural network architectures are investigated for this speciﬁc task. The results from the proposed analysis show, that multimodal approaches clearly outperform unimodal neural network architectures for the task of connector locking validation by reaching maximal accuracy levels close to 85%. Since in many cases there are no additional validation methods applied for the detection of correctly locked connectors in manufacturing industries, it is concluded that the proposed connector lock event detection framework is a signiﬁcant improvement for the qualitative validation of plugging operations.|
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.