|Abstract. The detection and classification of vehicles by suitable monitoring systems is an integral part of Intelligent Transportation Systems (ITS). We re-port results of an ongoing research project on fine-grained vehicle classification based on images acquired from roadside and overhead based video cameras. In a previous work  a dataset of overall 100,000 sample images from 36 fine-grained vehicle classes has already been presented. These images were acquired from roadside based cameras and results for the classification accuracy obtained with state-of-the-art CNNs (convolutional neural networks) allowed to fulfil the challenging traffic norm TLS 8+1 A1. Here, in extension to this work, cameras in overhead perspective were used to avoid the problem of occlusion (i.e., a larger vehicle completely occluding a smaller one), which currently limits the roadside perspective to two-lane roads (with one camera per lane). Therefore, the original dataset was expanded with a new set of close to 100,000 images now taken in overhead perspective and representing the same 36 fine-grained vehicle classes. While keeping all model and hyperparameters identical (size of training and test set, resolution, CNN architecture, ...) in overhead perspective a considerable drop in the classification accuracy was observed with respect to the roadside perspective. Analysis of the confusion matrix reveals that im-portant details of the vehicles, which are essential for the distinction among certain classes, are not sufficiently well represented in the CNN in overhead position. These results seem to indicate, that standard CNNs come to their limits for the present task of fine-grained vehicle classification and other, part-based approaches are required to solve this problem.|
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.