Delia Velasco-Montero, Jorge Fernández-Berni, Ricardo Carmona-Galán, Ángel Rodríguez-Vázquez.

Abstract
In this paper, we analyze heterogeneous performance exhibited by some popular deep learning software frameworks for visual inference on a resource-constrained hardware platform. Benchmarking of Caffe, OpenCV, TensorFlow, and Caffe2 is performed on the same set of convolutional neural networks in terms of instantaneous throughput, power consumption, memory footprint, and CPU utilization. To understand the resulting dissimilar behavior, we thoroughly examine how the resources in the processor are differently exploited by these frameworks. We demonstrate that a strong correlation exists between hardware events occurring in the processor and inference performance. The proposed hardware-aware analysis aims to find limitations and bottlenecks emerging from the joint interaction of frameworks and networks on a particular CPU-based platform. This provides insight into introducing suitable modifications in both types of components to enhance their global performance. It also facilitates the selection of frameworks and networks among a large diversity of these components available these days for visual understanding.
Keywords
convolutional neural networks, deep learning, edge inference, embedded vision, hardware performance, software frameworks