Abstract:
IoT Edge intelligence requires Convolutional Neural
Network (CNN) inference to take place in the edge devices
itself. ARM big.LITTLE architecture is at the heart of prevalent
commercial edge devices. It comprises of single-ISA heterogeneous cores grouped into multiple homogeneous clusters that
enable power and performance trade-offs. All cores are expected
to be simultaneously employed in inference to attain maximal
throughput. However, high communication overhead involved in
parallelization of computations from convolution kernels across
clusters is detrimental to throughput. We present an alternative
framework called Pipe-it that employs pipelined design to split
convolutional layers across clusters while limiting parallelization
of their respective kernels to the assigned cluster. We develop a
performance-prediction model that utilizes only the convolutional
layer descriptors to predict the execution time of each layer
individually on all permitted core configurations (type and count).
Pipe-it then exploits the predictions to create a balanced pipeline
using an efficient design space exploration algorithm. Pipe-it on
average results in a 39% higher throughput than the highest
antecedent throughput.