In our previous work, we have provided tools for an efficient characterization of biomedical images using Legendre and Zernike moments, showing their relevance as biomarkers for classifying image tiles coming from bone tissue regeneration studies (Ujaldón, 2009). As part of our research quest for efficiency, we developed methods for accelerating those computations on GPUs (Martín-Requena and Ujaldón, 2011). This new stage of our work focuses on the efficient data partitioning to optimize the execution on many-cores and clusters of GPUs to attain gains up to three orders of magnitude when compared to the execution on multi-core CPUs of similar age and cost using 1 Mpixel images. We deploy a successive and successful chain of optimizations which exploit symmetries in trigonometric functions and access patterns to image pixels which are effectively combined with massive data parallelism on GPUs to enable (1) real-time processing for our set of input biomedical images, and (2) the use of high-resolution images in clinical practice.
Journal of Parallel and Distributed Computing Vol. 74, Issue 1, p. 1994-2004