eprintid: 3573 rev_number: 5 eprint_status: archive userid: 69 dir: disk0/00/00/35/73 datestamp: 2016-10-06 14:39:06 lastmod: 2016-10-06 14:39:06 status_changed: 2016-10-06 14:39:06 type: article metadata_visibility: show creators_name: Das, Anup creators_name: Merrett, Geoff V. creators_name: Tribastone, Mirco creators_name: Al-Hashimi, Bashir M. creators_id: creators_id: creators_id: mirco.tribastone@imtlucca.it creators_id: title: Workload Change Point Detection for Runtime Thermal Management of Embedded Systems ispublished: pub subjects: QA76 divisions: CSA full_text_status: none keywords: Embedded systems, Hardware, Thermal management, Temperature dependence, Central Processing Unit, Voltage control, Thermal stresses note: SCOPUS ID: 2-s2.0-84979539453; WOS Accession Number: WOS:000380061700011 abstract: Applications executed on multicore embedded systems interact with system software [such as the operating system (OS)] and hardware, leading to widely varying thermal profiles which accelerate some aging mechanisms, reducing the lifetime reliability. Effectively managing the temperature therefore requires: 1) autonomous detection of changes in application workload and 2) appropriate selection of control levers to manage thermal profiles of these workloads. In this paper, we propose a technique for workload change detection using density ratio-based statistical divergence between overlapping sliding windows of CPU performance statistics. This is integrated in a runtime approach for thermal management, which uses reinforcement learning to select workload-specific thermal control levers by sampling on-board thermal sensors. Identified control levers override the OSs native thread allocation decision and scale hardware voltage-frequency to improve average temperature, peak temperature, and thermal cycling. The proposed approach is validated through its implementation as a hierarchical runtime manager for Linux, with heuristic-based thread affinity selected from the upper hierarchy to reduce thermal cycling and learningbased voltage-frequency selected from the lower hierarchy to reduce average and peak temperatures. Experiments conducted with mobile, embedded, and high performance applications on ARM-based embedded systems demonstrate that the proposed approach increases workload change detection accuracy by an average 3.4×, reducing the average temperature by 4 °C-25 °C, peak temperature by 6 °C-24 °C, and thermal cycling by 7%-35% over state-of-the-art approaches. date: 2016 date_type: published publication: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems volume: 35 number: 8 publisher: IEEE pagerange: 1358-1371 id_number: doi:10.1109/TCAD.2015.2504875 refereed: TRUE issn: 0278-0070 official_url: http://doi.org/10.1109/TCAD.2015.2504875 referencetext: J. Srinivasan, S. V. Adve, P. Bose and J. A. Rivers, "The case for lifetime reliability-aware microprocessors", Proc. Int. Symp. Comput. Archit., pp. 276. S. Sharifi, D. Krishnaswamy and T. S. Rosing, "PROMETHEUS: A proactive method for thermal management of heterogeneous MPSoCs", IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 32, no. 7, pp. 1110-1123, Jul. 2013. A. Das, "Reinforcement learning-based inter- and intra-application thermal optimization for lifetime improvement of multicore systems", Proc. Design Autom. Conf., pp. 1-6. "Failure Mechanisms and Models for Semiconductor Devices, JEDEC Standard JEP122G", 2011. H. Amrouch, V. M. van Santen, T. Ebi, V. Wenzel and J. Henkel, "Towards interdependencies of aging mechanisms", Proc. IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD), pp. 478-485. A. Das, A. Kumar and B. Veeravalli, "Reliability and energy-aware mapping and scheduling of multimedia applications on multiprocessor systems", IEEE Trans. Parallel Distrib. Syst., Mar. 2015. R. Cochran and S. Reda, "Consistent runtime thermal prediction and control through workload phase detection", Proc. Design Autom. Conf., pp. 62-67. M. Ghorbani, Y. Wang, Y. Xue, M. Pedram and P. Bogdan, "Prediction and control of bursty cloud workloads: A fractal framework", Proc. Int. Conf. Hardw. Softw. Codesign Syst. Synth.. V. N. Vapnik, Statistical Learning Theory, vol. 2, 1998, Wiley. X. Nguyen, M. J. Wainwright and M. I. Jordan, "Estimating divergence functionals and the likelihood ratio by convex risk minimization", IEEE Trans. Inf. Theory, vol. 56, no. 11, pp. 5847-5861, Nov. 2010. S. Liu, M. Yamada, N. Collier and M. Sugiyama, "Change-point detection in time-series data by relative density-ratio estimation", Neural Netw., vol. 43, pp. 72-83, Jul. 2013. M. J. Walker, A. K. Das, G. V. Merrett and B. M. Hashimi, "Run-time power estimation for mobile and embedded asymmetric multi-core CPUs", Proc. HiPEAC Workshop Energy Efficien. Heterogenous Comput.. . S. Yang, S. Khursheed, B. M. Al-Hashimi, D. Flynn and G. V. Merrett, "Improved state integrity of flip-flops for voltage scaled retention under PVT variation", IEEE Trans. Circuits Syst. I Reg. Papers, vol. 60, no. 11, pp. 2953-2961, Nov. 2013. C. Bienia, S. Kumar and K. Li, "PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on chip-multiprocessors", Proc. IEEE Symp. Workload Characterizat., pp. 47-56. M. R. Guthaus, "MiBench: A free commercially representative embedded benchmark suite", Proc. IEEE Workshop Workload Characterizat., pp. 3-14. V. Pallipadi and A. Starikovskiy, "The ondemand governor", Proc. Linux Symp., vol. 2, pp. 215-230. Y. Ge and Q. Qiu, "Dynamic thermal management for multimedia applications using machine learning", Proc. Design Autom. Conf., pp. 95-100. R. A. Shafik, "Learning transfer-based adaptive energy minimization in embedded systems", IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., Oct. 2015. I. Ukhov, P. Eles and Z. Peng, "Probabilistic analysis of power and temperature under process variation for electronic system design", IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 33, no. 6, pp. 931-944, Jun. 2014. B. H. Meyer, A. S. Hartman and D. E. Thomas, "Cost-effective lifetime and yield optimization for NoC-based MPSoCs", ACM Trans. Design Autom. Electron. Syst., vol. 19, no. 2, 2014. B. Shi, Y. Zhang and A. Srivastava, "Dynamic thermal management under soft thermal constraints", IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 21, no. 11, pp. 2045-2054, Nov. 2013. M. A. A. Faruque, J. Jahn, T. Ebi and J. Henkel, "Runtime thermal management using software agents for multi- and many-core architectures", IEEE Design Test Comput., vol. 27, no. 6, pp. 58-68, Nov./Dec. 2010. A. K. Coskun, T. S. Rosing and K. C. Gross, "Temperature management in multiprocessor SOCs using online learning", Proc. Design Autom. Conf., pp. 890-893. P. Mercati, A. Bartolini, F. Paterna, T. S. Rosing and L. Benini, "A linux-governor based dynamic reliability manager for android mobile devices", Proc. Conf. Design Autom. Test Europe, pp. 1-4. F. Sironi, "ThermOS: System support for dynamic thermal management of chip multi-processors", Proc. Int. Conf. Parallel Archit. Compilation Tech., pp. 41-50. S. Pagani, "TSP: Thermal safe power: Efficient power budgeting for many-core systems in dark silicon", Proc. Int. Hardw. Softw. Codesign Syst. Synth.. H. Khdr, T. Ebi, M. Shafique, H. Amrouch and J. H. Karlsruhe, "mDTM: Multi-objective dynamic thermal management for on-chip systems", Proc. Conf. Design Autom. Test Europe, pp. 1-6. J. Cui and D. L. Maskell, "A fast high-level event-driven thermal estimator for dynamic thermal aware scheduling", IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 31, no. 6, pp. 904-917, Jun. 2012. G. Dhiman and T. S. Rosing, "System-level power management using online learning", IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 28, no. 5, pp. 676-689, May 2009. H. Javaid, M. Shafique, J. Henkel and S. Parameswaran, "Energy-efficient adaptive pipelined MPSoCs for multimedia applications", IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 33, no. 5, pp. 663-676, May 2014. R. Ye and Q. Xu, "Learning-based power management for multicore processors via idle period manipulation", IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 33, no. 7, pp. 1043-1055, Jul. 2014. U. A. Khan and B. Rinner, "Online learning of timeout policies for dynamic power management", ACM Trans. Embedded Comput. Syst., vol. 13, no. 4, 2014. citation: Das, Anup and Merrett, Geoff V. and Tribastone, Mirco and Al-Hashimi, Bashir M. Workload Change Point Detection for Runtime Thermal Management of Embedded Systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 35 (8). pp. 1358-1371. ISSN 0278-0070 (2016)