HPE and NREL are using over five years of data, totaling over 16 terabytes of data, collected from sensors in NREL's supercomputers, Peregrine and Eagle, and its facility, to train models for anomaly detection to prevent issues before they occur
The AI Ops project sprung from HPE's R&D efforts involved with PathForward, a program backed by the US DOE to accelerate the nation's technology roadmap for exascale computing, which represents the next major leap in supercomputing
Multi-year collaboration with
The project is part of a three year collaboration that introduces monitoring and predictive analytics to power and cooling systems in
HPE and NREL are using more than five years' worth of historical data, which total more than 16 terabytes of data1, collected from sensors in NREL's supercomputers, Peregrine and Eagle, and its facility, to train models for anomaly detection to predict and prevent issues before they occur.
The collaboration will also address future water and energy consumption in data centers, that in the
Early results based on models trained with historical data have successfully predicted or identified events that previously occurred in NREL's data center, demonstrating the promise of using predictive analytics in future data centers.
The AI Ops project sprung from HPE's R&D efforts involved with PathForward, a program backed by the
'We are passionate about architecting new technologies that are impactful to powering the next era of innovation with exascale computing and its extent of operational needs,' said
HPE and NREL collaborate to improve operational efficiency and resiliency in data centers for the exascale era
'Our research collaboration will span the areas of data management, data analytics, and AI/ML optimization for both manual and autonomous intervention in data center operations,' said
The project will use open source software and libraries such as TensorFlow, NumPy and Sci-kit to develop machine learning algorithms. The project will focus on the following key areas:
Monitoring: Collect, process and analyze vast volumes of IT and facility telemetry from disparate sources before applying algorithms to data in real-time
Analytics: Big data analytics and machine learning will be used to analyze data from various tools and devices spanning the data center facility
Control: Algorithms will be applied to enable machines to solve issues autonomously as well as intelligently automate repetitive tasks and perform predictive maintenance on both the IT and the datacenter facility
Datacenter operations: AI Ops will evolve to become a validation tool for continuous integration (CI) and continuous deployment (CD) for core IT functions that span the modern datacenter facility
HPE plans to demonstrate additional capabilities in the future with the enhancement of the HPE High Performance Cluster Management (HPCM) system to provide complete provisioning, management, and monitoring for clusters scaling to 100,000 nodes at a faster rate. Other testing plans include exploring integration of HPE InfoSight, a cloud-based AI-driven management tool that monitors, collects and analyzes data on IT infrastructure. HPE InfoSight is used to predict and prevent probable events to maintain the overall health of server performance.
The solution will be showcased at HPE booth 1325 at Supercomputing 2019 (SC 19) in
About
1 NREL supercomputers combined sensor data comprised of 1.6 billion datapoints
2
Editorial contact
Nahren Khizeran, HPE
Nahren.khizeran@hpe.com
(C) 2019 Electronic News Publishing, source