Exascale: Deep Learning For Climate Analytics
In this paper, with the use of the variants of Tiramisu and DeepLabv3+ neural networks pixel-level masks of extreme weather patterns are extracted. The paper on the Piz Daint and Summit systems describes improvements to the software frameworks, input pipeline, and the network training algorithms that are necessary to efficiently scale deep learning.
The Tiramisu network scales to 5300 P100 GPUs with a sustained throughput of 21.0 PF/s and parallel efficiency of 79.0%.
DeepLabv3+ scales up to 27360 V100 GPUs with a sustained throughput of 325.8 PF/s and a parallel efficiency of 90.7% in single precision.
By taking advantage of the FP16 Tensor Cores, a half-precision version of the DeepLabv3+ network achieves a peak and sustained throughput of 1.13 EF/s and 999.0 PF/s respectively.
A Deeper Insight:
Climate change poses a major challenge to humanity today. Several nations are considering adaptation and mitigation strategies pertaining to global, mean quantities.
And both state as well as local governments are interested in the question: How extreme weather events will take a toll on their local communities?
In order to address these important questions, climate scientists routinely under a range of different climate change scenarios configure and run high-fidelity simulations. Each simulation produces 10s of TBs of high-fidelity output which requires automated analysis. Therefore, climate data analysts for prescribing extreme weather patterns have relied entirely upon multi-variate threshold conditions.
What Does This Work Aim At?
Recent efforts have shown that deep learning for detection, classification and localization of extreme weather patterns can be successfully applied. In this paper, the aim is to push the frontier of deep learning methods in order to extract high-quality, pixel-level segmentation masks of weather patterns.
In this work, the use of TensorFlow is made that provides portability with its capability to map a graph onto multi and many-core CPUs as well as GPUs. Due to the heavy use of linear algebra-based primitives most networks perform very well on GPUs.
The graph also captures the parallelism available in the computation, and TensorFlow uses a dynamic scheduler to select which operation to compute based on the availability of inputs.
Contributions Made in the Paper:
Motivated by the problem of finding extreme weather patterns in climate data, this paper makes the following contributions:
-
The adaptation of state-of-the art Tiramisu and DeepLabv3+ architectures to solve segmentation problems on high resolution, multi-variate scientific datasets.
-
A number of system-level innovations in data staging, efficient parallel I/O, and optimized networking collectives are made to enable their DL applications to scale to the largest GPU-based HPC systems in the world.
-
A number of algorithmic innovations are made to enable DL networks to converge at scale.
-
The demonstration of a good scaling on up to 27360 GPUs, obtaining 999.0 PF/s sustained performance and a parallel efficiency of 90.7% for half precision. The peak performance we obtained at that concurrency and precision was 1.13 EF/s.
-
The code is implemented in TensorFlow and Horovod; Their performance optimization's are broadly applicable to the general deep learning + HPC community.
-
While this work is conducted in the context of a specific science driver, most of the proposed innovations are applicable to generic deep learning workloads at scale.
Conclusion:
The paper presents to you the first Exascale-class Deep Learning Application. Motivated by the important problem of segmenting extreme weather patterns, they applied the Tiramisu and DeepLabv3+ architectures to high resolution, multi-variate climate datasets successfully.
The also developed a number of enhancements to the deep learning algorithm to obtain excellent qualitative and quantitative results. Adding on, they built upon a number of system-level optimization's to scale the scientific application to an unprecedented level of concurrency (4560 Summit nodes, 27360 Volta GPUs), scaling efficiency (90.7%) and performance (1.13 EF/s peak, 999.0 PF/s sustained).
The presented work extends open-source TensorFlow and Horovod tools, thereby benefiting the broader scientific and commercial deep learning communities. The environment that has been developed is already in use by other teams on Summit and the methodologies will extend to current and future HPC platforms.
For Innovations, implications, results and performance, one can go through the PDF whose link has been mentioned below:
Link To The PDF: Click Here