Concepts

Aircraft design is a very competitive and demanding field. Achieving a highly optimised design, which permits the manufacturing of lighter, quieter, safer, and better performing aircraft with lower fuel consumption, is a very complex task that requires taking into account a large number of different disciplines (aerodynamics, structure, system, vibration, acoustic, etc.). An advanced, highly efficient and accurate design/optimisation toolset, acting as a virtual facility, which provides full information about the design status, and suitable to be applied in virtual certification, is the target of the European aircraft industry. What is needed is to be able to automatically predict flow physics, aircraft forces, radiated acoustics, stresses, evolution of the design status, and the optimal shape for any specified constraints. Moreover, such tools need to be extremely accurate, and it should be fast enough to run in realistic engineering design timescales.

These goals translate into an increasing demand for accuracy in simulation capabilities, thus producing an exponential growth of the required computational resources. In particular, the high complexity of some of these processes frequently implies very long computation times. For example, the analysis of a complete aircraft configuration using a Reynolds-Averaged Navier-Stokes (RANS) modelling can require more than a day, even using the most modern highly parallel computational platforms. Furthermore, using CFD within a design optimization process, or increasing the target precision through the use of Large Eddy Simulation (LES) models, usually increases the computational requirements to unaffordable levels. This situation calls for an efficient implementation of new CFD codes and a proper handling of new parallel platforms.

In spite of these challenges, numerical simulation is regularly used in the design process, but contrary to intuition, as the number of simulations increases, its cost also increases beyond that of .the wind tunnel costs With the availability of a physical model and a tunnel time slot, wind tunnel tests are currently much cheaper than CFD simulations (see Fig. 1.1).

Figure 1.1 Wind Tunnel vs CFD costs

Taking into account that a modern design requires approximately 10^{5} simulations, current state-of-the-art CFD methods are still not competitive for the entire design process. **If non-steady, off-design conditions, un-conventional configurations or optimization fine-accurate loops** are considered, the situation is even worse. The corresponding excessively long execution times are one of the main bottlenecks to the massive application of CFD simulations within the aircraft design cycle.

Therefore, **it is increasingly urgent to develop new multidisciplinary CFD simulation techniques that are faster, more power-efficient, more reliable, more robust and specifically adapted to new computing platforms**. The appeal of a fully simulation-based design process can be seen in Figure 1.2, where it is shown that simulations allow for a dramatic reduction in the design time.

Figure 1.2: Design lead time by

efficient simulation

Thus, the objective of “more simulation, less testing” is becoming more and more strategic to the aeronautical industry. Adequate progress in these disciplines will enable a **reduction of more than a 50% in the development costs**, allowing designers to complete the conceptual design phase without having to resort to more expensive methods. **Simulation is therefore a key tool for streamlining decision-making in aeronautical design processes**.

Although the performance of numerical simulations is still far from what we would like to achieve (i.e. complex full aircraft simulation in only a few hours), it has dramatically increased in the recent years. Not only the computational capabilities of microprocessors have improved, but also the introduction of highly parallelised systems has brought massive increases in the number of processing units, where it is now common to have many tens of thousands of cores available to the user. This development raises a number of significant challenges for the parallel performance of CFD applications. Recently, new parallelization and optimization techniques have been introduced to address these challenges at several different stages of the calculation.

Parallelism is not only increasing at system level, but also at chip level. Actually, the current trend in consumer microprocessors is to double the core count every 18 months, so in 5 to 10 year time devices with 100+ cores will be widely available. Such devices are no longer named multi-core processors: A new term, many-core processor, has been coined to denote devices holding at least tens of computing cores.

Many-core processors are not the only alternative for chip-scale parallel computing: Stream processors provide even higher degrees of parallelism. These are devices containing hundreds of lightweight cores. Although the operations that a single core can perform are quite limited (for example, it cannot run an operating system), the computational power that is achieved when all cores collaborate with each other is huge, reaching teraflops. The prime example of such devices are GPUs (Graphic Processing Units), as video coding and graphics rendering are especially suited for parallel execution. However, in the recent years it has been shown that GPUs can also be used to accelerate other parallel problems not related to video or graphics generation, achieving great success in the acceleration of scientific applications including CFD solvers. Actually, GPUs can be even integrated in the same chip as a conventional processor, as it happens in the Berlin APU from AMD, thus offering more opportunities for application acceleration. The successor of TITAN at Oak Ridge National Lab (ORNL), called SUMMIT, will have nodes formed by 2 Power9 CPUs and 6 GPUs, each Power9CPU having 24 cores. This introduces additional architectural levels of complexity, as parallelism has to be implemented across several levels and granularities already at the node level.

Many-core processors and GPUs take advantage of core-level parallelism, but there is a finer grain of parallelism, gate-level parallelism, which relies on hybrid multicore/multi-GPU node (Figure 1.3), increasingly used nowadays as a building block of HPC systems. To sum up, the fact is that the computing industry is betting on parallel heterogeneous architectures, merging conventional processors with specialized accelerators, to provide the computing power required by new applications at a reasonable power budget.

Figure 1.3: Backbone of new strategies for breakthrough simulation speed-ups

**Fine-grained activities**

Task-based load balancing

System recovery from minor h/w faults

Energy aware hardware/algorithms

Use of accelerator directives/compilers

Trade-off speed/accuracy/reproducibility

**Coarse-grained activities**

Advanced solvers, grid/mesh partitioning

Synchronous to asynchronous communication

Quantifying uncertainty, simulation “error bars”

Algorithm recovery from core/thread failures

Hybrid MPI/Open MP/PGAS programming

Nowadays, chip-level parallelism alone leads to one teraflops performance. If we consider the advances in system-level parallelism, the result is that running CFD simulations on a petaflops system is no longer science fiction. If this trend continues, we will see exaflops performance in 5 to 10 years. **The question is, are the current CFD solvers mature enough to take advantage of these technologies?** The answer is only at a modest degree. The update of existing industrial solvers is a slow process where confidence is a key value when talking about aeronautical design, and gaining a level of confidence has been historically difficult, but **it is necessary to allocate the resources and efforts to update the industrial solvers so they can progress at the pace of the HPC technology offers.**

## What is the current status of HPC in the aeronautical industry?

**Both many-core and GP-GPU technologies have been evaluated for aerodynamic simulations, with interesting results for the case of** GP-GPUs, featuring promising speedups from 13 to 46x, although the bigger figures correspond to relatively small grids. However, it seems that there is a ceiling in the speedup that can be obtained with GP-GPUs due to the fixed architecture of the device and the bottleneck in accessing the board memory. Apart from the speedup ceilings already observed, another potential problem of GP-GPUs and many-core accelerators is its power consumption, reaching 250 Watts for some models. Although energy efficiency is good due to the acceleration obtained, power density might become a problem for nodes containing several acceleration boards.

From the point of view of algorithms, CFD solvers used by the aeronautical industry in Europe are mainly based on finite volume or finite element methods. Aerodynamic computations employ various implementations of turbulence models with one or two equations for turbulent quantities or even full Reynolds stress models. The resulting nonlinear discretized equations, which model our problem, are strongly stiff and count millions of unknowns. These classic approaches are slowly being updated to high order discontinuous Galerkin discretisations and advanced turbulence models (e.g. Large Eddy Simulations – LES), as considered in NextSim.

In general, solution algorithms for the Navier-Stokes equations are based on iterative algorithms with explicit (Runge-Kutta) or implicit (pseudo-Newton, LU-SGS) formulas to solve for the residual. **Multigrid schemes **have emerged as one or the most powerful tools to accelerate the convergence. Textbook multigrid convergence rates (less than 0.1 or only 10 iterations to achieve the desired solution) have been demonstrated for elliptic problems. Although promising, convergence rates in the range of 0.9-0.96 for RANS simple two-dimensional problems have been reported. Multigrid techniques for solving large-scale, high-Reynolds-number viscous flows are well established and frequently used. Progress in this field, although sometimes substantial, has been sporadic; the asymptotic convergence rate for multigrid RANS was 0.967 in 1995 and reached 0.965 ten years later. For even more complex problems, where **strong non-linear effects, shocks, unsteadiness and complex turbulence models** which typically appear in aeronautical models, it is common to see asymptotic convergence rates in the order of 0.98–0.99. This fact is discouraging when compared with the theoretical textbook multigrid (TME) convergence rates. It has been shown that the highly nonlinear nature of turbulence source terms leads to their inaccurate representation on coarse grid levels, which eventually may result in stall or divergence due to the appearance of non-physical values of turbulence quantities, leading to inefficient coarse grid correction and to the loss of multigrid robustness. The combination of several approaches in the treatment of turbulence equations: source-term freezing, turbulence viscosity freezing, residual damping or Galerkin projection of fine grid, in combination with efficient and robust smoothers: **Iterative methods – like CG, GMRES – or the traditional smoothers –RK, ADI, Jacobi, GS, SSOR**– are the key to obtain efficient performance per iteration.

Moreover, most of the research has been focused on the convergence properties of multigrid/iterative algorithms, paying **little attention to the parallelization-related aspects and their implementation in extreme scale hardware architectures**, which can have a strong effect on the global speed-up of the solver.

Finally, an additional difficulty arises when dealing with **hybrid meshes**, needed to solve **complex geometries** i.e. an integrated combination of **structured, unstructured and chimera mesh discretization** is now the future of the geometrical discretization. Little experience is available about the **integration of this new data in complex multigrid algorithms and their implementation in heterogeneous HPC platforms.**

### RANS in 1995

### RANS in 2005

Further steps to come will have to deal with higher order and fully mesh/order adaptive methods. **High order methods (HOM) and Discontinuous Galerkin (DG) methods** have started to be considered for industrial applications due to its superior performance – compared to its equivalent 2^{nd} order methods – in a certain class of aeronautical problems: e.g. aero-acoustic or high velocity detached flows. Some efforts are currently taking place in the evaluations and implementation of these methods, but additional research must be done in the application of multigrid/iterative methods and to its efficient parallelization, or low memory efficient time integrators for RANS and hybrid RANS/LES approaches for high-order methods.

As a final remark, the evaluation of unsteady solutions poses additional computational difficulties. Although more than 90% of simulations in the design process are performed in steady configurations, **there is an increasing need to solve unsteady problems**. The need to improve current aircraft design and address security/certification constraints requires investigating the limit of the flight envelope: buffet, flutter, aeroelasticity, dynamic loads or ice conditions. These configurations are unsteady in nature, and most of them are solved using empirical rules and wind tunnel experiments. Moreover, the necessity to obtain accurate information of these problems is becoming crucial and has raised concerns about how to extract information from the huge amount of **data obtained of those huge simulations**. To do that, **several feature detection techniques have emerged, such as (Spectral) Proper Orthogonal (POD) or Dynamic mode decomposition (DMD), and they have become regular tools in the ****analysis of fluid flows**. However, as the size of the data base increases, becomes noisy or irregular, or when the number of scales involved is very large, such as turbulent flows, obtaining valuable information is a challenge. Some algorithms have arisen proposing parallelization or application of statistical pre-processing to make the data manageable, showing good results at moderate data sizes. Besides these physic based feature extraction techniques, **machine learning algorithms** agglomerate a wider range of techniques (and also non-physic based) and are recently becoming very popular in discovering important information in large amounts of data (sometimes known as **bigdata**) related to fluid simulations.

The sampling information provided by these, physic and non-physic based, techniques can also be used to obtain **reduced order models (ROM)**, as the combination of, for example, the most energetic modes obtained from a proper orthogonal decomposition with Galerkin projection, or more specific metamodels (genetic algorithms, kriging, etc.) combined with evolutionary algorithms. These methods show a great potential to reduce the computational cost of full Navier-Stokes simulations. Most of the work has been dedicated to understand this methodology and to obtain the requirements **necessary to generate an accurate database to create the** **ROM bases**. This step is still expensive and its integration in the works proposed here can provide further speed-up in the numerical simulations. All these techniques will be explored in NextSim to post-process the data generated by large simulations. Finally, ROM models can be used to construct surrogate models that can to help develop **flow control strategies and guide optimisation routines** to find optimal aircraft configurations.

**This project has received funding from the European High-Performance Computing Joint Undertaking Joint Undertaking (JU) under grant agreement No 956104. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Spain, France, Germany.**