Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Parallel Computational Fluid Dynamics '99: Towards Teraflops, Optimization and Novel Formulations
Parallel Computational Fluid Dynamics '99: Towards Teraflops, Optimization and Novel Formulations
Parallel Computational Fluid Dynamics '99: Towards Teraflops, Optimization and Novel Formulations
Ebook980 pages8 hours

Parallel Computational Fluid Dynamics '99: Towards Teraflops, Optimization and Novel Formulations

By D. Keyes, A. Ecer, N. Satofuka and

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Contributed presentations were given by over 50 researchers representing the state of parallel CFD art and architecture from Asia, Europe, and North America. Major developments at the 1999 meeting were: (1) the effective use of as many as 2048 processors in implicit computations in CFD, (2) the acceptance that parallelism is now the 'easy part' of large-scale CFD compared to the difficulty of getting good per-node performance on the latest fast-clocked commodity processors with cache-based memory systems, (3) favorable prospects for Lattice-Boltzmann computations in CFD (especially for problems that Eulerian and even Lagrangian techniques do not handle well, such as two-phase flows and flows with exceedingly multiple-connected demains with a lot of holes in them, but even for conventional flows already handled well with the continuum-based approaches of PDEs), and (4) the nascent integration of optimization and very large-scale CFD. Further details of Parallel CFD'99, as well as other conferences in this series, are available at http://www.parcfd.org
LanguageEnglish
Release dateOct 18, 2000
ISBN9780080538389
Parallel Computational Fluid Dynamics '99: Towards Teraflops, Optimization and Novel Formulations

Related to Parallel Computational Fluid Dynamics '99

Related ebooks

Mechanical Engineering For You

View More

Related articles

Reviews for Parallel Computational Fluid Dynamics '99

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Parallel Computational Fluid Dynamics '99 - D. Keyes

    USA

    LIST OF PARTICIPANTS in Parallel CFD’99

    Ramesh Agarwal, Wichita State University

    Rakhim Aitbayev, University of Colorado, Boulder

    Hasan Akay, IUPUI

    Giorgio Amati, CASPUR

    W. Kyle Anderson, NASA Langley Rsearch Center

    Rustem Aslan, Istanbul Technical University

    Mike Ashworth, CLRC

    Abdelkader Baggag, ICASE

    Satish Balay, Argonne National Laboratory

    Oktay Baysal, Old Dominion University

    Robert Biedron, NASA Langley Research Center

    George Biros, Carnegie Mellon University

    Daryl Bonhaus, NASA Langley Research Center

    Gunther Brenner, University of Erlangen

    Edoardo Bucchignani, CIRA SCPA

    Xing Cai, University of Oslo

    Doru Caraeni, Lund Institute of Technology

    Mark Carpenter, NASA Langley Research Center

    Po-Shu Chen, ICASE

    Jiadong Chen, IUPUI

    Guilhem Chevalier, CERFACS

    Stanley Chien, IUPUI

    Peter Chow, Fujitsu

    Mark Cross, University of Greenwich

    Wenlong Dai, University of Minnesota

    Eduardo D’Azevedo, Oak Ridge National Laboratory

    Anil Deane, University of Maryland

    Ayodeji Demuren, Old Dominion University

    Jean-Antoine Désideri, INRIA Boris Diskin, ICASE

    Florin Dobrian, Old Dominion University

    Akin Ecer, IUPUI

    David Emerson, CLRC

    Karl Engel, Daimler Chrysler

    Huiyu Feng, George Washington University

    Paul Fischer, Argonne National Laboratory

    Randy Franklin, North Carolina State University

    Martin Galle, DLR

    Marc Garbey, University of Lyon

    Alfred Geiger, HLRS

    Aytekin Gel, University of West Virginia

    Omar Ghattas, Carnegie Mellon University

    William Gropp, Argonne National Laboratory

    X. J. Gu, CLRC

    Harri Hakula, University of Chicago

    Xin He, Old Dominion University

    Paul Hovland, Argonne National Laboratory

    Weicheng Huang, UIUC

    Xiangyu Huang, College of William & Mary

    David Hysom, Old Dominion University

    Cos Ierotheou, University of Greenwich

    Eleanor Jenkins, North Carolina State University

    Claus Jenssen, SINTEF

    Andreas Kahari, Uppsala University

    Boris Kaludercic, Computational Dynamics Limited

    Dinesh Kaushik, Argonne National Laboratory

    Shinichi Kawai, NSDA

    David Keyes, Old Dominion University

    Matthew Knepley, Purdue University

    Suleyman Kocak, IUPUI

    John Kroll, Old Dominion University

    Stefan Kunze, University of Tubingen

    Chia-Chen Kuo, NCHPC

    Trond Kvamsdal, SINTEF

    Lawrence Leemis, College of William & Mary

    Wu Li, Old Dominion University

    David Lockhard, NASA Langley Research Center

    Josip Loncaric, ICASE

    Lian Peet Loo, IUPUI

    Isaac Lopez, NASA Glenn Research Center

    Li Shi Luo, ICASE

    James Martin, ICASE

    Dimitri Mavriplis, ICASE

    Peter McCorquodale, Lawrence Berkeley National Laboratory

    James McDonough, University of Kentucky

    Lois McInnes, Argonne National Laboratory

    Piyush Mehrotra, ICASE

    N. Duane Melson, NASA Langley Research Center

    Razi Nalim, IUPUI

    Eric Nielsen, NASA Langley Research Center

    Stefan Nilsson, Chalmers University of Technology

    Jacques Périaux, INRIA

    Alex Pothen, Old Dominion University

    Alex Povitsky, ICASE

    Luie Rey, IBM Austin

    Jacques Richard, Illinois State University, Chicago

    Jacqueline Rodrigues, University of Greenwich

    Kevin Roe, ICASE

    Cord Rossow, DLR

    David Rudy, NASA Langley Research Center

    Jubaraj Sahu, Army Research Laboratory

    John Salmon, California Institute of Technology

    Widodo Samyono, Old Dominion University

    Nobuyuki Satofuka, Kyoto Institute of Technology

    Punyam Satya-Narayana, Raytheon

    Erik Schnetter, University of Tübingen

    Kara Schumacher Olson, Old Dominion University

    John Shadid, Sandia National Laboratory

    Kenjiro Shimano, Musashi Institute of Technology

    Manuel Soria, University Politecnico Catalunya

    Linda Stals, ICASE

    Andreas Stathopoulous, College of William & Mary

    Azzeddine Soulaimani, Ecole Supérieure

    A. (Suga) Sugavanam, IBM Dallas

    Samuel Sundberg, Uppsala University

    Madhava Syamlal, Fluent Incorporated

    James Taft, NASA Ames Research Center

    Danesh Tafti, UIUC

    Aoyama Takashi, National Aerospace Laboratory

    Ilker Tarkan, IUPUI

    Virginia Torczon, College of William & Mary

    Damien Tromeur-Dervout, University of Lyon

    Aris Twerda, Delft University of Technology

    Ali Uzun, IUPUI

    George Vahala, College of William & Mary

    Robert Voigt, College of William & Mary

    Johnson Wang, Aerospace Corporation

    Tadashi Watanabe, JAERI

    Chip Watson, Jefferson Laboratory

    Peter Wilders, Technological University of Delft

    Kwai Wong, University of Tennessee

    Mark Woodgate, University of Glasgow

    Paul Woodward, University of Minnesota

    Yunhai Wu, Old Dominion University

    Shishen Xie, University of Houston

    Jie Zhang, Old Dominion University

    Conference Photograph

    PLENARY PAPERS

    Parallel multigrid solution and optimization in compressible flow simulation and design

    J.-A. Désidéri, L. Fournier, S. Lanteri and N. Marco,     INRIA Projet Sinus, 2004 Route des Lucioles, 06902 Sophia-Antipolis Cedex

    B. Mantel and J. Périaux,     Dassault Aviation, France

    J.F. Wang,     Nanjing Institute of Aeronautics and Astronautics, China

    This paper describes recent achievements regarding the development of parallel multi-grid (PMG) methods and parallel genetic algorithms (PGAs) in the framework of compressible flow simulation and optimization. More precisely, this work is part of a broad research objective aimed at building efficient and robust optimization strategies for complex multi-disciplinary shape-design problems in aerodynamics. Ultimately, such a parallel optimization technique should combine the following ingredients:

    I1. parallel multigrid methods for the acceleration of the underlying flow calculations;

    I2. domain decomposition algorithms for an efficient and mathematically well posed distribution of the global optimization process on a network of processors;

    I3. robust parallel optimization techniques combining non-deterministic algorithms (genetic algorithms in the present context) and efficient local optimization algorithms to accelerate the convergence to the optimal solution;

    I4. distributed shape parametrization techniques.

    In this contribution, we address mainly topics I1 and I3 in the context of optimal airfoil design.

    1 PARALLEL STRATEGIES IN GENETIC ALGORITHMS

    1.1 Introduction

    Genetic algorithms (GAs) are search algorithms based on mechanisms simulating natural selection. They rely on the analogy with Darwin’s principle of survival of the fittest. John Holland, in the 1970’s, introduced the idea according to which difficult optimization problems could be solved by such an evolutionary approach. The technique operates on a population of potential solutions represented by strings of binary digits (called chromosomes or individuals) which are submitted to several semi-stochastic operators (selection, crossover and mutation). The population evolves during the generations according to the fitness value of the individuals; then, when a stationary state is reached, the population has converged to an/the optimized solution (see [3] for an introduction to the subject). GAs differ from classical optimization procedures, such as the steepest descent or conjugate gradient method, in many ways:

    - the entire parameter set is coded;

    - the iteration applies to an entire population of potential solutions, in contrast to classical algorithms, in which a single candidate solution is driven to optimality by successive steps;

    - the iteration is an evolution step, or new generation, conducted by semi-stochastic operators;

    - the search space is investigated (more) globally, enhancing robustness;

    - two keywords are linked to GAs : exploration and exploitation. Exploration of the search space is important at the beginning of the GA process, while exploitation is desirable when the GA process is close to the global optimum.

    GAs have been introduced in aerodynamics shape design problems for about fifteen years (see Kuiper et al. [4], Périaux et al. in [9], Quagliarella in [10] and Obayashi in [8], who present 3D results for a transonic flow around a wing geometry). The main concern related to the use of GAs for aerodynamic design is the computational effort needed for the accurate evaluation of a design configuration that, in the case of a crude application of the technique, might lead to unacceptable computer time if compared with more classical algorithms. In addition, hard problems need larger populations and this translates directly into higher computational costs. It is a widely accepted position that GAs can be effectively parallelized and can in principle take full advantage of (massively) parallel computer architectures. This point of view is above all motivated by the fact that within a generation (iteration) of the algorithm, the fitness values associated with each individual of the population can be evaluated in parallel.

    In this study, we have developed a shape optimum design methodology that combines the following ingredients:

    - the underlying flow solver discretizes the Euler or full Navier-Stokes equations using a mixed finite element/finite volume formulation on triangular meshes. Time integration to steady state is achieved using a linearized Euler implicit scheme which results in the solution of a linear system for advancing the solution to the next time step;

    - a binary-coded genetic algorithm is used as the main optimization kernel. In our context, the population of individuals is represented by airfoil shapes. The shape parametrization strategy is based on Bézier curves.

    1.2 Parallelization strategy

    Several possible strategies can be considered for the parallelization of the GA-based shape design optimization described above:

    - a first strategy stems from the following remark: within a given generation of a GA, the evaluation of the fitness values associated with the population of individuals defines independent processes. This makes GAs particularly well suited for massively parallel systems; we also note that a parent/chuld approach is a standard candidate for the implementation of this first level of parallelism, especially when the size of the populations is greater than the available number of processors;

    - a second strategy consists of concentrating the parallellization efforts on the process underlying a fitness value evaluation, here the flow solver. This approach finds its main motivation in the fact that, when complex field analysers are used in conjunction with a GA, then the aggregate cost of fitness values evaluations can represent between 80 to 90% of the total optimization time. A SPMD paradigm is particularly well suited for the implementation of this strategy;

    - the third option combines the above two approaches and clearly yields a two-level parallelization strategy which has been considered here and which will be detailed in the sequel. Our choice has been motivated by the following remarks: (1) a parallel version of the two-dimensional flow solver was available and adapted to the present study; (2) we have targetted a distributed memory SPMD implementation and we did not want the resulting optimization tool to be limited by memory capacity constraints, especially since the present study will find its sequel in its adaptation to 3D shape optimization problems, based on more complex aerodynamical models (Navier-Stokes equations coupled with a turbulence model); and (3) we believe that the adopted parallelization strategy will define a good starting-point for the construction and evaluation of sub-populations based parallel genetic algorithms (PGAs).

    In our context, the parallelization strategy adopted for the flow solver combines domain partitioning techniques and a message-passing programming model [6]. The underlying mesh is assumed to be partitioned into several submeshes, each one defining a subdomain. Basically, the same base serial code is executed within every subdomain. Applying this parallelization strategy to the flow solver results in modifications occuring in the main time-stepping loop in order to take into account one or several assembly phases of the subdomain results. The coordination of subdomain calculations through information exchange at artificial boundaries is implemented using calls to functions of the MPI library.

    The paralellization described above aims at reducing the cost of the fitness function evaluation for a given individual. However, another level of parallelism can clearly be exploited here and is directly related to the binary tournament approach and the crossover operator. In practice, during each generation, individuals of the current population are treated pairwise; this applies to the selection, crossover, mutation and fitness function evaluation steps. Here, the main remark is that for this last step, the evaluation of the fitness functions associated with the two selected individuals, defines independent operations. We have chosen to exploit this fact using the notion of process groups which is one of the main features of the MPI environment. Two groups are defined, each of them containing the same number of processes; this number is given by the number of subdomains in the partitioned mesh. Now, each group is responsible for the evaluation of the fitness function for a given individual. We note in passing that such an approach based on process groups will also be interesting in the context of sub-populations based PGAs (see [1] for a review on the subject); this will be considered in a future work.

    1.3 An optimum shape design case

    The method has been applied to a direct optimization problem consisting in designing the shape of an airfoil, symbolically denoted by γ, to reduce the shock-induced drag, CD, while preserving the lift, CL, corresponding to the RAE2822 airfoil, immersed in an Eulerian flow at 2° of incidence and a freestream Mach number of 0.73. Thus, the cost functional was given the following form:

    (1)

    The non-linear convergence tolerance has been fixed to 10−6. The computational mesh consists of 14747 vertices (160 vertices on the airfoil) and 29054 triangles. Here, each chromosome represents a candidate airfoil defined by a Bézier spline whose support is made of 7+7 control points at prescribed abscissas for the upper and lower surfaces. A population of 30 individuals has been considered. After 50 generations, the shape has evolved and the shock has been notably reduced; the initial and final flows (iso-Mach values) are shown on Figure 1. Additionally, initial and final values of CD and CL are given in Table 1.

    Table 1

    Drag reduction: initial and optimized values of the CDand CLcoefficients

    Figure 1 Drag reduction: initial and optimized flows (steady iso-Mach lines)

    The calculations have been performed on the following systems: an SGI Origin 2000 (equipped with 64 MIPS R10000/195 Mhz processors) and an experimental Pentium Pro (P6/200 Mhz, running the LINUX system) cluster where the interconnection is realized through FastEthernet (100 Mbits/s) switches. The native MPI implementation has been used on the SGI Origin 2000 system while MPICH 1.1 has been used on the Pentium Pro cluster. Performance results are given for 64 bit arithmetic computations.

    We compare timing measurements for the overall optimization using one and two process groups. Timings are given for a fixed number of generations (generally 5 optimization iterations). In Tables 2 and 3 below, Ng and Np respectively denote the number of process groups and the total number of processes (Ng = 2 and Np = 4 means 2 processes for each of the two groups), CPU is the total CPU time, Flow is the accumulated flow solver time, and Elapsed is the total elapsed time (the distinction between the CPU and the elapsed times is particularly relevant for the Pentium Pro cluster). Finally, S(Np) is the parallel speed-up ratio Elapsed(Ng = 1, Np = 5)/Elapsed(Ng, Np), the case Ng = 1, Np = 5 serving as a reference. For the multiple processes cases, the given timing measures (CPU and Flow) always correspond to the maximum value over the per-process measures.

    Table 2

    Parallel perfomance results on the SGI Origin 2000

    Table 3

    Parallel perfomance results on the Pentium Pro cluster

    For both architectures, the optimal speed-up of 2 is close to being achieved. For the Pentium Pro cluster, the communication penalty is larger, and this favors the usage of fewer processors and more groups. For the SGI Origin 2000, the situation is different: communication only involves memory access (shared memory), and parallelization remains efficient as the number of processors increases; moreover, additional gains are achieved due to the larger impact of cache memory when subdomains are small.

    1.4 High-lift multi-element airfoil optimization by GAs and multi-agent strategies

    In this section, we report on numerical experiments conducted to optimize the configuration (shape and position) of a high-lift multi-element airfoil by both conventional GAs and more novel ones, based on multi-agent strategies better fit to parallel computations. These experiments have been also described in [7] in part.

    The increased performance requirements for high-lift systems as well as the availability of (GA-based) optimization methods tend to renew the emphasis on multi-element aerodynamics. High-lift systems, as depicted on Figure 2, consist of a leading-edge device (slat) whose effect is to delay stall angle, and a trailing-edge device (flap) to increase the lift while maintaining a high L/D ratio.

    Figure 2 Geometry of multi-element airfoil including slat, main body and flap (shape and position definition)

    The lift coefficient CL of such an airfoil is very sensitive to the flow features around each element and its relative position to the main body; in particular, the location of the separation point can change rapidly due to the wake/boundary-layer interaction. As a result, the functional is non-convex and presents several local optima, making use of a robust algorithm necessary to a successful optimization.

    Here, the 2D flow computation is conducted by the Dassault-Aviation code Damien which combines an inviscid flow calculation by a panel method with a wake/boundary-layer interaction evaluation. This code, which incorporates data concerning transition criteria, separated zones, and wake/boundary-layer interaction, has been thoroughly calibrated by assessments and global validation through comparisons with ONERA wind-tunnel measurements. As a result, viscous effects can be computed, and this provides at a very low cost a fairly accurate determination of the aerodynamics coefficients. As a counterpart of this simplified numerical model, the flow solver is non differentiable and can only be treated as a black box in the optimization process. Evolutionary algorithms are thus a natural choice to conduct the optimization in a situation of this type.

    Figure 3 relates to a first experiment in which only the 6 parameters determining the positions relative to the main body (deflection angle, overlap and gap) of the two high-lift devices (slat and flap) have been optimized by a conventional GA, similar to the one of the previous section. Provided reasonable ranges are given for each parameter, an optimum solution is successfully found by the GA, corresponding to an improved lift coefficient of 4.87.

    Figure 3 Initial (top) and final (bottom) configurations in position optimization problem

    The second experiment is the first step in the optimization of the entire configuration consisting of the shapes of the three elements and both positions of slat and flap. More precisely, only the inverse problem consisting in reconstructing the pressure field is considered presently. If γS, γB and γF denote the design variables associated with the slat, main body and flap respectively, and γ = (γS, γB, γF) represents a candidate configuration, one minimizes the following functional:

    (2)

    in which, for example:

    (3)

    is a positive integral extending over one element only (slat) but reflecting in the integrand the interaction of the whole set of design variables. Here, Pt is the given target pressure; similar definitions are made for JB(γ) and JF(γ).

    In order to reduce the number of design variables and enforce smoothness of the geometry, shapes are represented by Bézier curves.

    This inverse problem has been solved successfully by the GA first by a global algorithm in which the chromosomes contain the coded information associated with all design variables indiscriminately. The convergence history of this experiment is indicated on Figure 4 for the first 200 generations.

    Figure 4 Optimization of shape and position design variables by various strategies all involving the same number of cost functional evaluations

    An alternative to this approach is provided by optimization algorithms based on (pseudo) Nash strategies in which the design variables are a priori partitioned into appropriate subsets. The population of a given subset evolves independently at each new generation according to its own GA, with the remaining design variables being held fixed and equal to the best elements in their respective populations found at the previous generation. Evidently, in such an algorithm, the computational task of the different GAs, or players to use a term of game theory, can be performed in parallel [11].

    Many different groupings of the design variables can be considered, two of which are illustrated on Figure 4: a 3-player strategy (slat shape and position; body shape; flap shape and position) and a 5-player strategy (slat, body and flap shapes, slat and flap positions). The two algorithms achieve about the same global optimum, but the parameters of the GAs (population size) have been adjusted so that the endpoints of the two convergence paths correspond to the same number of functional evaluations as 100 generations of the global algorithm. Thus, this experiment simulates a comparison of algorithms at equal serial cost. This demonstrates the effectiveness of the multi-agent approach, which achieves the global optimum by computations that could evidently be performed in parallel.

    We terminate this section by two remarks concerning Nash strategies.

    First, in a preliminary experiment related to the slat/flap position inverse problem, a 2-player game in which the main body (γB) is fixed, an attempt was made to let the population of γS found at the previous generation,) and symmetrically for γF . Figure 5 indicates that in such a case, the algorithm fails to achieve the desired global optimum.

    Figure 5 Effect of the fitness definition on the convergence of slat/flap position parameters

    Second, observe that in the case of a general cost function, a Nash equilibrium in which a minimum is found with respect to each subgroup of variables, the other variables being held fixed, does not necessarily realize a global minimum. For example, in the trivial case of a function f(x, y) of two real variables, a standard situation in which the partial functions:

    (4)

    achieve local minima at x* and y* respectively, is realized typically when:

    (5)

    and this does not imply that the Hessian matrix be positive definite. However, in the case of an inverse problem in which each individual positive component of the cost function is driven to 0, the global optimum is indeed achieved.

    2 PARALLEL MULTIGRID ACCELERATION

    2.1 Introduction

    Clearly, reducing the time spent in flow calculations (to the minimum) is crucial to make GAs a viable alternative to other optimization techniques. One possible strategy to achieve this goal consists in using a multigrid method to accelerate the solution of the linear systems resulting from the linearized implicit time integration scheme. As a first step, we have developed parallel linear multigrid algorithms for the acceleration of compressible steady flow calculations, independently of the optimization framework. This is justified by the fact that the flow solver is mainly used as a black box by the GA. The starting point consists of an existing flow solver based on the averaged compressible Navier-Stokes equations, coupled with a k – turbulence model [2]. The spatial discretization combines finite element and finite volume concepts and is designed for unstructured triangular meshes.

    Steady state solutions of the resulting semi-discrete equations are obtained by using an Euler implicit time advancing strategy which has the following features: linearization (approximate linearization of the convective fluxes and exact differentiation of the viscous terms); preconditioning (the Jacobian matrix is based on a first-order Godunov scheme); and local time stepping and CFL law (a local time step is computed on each control volume). Each pseudo time step requires the solution of two sparse linear systems (respectively, for the mean flow variables and for the variables associated with the turbulence model).

    The multigrid strategy is adopted to gain efficiency in the solution of the two subsystems. In the present method, the coarse-grid approximation is based on the construction of macro-elements, more specifically macro control-volumes by volume-agglomeration. Starting from the finest mesh, a greedy coarsening algorithm is applied to generate automatically the coarse discretizations (see Lallemand et al. [5]).

    Parallelism is introduced in the overall flow solver by using a strategy that combines mesh partitioning techniques and a message passing programming model. The MPI environment is used for the implementation of the required communication steps. Both the discrete fluxes calculation and the linear systems solution are performed on a submesh basis; in particular, for the basic linear multigrid algorithm which is multiplicative (i.e. the different levels are treated in sequence with inter-dependencies between the partial results produced on the different levels), this can be viewed as an intra-level parallelization which concentrates on the smoothing steps performed on each member of the grid hierarchy. A necessary and important step in this adaptation was the construction of appropriate data structures for the distribution of coarse grid calculations. Here, this has been achieved by developing a parallel variant of the original greedy type coarsening algorithm, which now includes additional communication steps for a coherent construction of the communication data structures on the partitioned coarse

    Enjoying the preview?
    Page 1 of 1