Report of the

U.S. GLOBEC Georges Bank

Science Meeting

18 - 20 November 2003, Rhode Island

 


Cover Page

Acknowledgements

Introduction

Narrative

Presentation Abstracts

Poster Presentations

Appendix I: Agenda

Appendix II: List of Participants

Appendix III: List of Planned Publications


Parallelization of the FVCOM Code: Application to the Gulf of Maine/Georges Bank

G. Cowles1, C. Chen1, H. Liu1
1B. Rothschild School for Marine Science and Technology University of Massachusetts-Dartmouth New Bedford, MA 02744

The parallelization methodology for the FVCOM code is described. The implementation is a Single Program Multiple Data (SPMD) approach and uses a message-passing model to perform the necessary interprocessor communication and synchronization. The resulting parallelization is efficient, transparent to the user, and portable to a variety of parallel architectures. The physical domain is decomposed into subdomains using the METIS graph partitioning libraries. Each subdomain is assigned to a processor for integration of the model equations. To compute the flux at the subdomain (interprocessor) boundaries, flow data must be exchanged among processors. The exchange subroutines utilize non-blocking sends and receives from the MPI (Message Passing Interface) 2.0 library. Other parallel constructs such as Gathers and Broadcasts are performed using the functionality of the MPI library. In addition to the parallelization, several major modifications to the FVCOM core code have been made. These include migratio n to Fortran 95/2K, utilization of allocatable memory, conversion of i/o to NetCDF format, and streamlining of the flux subroutines. The parallel FVCOM code was tested primarily on the IBM Regatta multiprocessor machine at ARSC in Fairbanks, Alaska. The model used for testing was a prognostic simulation of the Gulf of Maine/Georges Bank region utilizing 25000 elements and 31 sigma layers. The efficiency of the parallel implementation can be measured in terms of its speedup and/or scalability on a multiprocessor computer. For the GOM/GB simulation, the code maintains 13X throughput on 16 processors, indicating the efficiency of the current parallel implementation. Wall clock time for a one month integration of GOM/GB has been reduced from 63 hours (desktop, serial) to 3 hours (16 processors, parallel). The parallel implementation extends the range of the FVCOM code to applications with increased spatial resolution or longer integration periods.

A PDF version and an html version of this presentation are available on-line.