Skip to content

Runtime issues with ifort and gfortran/OpenCoarrays. #38

@rouson

Description

@rouson

@gutmann @scrasmussen

Could you remind me whether coarray_icar works at all with gfortran versions > 6.x? When I build the develop branch of this fork as follows with gfortran 8.2.0 and a recent OpenCoarrays commit, I get the following:

$ cd src/tests
$ export COMPILER=gnu
$ make USE_ASSERTIONS=.true.
$ cafrun -n 4 ./test-ideal
Number of images =            4
          1 domain%initialize_from_file('input-parameters.txt')
ximgs=           2 yimgs=           2
call master_initialize(this)
call this%variable%initialize(this%get_grid_dimensions(),variable_test_val)
 Layer height       Pressure        Temperature      Water Vapor
     [m]              [hPa]             [K]            [kg/kg]
  9750.00000       271.047180       206.509430       9.17085254E-06
  7750.00000       364.236786       224.725372       7.91714992E-05
  5750.00000       481.825287       243.449936       5.01311326E-04
  3750.00000       628.424316       262.669800       2.46796501E-03
  1750.00000       809.217651       282.372711       9.08217765E-03
ThompMP: read qr_acr_qg.dat instead of computing
qr_acr_qg initialized:  0.229000002            
ThompMP: read qr_acr_qs.dat instead of computing
qr_acr_qs initialized:  0.170000002            
ThompMP: read freezeH2O.dat instead of computing
freezeH2O initialized:   1.02300000            
qi_aut_qs initialized:   1.79999992E-02        

Beginning simulation...
Assertion "put_north: conformable halo_south_in and local " failed on image            1
ERROR STOP 
Assertion "put_south: conformable halo_north_in and local " failed on image            4
ERROR STOP 
Assertion "put_south: conformable halo_north_in and local " failed on image            3
ERROR STOP 
Assertion "put_north: conformable halo_south_in and local " failed on image            2
ERROR STOP 

[proxy:0:0@Sourcery-Institute-VM] HYDU_sock_write (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/utils/sock/sock.c:294): write error (Broken pipe)
[proxy:0:0@Sourcery-Institute-VM] HYD_pmcd_pmip_control_cmd_cb (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/pm/pmiserv/pmip_cb.c:932): unable to write to downstream stdin
[proxy:0:0@Sourcery-Institute-VM] HYDT_dmxu_poll_wait_for_event (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:0@Sourcery-Institute-VM] main (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/pm/pmiserv/pmip.c:202): demux engine error waiting for event
[mpiexec@Sourcery-Institute-VM] control_cb (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/pm/pmiserv/pmiserv_cb.c:208): assert (!closed) failed
[mpiexec@Sourcery-Institute-VM] HYDT_dmxu_poll_wait_for_event (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@Sourcery-Institute-VM] HYD_pmci_wait_for_completion (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:198): error waiting for event
[mpiexec@Sourcery-Institute-VM] main (/home/sourcerer/Desktop/opencoarrays/prerequisites/downloads/mpich-3.2.1/src/pm/hydra/ui/mpich/mpiexec.c:340): process manager error waiting for completion
Error: Command:
  `/opt/mpich/3.2.1/gnu/8.2.0/bin/mpiexec -n 4 --disable-auto-cleanup ./test-ideal`
failed to run.

I've also attempted to build with the Intel 18 and 19 compilers on the pegasus.nic.uoregon.edu and got the following runtime messages after which execution hangs:

$ mpiexec -np 1 ./test-ideal
[mpiexec@pegasus] HYDU_parse_hostfile (../../utils/args/args.c:553): unable to open host file: ./cafconfig.txt
[mpiexec@pegasus] config_tune_fn (../../ui/mpich/utils.c:2192): error parsing config file
[mpiexec@pegasus] match_arg (../../utils/args/args.c:243): match handler returned error
[mpiexec@pegasus] HYDU_parse_array_single (../../utils/args/args.c:294): argument matching returned error
[mpiexec@pegasus] HYD_uii_mpx_get_parameters (../../ui/mpich/utils.c:4999): error parsing input array

Usage: ./mpiexec [global opts] [exec1 local opts] : [exec2 local opts] : ...

Global options (passed to all executables):

  Global environment options:
    -genv {name} {value}             environment variable name and value
    -genvlist {env1,env2,...}        environment variable list to pass
    -genvnone                        do not pass any environment variables
    -genvall                         pass all environment variables not managed
                                          by the launcher (default)

  Other global options:
    -f {name} | -hostfile {name}     file containing the host names
    -hosts {host list}               comma separated host list
    -configfile {name}               config file containing MPMD launch options
    -machine {name} | -machinefile {name}
                                     file mapping procs to machines
    -pmi-connect {nocache|lazy-cache|cache}
                                     set the PMI connections mode to use
    -pmi-aggregate                   aggregate PMI messages
    -pmi-noaggregate                 do not  aggregate PMI messages
    -trace {<libraryname>}           trace the application using <libraryname>
                                     profiling library; default is libVT.so
    -trace-imbalance {<libraryname>} trace the application using <libraryname>
                                     imbalance profiling library; default is libVTim.so
    -check-mpi {<libraryname>}       check the application using <libraryname>
                                     checking library; default is libVTmc.so
    -ilp64                           Preload ilp64 wrapper library for support default size of
                                     integer 8 bytes
    -mps                             start statistics gathering for MPI Performance Snapshot (MPS)
    -aps                             start statistics gathering for Application Performance Snapshot (APS)
    -trace-pt2pt                     collect information about
                                     Point to Point operations
    -trace-collectives               collect information about
                                     Collective operations
    -tune [<confname>]               apply the tuned data produced by
                                     the MPI Tuner utility
    -use-app-topology <statfile>     perform optimized rank placement based statistics
                                     and cluster topology
    -noconf                          do not use any mpiexec's configuration files
    -branch-count {leaves_num}       set the number of children in tree
    -gwdir {dirname}                 working directory to use
    -gpath {dirname}                 path to executable to use
    -gumask {umask}                  mask to perform umask
    -tmpdir {tmpdir}                 temporary directory for cleanup input file
    -cleanup                         create input file for clean up
    -gtool {options}                 apply a tool over the mpi application
    -gtoolfile {file}                apply a tool over the mpi application. Parameters specified in the file


Local options (passed to individual executables):

  Local environment options:
    -env {name} {value}              environment variable name and value
    -envlist {env1,env2,...}         environment variable list to pass
    -envnone                         do not pass any environment variables
    -envall                          pass all environment variables (default)

  Other local options:
    -host {hostname}                 host on which processes are to be run
    -hostos {OS name}                operating system on particular host
    -wdir {dirname}                  working directory to use
    -path {dirname}                  path to executable to use
    -umask {umask}                   mask to perform umask
    -n/-np {value}                   number of processes
    {exec_name} {args}               executable name and arguments


Hydra specific options (treated as global):

  Bootstrap options:
    -bootstrap                       bootstrap server to use
     (ssh rsh pdsh fork slurm srun ll llspawn.stdio lsf blaunch sge qrsh persist service pbsdsh)
    -bootstrap-exec                  executable to use to bootstrap processes
    -bootstrap-exec-args             additional options to pass to bootstrap server
    -prefork                         use pre-fork processes startup method
    -enable-x/-disable-x             enable or disable X forwarding

  Resource management kernel options:
    -rmk                             resource management kernel to use (user slurm srun ll llspawn.stdio lsf blaunch sge qrsh pbs cobalt)

  Processor topology options:
    -binding                         process-to-core binding mode
  Extended fabric control options:
    -rdma                            select RDMA-capable network fabric (dapl). Fallback list is ofa,tcp,tmi,ofi
    -RDMA                            select RDMA-capable network fabric (dapl). Fallback is ofa
    -dapl                            select DAPL-capable network fabric. Fallback list is tcp,tmi,ofa,ofi
    -DAPL                            select DAPL-capable network fabric. No fallback fabric is used
    -ib                              select OFA-capable network fabric. Fallback list is dapl,tcp,tmi,ofi
    -IB                              select OFA-capable network fabric. No fallback fabric is used
    -tmi                             select TMI-capable network fabric. Fallback list is dapl,tcp,ofa,ofi
    -TMI                             select TMI-capable network fabric. No fallback fabric is used
    -mx                              select Myrinet MX* network fabric. Fallback list is dapl,tcp,ofa,ofi
    -MX                              select Myrinet MX* network fabric. No fallback fabric is used
    -psm                             select PSM-capable network fabric. Fallback list is dapl,tcp,ofa,ofi
    -PSM                             select PSM-capable network fabric. No fallback fabric is used
    -psm2                            select Intel* Omni-Path Fabric. Fallback list is dapl,tcp,ofa,ofi
    -PSM2                            select Intel* Omni-Path Fabric. No fallback fabric is used
    -ofi                             select OFI-capable network fabric. Fallback list is tmi,dapl,tcp,ofa
    -OFI                             select OFI-capable network fabric. No fallback fabric is used

  Checkpoint/Restart options:
    -ckpoint {on|off}                enable/disable checkpoints for this run
    -ckpoint-interval                checkpoint interval
    -ckpoint-prefix                  destination for checkpoint files (stable storage, typically a cluster-wide file system)
    -ckpoint-tmp-prefix              temporary/fast/local storage to speed up checkpoints
    -ckpoint-preserve                number of checkpoints to keep (default: 1, i.e. keep only last checkpoint)
    -ckpointlib                      checkpointing library (blcr)
    -ckpoint-logfile                 checkpoint activity/status log file (appended)
    -restart                         restart previously checkpointed application
    -ckpoint-num                     checkpoint number to restart

  Demux engine options:
    -demux                           demux engine (poll select)

  Debugger support options:
    -tv                              run processes under TotalView
    -tva {pid}                       attach existing mpiexec process to TotalView
    -gdb                             run processes under GDB
    -gdba {pid}                      attach existing mpiexec process to GDB
    -gdb-ia                          run processes under Intel IA specific GDB

  Other Hydra options:
    -v | -verbose                    verbose mode
    -V | -version                    show the version
    -info                            build information
    -print-rank-map                  print rank mapping
    -print-all-exitcodes             print exit codes of all processes
    -iface                           network interface to use
    -help                            show this message
    -perhost <n>                     place consecutive <n> processes on each host
    -ppn <n>                         stand for "process per node"; an alias to -perhost <n>
    -grr <n>                         stand for "group round robin"; an alias to -perhost <n>
    -rr                              involve "round robin" startup scheme
    -s <spec>                        redirect stdin to all or 1,2 or 2-4,6 MPI processes (0 by default)
    -ordered-output                  avoid data output intermingling
    -profile                         turn on internal profiling
    -l | -prepend-rank               prepend rank to output
    -prepend-pattern                 prepend pattern to output
    -outfile-pattern                 direct stdout to file
    -errfile-pattern                 direct stderr to file
    -localhost                       local hostname for the launching node
    -nolocal                         avoid running the application processes on the node where mpiexec.hydra started

Intel(R) MPI Library for Linux* OS, Version 2018 Update 3 Build 20180411 (id: 18329)
Copyright 2003-2018 Intel Corporation.
^C[mpiexec@pegasus] Sending Ctrl-C to processes as requested
[mpiexec@pegasus] Press Ctrl-C again to force abort

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions