Enabling MPI for multi-node

I am attempting to run on multiple nodes in my organizations high performance computing cluster. I know this requires enabling MPI. I have loaded Open MPI version 4.1.4 (ompi/4.1.4) and then running:

make mpi=1

This seems to stall and never progress or error no matter how long I wait. Is it possible this is an issue with the ompi version I use or my approach? The output is shown in the attached image

Note: my organizations HPC only has loaded ompi/4.1.0 - 6 I have tried some other versions with no different results

image

Hello @Ethan_Evans

I seem to recall I’ve heard of a similar issue in the past but I can’t seem to find a report about it, so I’m afraid we’ll have to do some debugging. The first thing to check is if at least make show works, and if so what is the output. If make show works, the next thing to try is a make echo=1 plain=1 -j1 to see where the build stalls.

Thanks for the help @giuseppe.bilotta

make show does appear to work and I attached the output below.

Then running make echo=1 plain=1 -j1 has the following output which stalls:

Appreciate the help and let me know if I can try anything else!

P.S. I know @AlirezaZarei will be interested in the results as well

OK one issue I’m seeing is that CXX is being set to the MPI compiler. I suspect this is the reason for the stall. Did you set CXX manually or was it autodetected that way?

It must be autodetected that way. I have done nothing to set CXX. Should I set it to something else?

You can override the autodetection adding something like CXX=g++ to Makefile.local, and see if this solves the problem.

We should try to understand why it’s being detected this way too. What does command -v c++ report on that system?

Here is what command -v c++ gives:

image

If it is helpful, this is what I currently have pertaining to CXX in Makefile.local:

CXXFLAGS=-std=c++14 -march=native
CPPFLAGS += -g

Also: Adding CXX=g++ to the Makefile.local is sucessful. It now sucessfully compiles!

Interesting. So something is setting the default value of CXX for make to the MPI compiler, probably when the mpi module is loaded.