问题描述:基于EulerOS release 2.0 (SP8)系统,直接在命令行中执行下面的命令时正常
mpirun --allow-run-as-root -mca pml ucx -mca btl ^vader,tcp,openib,uct -x UCX_TLS=self,sm --bind-to core -np 128 lmp_omp_daily1120_order1_gnuld_script_offset0_text -in equ.in_omp
为了实现自动化,所以想把上面的命令添加到shell 脚本中,比如 run-default.sh,然后执行sh run-default.sh后异常,提示
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.Host: A191240619
Framework: pml
Component: ucx
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):mca_pml_base_open() failed
--> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
[A191240619:41208] *** An error occurred in MPI_Init
[A191240619:41208] *** reported by process [3357081601,281470681743360]
[A191240619:41208] *** on a NULL communicator