搭建mpi并行运算中遇到的问题与解决方案
1,[root@localhost ~]# mpdtraceconfiguration file /etc/mpd.conf is accessible by otherschange permissions to allow read and write access only by you
解决:
[root@localhost ~]# chmod 600 /etc/mpd.conf
2,[root@localhost ~]# mpdboot -n 1 -f mpd.hosts mpdboot_localhost.localdomain (handle_mpd_output 414): from mpd on localhost.localdomain, invalid port info:no_port
解决:
是因为 mpd.conf 等文件权限问题造成的,需要设置为 600权限
3,[root@localhost ~]# mpdtracempdroot: perror msg: No such file or directorympdroot: cannot connect to local mpd at: /tmp/mpd2.console_root probable cause: no mpd daemon on this machine possible cause: unix socket /tmp/mpd2.console_root has been removedmpdtrace (__init__ 1204): forked process failed; status=255
解决:
mpdboot服务没有起来,mpdboot -n 1 -f mpd.hosts
4,在测试过程中,经常出现mpd进程无法与某个节点建立连接或者无法通信的问题,出现这种问题一是要检查该节点单独启动mpd是否成功,如果成功,则问题一般出现在防火墙的配置上
5,[root@localhost examples]# mpiexec -n 5 ./cpimpiexec_localhost.localdomain (mpiexec 392): no msg recvd from mpd when expecting ack of request[root@localhost examples]# mpiexec -n 5 ./cpiProcess 3 of 5 is on localhost.localdomainProcess 4 of 5 is on localhost.localdomainProcess 0 of 5 is on localhost.localdomainProcess 1 of 5 is on localhost.localdomainProcess 2 of 5 is on localhost.localdomainpi is approximately 3.1415926544231230, Error is 0.0000000008333298wall clock time = 0.005338[root@localhost examples]#
解决:可能是资源忙之类的,有的时候正常有的时候异常