If the applications you are using do not work or show increased performance, please carefully follow this troubleshooting guide.
If the applications is using SuperSockets:
To verify that the preloading works, use the ldd command on any executable, i.e. the netperf binary mentioned above:
$ export LD_PRELOAD=libksupersockets.so $ ldd netperf libksupersockets.so => /opt/DIS/lib64/libksupersockets.so (0x0000002a95577000) libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x00000033ed300000) libc.so.6 => /lib64/tls/libc.so.6 (0x00000033ec800000) libdl.so.2 => /lib64/libdl.so.2 (0x00000033ecb00000) /lib64/ld-linux-x86-64.so.2 (0x00000033ec600000)
The library libksupersockets.so
has to be listed at the top position. If this is not the case, make sure the library file actually exists. The default locations are /opt/DIS/lib/libksupersockets.so
and /opt/DIS/lib64/libksupersockets.so
on 64-bit platforms, and libksupersockets.so
actually is a symbolic link on a library with the same name and a version suffix:
$ ls -lR /opt/DIS/lib*/*ksupersockets* -rw-r--r-- 1 root root 29498 Nov 14 12:43 /opt/DIS/lib64/libksupersockets.a -rw-r--r-- 1 root root 901 Nov 14 12:43 /opt/DIS/lib64/libksupersockets.la lrwxrwxrwx 1 root root 25 Nov 14 12:50 /opt/DIS/lib64/libksupersockets.so -> libksupersockets.so.3.3.0 lrwxrwxrwx 1 root root 25 Nov 14 12:50 /opt/DIS/lib64/libksupersockets.so.3 -> libksupersockets.so.3.3.0 -rw-r--r-- 1 root root 65160 Nov 14 12:43 /opt/DIS/lib64/libksupersockets.so.3.3.0 -rw-r--r-- 1 root root 19746 Nov 14 12:43 /opt/DIS/lib/libksupersockets.a -rw-r--r-- 1 root root 899 Nov 14 12:43 /opt/DIS/lib/libksupersockets.la lrwxrwxrwx 1 root root 25 Nov 14 12:50 /opt/DIS/lib/libksupersockets.so -> libksupersockets.so.3.3.0 lrwxrwxrwx 1 root root 25 Nov 14 12:50 /opt/DIS/lib/libksupersockets.so.3 -> libksupersockets.so.3.3.0 -rw-r--r-- 1 root root 48731 Nov 14 12:43 /opt/DIS/lib/libksupersockets.so.3.3.0
Also, make sure that the dynamic linker is configured to find it in this place. The dynamic linker is configured accordingly on installation of the RPM; if you did not install via RPM, you need to configure the dynamic linker manually. To verify that the dynamic linking is the problem, set LD_LIBRARY_PATH
to include the path to libksupersockets.so
and verify again with ldd:
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/DIS/lib:/opt/DIS/lib64 $ echo $LD_PRELOAD libksupersockets.so $ ldd netperf ....
A better solution than setting LD_LIBRARY_PATH is to configure the dynamic linker ld to include these directories in its search path. Use man ldconfig to learn how to achieve this.
You need to make sure that the preloading of the SuperSockets library described above is effective on both Cluster Nodes, for both applications that should communicate via SuperSockets.
Make sure that the SuperSockets kernel module (and the kernel modules it depends on) are loaded and configured correctly on both Cluster Nodes.
Check the status of all Dolphin kernel modules via the dis_services script (default location /opt/DIS/sbin
):
# dis_services status Dolphin kOSIF 5.5.0 is running Dolphin PX 5.5.0 is running Dolphin IRM 5.5.0 ( January 10th 2018 ) is running. Dolphin Node Manager is running (pid 3172). Dolphin SISCI 5.5.0 ( January 10th 2018 ) is running. Dolphin SuperSockets 5.5.0 "Express Train", January 10th 2018 (built January 10th 2018) running.
At least the services dis_irm and dis_supersockets need to be running, and you should not see a message about SuperSockets not being configured.
Verify that SuperSockets have the correct view of the PCI Express adapters within the cluster. Call dis_ssocks_adm with the option -n
:
root@d0 # dis_ssocks_adm -n Local node ID list ------------------------------------- X 4 0 8 0 12 0 16 0
Running this command on all Cluster Nodes should give identical output apart from the marker X which indicates the current Cluster Node.
If this is not the case, the affected Cluster Node has an invalid /etc/dis/dishosts.conf. Make sure that the dis_nodemgr service is up on this Cluster Node, and that the Cluster Node is shown active in the dis_admin GUI.
Verify the SuperSockets routing configuration if all cluster Cluster Nodes will connect and communicate via SuperSockets using the right IP addresses. The active configuration can be retrieved via dis_ssocks_adm -m
:
# dis_ssocks_adm -m IP/net Adapter NodeId List ----------------------------------------------- 172.16.5.1/32 0x0000 4 0 0 172.16.5.2/32 0x0000 8 0 0 172.16.5.3/32 0x0000 68 0 0 172.16.5.4/32 0x0000 72 0 0
Depending on the configuration variant you used to set up SuperSockets, the content of this file may look different, but it must never be empty and should be identical on all Cluster Nodes. The example above shows a four-node cluster with a single fabric and a static SuperSockets configuration, which will accelerate one socket interface per Cluster Node.
For more information on the configuration of SuperSockets, please refer to Section 1.1, “dishosts.conf”.
Make sure that the host names/IP addresses used effectively by the application are the ones that are configured for SuperSockets, especially if the Cluster Nodes have multiple Ethernet interfaces configured.
SuperSockets provide an internal event log, which can be accesses via dis_ssocks_diag. To attach to the event log and get all events printed to the terminal as they occur, use dis_ssocks_diag-Ev
. If you then run the application, you will see all connection attempts and their results.
A successful connection attempt of a client towards a server via the PCI Express interconnect will look like this:
[Jul 14 14:08:36] TRACE: new SuperSocket created local:0.0.0.0:0 peer:0.0.0.0:0 pid:3293 obj:0x0xffff880259440800 [Jul 14 14:08:36] TRACE: SuperSockets connection established local:172.16.6.15:35394 peer:172.16.6.16:5432 pid:3293 obj:0x0xffff880259440800 [Jul 14 14:08:37] TRACE: releasing stream socket local:172.16.6.15:35394 peer:172.16.6.16:5432 pid:3293 obj:0x0xffff880259440800
The server will report the accepted SuperSockets connection like this:
[Jul 14 14:10:35] TRACE: native accept succeeded local:0.0.0.0:5432 peer:172.16.6.15:55215 pid:21472 obj:0x0xffff880257454800 [Jul 14 14:10:35] TRACE: SuperSockets connection accepted local:172.16.6.16:5432 peer:172.16.6.15:55215 pid:21472 obj:0x0xffff880257454c00 [Jul 14 14:10:35] TRACE: releasing stream socket local:0.0.0.0:5432 peer:0.0.0.0:0 pid:21472 obj:0x0xffff880257454800 [Jul 14 14:10:36] TRACE: releasing stream socket local:172.16.6.16:5432 peer:172.16.6.15:55215 pid:21472 obj:0x0xffff880257454c00
A client's connection towards a server that (the client thinks) is not configured to use SuperSockets is performed via Ethernet and reported as follows:
[Jul 14 14:11:16] TRACE: new SuperSocket created local:0.0.0.0:0 peer:0.0.0.0:0 pid:3320 obj:0x0xffff880259440000 [Jul 14 14:11:16] WARN: admin msg SYN_CLIENT failed err:0x6f local:172.16.6.15:35652 peer:172.16.6.16:5432 pid:3320 obj:0x0xffff880259440000 [Jul 14 14:11:16] WARN: fallback connection established local:172.16.6.15:35652 peer:172.16.6.16:5432 pid:3320 obj:0x0xffff880259440000 [Jul 14 14:11:29] TRACE: releasing stream socket local:172.16.6.15:35652 peer:172.16.6.16:5432 pid:3320 obj:0x0xffff880259440000
If a client tries to connect via SuperSockets, but fails to do, it falls back to Ethernet by default. This fall-back capability can be disabled to ensure that SuperSockets and nothing else are actually used if they are to be used. The event log will look like this:
[Jul 14 14:12:24] TRACE: new SuperSocket created local:0.0.0.0:0 peer:0.0.0.0:0 pid:21491 obj:0x0xffff88025736f800 [Jul 14 14:12:29] TRACE: native accept succeeded local:0.0.0.0:5432 peer:172.16.6.15:47158 pid:21491 obj:0x0xffff88025736f800 [Jul 14 14:12:29] WARN: fallback connection accepted local:172.16.6.16:5432 peer:172.16.6.15:47158 pid:21491 obj:0x0xffff880256f94000 [Jul 14 14:12:29] TRACE: releasing stream socket local:0.0.0.0:5432 peer:0.0.0.0:0 pid:21491 obj:0x0xffff88025736f800 [Jul 14 14:12:42] TRACE: releasing stream socket local:172.16.6.16:5432 peer:172.16.6.15:47158 pid:21491 obj:0x0xffff880256f94000
The server may not report this event at all, as it may not got notice of it.
For an explanation of typical error messages, please refer to Section 2, “Software”.
Don't forget to check if the port numbers used by this application, or the application itself have been explicitly been excluded from using SuperSockets. By default, only the system port numbers below 1024 are excluded from using SuperSockets, but you should verify the current configuration using dis_ssocks_adm -p
(see Section 2, “SuperSockets Configuration”).
If you can't solve the problem, please contact Dolphin Support. When doing so, please attach
the output of dis_status
the output of dis_ssocks_diag -Ev for the connection tries.