PCIe Device Lending 1.0 application note
Device Lending functionality
Device lending enables you to temporarily access a PCIe device located in a remote server over a PCIe network. Devices can be made available to systems on the network and can be temporarily borrowed by any system as long as it is required. When use of the device is completed, the device can be borrowed by other systems on the network or it can be returned to local use where it is physically located.
The Dolphin Device Lending software enables this process to be controlled using a set of command line tools and options. These tools can be used directly or integrated into any other higher level resource management system. The device lending software is very flexible and does not require any boot order or power on sequencing. PCIe devices borrowed from a remote system can be used as if they were local devices until they are given back. The Device Lending software does not require any changes to transparent devices or to the Linux kernel.
Devices that are borrowed, will be inserted into the local device tree and the transparent device driver for the device will receive a hot add event to signal that the new resource is available.
More information on Device Lending.
Essential Device Lending commands
The table below lists the essential Device Lending commands needed to make a device available to systems on the PCIe network (Lend out) and use (Borrow) a device on a remote system. The SMART-IO software and eXpressWare software must be installed and configured prior to using these commands.
Command | Lending or borrow side use | Meaning |
---|---|---|
smartio_tool connect <NodeId> | Lending | The local host offers the specified system (identified by the NodeId) to borrow devices. This command must be repeated for all systems that want to borrow a device from the local host. |
smartio_tool add <[domain]:B.D.F> | Lending | The lending side offers all connected systems to see the the specified device. See command "smartio_tool list ". (B.D.F is a notation for Bus.Devie.Function. Domain is optional.) |
smartio_tool available <[domain]:B.D.F> | Lending | Enable remote systems to use the device. A unbind will be performed on the local device driver. (Device no longer available for local use.) |
smartio_tool unavailable <[domain]:B:D:F> | Lending | Disable remote systems from using the device. A bind will be performed on the local device to load the default device driver. (Device can be used by the local system.) |
smartio_tool list | Borrow | Will show all devices that is offered to be borrowed and if the device is available on unavailable. The devices can be referenced using the displayed <Id>. |
smartio_tool borrow <Id> | Borrow | Will borrow an available device |
smartio_tool return <Id> | Borrow | Will return a previously borrowed device. Other systems can now borrow the device. |
System requirements
The Lending and borrowing side will have different system requirements depending on the type of device and required functionality. Both sides must be running Dolphins eXpressWare driver DIS 5.5.0 or newer and have the Dolphin SMART-IO module installed and configured. The functionality is currently available with Dolphins PXH810, PXH830, PXH840 and MXH830 adapter cards.
All systems connected to the PCIe network can concurrently both offer devices (Lending side) or borrow devices (Borrow side). The Lending and Borrowing functionality may have different trade-offs and requirements. More details below.
All systems must run the latest CentOS 7, 64 bit (updated to the latest bug fix release).
Lending side system requirements
PCIe peer to Peer support
The lending side must support PCIe peer to peer transactions between the slot where the Dolphin PCIe card and the slot where the device that will be lent is installed. This means that the device must be able to send and receive PCIe transactions directly to / from the installed Dolphin PCI Express adapter. Some systems comes with a PEX PCIe switch in the IO system. These systems will normally always support PCIe peer to peer communication if the IOMMU is OFF and both devices are behind the PEX switch. Some use cases will require the IOMMU to be ON, this will require the CPU/memory controller to support PCIe peer to peer. Please also study the information on IOMMU ON or OFF below.
It is strongly recommended to ask your system vendor to confirm this is supported before ordering a new system. There is currently no known way to determine if the PC support PCIe peer to peer transactions, except by testing. If this test fails, you have to find another PC.
The CPU and Memory of the computer will only be used for initialization and will not be active / used when the borrowing system is using the PCIe devices. All PCIe transactions and system interrupts will be forwarded to the borrowing side by the PCIe hardware.
IOMMU
Lending side should normally enable the IOMMU to enable maximum system compatibility for general PCIe devices. The IOMMU is currently required if the BIOS enables devices to use 64 bit addressing and the device does not fully support all physical addresses (E.g. GPUs prior to the Pascal architecture).
The IOMMU is on Intel systems is called VT-d. The Intel IOMMU will limit the performance for some high throughput devices. We will provide instructions for performance optimizations in the next version of this guide. It is strongly recommended to start the evaluation using the IOMMU ON!
LARGE PCIE BAR SIZE
On the lending side, the Dolphin adapter must be configured to use a prefetch space large enough to support the DMA window used by your device. A prefetch space size of 32 Gigabytes enough for most devices. It is strongly recommended to ask your system vendor to confirm large PCIe BARS (e.g. 32 Gigabytes) is supported before ordering a new system.
Borrowing side requirements
IOMMU
The borrowing side system must have the IOMMU enabled.
LARGE PCIE BAR SIZE
On the borrowing side, the Dolphin adapter card must be configured to use a prefetch space size large enough to re-map all BARs (including natural alignments) of your PCIe device. A prefetch space size of 32 Gigabytes is enough for most devices. It is strongly recommended to ask your system vendor to confirm large PCIe BARS (e.g. 32 Gigabytes) is supported before ordering a new system.
Installation of the Device Lending Software
Please note that the following instructions are valid for the BETA1 release only. The final software product will have an improved and simplified installation. Name of functions and commands may change.
- Buy or make sure the systems that will be used for the test is compliant to the above requirements for Borrow and Lending side systems.
- Install a fully updated CentOS 7 64 bit on both systems (kernel 3.10.0-514 or newer). The Device Lending software has been tested up to newer 4.x kernels.
- Enable Intel IOMMU / VT-d in the BIOS
- Verify your BIOS configuration is set up to support 64 bit decoding (support 64 bit PCIe addressing).
- Enable use of the IOMMU in the kernel by adding:
intel_iommu=on
... to boot parameters. For grub2 this can be set in
/etc/default/grub
by appending to GRUB_CMDLINE_LINUX= and running:
grub2-mkconfig > /boot/grub2/grub.cfg - Install the Dolphin PCIe NTB adapter adapter cards in both systems and install the cables.
- Install the eXpressWare DIS 5.5.0 - software on the systems and make sure the PCIe network is fully operational before installing the SmartIO module. It is recommended to follow the Quick installation guide found at https://www.dolphinics.com/download/PX_5_X_X_LIN_DOC/ch02.html
IMPORTANT: In order to create the SmartIO RPM file, the parameter "--enable-smartio" must be passed to the Dolphin eXpressWare installer, e.g.,
./Dolphin-eXpressWare-Linux-x86_64_'VERSION'.sh --enable-smartio
- Verify the installation before proceeding to the next step, please follow the instructions found at https://www.dolphinics.com/download/PX_5_X_X_LIN_DOC/ch07s01.html
- Update the PCIe prefetch space size to 32 Gigabyte to meet the requirements described above. if your system does not support large PCIe BARS as described above, you need to follow the firmware recovery procedure described below. The prefetch space for the PXH cards can be updated by using the /opt/DIS/sbin/dis_config. Start the utility and use the command "set-prefetch". We strongly recommend doing this on one system at the time until you are sure your system fully supports a large PCIe bar.
- The "--enable-smartio" parameter passed onto the installer creates a
SmartIO RPM file in the "node_RPMS" directory (from where the installer was executed). This RPM must be installed on both nodes, e.g., as root:
# rpm -i node_RPMS/Dolphin-SmartIO-PX-5.5.0.0-1.x86_64.rpm
Then, as root load the kernel module on both machines:
# insmod /opt/DIS/lib/modules/$( uname -r )/dis_sio.ko
You can verify that the module loading was successful by looking in the kernel log. You should see this line:
[ 276.220398] smartio initializing
The final release of the SmartIO software will include automatic startup / load of the SmartIO module.
- After installing the SmartIO RPM and loading the kernel module on both nodes, please follow the instructions and examples found in the /opt/DIS/doc/README_DEVICE_LENDING.txt to test the device lending software.
Firmware recovery.
Some systems will not boot if the PCIe prefetch space is set too high and your PC does not support large PCIe BARs as described above. In this case, you have to:
- Power down the system.
- Toggle the OPT2 DIP-Switch to the other position.
- Boot up and run /opt/DIS/sbin/upgrade_eeprom.sh --upgrade to enable the default firmware settings.
- Power down the system
- Toggle the OPT2 DIP-Switch back to the previous setting.
- Once this is completed, the default prefetch space size is re-enabled.
You man try a smaller prefetch space size, but this may prevent device lending from operating.
Tested devices
Dolphin has tested the following devices:
- Nvidia GPU Tesla K40c
- Nvidia GPU GeForce GTX 1050
- Nvidia GPU Quadro P400 (Pascal)
- Intel I350 Gigabit Network Card (supports SR-IOV)
- Intel PCIe Data Center SSD 750
- PMC NVMe
As a part of this beta test program, we are very interested in learning about your experience using these or any other compliant device.
During the development and testing of the Device Lending software, the following motherboards was used:
- ASUS X99-E-10G WS
- ASUS X99-E WS/USB 3.1
- ASUS X99-M WS/SE
Support
If you run into any problem or have successfully tested the new software, we would very much like to know about it. Please contact pci-support@dolphinics.com and report.
If you have problems installing the eXpressWare drivers or verifying the installation, please forward the output of "dmesg" and /opt/DIS/sbin/dis_diag -V 9 on both systems.