Overview
The SPDK community introduced OCSSD (Open-Channel SSD) ftl_bdev in January 2019. One way to develop host based FTLs in SPDK environment is to set up an OCSSD qemu-nvme virtual machine. We explain how this environment can be set up with SPDK's NVMe-oF TCP and NVMe-oF RDMA features. With SPDK NVMe-oF features, OCSSD PUs (Parallel Units) can be exposed as separate block devices over the network. Fig. 1 outlines the set up.
OCSSD qemu-nvme
We install an OCSSD qemu-nvme as outlined in the github page. We entered the following commands to the physical machine pm111. We're entering commands that are slightly different from the original commands on the github page but these generally make things a little bit easier. Also note that a simple source code change is needed if you experience "kvm_mem_ioeventfd_add: error adding ioeventfd: No space left on device" error when running qemu. This is explained here and you can see the patch in the appendix at the bottom of this page.
cbuser@pm111:~/github$ git clone https://github.com/OpenChannelSSD/qemu-nvme.git
cbuser@pm111:~/github$ cd qemu-nvme
cbuser@pm111:~/github/qemu-nvme$ ./configure --target-list=x86_64-softmmu --enable-kvm --enable-linux-aio --enable-virtfs \
--enable-trace-backends=log --prefix=/home/cbuser/github/qemu-nvme
cbuser@pm111:~/github/qemu-nvme$ make -j${nproc}; make install
Once the installed qemu-nvme works correctly, install a host OS. In this experiments, we are using Ubuntu 18.10 Desktop for both the physical machine and the virtual machine. The first command creates a qcow2 VM image of size 20GB. The second command launches a VM on which you can install Ubuntu 18.10. We are assuming that Ubuntu installation iso image is under /tmp directory and you're connecting to this VM via vnc display :2.
cbuser@pm111:~/github/qemu-nvme/bin$ ./qemu-img create -f qcow2 \ ~/work/qemu/u1.qcow2 20G
cbuser@pm111:~/github/qemu-nvme/bin$ sudo ./qemu-system-x86_64 -m 4G \
--enable-kvm -drive if=virtio,file=/home/cbuser/work/qemu/u1.qcow2 \
-cdrom /tmp/ubuntu-18.10-desktop-amd64.iso -vnc :2
Now we configure an OCSSD image with 2 groups (channels), 4 PUs per group, and 60 chunks per PU to store emulated OCSSD data on the host machine. We launch the qemu-nvme VM with Ubuntu 18.10 installed and you can connect it to vnc display :2. As we configure a tap network for the VM, you can set up ssh or network file system to work more easily with this VM. Note that the third line from the bottom containing vfio-pci device parameter is not needed if you are not using PCIe passthrough for NVMe-oF RDMA.
cbuser@pm111:~/github/qemu-nvme/bin$ ./qemu-img create -f ocssd \
-o num_grp=2,num_pu=4,num_chk=60 ~/work/qemu/ocssd.img
cbuser@pm111:~/work/qemu$ sudo /home/cbuser/github/qemu-nvme/bin/qemu-system-x86_64 --enable-kvm -cpu host -smp 1 -m 8G \
-drive file=/home/cbuser/work/qemu/u1.qcow2,if=none,id=disk \
-device ide-hd,drive=disk,bootindex=0 \
-blockdev ocssd,node-name=nvme01,file.driver=file,file.filename=/home/cbuser/work/qemu/ocssd.img \
-device nvme,drive=nvme01,serial=deadbeef,id=lnvm \
-device vfio-pci,host=01:00.0 -device vfio-pci,host=01:00.1 \
-net nic,macaddr=DE:AD:BE:EF:01:42 \
-net tap,ifname=tap0,script=q_br_up.sh,downscript=q_br_down.sh -vnc :2
SPDK inside OCSSD qemu-nvme
Let's install SPDK inside the VM, qemu-nvme142. As of this writing, we are using the SPDK tip version v19.01-642-g130a5f772. One can find additional information regarding SPDK installation from the github page. We want to make sure all the necessary features are built including ftl_bdev, RDMA and fio. As ftl_bdev is one of the latest SPDK features, some of the codes are not upstream yet. Hence, we apply a couple of cherry-picks.
cbuser@qemu-nvme142:~/github$ git clone https://github.com/spdk/spdk
cbuser@qemu-nvme142:~/github$ cd spdk
cbuser@qemu-nvme142:~/github/spdk$ git submodule update --init
cbuser@qemu-nvme142:~/github/spdk$ git fetch "https://review.gerrithub.io/spdk/spdk" refs/changes/68/449068/10 && \
git cherry-pick FETCH_HEAD
cbuser@qemu-nvme142:~/github/spdk$ git fetch "https://review.gerrithub.io/spdk/spdk" refs/changes/39/449239/10 && \
git cherry-pick FETCH_HEAD
cbuser@qemu-nvme142:~/github/spdk$ git commit -a
cbuser@qemu-nvme142:~/github/spdk$ sudo ./scripts/pkgdep.sh
cbuser@qemu-nvme142:~/github/spdk$ ./configure --with-ftl --with-rdma \
--with-fio=/home/cbuser/github/fio
cbuser@qemu-nvme142:~/github/spdk$ make -j${nproc}
Once SPDK is installed, we run scripts/setup.sh script to establish hugepages and unbind nvme devices (in our case, the emulated OCSSD drive). Our first SPDK app run is identify which correctly displays emulated OCSSD parameters as specified in the qemu command line.
cbuser@qemu-nvme142:~/github/spdk$ sudo NRHUGE=1024 scripts/setup.sh
0000:00:04.0 (1d1d 1f1f): nvme -> uio_pci_generic
cbuser@qemu-nvme142:~/github/spdk$ cat /proc/meminfo
(truncated)...
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 1024
HugePages_Free: 1024
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
...(truncated)
cbuser@qemu-nvme142:~/github/spdk$ sudo examples/nvme/identify/identify
Starting SPDK v19.04-pre / DPDK 19.02.0 initialization...
[ DPDK EAL parameters: identify --no-shconf -c 0x1 -n 1 -m 0 --log-level=lib.eal:6 --base-virtaddr=0x200000000000 --match-allocations --file-prefix=spdk_pid30476 ]
EAL: No free hugepages reported in hugepages-1048576kB
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles !
nvme_qpair.c: 118:nvme_admin_qpair_print_command: *NOTICE*: GET FEATURES (0a) sqid:0 cid:86 nsid:0 cdw10:000000ca cdw11:00000000
nvme_qpair.c: 306:nvme_qpair_print_completion: *NOTICE*: INVALID FIELD (00/02) sqid:0 cid:86 cdw0:0 sqhd:000f p:1 m:0 dnr:1
get_feature(0xCA) failed
=====================================================
NVMe Controller at 0000:00:04.0 [1d1d:1f1f]
=====================================================
Controller Capabilities/Features
================================
Vendor ID: 1d1d
Subsystem Vendor ID: 1af4
Serial Number: deadbeef
Model Number: QEMU NVMe OCSSD Ctrl
Firmware Version: 2.0
...(truncated)...
Namespace OCSSD Geometry
=======================
OC version: maj:2 min:0
LBA format:
Group bits: 1
PU bits: 2
Chunk bits: 6
Logical block bits: 12
Media and Controller Capabilities:
Namespace supports Vector Chunk Copy: Not Supported
Namespace supports multiple resets a free chunk: Not Supported
Wear-level Index Delta Threshold: 0
Groups (channels): 2
PUs (LUNs) per group: 4
Chunks per LUN: 60
Logical blks per chunk: 4096
MIN write size: 4
OPT write size: 8
Cache min write size: 24
Max open chunks: 0
Max open chunks per PU: 0
...(truncated)
Now, let's construct ftl_bdev logical SPDK block devices. As SPDK has officially transitioned from file based configurations to RPC based dynamic configurations, we will use RPC configuration calls. First, we launch spdk_tgt app at physical machine's IP address and port number 7777 to wait on RPC calls. Second, we send a set of RPC calls starting with start_subsystem_init. We are building four ftl_bdevs, each of which is based on two PUs of the emulated OCSSD.
cbuser@qemu-nvme142:~/github/spdk$ sudo app/nvmf_tgt/nvmf_tgt \
--wait-for-rpc -r 10.12.90.142:7777
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \
-p 7777 start_subsystem_init
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \
-p 7777 construct_ftl_bdev -b ftl01 -l 0-1 -a 0000:00:04.0
{
"name": "ftl01",
"uuid": "12a2bbfe-11c3-4c00-9d74-455de444c2e4"
}
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \
-p 7777 construct_ftl_bdev -b ftl23 -l 2-3 -a 0000:00:04.0
{
"name": "ftl23",
"uuid": "6141d031-0c8e-475b-a56f-58ef52d2f41b"
}
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 construct_ftl_bdev -b ftl45 -l 4-5 -a 0000:00:04.0
{
"name": "ftl45",
"uuid": "7420d063-409a-4945-8c2f-cee4afbb1cd4"
}
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \
-p 7777 construct_ftl_bdev -b ftl67 -l 6-7 -a 0000:00:04.0
{
"name": "ftl67",
"uuid": "09fcceb3-0863-4820-8933-2d1700f44c28"
}
NVMe-oF TCP
NVMe-oF TCP provides all the benefits of NVMe-oF to virtually any Ethernet based network deployments. A downside is additional latency compared to hardware accelerated transports like RDMA but compared to more traditional approaches like iSCSI, NVMe-oF TCP still performs much better. Read SPDK documentation for more information. We need to update our Ubuntu Linux kernel version to 5.0.5 which provides dynamic loadable nvme-tcp without compiling the Linux kernel.
cbuser@qemu-nvme142:/tmp$ wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0.5/linux-headers-5.0.5-050005_5.0.5-050005.201903271212_all.deb
cbuser@qemu-nvme142:/tmp$ wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0.5/linux-headers-5.0.5-050005-generic_5.0.5-050005.201903271212_amd64.deb
cbuser@qemu-nvme142:/tmp$ wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0.5/linux-image-unsigned-5.0.5-050005-generic_5.0.5-050005.201903271212_amd64.deb
cbuser@qemu-nvme142:/tmp$ wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0.5/linux-modules-5.0.5-050005-generic_5.0.5-050005.201903271212_amd64.deb
cbuser@qemu-nvme142:/tmp$ sudo dpkg -i *.deb
### Reboot the system ###
cbuser@qemu-nvme142:/tmp$ sudo modprobe -v nvme-tcp
After repeating the RPC commands in the previous section to relaunch nvmf_tgt and set up ftl_bdevs, let's send following RPC commands to nvmf_tgt in order to expose our four ftl_bdevs over NVMe-oF TCP.
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \
-p 7777 nvmf_create_transport -t TCP -u 16384 -p 8 -c 8192
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_create nqn.2016-06.io.spdk:cnode01 -a \ -s SPDK00000000000001
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_create nqn.2016-06.io.spdk:cnode23 -a \ -s SPDK00000000000023
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_create nqn.2016-06.io.spdk:cnode45 -a \ -s SPDK00000000000045
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_create nqn.2016-06.io.spdk:cnode67 -a \ -s SPDK00000000000067
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode01 ftl01
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode23 ftl23
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode45 ftl45
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode67 ftl67
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_add_listener nqn.2016-06.io.spdk:cnode01 -t tcp \
-a 10.12.90.142 -s 4420
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_add_listener nqn.2016-06.io.spdk:cnode23 -t tcp \ -a 10.12.90.142 -s 4421
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_add_listener nqn.2016-06.io.spdk:cnode45 -t tcp \ -a 10.12.90.142 -s 4422
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_add_listener nqn.2016-06.io.spdk:cnode67 -t tcp \ -a 10.12.90.142 -s 4423
As a quick check, one can initiate NVMe-oF TCP local connections inside the VM. Install nvme-cli and use nvme discover and connect commands to instantiate kernel block devices /dev/nvme[0-3]n1 as the listing shows. Now you can do anything from building file systems to running fio benchmarks for your host FTL development and debugging.
cbuser@qemu-nvme142:~$ sudo apt install nvme-cli
cbuser@qemu-nvme142:~$ sudo nvme discover -t tcp -a 10.12.90.142 -s 4420
cbuser@qemu-nvme142:~$ sudo nvme connect -t tcp -n "nqn.2016-06.io.spdk:cnode01" -a 10.12.90.142 -s 4420
cbuser@qemu-nvme142:~$ sudo nvme connect -t tcp -n "nqn.2016-06.io.spdk:cnode23" -a 10.12.90.142 -s 4421
cbuser@qemu-nvme142:~$ sudo nvme connect -t tcp -n "nqn.2016-06.io.spdk:cnode45" -a 10.12.90.142 -s 4422
cbuser@qemu-nvme142:~$ sudo nvme connect -t tcp -n "nqn.2016-06.io.spdk:cnode67" -a 10.12.90.142 -s 4423
cbuser@qemu-nvme142:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
fd0 2:0 1 4K 0 disk
...(truncated)...
sda 8:0 0 20G 0 disk
└─sda1 8:1 0 20G 0 part /
nvme0n1 259:1 0 1.5G 0 disk
nvme1n1 259:3 0 1.5G 0 disk
nvme2n1 259:5 0 1.5G 0 disk
nvme3n1 259:7 0 1.5G 0 disk
### run your tasks on top of /dev/nvme[0-3]n1 ###
cbuser@qemu-nvme142:~$ sudo nvme disconnect -n "nqn.2016-06.io.spdk:cnode01"
cbuser@qemu-nvme142:~$ sudo nvme disconnect -n "nqn.2016-06.io.spdk:cnode23"
cbuser@qemu-nvme142:~$ sudo nvme disconnect -n "nqn.2016-06.io.spdk:cnode45"
cbuser@qemu-nvme142:~$ sudo nvme disconnect -n "nqn.2016-06.io.spdk:cnode67"
NVMe-oF RDMA
If you have RDMA capable NIC cards such as Mellanox ConnectX series, you can expose SPDK ftl_bdevs over low latency RDMA connections. As our emulated OCSSDs work only inside the VM, we need to expose the Mellanox ConnectX card as a PCIe device to the VM. This can be done using PCIe passthrough, a very well known facility and well explained in this page. Note that RDMA network is a physically separate network and in our example a static IP address 10.0.0.142 is allocated to the RDMA Ethernet port.
cbuser@pm111:~/work/qemu$ sudo vi /etc/default/grub
### add 'intel_iommu=on iommu=pt pci=nocrs' to GRUB_CMDLINE_LINUX_DEFAULT
### GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on iommu=pt \ pci=nocrs"
cbuser@pm111:~/work/qemu$ sudo update-grub2
### Reboot the physical machine ###
cbuser@pm111:~/work/qemu$ ./iommug.sh
IOMMU Group 0 00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers [8086:1918] (rev 07)
IOMMU Group 1 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 07)
IOMMU Group 1 01:00.0 Ethernet controller [0200]: Mellanox Technologies MT27800 Family [ConnectX-5] [15b3:1017]
IOMMU Group 1 01:00.1 Ethernet controller [0200]: Mellanox Technologies MT27800 Family [ConnectX-5] [15b3:1017]
...(truncated)
### Make sure Mellanox cards are in the same IOMMU group ###
cbuser@pm111:~/work/qemu$ sudo ./vfio-bind.sh 0000:01:00.0 0000:01:00.1
### Launch qemu VM with a command line containing the following ###
### -device vfio-pci,host=01:00.0 -device vfio-pci,host=01:00.1 ###
cbuser@qemu-nvme142:~$ ifconfig
ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.12.90.142 netmask 255.255.255.0 broadcast 10.12.90.255
inet6 fe80::40f2:b7db:d970:97a3 prefixlen 64 scopeid 0x20<link>
ether de:ad:be:ef:01:42 txqueuelen 1000 (Ethernet)
RX packets 235 bytes 72815 (72.8 KB)
RX errors 39 dropped 6 overruns 0 frame 39
TX packets 124 bytes 31394 (31.3 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
ens5: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 10.0.0.142 netmask 255.0.0.0 broadcast 10.255.255.255
ether 24:8a:07:b4:33:aa txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
...(truncated)
Once Mellanox ConnectX becomes available to the qemu VM through PCIe passthrough, load following kernel modules before launching nvmf_tgt app and creating ftl_bdevs.
cbuser@qemu-nvme142:~$ sudo modprobe -v nvme-rdma
cbuser@qemu-nvme142:~$ sudo modprobe -v rdma_ucm
cbuser@qemu-nvme142:~$ sudo modprobe -v ib_umad
Now, send the following RDMA RPC commands to nvmf_tgt app. They are almost identical to NVMe-oF TCP RPC commands except some parameters which are marked italic bold.
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \
-p 7777 nvmf_create_transport -t RDMA -u 8192 -p 4 -c 0
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_create nqn.2016-06.io.spdk:cnode01 -a \
-s SPDK00000000000001
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_create nqn.2016-06.io.spdk:cnode23 -a \ -s SPDK00000000000023
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_create nqn.2016-06.io.spdk:cnode45 -a \ -s SPDK00000000000045
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_create nqn.2016-06.io.spdk:cnode67 -a \ -s SPDK00000000000067
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode01 ftl01
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode23 ftl23
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode45 ftl45
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode67 ftl67
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_add_listener nqn.2016-06.io.spdk:cnode01 -t RDMA \
-a 10.0.0.142 -s 4420
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_add_listener nqn.2016-06.io.spdk:cnode23 -t RDMA \ -a 10.0.0.142 -s 4421
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_add_listener nqn.2016-06.io.spdk:cnode45 -t RDMA \ -a 10.0.0.142 -s 4422
cbuser@qemu-nvme142:~/github/spdk$ sudo scripts/rpc.py -s 10.12.90.142 \ -p 7777 nvmf_subsystem_add_listener nqn.2016-06.io.spdk:cnode67 -t RDMA \ -a 10.0.0.142 -s 4423
Just like NVMe-oF TCP example above, one can use local NVMe-oF RDMA connections for their development. We list some commands below with changed parameters in italic bold.
cbuser@qemu-nvme142:~$ sudo nvme discover -t rdma -a 10.0.0.142 -s 4420
cbuser@qemu-nvme142:~$ sudo nvme connect -t rdma \
-n "nqn.2016-06.io.spdk:cnode01" -a 10.0.0.142 -s 4420
cbuser@qemu-nvme142:~$ sudo nvme disconnect \
-n "nqn.2016-06.io.spdk:cnode01"
Software versions
Sticking to following software versions will ensure all the commands in this article work in your environment:
Linux OS: Ubuntu 18.10 Desktop for both physical machine and qemu virtual machine
Linux kernel: 5.0.5 version
SPDK: v19.01-642-g130a5f772 (/w two cherry picks)
Questions?
Contact info@circuitblvd.com for additional information.
Appendix: qemu code patch
If you experience "kvm_mem_ioeventfd_add: error adding ioeventfd: No space left on device" error when running qemu, apply the following code patch.
diff --git a/memory.c b/memory.c
index 61d66e4..e49369d 100644
--- a/memory.c
+++ b/memory.c
@@ -932,9 +932,7 @@ static void address_space_update_topology_pass(AddressSpace
*as,
} else if (frold && frnew && flatrange_equal(frold, frnew)) {
/* In both and unchanged (except logging may have changed) */
- if (!adding) {
- flat_range_coalesced_io_del(frold, as);
- } else {
+ if (adding) {
MEMORY_LISTENER_UPDATE_REGION(frnew, as, Forward, region_nop);
if (frnew->dirty_log_mask & ~frold->dirty_log_mask) {
MEMORY_LISTENER_UPDATE_REGION(frnew, as, Forward,
log_start,
@@ -946,7 +944,6 @@ static void address_space_update_topology_pass(AddressSpace
*as,
frold->dirty_log_mask,
frnew->dirty_log_mask);
}
- flat_range_coalesced_io_add(frnew, as);
}
++iold;
--
Appendix: qemu related shell scripts
Tap network and PCIe passthrough scripts are listed below. Some scripts were modified from original versions described in the referenced URLs in this article.
(1) Tap network
We assume that br0 is set up as a bridge interface in the physical machine. Create q_br_up.sh and q_br_down.sh before you launch qemu with a tap network defined. Also you need DHCP set up to reply the qemu VM's IP address when requested with the MAC address.
### Include following parameters in qemu launch
### -net nic,macaddr=DE:AD:BE:EF:01:42
### -net tap,ifname=tap0,script=q_br_up.sh,downscript=q_br_down.sh
cbuser@pm111:~/work/qemu$ cat q_br_up.sh
#!/bin/sh
switch=br0
echo "$0: adding tap interface \"$1\" to bridge \"$switch\""
ifconfig $1 0.0.0.0 up
brctl addif ${switch} $1
exit 0
cbuser@pm111:~/work/qemu$ cat q_br_down.sh
#!/bin/sh
switch=br0
echo "$0: deleting tap interface \"$1\" from bridge \"$switch\""
brctl delif $switch $1
ifconfig $1 0.0.0.0 down
exit 0
(2) PCIe passthrough
Have following shell scripts ready before launching qemu with vfio-pci parameters.
cbuser@pm111:~/work/qemu$ cat iommug.sh
#!/bin/bash
shopt -s nullglob
for d in /sys/kernel/iommu_groups/*/devices/*; do
n=${d#*/iommu_groups/*}; n=${n%%/*}
printf 'IOMMU Group %s ' "$n"
lspci -nns "${d##*/}"
done
cbuser@pm111:~/work/qemu$ cat vfio-bind.sh
#!/bin/bash
modprobe vfio-pci
for dev in "$@"; do
echo "dev=" $dev
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
Comments