Software apps and online services
Direct memory access, or DMA as it's referred to, is an important aspect of embedded development as it a method for accessing the embedded system's main memory (typically DDR) without tying up the CPU, therefore leaving it open for performing other operations during the read/write cycle to memory. DMA simply allows for the processor to kick off a transfer to read from/write to the main system memory, then it generates an interrupt indicating the completion of the transfer. Which leaves the processor free to do other tasks until the interrupt calls its respective service routine.
While the concept of DMA is straightforward enough, trying to get the full picture in one's mind of how to implement it on an FPGA with embedded processor such as the ARM-core processor in the Xilinx Zynq SoC is no small feat. Then throwing in the extra layer of complexity of how to access it from the userspace of an operating system such as Linux, just makes it even worse.
So for this project, I'll start by going over the mechanics of getting the hardware setup in Vivado, and base settings in the PetaLinux project setup for the kernel drivers and Linux application in the root filesystem. Then I'll go over the functionality of how the code in the Linux application is actually accomplishing the task of controlling the AXI DMA in the programmable logic of the FPGA to perform a transfer.Add DMA to the Hardware Design in Vivado
While the Zynq processing system has native DMA channels built-in, it is only for memory-to-memory transfers, not stream-to-memory or memory-to-stream transfers. Thus for peripherals to be able to use the Zynq's native DMA channels, they must already have the DMA-capable interface for memory-to-memory transfers.
The differentiation is important because memory-to-memory transfers read/write data between different memory locations whereas a stream-to-memory transfer takes a serialized representation of data and writes it to a designated memory location and a memory-to-stream transfer reads data from a location in memory then converts it to a serialized format.
The AXI DMA IP block allows for any peripheral equipped with an AXI stream interface to access the main memory of the system via stream-to-memory or memory-to-stream transfers. Given that I've come across the need to use DMA for stream-to-memory/memory-to-stream transfers more often as an FPGA developer, I'm going to go with the AXI DMA IP for this tutorial.
I'm using the Arty Z7 FPGA development board and I'm starting with the base hardware design for it that I created in Vivado 2020.2 as outlined in one of my previous posts. All of the modifications I will be making will be in the block design of that project.
In the Zynq Processing System IP block under PS-PL Configuration, enable one of the AXI high performance slave interface. This is the interface that the AXI DMA will use to access the DDR (main system memory) of the Arty Z7.
Next, add an instance of the AXI Direct Memory Access IP block to the Vivado block design. Double-click on it to open the configuration window and uncheck the box next to Enable Scatter Gather Engine. This will automatically uncheck the box next to Enable Control / Status Stream as well. This project will be using the AXI DMA in Direct Register mode, as Scatter Gather is a whole different beast.
The rest of the default options can be left for the AXI DMA IP. Click OK to close the configuration window.
Click the option to Run Connection Automation that appears in the green banner across the top of the block design window. Check the box for All Automation and click OK.
With the AXI DMA now added and connected to the Zynq Processing System, the AXI Stream ports M_AXIS_MM2S and S_AXIS_S2MM can either be connected directly to each other, or a peripheral equipped with the AXI Stream interface can been connected between them.
I chose to connect an AXI Stream Data FIFO IP block, connecting its S_AXIS port to M_AXIS_MM2S and its M_AXIS port to S_AXIS_S2MM.
As is probably obvious at this point, S_AXIS_S2MM is the port for the AXI stream data being written to memory and M_AXIS_MM2S is the port for the AXI stream data being read out of memory. Where S2MM stands for stream-to-memory-map and MM2S stands for memory-map-to-stream.
Validate the block design and save it. Then generate or regenerate an HDL wrapper for it by right-clicking on the block design in the Sources window and selecting the option to Create HDL Wrapper...
In the resulting pop-up window, select the option Let Vivado manage wrapper and auto-update.
Once the Sources window shows that Vivado is finished updating the project file structure (the word Updating... disappears from the upper right corner of the window) run synthesis, implementation, and generate a new bitstream. After a new bitstream has been generated, export the hardware for use in PetaLinux by selecting File > Export > Export Hardware...
Select the option to include the bitstream in the exported hardware platform and specify the desired output file path. I personally like to use the main project folder for the respective Vivado project. The final screen in the pop-up window will summarize your selections for verification. Click Finish to export the hardware as the Xilinx filetype,.XSAImport Hardware Design into PetaLinux Project
Whether a new PetaLinux project was created or you're starting from an existing PetaLinux project targeted for the Arty Z7 board (I'm using the PetaLinux 2020.2 project I created for the Arty Z7 in my pervious post here), import the new hardware into the project with the following command:
~$ petalinux-config --get-hw-description /<directory of .XSA file>/
As I've mentioned in the past, this command pulls the hardware design into the PetaLinux project and launches the System configuration GUI for you to specify hardware configurations such as the target UART port to be used, the root filesystem type and where it will be stored, the target ARM core processor within the Zynq chip, and so on.
Since I'm not making any other changes to the hardware in the PetaLinux project outside of importing the new hardware design that now has the AXI DMA in it, I simply exited the System configuration GUI and selected the option to save any changes.Add DMA Debugging Kernel Drivers in PetaLinux (Optional)
By default, the Xilinx AXI DMA kernel drivers are enabled in PetaLinux projects, located under Device Drivers > DMA Engine support > Xilinx DMA Engines. The drivers for memory mapping I'll be using in the Linux application to view the memory space (/dev/mem from the userspace and CONFIG_DEVMEM in the kernel) are also enabled by default in PetaLinux projects. So you don't need to mess with configuring the kernel for this project, but if you are curious, you can look through the kernel DMA options and select the Help option for each to get an informative little description of each:
There is a test client for the DMA that can be selected to be built into the kernel as a module which allows for a kernel level test of the DMA engine using the modprobe command to load it manually when desired.
If the DMA test client is desired, launch the PetaLinux kernel configuration ASCII GUI:
~$ petalinux-config -c kernel
Navigate to Device Drivers > DMA Engine support where DMA Test client is towards the bottom of the list. Enable it by highlighting it using the arrow keys then pressing Y.
I won't cover how to run the DMA test client in this post, but there is a great how-to for it here.Create Custom Application in PetaLinux
This project will be demonstrating how to use AXI DMA from the OS/application layer in Linux using memory mapping. Thus a custom application needs to be added to the PetaLinux project with the following command:
~$ petalinux-create -t apps --template c --name dmatest --enable
The --enable flag at the end simply saves you the step of having to manually add the application to be included in the next PetaLinux build using the root filesystem configuration ASCII GUI.
Open dmatest.c from /<PetaLinux project directory>/project-spec/meta-user/recipes-apps/dmatest/files and add the code to it from the dmatest.c file attached below.How to Use DMA from Linux Userspace
Now that I've gone through the hardware setup and touched on the Linux drivers that I'm using to control the DMA, let's take a look at the overall picture if what's happening here.
The key thing to remember about using the DMA engine, is that everything happens by setting the relevant bits in the control registers for the MM2S and S2MM channels and monitoring their current states by reading the values of the bits in their respective status registers. (each channel has its own, independent control and status registers). The control and status registers are accessed in the AXI DMA via its AXI Lite interface.
T AXI Lite interface is reading/writing the relevant values of the control and status registers to the physical address in system memory specified in the address editor for the AXI DMA in Vivado. Look at the screenshot from the Address Editor for this project below, we can see that the control and status register values will be placed starting at physical address 0x40400000. Each specific control and status register value will have its dedicated offset from that base address. For example, the address for the source data that the MM2S channel pull from to transfer is located at offset 0x18 (further explained below), meaning the Linux application needs to write the desired source data to physical address 0x40400018 in the DDR.
As I previously mentioned in the hardware setup of the AXI DMA in Vivado, it is configured for direct register mode, meaning that simple transfers of data of a given length will be transferred from a source address to a destination address in main system memory via the MM2S and S2MM DMA channels.
Taking a look at Table 2-6 in Xilinx's PG201 for the AXI DMA IP, it lists the relevant control and status registers to read and write to for Direct Register mode:
The source address registers, destination address registers, and transfer length registers for the MM2S and S2MM channels are straightforward as they simply have a given value written to or read from them. The specific control and status registers however have bits dedicated to certain actions/stats requiring a bit more explanation:
For the control register of the DMA MM2S channel, the main bits to look at for basic function of the AXI DMA in Direct Register mode are the following:
- Bit 0 = Run (set to 1) or Stop (set to 0) the DMA MM2S channel.
- Bit 2 = Set to 1 to soft reset the DMA MM2S channel.
- Bit 12 = Set to 1 to enable the Interrupt on Complete (IOC) flag for the DMA MM2S channel.
Then the following bits are monitored in the status register of the DMA MM2S channel:
- Bit 0 = Is set to 0 when the DMA MM2S channel is running, and is set to 1 when the channel is halted.
- Bit 1 = Is set to 0 when the DMA MM2S channel is not idle, meaning that the transfer has not been completed in direct register mode. It is set to 1 when the channel is idle, meaning the transfer has completed and DMA controller is paused.
- Bit 12 = If enabled (bit 12 is set to 1 in the control register), it will read out as 1 when the transfer has completed and the MM2S interrupt output (mm2s_introut) of the AXI DMA will also go high.
For the control register of the the DMA S2MM channel, the main bits to look at for basic function of the AXI DMA in Direct Register mode are the following:
- Bit 0 = Run (set to 1) or Stop (set to 0) the DMA S2MM channel.
- Bit 2 = Set to 1 to soft reset the DMA S2MM channel.
- Bit 12 = Set to 1 to enable the Interrupt on Complete (IOC) flag for the DMA S2MM channel.
Then the following bits are monitored in the status register of the DMA S2MM channel:
- Bit 0 = Is set to 0 when the DMA S2MM channel is running, and is set to 1 when the channel is halted.
- Bit 1 = Is set to 0 when the DMA S2MM channel is not idle, meaning that the transfer has not been completed in direct register mode. It is set to 1 when the channel is idle, meaning the transfer has completed and DMA controller is paused.
- Bit 12 = If enabled (bit 12 is set to 1 in the control register), it will read out as 1 when the transfer has completed and the S2MM interrupt output (s2mm_introut) of the AXI DMA will also go high.
Now seeing which registers are relevant in the DMA controller interface, here's the sequence to accomplish a transfer using the AXI DMA (I have also commented the code in the dmatest.c file attached below to correlate these steps to):
1. Reset the DMA by writing a 1 to bit 2 of the MM2S (offset 0x00) and S2MM (offset 0x30) control registers.
2. Make sure the DMA is stopped by writing a 0 to bit 0 of the MM2S (offset 0x00) and S2MM (offset 0x30) control registers.
3. Enable the Interrupt on Complete (IOC) flag by writing a 1 to bit 14 of the MM2S (offset 0x00) and S2MM (offset 0x30) control registers.
4. Write the source address in the Arty's DDR of the data the MM2S channel is to read from to the MM2S DMA source address register.
5. Write the destination address in the Arty's DDR of the data the S2MM channel is to write to to the S2MM DMA destination address register
6. Run the DMA MM2S channel by writing a 1 to bit 0 of the MM2S control register (offset 0x00).
7. Run the DMA S2MM channel by writing a 1 to bit 0 of the S2MM control register (offset 0x30).
8. Write the length of the transfer from the MM2S channel by writing the value for the total number of bytes to send out to the MM2S transfer length register.
9. Write the length of the buffer for the S2MM channel by writing the value for the total number of bytes to read into memory on the S2MM channel to the S2MM buffer length register.
10. Monitor the IOC flag (bit 12) and idle flag (bit 1) in the status registers for the MM2S and S2MM channels. Once both of these bits read out as being set to 1, the transfer is complete.
I made this high level block diagram in draw.io of what is happening.
In the Linux userspace, virtual addresses for the location of the source data in the Arty's DDR, location of the destination for the data in the Arty's DDR, and the location in the Arty's DDR where the DMA controller is reading from/writing to, are all mapped to virtual address via the memory map driver. Via the character device file /dev/mem the kernel is able to access the physical memory locations in the Arty's DDR.
The AXI DMA engine is then accessing the DDR directly with its controller interface, M2SS DMA channel interface, and S2MM DMA channel interface. The AXI DMA engine pulls data out of the DDR from the specified source address and converts the memory mapped data into a serialized format before outputting it on the AXI Stream MM2S port to the input of the AXI Stream Data FIFO. The AXI DMA then receives the data back from the output of the AXI Stream Data FIFO in on its AXI Stream S2MM port where it converts the serialized format of the data back to memory mapped and writes it to the specified destination address in the Arty's DDR.Build PetaLinux Project
After adding the code to run the AXI DMA to the custom application, first build or rebuild the root filesystem.
~$ petalinux-build -c rootfs
Then build or rebuild the whole PetaLinux project.
After the PetaLinux project has built, generate the boot binary for the resultant embedded Linux image with the following command. Note that the force flag is needed if you have previously generated a boot binary for the PetaLinux project.
Prep SD Card
~$ petalinux-package --boot --fsbl ./images/linux/zynq_fsbl.elf --fpga ./images/linux/system.bit --u-boot --force
If you haven't done so already, prepare an SD card for loading Linux onto it by partitioning it into two main partitions. A FAT32 partition that is at least 500MB with 4MB of free space preceding it where all of the files related to the boot process will live such as the boot binary image and the kernel. And a second EXT4 partition at least 4GB in size where the root filesystem will live.
As I've outlined in the past, create a directory titled BOOT and a directory titled rootfs on the host PC to mount the FAT32 and EXT4 to respectively. Copy the kernel, and boot binaries to the FAT32 partition and extract the root filesystem package onto the EXT4 partition. Unmount each partition after running the sync command.
~$ mkdir /media/BOOT
~$ mkdir /media/rootfs
~$ sudo mount /dev/sdc1 /media/BOOT
~$ sudo mount /dev/sdc2 /media/rootfs
~$ sudo cp /<petalinux project dir>/images/linux/BOOT.BIN /media/BOOT/
~$ sudo cp /<petalinux project dir>/images/linux/image.ub /media/BOOT/
~$ sudo cp /<petalinux project dir>/images/linux/boot.scr /media/BOOT/
~$ sudo tar xvf /<petalinux project dir>/images/linux/rootfs.tar.gz -C /media/rootfs/
~$ sudo umount /media/BOOT/
~$ sudo umount /media/rootfs/
Install the SD card in the proper slot on the Arty Z7 and boot into the Linux image.Test DMA Application
To run the DMA test application:
Looking at the code in dmatest.c, you'll notice that the DMA is being tested in that a recognizable pattern of data is being written to the source address in DDR and is then checking to see if that is what ends up in the destination address. Each step from the DMA transfer process specified above is also outlined with print statements.
As we can see, the hex pattern of test data is successfully read out from the destination address in the Arty's DDR.
While this post is quite long-winded, I hope the thoroughness can really help with a mountain it can be to first figure out how to implement DMA in FPGAs. I know I was pretty stuck when I first started out.