Network Drivers Lab
Network Drivers Lab
driver development
Training lab book
Warning
In this lab, we are going to re-implement a driver that already
exists in the Linux kernel tree. Since the driver already exists, you
could just copy the code, compile it, and get it to work in a few
minutes. However, the purpose of this lab is to re-create this driver
from scratch, taking the time to understand all the code and all the
steps. So please play the game, and follow our adventure of
creating a network driver from scratch !
Setup
Go to the /home/<user>/felabs/linux/networking directory. It
contains:
• rootfs.jffs2, the JFFS2 image of a root filesystem,
containing the necessary tools to load and test the network
device driver. Obviously, since we are going to re-develop the
network driver, we cannot use NFS to mount our root
filesystem during development!
• module/, containing a skeleton of a kernel module
The datasheet of the device is available at http://www.free-
electrons.com/labs/at91sam9263-manual.pdf.
We'll need a special kernel for this lab because we need to remove
the official network driver of the Calao board. Follow these steps to
configure and compile the kernel:
• Grab the tarball of a recent kernel
• Modify the Makefile with ARCH=arm and adjust
CROSS_COMPILE to your cross-compiler
• Run make usba9263_defconfig to load the default
configuration for the Calao board
• Run make xconfig or make menuconfig, and in the
configuration utility, go to «Device Drivers» → «Network
device support» → «10/100 Mbit/s devices» and disable the
«Atmel MACB support».
Now, boot the board in U-Boot, transfer and flash the kernel and
root filesystem to the board, and adjust the bootargs parameter to
mount the root filesystem from flash. In U-Boot:
• nand erase 0 200000
• tftp 21000000 uImage
• nand write 21000000 0 200000
• nand erase 200000 400000
• tftp 21000000 rootfs.jffs2
• nand write 21000000 200000 ${filesize}
• setenv bootcmd nboot 21000000
• setenv autostart yes
• setenv bootargs
mtdparts=atmel_nand:2m(kernel)ro,3m(rootfs)rw
root=/dev/mtdblock1 rootfstype=jffs2
• saveenv
Reboot your board, and see your kernel booting, mounting your
root filesystem and starting the userspace applications.
2
© 2009 Free Electrons, http://free-electrons.com Creative Commons License
Linux network
driver development
Training lab book
by
#if 1
4
© 2009 Free Electrons, http://free-electrons.com Creative Commons License
Linux network
driver development
Training lab book
Four values for the clock divider are possible, let's add defines for
them:
#define EMAC_CLK_DIV8 0
#define EMAC_CLK_DIV16 1
#define EMAC_CLK_DIV32 2
#define EMAC_CLK_DIV64 3
Now, in the probe() function, we'll use the clock API of the kernel.
Remember, the clock API is just clk_get()/clk_put(),
clk_enable()/clk_disable() and clk_get_rate(). So, in the
probe() function, we'll get and enable the “macb_clk” clock. The
clocks are defined statically in arch/arm/mach
at91/at91sam9263.c. The struct clk pointer returned by
clk_get() will be stored in the private structure struct
netdrv_device.
Once the clock has been get and enabled, we need to adjust the
divider of the Ethernet controller, according to the datasheet of the
CPU:
clk_hz = clk_get_rate(priv>clk);
if (clk_hz <= 20000000)
config = (EMAC_CLK_DIV8 << EMAC_NCFG_CLK_DIV_SHIFT);
else if (clk_hz <= 40000000)
config = (EMAC_CLK_DIV16 << EMAC_NCFG_CLK_DIV_SHIFT);
else if (clk_hz <= 80000000)
config = (EMAC_CLK_DIV32 << EMAC_NCFG_CLK_DIV_SHIFT);
else
config = (EMAC_CLK_DIV64 << EMAC_NCFG_CLK_DIV_SHIFT);
__raw_writel(config, priv>regs + EMAC_NCFG);
The first one contains the low 4 bytes, the second one contains the
top 2 bytes, forming the 6 bytes MAC address.
Write a function that:
• reads the MAC address (using __raw_readl)
• initialize a 6 bytes array with the MAC address
• test if this MAC address is valid using the
is_valid_ether_addr() function provided by the kernel. If
the address is valid, copy it to the dev_addr field of the
net_device structure. If the address is not valid, generate a
random network address into the same dev_addr field using
the random_ether_addr() function, also provided by the
kernel.
Now, in the probe() function, call your MAC address reading
function. After returning from the function, you can add a printk()
message to print the MAC address from the dev_addr field of the
net_device structure. Compile and test your module to see if it
works.
6
© 2009 Free Electrons, http://free-electrons.com Creative Commons License
Linux network
driver development
Training lab book
out = (EMAC_MAN_SOF_VALUE << EMAC_MAN_SOF_SHIFT) |
(EMAC_MAN_RW_READ << EMAC_MAN_RW_SHIFT) |
(mii_id << EMAC_MAN_PHYA_SHIFT) |
(regnum << EMAC_MAN_REGA_SHIFT) |
(EMAC_MAN_CODE_VALUE << EMAC_MAN_CODE_SHIFT);
__raw_writel(out, netdrvdev>regs + EMAC_MAN);
while(! (__raw_readl(netdrvdev>regs + EMAC_NSR) &
(1 << EMAC_NSR_IDLE_SHIFT))
cpu_relax();
return __raw_readl(netdrvdev>regs + EMAC_MAN) & 0xFFFF;
}
static int netdrv_mdio_write(struct mii_bus *bus, int mii_id,
int regnum, u16 value)
{
struct netdrv_device *netdrvdev = bus>priv;
u32 out;
out = (EMAC_MAN_SOF_VALUE << EMAC_MAN_SOF_SHIFT) |
(EMAC_MAN_RW_WRITE << EMAC_MAN_RW_SHIFT) |
(mii_id << EMAC_MAN_PHYA_SHIFT) |
(regnum << EMAC_MAN_REGA_SHIFT) |
(EMAC_MAN_CODE_VALUE << EMAC_MAN_CODE_SHIFT) |
(value & 0xFFFF);
__raw_writel(out, netdrvdev>regs + EMAC_MAN);
while(! (__raw_readl(netdrvdev>regs + EMAC_NSR) &
(1 << EMAC_NSR_IDLE_SHIFT))
cpu_relax();
return 0;
}
Of course, you'll have to create all the definitions for the different
registers, according to the AT91SAM9263 specifications.
Main initialization
Finally, we have to use this new mechanism from the probe()
function of our driver. We'll first enable the clock and configure
whether we're using a RMII or a MII connection with the PHY
(through the EMAC_USRIO register), and then call our
netdrv_mii_init() function.
8
© 2009 Free Electrons, http://free-electrons.com Creative Commons License
Linux network
driver development
Training lab book
Now, we'll set bit CLKEN of register EMAC_USRIO, and optionally set
the RMII bit if the is_rmii field of the platform data is true. Refer
to the AT91SAM9263 datasheet for the registers and bits values,
and use __raw_writel() to write to the EMAC_USRIO register.
Finally, call the netdrv_mii_init() function.
This will point to the PHY we're using. Then, we'll implement a
netdrv_mii_probe() function. The first step is to scan the detected
PHYs to get the phy_device of our PHY:
for (phy_addr = 0; phy_addr < PHY_MAX_ADDR; phy_addr++) {
if (netdrvdev>mii_bus>phy_map[phy_addr]) {
phydev = netdrvdev>mii_bus>phy_map[phy_addr];
break;
}
}
if (! phydev)
return 1;
} else {
phydev = phy_connect(netdrvdev>dev,
dev_name(&phydev>dev),
&netdrv_handle_link_change,
0, PHY_INTERFACE_MODE_MII);
}
if (! phydev)
return 1;
priv>speed = phydev>speed;
priv>duplex = phydev>duplex;
}
}
The next case to handle is when the link goes down. Here we
simply reset the speed and duplex field of our private data
structures, so that next time the link goes up, they have sane
default values:
else {
priv>speed = 0;
priv>duplex = 1;
}
10
© 2009 Free Electrons, http://free-electrons.com Creative Commons License
Linux network
driver development
Training lab book
We will arbitrarily decide that our reception ring contains 512 DMA
buffers (and descriptors !), so let's define
#define RX_RING_SIZE 512
12
© 2009 Free Electrons, http://free-electrons.com Creative Commons License
Linux network
driver development
Training lab book
priv>rx_ring[i].ctrl = 0;
addr += RX_BUFFER_SIZE;
}
priv>rx_ring[RX_RING_SIZE 1].addr |=
(1 << EMAC_DMA_RX_WRAP_SHIFT);
14
© 2009 Free Electrons, http://free-electrons.com Creative Commons License
Linux network
driver development
Training lab book
Introduce locking
Until now, our driver does not implement proper locking, which
might lead to incorrect concurrent access to shared resources.
Therefore, we must implement locking. In this driver, a single
spinlock will be used, since the concurrent accesses that must be
prevented occur between the interrupt handler and the process
context code.
Therefore, add a spinlock_t structure to our private data
structure, and initialize this spinlock with spin_lock_init() in the
probe() method.
Then, we must use this spinlock in:
• netdrv_handle_link_change(), with spin_lock_irqsave()
and spin_unlock_irqrestore() to prevent concurrency
between the execution of this function and the interrupt
handler
• netdrv_close(), again with spin_lock_irqsave() and
spin_lock_irqrestore() to prevent concurrency between
interrupts and the operation of stopping the network
interface. This must be done after stopping the queue and the
PHY.
Implement transmission
Definitions
Before implementing the transmission function themselves, let's
start by adding the usual definitions:
• The TSTART bit of the Network Configuration Register, used
to start the transmission of the packets stored in the
Transmission Queue
#define EMAC_NCR_TSTART_SHIFT 9
• The transmission completion bit of the Transmit Status
Register
#define EMAC_TSR_COMP_SHIFT 5
• The bit of the transmission DMA descriptor that tells if the
current descriptor is the last buffer of the current frame. In
our case, this bit will be set of all transmission DMA
descriptors since we will always send a packet in a single
DMA buffer
#define EMAC_DMA_TX_LAST_SHIFT 15
• A macro that tells how many DMA buffers are currently
available (free) in the queue
#define TX_BUFFS_AVAIL(priv) \
(((priv)>tx_tail <= (priv)>tx_head) ? \
(priv)>tx_tail + TX_RING_SIZE 1 (priv)>tx_head: \
(priv)>tx_tail (priv)>tx_head 1)
int ndo_start_xmit(struct sk_buff *, struct net_device *)
operation. So, create an empty netdrv_start_xmit() function and
register it in the net_device_ops structure.
The code of the netdrv_start_xmit() function will manipulate the
queue of DMA buffer descriptors and this queue will also be
modified by the interrupt handler. Therefore, locking must be used.
As the start_xmit() function is guaranteed never to be called from
an interrupt handler, we can directly use spin_lock_irq() and
spin_unlock_irq().
Once the lock is taken, the first thing to check is if we have at least
one remaining DMA buffer descriptor available to send the packet
(using the TX_BUFFS_AVAIL macro) If not, this is really a problem
since we are supposed to manage this queue and tell the kernel to
stop sending packets when the queue is full. Therefore, if this
happens, stop the queue with netif_stop_queue(), release the
spinlock and return 1 (which the kernel will interpret as an error).
If we have at least one DMA buffer descriptor available, the next
available is the one pointed by tx_head in our private data
structure.
The next step is to map the packet so that it can be send through
DMA (we are using « streaming DMA »). It takes place using the
dma_map_single() function, which takes as argument a struct
device pointer (can be found from our private data structure), the
memory area to be mapped (the pointer to the packet data is skb
>data), the length (skb>length) and the direction of the DMA
transfer (in our case a transmission to the device, so
DMA_TO_DEVICE). The function returns a DMA address, of the type
dma_addr_t. This is the address we must give to our device.
Then, update our internal tx_skb array with the DMA address and
the pointer to the SKB. This will be useful at the completion of the
transmission.
Now, let's compute the value of the ctrl field of the DMA buffer
descriptor:
• It must contain the length of the data to transmit, skb->len
• The EMAC_DMA_TX_LAST_SHIFT bit must be set, as all our
packets are sent through a single buffer
• If the buffer we're using is the last one of the queue (tx_head
is equal to TX_RING_SIZE - 1), then the
EMAC_DMA_TX_WRAP_SHIFT bit must be set
Then, initialize the addr field of the DMA buffer descriptor with the
DMA address, and the ctrl field with the value computed
previously. To prevent the reordering of these writes with the write
that will start the transmission, add a write memory barrier after
the setup of the DMA buffer descriptor.
Then, update the tx_head to the next available transmission buffer
so that further calls to start_xmit() will use another buffer (hint:
use the NEXT_TX macro).
Finally, start the transmission by setting the
EMAC_NCR_TSTART_SHIFT bit of the Network Configuration
Register. Be careful not to change the value of other bits in this
register !
16
© 2009 Free Electrons, http://free-electrons.com Creative Commons License
Linux network
driver development
Training lab book
Before the end of the function, we must tell the kernel if we still
have DMA buffer descriptors available to accept new packets. Test
the number of DMA buffer descriptors available, and call
netif_stop_queue() if needed.
Transmission completion
The completion of the transmission will of course be notified by an
interrupt. So, when an interrupt is raised, we will check if it's due
to a transmission completion, and if so, we will unmap the DMA
buffer, mark it as available, and potentially signal the kernel that
we are ready again to send more packets.
So, the first part takes place in the interrupt handler,
netdrv_interrupt(). First, we need to test if the interrupt really
originates from our device. To do so, read the EMAC_ISR (Interrupt
Status Register), and if it's 0 (no interrupt pending), then simply
return IRQ_NONE to the kernel.
Otherwise, take our spinlock, so that the execution of the code of
our interrupt handler is protected against concurrent access. Using
the spin_lock() and spin_unlock() variant is sufficient, since our
interrupt is already guaranteed to be disabled.
Then, we have to loop until the EMAC_ISR register is 0. This
register gets reset to 0 when it's read, so there's no need to reset
bits manually in it. However, this also mean that you must save and
use the value of the register as it was in the first test at the
beginning of the interrupt handler.
In the loop, test if the bit EMAC_IER_TCOMP_SHIFT bit is set, which
notifies a transmission completion. If so, call a new netdrv_tx()
function that will take care of finishing the transmission process.
Now, let's implement the netdrv_tx() function. This function should:
• Verify in the Transmit Status Register that a transmission
completion occurred. To do so, one must
◦ Read the EMAC_TSR register
◦ Write the read value into the EMAC_TSR register to clear
the bits (according to the controller specification, writing
with a bit set actually clears the bit in the register)
◦ Test if the EMAC_TSR_COMP_SHIFT bit is set, and if not,
return
• Test all DMA buffer descriptors (in a loop), for the tail
(pointed by tx_tail) to the head (pointed by tx_head).
Remember to use NEXT_TX() to compute the index of the next
DMA buffer descriptor in the queue. For each descriptor, we
will:
◦ Use a read memory barrier to make sure that what we will
actually read is what has been set by the device into the
DMA descriptor
◦ Test the EMAC_DMA_TX_USED_SHIFT bit. If it isn't set, then
we have to stop the loop over the DMA descriptors, since
it means that we reached a DMA descriptor whose
transmission hasn't been completed by the controller
◦ Unmap the SKB using dma_unmap_single()
Implement reception
The last (but not least!) part of our driver is obviously to implement
the reception support.
The reception of packets is notified through an interrupt, so in the
interrupt handler, we'll add a call to a netdrv_rx() function. This
function will go through the list of DMA descriptors, and find the
ranges of DMA descriptors that correspond to a packet. For each of
these ranges, a netdrv_rx_frame() function will be called to handle
the reception of a packet. Here, we have a difference between
transmission and reception: on the transmission side, eack packet
is completely sent through a single DMA buffer and descriptor,
while on the reception side, DMA buffers are limited to 128 bytes,
so multiple reception DMA buffers are usually needed to store the
contents of a network packet.
Definitions
As usual, additional definitions are needed:
• Bit definitions for the DMA reception descriptors
◦ #define EMAC_DMA_RX_USED_SHIFT 0
This bit is set to one in the address field of the DMA
descriptor by the device when the DMA buffer has been
filled with data
◦ #define EMAC_DMA_RX_SOF_SHIFT 14
This bit is set to one in the control field of the DMA
descriptor by the device when the data in this DMA buffer
is the beginning of a network packet (SOF stands for Start
of Frame)
◦ #define EMAC_DMA_RX_EOF_SHIFT 15
This bit is set to one in the control field of the DMA
descriptor by the device when the data in this DMA buffer
is the end of a network packet (EOF stands for End of
Frame)
• As the Ethernet header is 14 bytes in size and for
performance reasons, it's better to have the IP header word-
aligned, many Ethernet drivers allocates two additional bytes
in each packet and shift by two bytes the Ethernet header. So
18
© 2009 Free Electrons, http://free-electrons.com Creative Commons License
Linux network
driver development
Training lab book
• Tell the kernel that our device didn't do any verification of the
packet checksums (some devices do this directly in
hardware). This is done using skb>ip_summed =
CHECKSUM_NONE.
• Tell the kernel how much data we will put in our SKB using
skb_put() with the packet length as argument
Once our SKB is setup, let's go through the different DMA buffers
that contain the data of our packet and handle them in a loop that
does:
• Computes the length of the data available in the current DMA
buffer. Usually it's the size of the buffer, RX_BUFFER_SIZE,
except for the last one !
• Copy data from the DMA buffer to the SKB, using
skb_copy_to_linear_data_offset(). The arguments of this
function are: the SKB pointer, the offset in the SKB at which
the data should be copied, the location from which the data
should be taken, and the length of the data to copy
• Clear the EMAC_DMA_RX_USED_SHIFT bit in the DMA
descriptor, to mark the corresponding DMA buffer as
available again for future receptions
At the end of the function, we must compute the protocol of the
packet that has been received and store it in the SKB: skb
>protocol = eth_type_trans(skb, priv>dev).
And finally, submit the received packet to the kernel using
netif_rx().
Now, your driver should be working, and network traffic should go
back and forth between the target and the rest of the world.
Congratulations!
Improvements
Compared to the official driver for this Ethernet controller as
available in the kernel, our driver lacks a few features:
• No support for NAPI, which allows to limit the interrupt rate
when the network traffic increases significantly
• No support for the ethtool API, which allows userspace
applications to get informations about the status of the link
and to configure a few settings
• No support for the statistics (packets received/sent, bytes
received/sent, errors, etc.)
• No support for promiscuous mode and for the multicast filters
• No proper management of errors communicated by the
Ethernet controller
20
© 2009 Free Electrons, http://free-electrons.com Creative Commons License