How To Create and Program USB Devices PDF
How To Create and Program USB Devices PDF
How To Create And Program USB Devices
Henk Muller, XMOS
Fri, 20120727 10:28
The Universal Serial Bus (USB) standard has been with us for many years, but making USB devices is still a
daunting task. The USB specification comprises thousands of pages spread over dozens of documents, and
although good books have been written on the subject, they are rarely shorter. In addition, the application
programming interface (API) offered for programming USB devices is often complex and intricate. This
article describes how to program your own softwarebased USB devices. It is not limited to standard class
devices, but also presents a way to implement any device, whether it complies with a standard class or not.
Table of Contents
1. The USB Way Of Thinking
2. Specifying And Discovering Device Capabilities
3. What To Do With Your Data
4. Programming USB Devices
5. JTAG Over USB
6. Audio Over USB
7. Summary
8. References
The USB Way Of Thinking
To understand USB, one has to understand a dozen terms that form the foundation of the USB world. USB
separates the host from the device: there is one host, connected to multiple devices. The host initiates all
traffic and schedules it on the USB bus.
Related
USB 3.0: A Tale Of Two Buses
Get Ready For Some Hard Work With Multicore Programming
Temperature Sensors Are Hot... In Circuit Design
Bolster Overcurrent Protection With Chip Fuses
A device is a physical box at the end of the USB cable that identifies itself to the host by passing it a device
descriptor and a configuration descriptor. These descriptors are binary data that describe the capabilities of
the USB device. In particular, the configuration descriptor describes one or more interfaces, where each
interface is a specific function of the device. A device may have multiple interfaces. For example, a USB
device that comprises a keyboard with a builtin speaker will offer an interface for playing audio and an
interface for key presses.
Each interface comprises a series of endpoints that are the communication channels between the host and
the device. Endpoints are numbered between 0 and 15 and may be IN endpoints or OUT endpoints. These
terms are relative to the host: OUT endpoints transport data to the device, and IN endpoints transport data
to the host (Fig. 1). The are four types of endpoints:
Bulk endpoints reliably transport data whenever it is required. Bulk data is acknowledged and
therefore faulttolerant.
Isochronous endpoints are for transporting realtime data. A fixed bandwidth is allocated to them.
The host allocates this bandwidth and will not allow an isochronous endpoint to be created if no
bandwidth is available. In contrast, bulk endpoints have no guaranteed bandwidth.
Interrupt endpoints are polled occasionally by the host and enable a device to report status changes.
The control endpoint (endpoint 0) is used to perform general operations, such as obtaining
descriptors, or performing a controloperation such as “change the volume” or “set the baud rate” on
any of the interfaces.
1. Traffic over the USB bus is bidirectional.
USB traffic is organized in frames. Frames are marked by the host sending a start of frame (SOF) every 125
µs (for highspeed USB) or every 1 ms (for Full Speed USB). Isochronous endpoints are allocated a transfer
in every frame. Interrupt endpoints are polled once every so many frames, and bulk transfers may happen
anytime when the bus is not in use.
As an example, the aforementioned keyboard with builtin speaker has at least two endpoints: an
isochronous OUT endpoint to transfer audio data to the speaker, and an interrupt IN endpoint to poll the
keyboard. Suppose the speaker is a monospeaker with a 48kHz sample rate. The host then will send six
samples of data every 125 µs (six samples/0.000125 seconds = 48,000 samples/second). If a sample
occupies 16 bits, the host will reserve enough bandwidth to send a 96bit OUT packet in every 125µs frame.
This consumes around 0.5% of the USB bandwidth. The remaining 99.5% is free for other interfaces or
other USB devices on the same bus.
Specifying And Discovering Device Capabilities
The host initiates all USB traffic. When a device is plugged in, the host first requests the devicedescriptor.
This descriptor comprises two sets of information that inform the host of the basic capabilities of the device:
the device class and the vendor ID/product ID (VID/PID).
The class and subclass can be used to specify a device with generic capabilities. A USB speaker advertises
itself as class Audio2.0. A keyboard advertises itself as a HIDclass (human interface device) device. The
previous example of a device with both a speaker and a keyboard advertises itself as a Composite device
class.
USB devices that comply with a specific USB class enable crossvendor and crossplatform compatible USB
devices. The USB specification specifies hundreds of device classes that enable the generic implementation
of, for example, Ethernet dongles, mixing desks, or flash disks and enable operating systems to provide
generic drivers for these classes.
There are cases where the USB device does not fit a specific class or where the class specification is too
constrained for a particular device. In that case, the class of the device must be described as vendorspecific.
The operating system (OS) shall then use the VID and PID to find a vendorspecific driver.
When the device descriptor has been dealt with, the OS assigns the USB device a number, informs the USB
device of the number (it is being enumerated), and requests the configuration descriptor that specifies each
interface in detail. In the earlier example, the configuration descriptor will specify two interfaces: one
interface of class USBAudio2.0 with a single channel output endpoint running at 48 kHz only, the other
interface of class HID that specifies a single keyboard with a specific keymap.
There are cases where the USB device does not have any OS support and it should interact with a user
program directly. In that case, a generic driver such as the opensource libusb driver that allows an
application program to communicate with any USB device can be used. Typically, the device will be
advertised as vendorspecific. Through the libusb interface the user program can detect a device with a VID
and PID that it wants to interact with, claim an interface, open an endpoint, and send IN and OUT requests
to that endpoint.
What To Do With Your Data
The enumeration of the device typically requires static descriptors to be sent to the host. The difficult bit is
creating the descriptors. Serving them is simple, as that is the only task required of the device at the time.
After enumeration, data may arrive or be requested on all endpoints in quick succession. This requires an
interface between the software that deals with the function of the USB device (e.g., playing audio or
monitoring keystrokes on the keyboard) and the lowlevel USB protocol. Prior to designing this interface,
let’s look at how to handle data on various types of endpoints.
Bulk endpoints are the easiest to deal with. Since each data transfer is acknowledged, it is possible to send a
negative acknowledge (NAK) stating that the device is not yet ready to deal with the endpoint. For example,
if software is dealing with some other part of the device, or if data is simply not yet available (for example, a
read from flash memory is not yet completed), the lowlevel USB driver can send a NAK.
However, sending NAKs has a downside. The only sensible option for the host is to retry the request,
potentially creating a long series of requests that are aborted by NAKs. This wastes USB bandwidth that
could have been used by other endpoints or devices. In addition, the host software is blocked until the
device answers. Hence, NAKs should be a last resort. It may be more appropriate to send partial data than
to NAK an IN request. In the case of an OUT request, little can be done. If there is no room to accept the
data, then a NAK is the only answer. However, it may be more appropriate to introduce a highlevel
protocol that will not allow an OUT request until there is space.
Isochronous endpoints are more difficult to deal with because they are not acknowledged. The transmitter
(in either direction) assumes that the data arrives. Since there is no acknowledgement on an isochronous
endpoint, there is no possibility to send a NAK. Hence, if the device is not ready, the only course of action is
to drop the data from an OUT packet or to send no data for an IN packet.
Although this may seem harsh at first, remember that the purpose of an isochronous endpoint is to transmit
realtime data in a guaranteed time slice of the USB bus. If the device does not have room to store the OUT
data, data is probably not dealt with in realtime. Dropping is a sensible course of action. If no data is
available to answer an IN request, then the device has not collected enough data. A sensible course of action
is to transmit whatever data is present, or possibly no data at all.
Assuming that the data can be processed or produced in real time, it is easy to compute the buffer
requirements for an isochronous endpoint:
For an OUT endpoint, the worst possible case is that the host posts one OUT request right at the end
of a USB frame, and then immediately after the start of frame (SOF) it posts a second OUT request.
This means that two OUT requests, carrying 250 µs of data, are received in quick succession. Hence,
the buffering scheme must be able to buffer at least 250 µs worth of data. As long as the program
does not consume data from this buffer until the SOF following the first packet, the buffer will never
empty, providing a continuous data stream from host to device.
For an IN endpoint, the worst case is similar. The host could perform two IN transfers in short
succession just before and immediately after a SOF. This means the IN buffer needs to be at least 250
µs too, and the buffer should contain125 µs at the start of each frame.
It is worth comparing bulk and isochronous transfers from a perspective of coping with errors. In bulk
transfers, the data itself is critical. The host and device can retry and slow down, as long as the data is
transferred correctly, and this transfer must be acknowledged. For an isochronous transfer, the timing is
critical. Either side can throw data away, as long as the realtime characteristics of data further along in the
stream are adhered to. (Of course, the decision to drop data should not be taken lightly as it will have an
impact on the fidelity of, for example, a video or audio stream.)
The datacentric versus timecentric approach has a knockon effect on the consequences of bit errors. A
cyclic redundancy code (CRC) for error detection protects all USB traffic. A corrupted bulk transfer must be
retried until the data is transferred without error. In contrast, a corrupted isochronous transfer will simply
be dropped. The transmitting side will be unaware that data was dropped. The receiving side may know that
the transfer was dropped (if the header with the endpoint was not corrupted), but even then how many
bytes the transfer contained may not be determined. When streaming realtime video or audio this is
important, since there will be an unknown gap in the stream that has to be filled with best effort.
Interrupt endpoints inquire about current state. This may be data that is not too timecritical (such as a key
press), or it may be timecritical data (such as the X and Y location of a mouse or other pointing device). In
the first case, a few microseconds of delay between typing the key and reporting it won’t hurt. However,
when reporting mouse locations, irregular reporting may lead to unintended results.
Programming USB Devices
Having seen how to deal with different types of endpoints, we can develop a programming model for
softwarebased USB devices. It is helpful to keep in mind how USB operates:
There are one or more endpoints, for one or more interfaces, where traffic may arrive or depart at
any time.
Transfers on isochronous endpoints are timecritical.
At most one transfer happens at a time.
The first two points suggest a multithreaded programming structure, especially if more than a single
interface is concerned, or if isochronous endpoints are being used (Fig. 2). The basic software architecture
assumes that there is some sort of USB device library and that for each endpoint we implement a thread that
deals with USB transfers on that endpoint. Other parts of the system, not directly connected to the USB
device library, are implemented using additional threads.
2. The USB software architecture is designed for handling multiple endpoints.
Note that one thread per endpoint may not be required and may not be the most elegant method. Given that
only one transaction happens at a time (the third point), we can create a version of the system that relies on
fewer threads in the system. Suppose that we want to implement a synchronous protocol over two
endpoints where the host will always transmit data over a bulk OUT endpoint, prior to receiving data on an
associated IN endpoint. This protocol requires only a single thread that handles OUT and IN transactions in
order on that endpoint.
This optimization is not without risk. Using a single thread per endpoint naturally caters to the situation
where the host program was aborted and restarted between the OUT and IN transaction. In this case, the
sequence of transactions seen on the device will be ..., OUT, IN, OUT, IN, OUT, OUT, IN, ..., and the thread
dealing with OUT transactions must swallow the extra OUT. When optimized away to a program that
sequentially consumes OUT and IN in order, this program must be written so that at any time it may expect
the protocol to reset.
The third point enables a further optimization. A single thread can deal with all bulk traffic on all interfaces,
optimizing multiple endpoints into a single thread (Fig. 3). The single thread receives a request (IN or OUT)
on any endpoint, dealing with that request, whereupon it moves on to the next request, possibly on a
different endpoint. If the next request arrives before the last request has been dealt with completely, the
USB device library sends NAKs, temporarily holding up the host. This optimization has one disadvantage,
which is that the single thread must keep state for each endpoint and is effectively context switching on each
request. We will show an example of this later.
3. Multiple endpoints can be optimized into a single thread.
The same optimization cannot be applied to isochronous endpoints. If we had a single thread dealing with all
isochronous data, it would involve FIFOs for each endpoint from which the thread will read data or post
data. These FIFOs will increase latency, which is often undesirable.
The rest of this article discusses two examples of the software architecture and optimizations. One example
uses vendorspecific drivers and mostly bulk endpoints (JTAG over USB), and the other shows a standard
USB class with mostly isochronous endpoints (Audio over USB).
JTAG Over USB
For debugging programs on embedded processors, it is common to use a protocol such as JTAG for
accessing the internal state of the processor and to use a program such as gdb to run on a PC to interpret
and modify state, set breakpoints, single step, and so on. USB can be used to provide a crossplatform
portable transport layer between the PC and JTAG wires.
These devices are often called JTAG keys. In addition to JTAG, they often contain a UART for text I/O from
the embedded program. JTAG keys do not follow any standard USB class. Hence, the descriptor labels them
as vendorspecific, and it is up to us to define an endpoint structure that is fit for purpose. One endpoint
structure would use six endpoints:
Two endpoints that control the USB device itself (endpoint 0 IN and OUT, required by USB)
An IN and OUT endpoint for JTAG traffic
An IN and OUT endpoint for UART traffic
Since there is no USB standard, we can define the protocol for the JTAG traffic and choose a set of
commands such as “send a clock with TMS high” or “read the program counter.” On the host side, our
program can use libusb (an opensource USB driver library) to search for a device with our VID and PID,
claim the interface, and then use the libusb interface to send IN and OUT transactions to both the JTAG and
UART endpoints.
Figure 4 shows a suitable software architecture for the deviceend. Given that all endpoints are for bulk
traffic, they can all be mapped onto a single thread and have two separate threads to deal with the state
machines for JTAG traffic and UART traffic. Figure 5 shows a sample implementation.
4. JTAG over USB employs multiple endpoints.
5. A JTAG interface can be implemented using USB hardware and a standard 20pin JTAG connector.
Audio Over USB
As an example of a standard USB device, let’s discuss Audio over USB. The Audio2.0 Class standard allows
interoperability of devices on platforms: a consumer can buy a USB microphone or USB speakers and plug it
into any computer that supports Audio over USB. The number of channels, sampling rate, and sample depth
can be varied to support anything from lowchannelcount consumer devices to highquality, highchannel
count professional audio.
Devices that are more complex also are supported. The descriptor has a syntax for describing mixers,
volume controls, equalizers, clocks, resampling, MIDI, and many other functions, although not all of those
functions are recognized by all operating systems.
On the host side, all USB traffic carrying audio samples is directed to the USBAudio driver, which interacts
through some general kernel sound interface with the program using audio, such as Skype. Other data, such
as MIDI, can be handled through a separate interface by a separate driver.
The device is designed to use USB Audio Class 2.0, and the standard specifies the endpoints that we need to
use. If the application has to support MIDI, stereo in, and stereo out with a clock controlled by the device,
then the standard dictates that there shall be seven endpoints:
Two endpoints that control the USB device itself (endpoint 0 IN and OUT, required by USB)
An isochronous IN endpoint for the I2S analogtodigital converter (ADC)
An isochronous OUT endpoint for the I2S digitaltoanalog converter (DAC)
An isochronous IN endpoint for feedback on the clock speed
A bulk IN endpoint and bulk OUT endpoint for MIDI
The endpoints for the ADC and DAC have one IN and OUT transaction every microframe, every 125 µs.
Assuming that the DAC and ADC operate with a 96kHz sample rate, 12 samples are sent in each direction
every 125 µs. Note that there are two independent oscillators: the device controls the 96kHz sample rate,
and the host controls the 125µs microframe rate.
As these clocks are independent, they will drift relative to each other, and there won’t always be 12 samples
in each transfer. The vast majority of the transfers will have 12 samples, but sometimes there will be 13 or 11
samples.
The device uses the third isochronous endpoint to inform the host of the current speed. It is sampled once
every few milliseconds and reports the current sample rate in terms of samples per microframe. The MIDI
endpoints carry MIDI data as and when available. The standard provides flexibility, allowing us to easily add
more audio channels or audio processing.
Figure 6 shows the software architecture for this device. Unlike the previous example, there is little that can
be optimized. The class specification dictates the endpoint structure. With three isochronous endpoints, it is
advisable to have three processes ready to accept and provide data on these endpoints. The only
optimization that is feasible is for a single thread to handle Endpoint 0 and the MIDI endpoints (Fig. 7).
6. Audio over USB software employs multiple endpoints.
7. Audio over USB hardware can be implemented with XMOS hardware.
Summary
USB devices comprise many interfaces that run concurrently and endpoints that are either bulk or
isochronous. Bulk endpoints are for reliable data transport between host and device, whereas isochronous
endpoints are for realtime data transport.
When programming USB device endpoints, it is easiest to see those endpoints as individual software
threads. Some of those can be mapped onto a single thread, but the programmer has to understand the
consequences. In particular, mapping multiple isochronous endpoints onto a single software thread will
introduce an (unpredictable) latency in the realtime stream.
References
1. Audio class specification: http://www.usb.org/developers/devclass_docs/Audio2.0_final.zip
2. Libusb documentation: http://libusb.sourceforge.net/api1.0/
Source URL: http://electronicdesign.com/boards/howcreateandprogramusbdevices