Variation in USB Audio Latency
Introduction
Not all usb sound devices are made the same with respect to latency, and the difference matters quite a lot if you hope to use your audio interface for real-time effects.
This post will show two cards with widely varying latency on Linux and Jack, but many of the concepts apply to other platforms as well.
Latency is not Throughput
It’s easy to find specs associated with audio interfaces that relate to “speed”. Knowing that your audio interface supports USB3 which can transmit data at 5000mbps won’t help you evaluate latency, though, and throughput isn’t generally interesting when evaluating an audio interface.
In theory, a high-throughput interface allows you to record more simultaneous tracks at higher bit-depths and higher-sample rates. Throughput limits were important in the early days of USB1.1 interfaces, when the 12mpbs of shared throughput could only support half-a-dozen hardware channels before saturating the USB1.1 bus. Those concerns are no longer an issue with modern audio interfaces, though. USB2, USB3, Firewire, Thunderbolt, and PCI all have more than enough throughput to handle many tens of channels of simultaneous audio at the highest bit-depths and sample-rates available. Audio manufacturers also design their interfaces so that you don’t have to think about throughput by matching channel counts to the available bandwidth. Throughput specs are easy to advertise, but they just don’t matter for an audio interface.
Round-trip latency is what you need to be concerned about if you intend to use your audio interface for real-time effects or synthesis. These specs are often much harder to come by, although some manufacturers like Focusrite do advertise their best-case measured latency. There is a great deal of variation in performance with regard to latency, and it varies not just from card to card, but from computer to computer depending on what operating system you run and how it’s configured. For some applications (like podcasting) the difference almost certainly doesn’t matter. For other applications (like real-time effects) the difference is almost certainly going to be both audible and distracting. I’m no expert on psychoaccoustics, but my impression (based on reading and listening) is that ~15ms is a good rule of thumb for when audio latency becomes distracting. Even among interfaces targeted at musicians, it’s common for audio interfaces to vary between 5ms and 30ms, which means some of them induce distracting amounts latency.
Impact of Latency
- Podcasting: Non-musical voice recording can tolerate a relatively large amount of latency. I haven’t encountered any hardware that isn’t suitable for this purpose, and I wouldn’t worry about latency if this is your use-case.
- Real-time Effects: Playing an instrument live, routing the audio signal into your interface, processing it with plugins, and then outputting it to headphones/speakers is the most demanding use-case in terms of latency… and also the most subjective. Some musicians may not notice 20ms of latency, others might. Even the same musician might not notice when playing a pad synth with a soft attack, and then be distracted by the same latency during a piano part. If your hardware isn’t capable of achieving sub-20ms latency, it’s likely you’ll hit some situation where someone notices and is distracted. Not all USB interfaces are capable of hitting that bar (probably most fail at it).
- Soft-synths: Synthesis has an advantage over real-time effects in that it is effected only by output latency, and is not effected by audio input latency. While soft-synths are subject to midi-input latency when played from an external midi-based controller, that latency is usually much less than audio input latency. Interfaces with 25ms-30ms of round-trip latency as measured by jack_iodelay may still be able to achieve consistently acceptable latency for soft-synths.
- Multi-Track Music Recording: When overdubbing multiple audio
tracks, you often end up wanting to do soft-synths or real-time
effects, in which case the above advice applies. For more basic
multi-track recording, you can:
- Monitor input directly (many interfaces have a direct/hardware monitoring switch or dial, or you can use an external mixer).
- Have Jack compensate output timings based on the end-to-end round-trip latency measured by jack_io_delay. See the Ardour Manual and this Ardour Community Thread for details.
USB Latency Chain
ProAudioBlog, AndroidAuthority and the LinuxAudio wiki all have articles that provide solid introductions to the different places that can introduce latency into the audio path. I’ll provide my own take below:
Name | Latency | Description |
---|---|---|
Sound Generation | 1ms - 10ms | Instruments don’t react immediately to producce sound. A synthesizer is likely to have 5ms of latency. Even accoustic instruments have latency. The low-E on a bass guitar vibrates at 41.2Hz, or less than one complete vibration every 2 milliseconds. It must take several milliseconds to excite a recognizeable pitch. |
Analog Cables and Components | 0ms | Signals propagate through copper at roughly 2/3rds of the speed of light. It would require over 10 miles of analog cable to introduce a single ms of latency. This will also apply to any purely analog outboard gear you may have, routing signals through a mixer or a compressor won’t affect latency in a detectable way. |
Digital Outboard Components | 2ms-?ms | While purely analog outboard components behave like a bunch of analog cable from a latency perspective, digital outboard components behave like a tiny computer with their own buffers that introduce latency. Expect at least several ms of latency from any digital outboard gear. |
Midi Input Latency | 1ms | If you’re driving a soft-synth from a midi-controller, you’ll experience midi input latency instead of ADC, OS, and Jack input processing. Midi input latency is generally low compared to audio input latency, often about a millisecond. |
ADC | 1ms-10ms | The audio interface must take analog sound from the inputs and convert them to digital data, as well as apply any digital effects or processing. |
OS Input Processing | 1ms-5ms | The USB subsystem, the audio driver, and possibly other OS components must process data coming in from the audio interface. |
Jack Input Processing | 1ms-5ms on a well tuned kernel. | This is the number that jack and front-ends like Cadence and QJackctl give you. It’s NOT the end-to-end latency of your system. |
Application Processing | 0ms-?ms | Any soft-synths, effects plugins, can introduce their own processing delays between when jack delivers audio samples or midi signals to them and when they output their processed audio samples on their output jack port. |
Jack Output Processing | 1ms-5ms on a well tuned kernel. | This is the same number described in “Jack Input Processing”. In addition to the delay incurred on input, jack must process the audio on output and incur a second delay. |
OS Output Processing | 1ms-5ms | The USB subsystem, audio drivers, and possibly other OS components must process data going out to the audio interface. |
DAC | 1ms-10ms | The audio interface must take the output data and convert it to analog signal on the outputs, as well as perform any effects or signal processing. |
Distance from Speaker | 0ms-10ms | Sound travels through the air at about 1 foot per millisecond. If you’re wearing headphones, this delay rounds to zero. If you’re 10 feet away from your speakers, sound traveling through the air may be eating up a good chunk of your latency budget. |
The latency chain has some notable properties: 1. There are lots of possible sources of latency, almost any one of which can completely blow a 15ms latency budget. 2. It’s often difficult to know where in the signal path latency is coming from or know when latency from multiple components is stacking up to cause a problem. 3. Confusion about latency sources is made even worse by the fact that few components advertise or document their latency properties, you almost always have to measure latency yourself to have any idea what’s going on. 4. Lots of audio software (including Jack and front-ends like Cadence or QJackCtl) highlight their own latency prominently but tell you nothing about end-to-end latency. If you’ve ever read a forum post where someone said “My system has 2ms of latency and its very distracting!” you can be certain that person is misunderstanding latency reported by their software, has end-to-end latency of 20ms or more, but has no idea how much or what is contributing to it.
These challenges combine to ensure that there is an enormous amount of bad anecdotal advice on the internet about latency.
Listening For Latency
The simplest way to listen for latency is just to play and listen. Somewhere between 20ms and 75ms of end-to-end latency, you will start to hear and become distracted by the delay. Use stacatto notes so you can focus on the timing of the attack.
If you have the ability to mix your “direct” signal and the signal after routing it through your computer into a set of headphones, this will allow you to detect latency that may not be distracting (to you) but is perceptible (and therefore might be distracting to someone you play with if they’re more sensitive). Some audio interfaces have a built-in dial to mix “direct” (0-latency, 100% analog signal path from the interface inputs to the headphone outputs) and digital outputs from your computer. Or if you have an analog mixer with an effects bus or similar mechanism to route a signal out and back in, you can plug your mic or instrument into the analog mixer, send it to the computer via the effects bus, and monitor both signals in the headphones.
The most sensitive mechanism I’ve found to detect latency right at the threshold of what I’m able to percieve is to sing into a mic. On a zero-latency analog signal path, or one with less than about 10ms of end-to-end latency, this sounds “normal” to me. With 25ms of end-to-end latency, it sounds “weird”… similar to a vocal doubler or phaser effect. Unless you have a pretty good sense of how your equipment performs this technique probably won’t help you. Once you recognize the sound, though, it’s a quick and sensitive test to perform.
Measuring Latency
Focusrite Scarlett 2i2
First let’s verify that Alsa detects the presence of the card by connecting it
via usb and running aplay -l
to list all available Alsa devices. Card 0 is my
laptop’s built-in sound-card, and card 1 shows the Scarlett 2i2 plugged in via
USB:
$ aplay -l
**** List of PLAYBACK Hardware Devices ****
card 0: PCH [HDA Intel PCH], device 0: ALC293 Analog [ALC293 Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 0: PCH [HDA Intel PCH], device 3: HDMI 0 [HDMI 0]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 0: PCH [HDA Intel PCH], device 7: HDMI 1 [HDMI 1]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 0: PCH [HDA Intel PCH], device 8: HDMI 2 [HDMI 2]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 1: USB [Scarlett 2i2 USB], device 0: USB Audio [USB Audio]
Subdevices: 1/1
Subdevice #0: subdevice #0
In order to measure audio latency, we must:
- Configure jack. You can do this via the
jackd
command-line tool,qjackctl
, orcadence
. All the options are reasonable, but I tend to usecadence
. After starting it and clickingconfigure
, my settings look like the screenshot above.hw:USB
is the name by which Jack knows the Scarlett- The
Sample Rate
,Buffer Size
(which is confusingly named incadence
since it sets the period size not the buffer size and is calledframes/period
in other programs), andPeriods/Buffer
indicate that Jack itself will take 2ms to process incoming audio and an additional 2ms to process outgoing audio… so 4ms of total latency coming from Jack itself. - Click
start
incadence
to start up Jack. With these settings I do see periodic xruns every 10 or 20 minutes. This may be approaching the limits of the hardware’s ability to deliver data on-time, or my Linux instance may need futher tuning to process the data on-time. The occasional xruns are not distracting in practice sessions, though.
- Physically connect the left output on the Scarlett to input 1 on the Scarlett using a quarter-inch cable. This will allow jack_iodelay to measure it’s own output once we wire things up in Jack.
- In a terminal, run
jack_iodelay
. - Start
catia
, which will let us wire up the Jack routes.- Maximize
catia
and selectCanvas -> Zoom -> Auto-Fit
if the various ports are scrolled off-screen or are otherwise difficult to read. - Connect
capture_1
to jack_iodelay’s input, and connect jack_iodelay’s output toplayback_1
. In conjunction with our physical cable from step-2, we now have a closed loop that let’s jack_iodelay analyze the signal it’s generating to see how long it takes to complete the loop.
- Maximize
Back in the terminal window, jack_iodelay will now have some useful output:
$ jack_iodelay
new capture latency: [0, 0]
new playback latency: [0, 0]
Signal below threshold...
Signal below threshold...
< repeated many times while we wire up the connections in catia >
new capture latency: [32, 32]
417.637 frames 8.701 ms total roundtrip latency
extra loopback latency: 289 frames
use 144 for the backend arguments -I and -O
< repeated endlessly until jack_iodelay is killed via Ctrl-c >
Lexicon Omega
Again, let’s verify that Alsa is detecting the card correctly with aplay
:
$ aplay -l
**** List of PLAYBACK Hardware Devices ****
card 0: PCH [HDA Intel PCH], device 0: ALC293 Analog [ALC293 Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 0: PCH [HDA Intel PCH], device 3: HDMI 0 [HDMI 0]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 0: PCH [HDA Intel PCH], device 7: HDMI 1 [HDMI 1]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 0: PCH [HDA Intel PCH], device 8: HDMI 2 [HDMI 2]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 1: Omega [Lexicon Omega], device 0: USB Audio [USB Audio]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 1: Omega [Lexicon Omega], device 1: USB Audio [USB Audio #1]
Subdevices: 1/1
Subdevice #0: subdevice #0
Again test latency by:
- Configuring
jackd
via cadence. After selecting the appropriate device, This time jack refuses to start with aBuffer Size
(akaframes/period
) of 32. I get excessive xruns at 64, so have to bump all the way up to 128. Jack itself now adds 16ms of latency to the signal. Start jack. - Physically connect the Omega’s left output to to its Line-1 input, and use
the Omega’s hardware channel-selection button to assign Line-1/Line-2 to
input 1 and input 2 seen by
jackd
. - Start
jack_iodelay
. - In
catia
, wire upcapture_1
to the input onjack_iodelay
, and wire up the output fromjack_iodelay
toplayback_1
± % jack_iodelay !8112
new capture latency: [0, 0]
new playback latency: [0, 0]
Signal below threshold...
Signal below threshold...
< repeated many times while we wire up the connections in catia >
new playback latency: [384, 384]
1132.755 frames 23.599 ms total roundtrip latency
extra loopback latency: 620 frames
use 310 for the backend arguments -I and -O
< repeated endlessly until jack_iodelay is killed via Ctrl-c >
Analyzing Latency Measurements
- Previous reading had lead me to believe that there was approximately 10ms
of unavoidable latency from ADC, DAC, and USB/alsa drivers. Apparently this
isn’t always the case. For the Scarlett, The roundtrip latency is 8.7ms,
of which
jackd
accounts for 4ms. This means that the ADC, DAC, and OS latency can’t total more than 4.7ms, half of what I had previously thought possible. - The Scarlett is an excellent result, and shows that USB audio interfaces can achieve latency well below the threshold of perception and even approach latency expected from dedicated digital audio devices like hardware synths and digital effects boxes, which I believe tend to run between 2ms and 6ms.
- Not all usb audio interfaces achieve acceptable latency, and latency measurements on the internet are often wrong. Be wary of latency measurements that don’t specify how they were made, and try to test hardware on your own computer if your latency requirements are strict.