USB WTF

Kermos

Working on some new firmware for an embedded device and came across something interesting...

The processor it uses has built-in fullspeed capable USB support. Attached to the processor, via the EMI bus, is a bunch of NAND memory. Now with the timings tweaked as fast as I can get them, and the fact that the bus to the NAND is only 8 bits wide, I can get ~944kb/second of data. So that comes out roughly to 7.5mbit/second which is perfectly within the scope of full-speed USB.

Now what did the guy who designed the hardware do? On the same EMI bus that the NAND is on, he attached a 3rd party USB chip with 480mbit high-speed support.

1. This is a 98MHz processor, it can't process that much incoming data.
2. As previously stated, its on the SAME bus as the NAND. So to those who don't do much hardware, there are chip select lines which determine what chip the bus currently talks to. So I can talk to the USB chip ----> OR <---- the NAND chip. Not both at the same time. Meaning that my effective bandwith, for each chip in a USB->NAND data transfer has just been cut in half.
3. Instead of the processor simply placing incoming data in the background into memory buffers and notifying me via an interrupt when the data is arrived, leaving valuable CPU processing time for other tasks, I now have to *manually* read the data from the USB chip (which takes CPU cycles) into system RAM, then transfer it to the NAND effectively cutting my bandwidth on the EMI bus in half.

So the end result is, I have a frigging 480 MBit USB connection with an effective data rate of less than half full-speed plus the added CPU overhead. *sigh*

The things people do sometimes just...amaze me. Utterly amaze me.

immibis

@Kermos said:

1. This is a 98MHz processor, it can't process that much incoming data.

You mention below that the processor can put the data directly into memory, so maybe this is faster?

@Kermos said:

2. As previously stated, its on the SAME bus as the NAND. So to those who don't do much hardware, there are chip select lines which determine what chip the bus currently talks to. So I can talk to the USB chip ----> OR <---- the NAND chip. Not both at the same time. Meaning that my effective bandwith, for each chip in a USB->NAND data transfer has just been cut in half.

But would be full speed for anything else involving USB xor NAND. Not all USB transfers involve NAND, not all NAND access involves USB.

@Kermos said:

3. Instead of the processor simply placing incoming data in the background into memory buffers and notifying me via an interrupt when the data is arrived, leaving valuable CPU processing time for other tasks, I now have to *manually* read the data from the USB chip (which takes CPU cycles) into system RAM, then transfer it to the NAND effectively cutting my bandwidth on the EMI bus in half.

Kermos

@immibis said:

You mention below that the processor can put the data directly into memory, so maybe this is faster?

Only if you use the processor's own USB hardware.

@immibis said:

But would be full speed for anything else involving USB xor NAND. Not all USB transfers involve NAND, not all NAND access involves USB.

Not really because while the USB chip can and does receive at high-speed, it now has to transfer it at a fraction of the speed to the processor via the EMI bus after it receives it. The NAND transfers only make this even worse and the majority of USB usage is data transfer to/from NAND.

Mole

Seen that before, the reason being? They wanted to put "USB 2.0" on the box (as thats what customers expect, apparently), and the USB built into the processor was only USB 1.1.

lolwtf

Most hardware design WTFs can be explained by one of two thought patterns:

Make it cheap/simple/fast/small and fix it in software. Why add more hardware to do what the software can be doing?
Stick a Foo chip in there somewhere so we can say it has Foo.

This sounds like both.

Weng

@Mole said:

Seen that before, the reason being? They wanted to put "USB 2.0" on the box (as thats what customers expect, apparently), and the USB built into the processor was only USB 1.1.

I'll do you one better.

I have a prototype board with THREE USB lines. One built into the microcontroller (Connected to a port on the prototype boards for flashing and debugging, but unused in the final version despite the fact that you can disable the flash/debug capabilities), one that's actually a dedicated RS232 bridge chip (taking an RS232 (9600 8N1) output from the uC - this is again connected to a port on our protoboards, and disconnected on the production units. Data carried is state output for connection to companion software.

And then there's the big fancy expensive USB2.0 chip, which we had to program to do RS232 bridging all by ourselves, carrying the same 9600 8N1 RS232 signals as the 50 cent bridge chip. It's the only one populated in the production version.

Bottom line, some hardware hacker is going to become insanely happy when he realizes he has solder pads to access the uC's debugger and flash capabilities, and we haven't disabled them.

Oh yeah - I should probably mention that this device is a USB software license dongle, and I've already demonstated a pirate attack against it. Nobody cares. You can actually gank the encryption key right off the debugger line.

OzPeter

@Weng said:

Bottom line, some hardware hacker is going to become insanely happy when he realizes he has solder pads to access the uC's debugger and flash capabilities, and we haven't disabled them.
Oh yeah - I should probably mention that this device is a USB software license dongle, and I've already demonstated a pirate attack against it. Nobody cares. You can actually gank the encryption key right off the debugger line.

Reminds me of the ol' Protel printed circuit board cad package back in the late '80s when it had a dongle for licensing. The educational institution I was working for at the time used Protel a lot so they examined the data flow between the software package and the dongle. It seemed that the package wrote this encrypted string of seemingly random challenge data to the dongle and then read back the response. But then they realized that no matter what the challenge sent to the dongle, the dongle always replied with the same string of digits. Pretty soon Protel was being used to build a dongle that mimicked that response.

tgape

@Kermos said:

2. As previously stated, its on the SAME bus as the NAND. So to those who don't do much hardware, there are chip select lines which determine what chip the bus currently talks to. So I can talk to the USB chip ----> OR <---- the NAND chip. Not both at the same time. Meaning that my effective bandwith, for each chip in a USB->NAND data transfer has just been cut in half.

For a counter example, back in the day, I came across an ethernet card for a C64 (~1Mhz processor). This used the C64 expansion port, which had DMA. Also, it actually had its own buffers, which happened to be larger than the C64 total memory. As such, it could receive full-speed ethernet for quite some time before filling its packets. If it was on a highly congested network (this having been back in the heyday of hubs and bridges; at the time, I had access to a network with over 250 devices in the same collision domain), it could actually buffer more output than the C64 could hold, without loosing any. Transferring data from/to the C64 main memory ran about as fast as ethernet (8Mbit, but no parity or stop bits), but during that time the processor was completely shut out from the system. If you had to run a C64 on your network, it was an insanely fast way to do it.

I cannot fathom how anyone could hook up a 480MBit USB chip to some slow device in that fashion - it should be connected via another bus to the memory. But that, of course, was what you said.

Now, the real question is, "does the fact that the 480MBit chip is connected to this 7.5 mbit bus in this fashion slow down the 480MBit chip in such a way that the entire USB line is degraded while it is talking/listening? Or does it at least have its own internal buffers, so it can send and receive packets at full speed, it just then needs an enormous amount of time to transfer them to its 'brain'?"

Mole

It's not possible to slow down the entire USB line as such as far as I'm aware. You have a finite time to talk and to send each packet. If your too slow, your time slot expires and you just send a small amount of bytes per frame, whilst other devices on the bus can fill an entire frame.

DaveK1

TRWTF is the Phillips/NXP ISP1761 datasheet. I had to get the damn analyzer out and tell them how their own damn chip worked after the unnecessarily grievous experience of having to write a low-level driver for it a couple of years back. They up-issued the datasheet three times that year in response to my errata.

The most critical problem was the data channel write timing was waaaaay off, like 40ns vs. 87.5ns off. This meant it generally worked most of the time, but you'd get mysterious intermittent and rare breakdowns in the usb transaction protocol. Very hard to diagnose, particularly if you start by assuming you've got a bug in your software somewhere, which would be most people's first assumption.

What boggles me is that this was a popular chip that had been in production use for years by the time I came to hacking on it. How on earth did nobody notice it before? Or did everyone else just give up on solving the problems, blindly turn the access cycle times waaay up, and just give up on trying to get the full throughput from the device? It's a total mystery to me how it could have gone on so long.

DaveK1

@Mole said:

It's not possible to slow down the entire USB line as such as far as I'm aware. You have a finite time to talk and to send each packet. If your too slow, your time slot expires and you just send a small amount of bytes per frame, whilst other devices on the bus can fill an entire frame.

The USB bus is a star topology. If you hang a full speed device off a hi-speed hub, that spur of the line operates at the lower signalling rate but the hub buffers everything back up to hi-speed upstream. Each branch of the graph only uses one signalling rate at a time, even when the data from multiple devices has been concentrated onto it up toward the root of the graph.

alegr

FWIW, I've been using Cypress USB chip to get some 40 MB/s or so. Don't remember the part number.

tgape

@DaveK said:

The USB bus is a star topology.

That sounds like a much more sensible explanation of why my question was wrong. Thank you.

So, then, the slow device would intermittently distract its USB hub, while said HUB took care of business for everyone else. Then, when the packet finally made its way to the hub in its entirety, zoom, it's on the computer and being processed there, because relative to the slow device, the rest of the USB network is, like, instant, man.

I guess my mistake was thinking USB hub is similar to Ethernet hub. While technically, ethernet hub is one of a number of possible device types, everyone apparently always means a repeater. Good to know that *some* people apparently were smart enough to learn why one shouldn't repeat that mistake...

Helix

@Kermos said:

Working on some new firmware for an embedded device and came across something interesting...
The processor it uses has built-in fullspeed capable USB support. Attached to the processor, via the EMI bus, is a bunch of NAND memory. Now with the timings tweaked as fast as I can get them, and the fact that the bus to the NAND is only 8 bits wide, I can get ~944kb/second of data. So that comes out roughly to 7.5mbit/second which is perfectly within the scope of full-speed USB.
Now what did the guy who designed the hardware do? On the same EMI bus that the NAND is on, he attached a 3rd party USB chip with 480mbit high-speed support.
1. This is a 98MHz processor, it can't process that much incoming data.
2. As previously stated, its on the SAME bus as the NAND. So to those who don't do much hardware, there are chip select lines which determine what chip the bus currently talks to. So I can talk to the USB chip ----> OR <---- the NAND chip. Not both at the same time. Meaning that my effective bandwith, for each chip in a USB->NAND data transfer has just been cut in half.
3. Instead of the processor simply placing incoming data in the background into memory buffers and notifying me via an interrupt when the data is arrived, leaving valuable CPU processing time for other tasks, I now have to *manually* read the data from the USB chip (which takes CPU cycles) into system RAM, then transfer it to the NAND effectively cutting my bandwidth on the EMI bus in half.
So the end result is, I have a frigging 480 MBit USB connection with an effective data rate of less than half full-speed plus the added CPU overhead. *sigh*
The things people do sometimes just...amaze me. Utterly amaze me.

I see this once every week at our customers