Yesterday, I presented a diagram of a problem circuit and asked if EB readers can spot the problem. Here’s that diagram again:
Bill was the first to respond and came up with the correct answer; Chris Zeh of Idle-Logic.com feels my pain; while Paul Clarke, our contributing author, is still giggling like a little girl. Yes, this is a circular dependency problem. The digital buffer (triangle shape) depends on the voltage supply to be up and running to work. But the voltage supply can only be up and running if the digital buffer is working and passing through the power-on signal. So if you want to power on the chip by asserting the power-on signal, that signal can never get through to the rest of the chip because the digital buffer’s voltage supply is not up and running yet. So the chip is dead.
The correct wiring diagram should have the digital buffer’s supply voltage connected to the external supply:
So how did this bug ever come into existence? My stupidity? No need to answer. But the bug wasn’t due to idiotic design intent. That circular dependency wasn’t never meant to be. The root of it all was … a typo.
In the design tool that I use, wires can be connected on a circuit diagram without needing to explicitly draw a line between the two. Giving the same name to two disconnected wires is the same as shorting them together. I copied and pasted a few of these digital buffers on a circuit diagram. Some of those buffers needed their voltage supply wire names changed to connect to the external supply. I missed one of them. When I gave the circuit diagram to the person doing the physical layout and wiring of the design, he simply saw the way I labeled my wires and hooked them up accordingly. This is the simple and mundane reason for the problem.
There is a bigger reason, however. We never simulated the power-on design with those digital buffers in place, violating the rule that if it’s not simulated, you should assume it’s broken. There are a few reasons for skipping these sims. First, we were understaffed. Second, power-up/power-down simulations are quite time consuming and for very little perceived gain relative to other simulations we could do on the major functionality of the part. So we focused on the big issues and “minor” issues such as checking the addition of buffers got shoved way down on the priority list. Third, we ran out of time. The product had to go out the door.
The only reason this issue was caught at all was due to pure luck. I was manually checking the circuit diagram for a totally different issue when I noticed something didn’t look right. Then the dread set in. Oh, foodledoodles, I said to myself.
Luckily, this error was discovered before the test chip was manufactured but after wafer masks had already been made. Wafer masks contain the patterns (e.g. wiring trace patterns) of various layers of an integrated circuit. A modern IC typically uses between 30-40 difference masks. Each mask costs about $25,000 to $50,000 each. Three masks had to change to fix the problem. Since it needed to be a rush job and I was the one responsible for the bug, I ended up costing the company $150,000 for that errant wire.
A costly mistake for someone barely a year into the job at FluxCorp at the time. But I owned up quickly to the error and I think my reputation didn’t suffer (much) for it. After all, I’m still here.