In my own personal blog, I’ve complained more than once about the zero-bug prototype policy at FluxCorp and the ridiculous design processes that fall out from that. The whole premise was based on an internal study by the managers that showed we engineers spend 25% of our man-hours debugging problems on prototypes. Lots of bar charts, pie charts, tables, and scatter plots, all to convince us stupid engineers that if we could save that 25% in the lab, we’ll have time to design a whole new Flying Flux a year. They try to sell this to us by telling us we won’t be as stressed out if our designs are perfect. If engineers were to simulate more, simulate better, do more dog-and-pony show presentations, all these bugs would just melt away. I guess if I became the dictator of the world, I won’t have to work anymore. Both are fantasy scenarios that will never, ever, come true.
During the meeting where this management edict was handed down, I asked whether management ever considered the time-cost of trying to simulate a design so broadly as to make a zero-bug design close to being likely. The retort from our VP was what is the cost if we don’t run enough simulations to get to zero bugs. Well, the cost is 25% of man-hours in debugging time in the lab. But what is the cost of actually trying to implement a zero-bug design? I didn’t have an answer for him right there and then. And of course, there’s no point showing up the VP in front of the whole business unit.
So as a quick thought exercise, a colleague, Mr. Poker, and I calculated what it would really take in order to approach the probability of having zero bugs. First, let’s pull out that rule again I wrote about in The $150,000 Wire: if it’s not simulated, you should assume it’s broken. Also, bug finding is an activity with diminishing returns. The more bugs you find, the harder it is to find the next one. With that in mind, let’s take a very simple mixed-signal (analog + digital) circuit — a signal level detector.
The detector takes in a sine wave signal as its input. If the amplitude is above the on-threshold level set by an 8-bit digital input, then the output goes high. Once the output is high, if the signal amplitude drops below the off-threshold level set by another 8-bit digital input, then the output goes low. The 8-bit settings must be accurate to within +/-5%. As most analog circuit blocks, this circuit also has a voltage supply pin, a ground pin, and a bias current input pin. Simple enough. So let’s see what it would take to exhaustively simulate this block in order to approach the zero-bug mantra. I’m just going to list the conditions in which this circuit must be analysed.
|Temperature range: -40C to 125C in 10 steps||10 conditions|
|Supply voltage range: min, nominal, max||3 conditions|
|Voltage supply noise frequency: min, mid, max||3 conditions|
|Bias current variation: min, mid, max||3 conditions|
|Manufacturing variations||17 conditions|
|8-bit on-threshold||256 conditions|
|8-bit off-threshold||256 conditions|
|Input sine wave amplitude; min to max in 20 steps||20 conditions|
So the total number of combinations for simulations is:
10 x 3 x 3 x 3 x 17 x 256 x 256 x 20 = 6 016 204 800
Now let’s assume it takes 1 second to simulate and analyze each combination. One second! That’s a super fast simulator. Even then, that’s 6 016 204 800 seconds of simulation time, or equivalently, 190.77 years. But that’s not all. If you’ll remember, one of the conditions above is that the threshold settings must be accurate to within 5%. This can only be done through Monte Carlo simulations. A reasonable sample size is 100 per condition. So we’re now up to 19 077 years of non-stop continuous simulation time.
But let’s say I was pessimistic. I overestimated by 3 orders of magnitude. Three! OK, divide by a thousand then. We’re now back down to 19.1 years of continuous simulation time for a tiny little block to ensure that we’re as close to zero bugs as possible.
Do you think management will go for that approach?
But you as an astute engineer will not doubt tell me that good engineers should know which simulations conditions are important and which are not. Which ones can be skipped and inferred from other simulations, and which ones are absolutely necessary. I would say you’re correct. In order to design something in a reasonable amount of time, only a limited set of simulations can be run. This necessarily requires the expertise of the engineer to determine what is or isn’t important to simulate. Engineers are human beings. Human beings have limitations. And if management is relying on human beings to narrow 6 billion cases down to a manageable handful, it is very possible that a couple of important ones may have been missed. It is even possible bugs can slip through without being caught by any one of the 6 billion simulation cases outlined.
That’s why prototypes are needed. That’s why prototypes are not meant to be perfect. As for spending 25% of the time in the lab debugging? I’ll take that any day over 19 years of simulating just a tiny little design. After all, not everything is as simple as a signal detector.