I often say that troubleshooting is the main skill I possess. My day job requires interfacing with manufacturing and trying to help them solve problems from time to time (among other things). And recently while working on an issue, I’ve come upon one of the most difficult problems to solve: component level problems.
The hard part about component level problems is narrowing down and eliminating all of the other potential problems in a circuit board. There’s always temptation to look at a problematic part of a circuit and immediately declare, “These parts are no good. Get me the vendor on the phone so I can yell at them!“. However, that would have you chasing ghost problems and wouldn’t make you any friends at the part manufacturer or distributor. No one likes their product being accused of being shoddy, nor do they like chasing problems you have fabricated because you were impatient.
What are some of the board level problems you might encounter and have to eliminate before talking to a part manufacturer?
- Poor design — Tricky in its own right to diagnose, a bad design usually does not provide enough problems by itself to really cause an issue. Otherwise, the circuit wouldn’t have passed testing done when the product was first made. Instead, a bad design may have been design without regard to the sensitivities of certain critical parameters. Then when a part shifts ever so slightly from a vendor (though it could still be well within specified tolerances), the circuit breaks. Marginal designs such as these are hard to detect but more likely to occur after the initial design is released.
- Environmental factors — Another way the marginal designs mentioned above can manifest as problems is when the parts aren’t shifting, but the environmental factors are causing issues. Dirty circuit boards can cause leakage across the contaminated pathways. If the inside of your enclosure is too warm, you might see component values shift when you weren’t expecting it. These kinds of problems cannot be considered a component problem, though components definitely are involved.
- Board population problems — Whoops! Sometimes, it happens. Someone loads up the wrong reel on the pick and place. Or an engineering change order gets botched and the wrong part is called out on the Bill of Materials (BOM). Or sometimes the part isn’t there at all or an extra part is populated. This can look benign while just viewing the board (not noticing any charred or askew components); but the extra component can cause a lot of issues, by providing a current leakage path or changing the frequency response of the circuit in question.
Once you’ve eliminated these board-level problems and are able to narrow it down to one or two possible parts on the board that are causing issues, I would probably do some comparison with “known good” parts. In the simplest case, you take a decent piece of test equipment and verify it against what the manufacturer says the value should be. So for a resistor, if the part is labeled as a 2.2kΩ and it measures 1.4kΩ on a DMM, that’s an obvious issue. But with active components such as transistors might need a comparison to have a more complex measurement on a curve tracer, so you then can compare old and new. For other passives with complex parameters, you can use instruments like an impedance analyzer to determine if the part is behaving correctly over all specified frequencies. If your measurements are marginal and you can’t quite figure out if the problem part has a feature that is affecting the final product, you can swap the suspect part into a “known good” unit and the borrow the same part you just removed from the “known good” unit and put it into the suspect unit. If you have a complete reversal of behavior between the suspect and the “known good” units, you’ve found your culprit.
Once you’ve determined that the component is at fault, you can start to dig into why these parts have gone bad. It’s still not a great idea to start shouting into your telephone at the vendor. Instead, it’s better to consider if it’s some of the following scenarios, which still absolve the vendor from fault:
- Using the part outside its specifications — Sometimes using a part outside its specs is an accident, much like the ‘poor design’ section, mentioned above. Other times, it’s done intentionally because better parts can’t be found (and perhaps you’re sorting the good from the bad). In really odd cases, electrical design engineers like a specification inside the part that is not at all intended for the part. An example would be using the 555 timer, simply for its internal 5kΩ resistors. If you call up the chip maker and complain they’re only 4.8kΩ on your DMM, you might get laughed at. Whatever your specific mis-use might be, if you use it outside how the vendor specifies it, you’re SOL when you break it.
- Electrical stress on the parts — Breaking a part could be from misusing a part. But sometimes in outlier cases, your product might experience electrical stress that it doesn’t normally. This could be externally applied power, power applied with the wrong polarity (backwards plugs), static discharge (ESD) and a range of other conditions that can stress the part. In some cases, it will change the behavior of a part; a semiconductor’s on resistance might change if it has too much heat due to electrical over-stress. In the worst case, if the heat continues past a certain point, the semiconductor (or whatever part is experiencing stress) will fail catastrophically! I’ve talked to power MOSFET manufacturers before, and they apparently get many “returns” where the customer is trying to claim the part failed…while handing over a charred chip! They catch on pretty quick.
- Mechanical stress on the parts — The reality of our current day and age is that robots are not doing the majority of the electronics construction, people are (though this may change in the next few years). Especially in the final integration step of putting circuit boards into cases and selling them as an end product. At any point when there is physical handing of the product, the parts on the board (which are soldered down) will experience stress and strain that can change the characteristics of the component. In other cases, it will detach bonding wires and cause some or all of the pins or contacts to malfunction.
While those cases don’t encapsulate all of the potential things that could go wrong when analyzing potential component failures, I have found these are some of the more common suspect items.
So say you’ve done all this testing and you still find that the part in question has been the cause of the failure and there was no outside influence on that part, causing the change. In the event there are no other manufacturers of the part in question or you really like the vendor, you should start engaging the part manufacturer now. This often will kick off an investigation internally at the part manufacturers’ facility, either with a customer service group or with a quality assurance group. They will want to see your data, the tests you’ve run and probably will want a sample of the part in question. If you can search through your stock and find a part that is failing before you put it through your manufacturing process, you will have a much easier time pointing your finger at them. If not, the burden of proof will be on you to show that your manufacturing process has been following the proper steps in order to assure the quoted specifications.
Chasing down and isolating problem parts is one of the trickiest parts of troubleshooting electronic circuits. You will run into many dead ends and have lots of false starts. But when there is money on the line, as there is in almost all manufacturing situations, you need to act quickly to determine what’s causing failures in your products.