Bathtub curve (BTC), the reliability “passport” of an electronic product, is affected by two major irreversible processes: the decreasing with time statistics-related failure rate (SFR) process and increasing with time physics-of-failure-related failure rate (PFR) process. The first process dominates at the infant-mortality portion (IMP) of the curve and the second one – at its wear-out portion (WOP). For many electronic products these two processes compensate for each other at the BTC's steady-state portion. The SFR process can be predicted theoretically for products comprised of mass-produced components, i.e., for typical electronic products. This could be done assuming that the failure rates of the components received by an electronic product manufacturer from various and numerous vendors can be viewed as random variables distributed between zero and infinity and that the SFR and PFR processes are statistically independent. The predicted non-random SFR depends, of course, on the particular probability distribution function (PDF) of the random SFRs of its components. Two PDFs for the components' random SFR have been considered in this analysis: normal (Gaussian) and Rayleigh. The normal law turned out to be more conservative: it led to a slower decrease in the SFR of the product than the Rayleigh law. Future work should include the investigation into the most realistic distributions of the random SFR for the most critical and the most vulnerable components obtained from major vendors of the particular manufacturer, particular products and applications. The computed data indicate that the decrease in the resulting failure rate at the WOP of the BTC because of the favorable effect of the decrease of the non-random SFR with time can be appreciable for highly reliable products expected to function for a long time.

Products that underwent highly accelerated life testing (HALT) [1, 2], passed the existing qualification tests (QT) and survived burn-in testing (BIT) often exhibit nevertheless premature and unexpected field failures. Are the existing reliability specifications and best practices always adequate [3]? Do electronic manufacturing industries need new approaches to qualify their products [4]? Could the operational reliability of electronic products intended for applications, in which the demand for failure-free performance is imperative (aerospace, military, long-haul communications, medical, etc), be assured, if it is not quantified [5,6]? If such a quantification is found to be necessary, could it be done on a deterministic basis, or, since nothing is perfect, the application of probabilistic risk analysis is imperative [715]? Isn't it true that the difference between a practically failure-free product and insufficiently reliable one is, in effect, the difference in their never-zero probability of failure? If the application of the probabilistic design for reliability (PDfR) approach is decided upon, is Zhurkov's idea [16] suggested about half a century ago in application to fracture mechanics problems also applicable to electronic materials and products, whose physics-of-failure and loading conditions are of very different nature? Is the recently suggested, in application to electronic products, Boltzmann-Arrhenius-Zhurkov (BAZ) equation, and particularly its multi-parametric version, an adequate PDfR model [1721]? These critical questions were answered more or less satisfactorily in the referenced publications.

The objective of the analysis that follows has to do with the information provided by the BTC - the reliability “passport” of an electronic product. This curve reflects, as is known, the input of two major governing irreversible processes – the “favorable” statistics-of-failure process that results in a reduced failure rate with time and the “unfavorable” physics-of-failure (aging, degradation) process that leads to an increased failure rate. Could these two critical processes be separated? The need for that is due to the obvious incentive for understanding, quantification and minimization of the role of materials degradation and aging, especially for products like lasers and materials like solders, which are characterized by long WOPs and are supposed to function for a long time. Clearly, a reliability engineer wants to better understand the favorable role of the SFR process and the unfavorable, but inevitable, role of the POF (degradation) process and, if possible, to slow it down. But there is also another practically important and perhaps less obvious aspect associated with these two processes in a situation, when the product of interest is fabricated of numerous components obtained from various and numerous vendors. It is usually assumed that the governing accelerated test model is applicable also in the field conditions. One just has to consider the actual, much lower, operation loading (stress) level. The more favorable state of loading is, however, not the only difference between the testing and the field conditions. Additional favorable effect has to do with the significantly larger number of mass-produced components that the actual product of interest is comprised of, than the relatively small number of components in the product(s) subjected to the accelerated tests like HALT or FOAT. How could this effect be considered for a product comprised of mass-produced components obtained by the product manufacturer from different vendors with different attitudes to the importance of their components reliability? In the analysis that follows we try to answer this question. Note that it was initially and partially addressed in Ref. 22.

Our analysis is based on the following major assumptions:

  • The non-random SFR of the product can be evaluated from the known (assumed or determined) random SFR of the components comprising this product.

  • The SFR of these components obtained from numerous vendors are random variables distributed between zero and infinity, and that

  • The exponential law of reliability can be applied to predict the SFR-related probability of non-failure of the product.

Two realistic distributions of the component's SFR have been considered: normal (Gaussian) and Rayleigh.

A. Typical bathtub curve (BTC)

The typical bathtub curve (Fig.1) could be approximated analytically as follows:

Here λ(t) is time-dependent failure rate, λ0 is its “steady-state” minimum, λ1 is its initial (high) value at the beginning of the infant mortality portion, t1 is the duration of this portion, λ2 is the final value of the failure rate at the end of the wear-out portion, t2 is the duration of this portion, and the exponents n1 and n2 are expressed through the fullnesses β1 and β2 of the BTC infant-mortality and the wear-out portions as . These fullnesses are defined as the ratios of the areas above the BTC to the areas (λ1λ0)t1 and (λ2λ0)t2 of the corresponding rectangular areas. The exponents n1 and n2 change from 1 to infinity, when the fullnesses β1 and β2 change from 0.5 (triangle) to 1 (rectangular). The lowest λ(t) values can be achieved in the case of the largest β1 and β2 (or n1 and n2) values.

B. Statistical Failure Rate (SFR)

Let an electronic product manufacturer receive the components for this product from n independent vendors. The probability of non-failure for the product assuming that the exponential law of reliability is applicable is

where pk is the fraction of the components received from the k-th vendor, exp(−λkt) is the probability of non-failure of these components, λk is their (random) failure rate, which can be any quantity between zero and infinity, and t is time. The sum in the formula (2) can be substituted, in accordance with the ergodic property of random processes (see, e.g., [8]) and for a large number n of mass-produced components, by the integral:

Here F(λ) is the probability distribution function and is the probability density distribution function of the random failure rate λ. The probability of non-failure for the product can be determined by the (non-random) SFR as

This statistical failure rate is defined as the ratio of the current rate of the number Nf(t) of products that failed by the time t to the number Ns(t) of products that remained sound by this time. Substituting, in accordance with the ergodic theorem, the number of the sound items with the probability P(t) of non-failure and the number of the failed items - with the probability Q(t) = 1−P(t) of failure, the formula (4) can be written as

Considering (3), this expression results in the following formula for the product's non-random SFR:

Computations indicate that indeed the failure rate λST(t) of the product decreases with time. The formula (6) indicates also that the nonrandom time-dependent failure-rate λST(t) of the product should be dependent on the probability distribution of the random and presumably time-independent failure rate λ of a particular component. In the special case, when the probability density function f(λ) is constant (the random failure rate process is evenly distributed over time), the formula (6) yields:

This simple result does not look, however, realistically: the nonrandom SFR process is too rapidly decreasing. Therefore two more realistic random SFR processes are considered in the analysis that follows: normal (Gaussian) and Rayleigh.

C. Components' random SFR is normally distributed

In the expression for the probability density function

λ¯ is the mean value of the random SFR λ, and D is its variance. The effective SFR λST(t) of the product can be found as a function of time by substituting the distribution (8) into the basic formula (6). Using [24] the following formula can be obtained:

where the function

depends on the dimensionless time

and so do the auxiliary function

and the probability integral (Laplace function)

The function ϕ(ξ) is tabulated in Table 1.

The formula (11) indicates that the “physical” (effective) time ξ affecting the non-random SFR of the product depends not only on the “chronological” (actual) time t, but also on the probabilistic characteristics (mean and variance) of the distribution of the components' random SFR. As follows from the formula (11), the rate of changing of the dimensionless effective time ξ with the change in the actual time t is , i.e. effective time changes the faster the larger the standard deviation of the random SFR of the components is. The asymptotic expansion (12) of the auxiliary function φ̄(ξ) can be used to calculate this function for large ξ values, exceeding, say, 2.5. This expansion has been actually employed when calculating the Table 1 data for the function ϕ(ξ) determined by the formula (10). As evident from the formula (12), the auxiliary function φ̄ (ξ) changes from infinity to zero, when the effective time ξ changes from −∞ to ∞. For effective times below −2.5 the function φ̄ (ξ) is large, so that the second term in (10) becomes small compared to the first term, and the function ϕ(ξ) coincides with the dimensionless time ξ itself, with an opposite sign though.

At the initial moment of time (t = 0) the formulas (11), (12) and (9) yield:

Considering that the initial SFR value of the product is λST = λ1 (see notation in Fig.1) and that the degradation failure rate λDG at the initial moment of time is zero, the third formula in (14) yields:

This relationship indicates that the effective initial value of the BTC can be established based on the mean value and the variance of the distribution (1) and is tabulated in Table 2. When the ratio changes from zero to infinity, the ratio changes from to infinity. The initial failure rate can be put equal to its mean value, if the ratio exceeds 2.5. This is usually indeed the case, since the accepted normal distribution, when applied to a random variable that, physically, cannot be negative, should be characterized by a significant ratio of its mean value to the standard deviation, so that the negative values of such a distribution, although formally exist, are made meaningless, i.e., do not contribute appreciably to the sought information.

D.Numerical example #1

The numerical example in Table 3 is carried out with the following input data: mean value (initial value) factor of the random failure rate of the component population: ; standard deviation of the random failure rate of the component population: ; initial non-random failure rate of the product: λ1 = 8.4853x10−4 h−1; the lowest failure rate of the product: λ0 = 9.6000x10−4 h−1; the highest (allowable) failure rate of the product: λ2 = 19.8x10−4; the duration of the infant mortality portion (burn-in time) for the product: t1 = 48h; the duration of the wear out portion of the product's BTC: t2 = 39,952h (obtained as the difference between the total time of operation of 40,000hrs and the duration of the infant mortality portion); the “fullness” of the infant mortality portion of the product's BTC: β1 = 0.8 (n1 = 4); the “fullness” of its wear out portion: β2 = 0.75; (n2 = 3);

The obtained data indicate that the statistical probability of non-failure (based on the non-random SFR of the product and the exponential law of reliability) decreases with time at the rather significant portion of the product's lifetime despite the decrease in the SFR of the product. At some moment of time (beginning, in the carried out example, with about 10,000 hours), the effect of the decreasing SFR starts to prevail, and the SFR related probability of non-failure begins to increase with time. This circumstance does not play, however, an important role in this example, because the “unfavorable” degradation (reliability physics related) failure rates become significant and suppress the “favorable” increase in the probability of non-failure associated with the SFR. It should be emphasized, however, that although in the carried out example the increase in the SFR related probability of non-failure turned out to be relatively small, the situation might be quite different with other input data and for products expected to function for long periods of time.

It is noteworthy also that for short times at the beginning of the infant mortality (burn-in) process, the function φ̄(ξ) is significant, and the second term in the formula (3) is small compared to the first term. In such a situation the linear formula λST = λ1Dt can be used to evaluate the SFR. Indeed, this simplified formula predicts that the SFR at the end of the infant mortality time is λST = 84.834x10−5 hr−1. The exact SFR number is λST=84.668x10−5 hr−1, i.e. only 0.2% lower.

E. Components' random SFR is distributed in accordance with the Rayleigh law

Assume now that the random SFR failure rate of the components is distributed in accordance with Rayleigh law

Then the formula (6) yields:

F. Numerical example #2

Here

is a function of the dimensionless time

and so are the auxiliary function

and the probability integral (Laplace function) Φ(ξ) defined by the formula (13). The calculations using the same input data as in the case of the normally distributed SFR are carried out in Table 4.

Calculations indicate that normal distribution of the actual (“instantaneous”) SFR results in lower resulting probabilities of non-failure than Rayleigh law, and should be preferred therefore in engineering analyses, unless more reliable SFR related information becomes available.

G. Wear-out probability of failure

The second segment of a typical BTC for an electronic product is known to be steady-state. The failure rate is assumed to be more or less constant at this segment, and this is confirmed by numerous experiments. It has been established also that the steady-state segment of the BTC is characterized primarily by instantaneous random failures. Their occurrence could be adequately described by the one-parametric exponential law of reliability:

that establishes the probability of non-failure of a product at the moment t of operation. In the formula (21) τ1 mean-time-to-failure (MTTF), and is the steady-state failure rate. The MTTF can be defined as the moment of time t = τ1, when the entropy H(P) = −P ln P of the distribution (21) reaches its maximum value . The wear-out portion of the BTC, on the other hand, is characterized primarily by continuous and accelerated physical degradation (aging) of the product. The wear-out failures are described by the two-parametric normal distribution:

Here τ2 = τ2(t) is the MTTF, Dσ is the variance of the applied load σ, and

is the (tabulated) Laplace function. Clearly, the log-normal law of reliability can also be used in this case.

The MTTF τ1, when the failures are instantaneous and random, is typically assumed to be time independent, while the MTTF τ2 = τ2(t), when the failures are caused by the degradation process, is time dependent and decreases when time progresses. Generally, both types of failures could possibly occur in a particular product at any stage of its operation. The only difference is the likelihood of a particular failure mode: aging related failures are less likely at the steady-state period of the product's operation, while instantaneous failures are less likely at the wear-out portion of the BTC. When both modes could possibly occur and could be assumed statistically independent, the probability of non-failure can be evaluated as

The calculated data indicate that although the failure rate of the SFR process decreases with time, the corresponding probability of non-failure determined using the exponential law of reliability still decreases with time during the most of the operation time, and this circumstance should be considered when evaluating the probability of non-failure due to the combined action of the two addressed irreversible processes. It is only for very long times in operation the probability of non-failure of the SFR process starts increasing with time owing to the very low statistical failure rate. This circumstance might not, however, change the situation, since the probabilities of non-failure associated with the degradation process become very low, while the total probability of non-failure is determined as a product of the probabilities of non-failure of the two addressed time-dependent governing processes. The calculated data indicate also that the product's SFR is strongly dependent on the probability distribution of the random failure rate of its components, and that the normal law is more conservative, i.e. leads to appreciably lower predicted SFR-related probabilities of non-failures than the Rayleigh law, especially for long times in operation. Future work should include therefore the analysis of the most realistic distributions of the random SFRs for the most vulnerable components obtained from major vendors for the particular manufacturer, products and applications.

[1]
Suhir
,
E.
Reliability and accelerated life testing
,
Semiconductor International
,
Feb.
2005
.
[2]
Suhir
,
E.
,
Bensoussan
,
A.
,
Nicolics
,
J.
,
Bechou
,
L.
,
Highly accelerated life testing (HALT), failure oriented accelerated testing (FOAT), and their role in making a viable device into a reliable product
,
IEEE Aerospace Conference
,
Big Sky, Montana
,
March 2014
[3]
Suhir
,
E.
,
Mahajan
,
R.
,
Are current qualification practices adequate?
,
Circuit Assembly
,
April
2011
[4]
Suhir
,
E.
,
Assuring aerospace electronics and photonics reliability: what could and should be done differently”
,
2013 IEEE Aerospace Conference
,
Big Sky, Montana
,
March 2013
[5]
Suhir
,
E.
,
Could electronics reliability be predicted, quantified and assured?
,
Microelectronics Reliability
,
No. 53
,
April
15
,
2013
[6]
Suhir
,
E.
Electronics reliability cannot be assured, if it is not quantified
,
ChipScale Reviews
,
March–April
,
2014
[7]
Suhir
,
E
and
Poborets
,
B.
,
Solder glass attachment in Cerdip/Cerquad packages: thermally induced stresses and mechanical reliability
,
40th Elect. Comp. and Techn. Conf.
,
Las Vegas, Nevada
,
May 1990
;
[8]
Suhir
,
E
,
Applied probability for engineers and scientists
,
McGraw-Hill
,
New York
,
1997
[9]
Suhir
,
E
,
Probabilistic design for reliability
,
Chip Scale Reviews
,
vol.14
,
No.6
,
2010
[10]
Suhir
,
E.
,
Mahajan
,
R.
,
Lucero
,
A.
,
Bechou
,
L.
,
Probabilistic design for reliability (PDfR) and a novel approach to qualification testing (QT)
,
IEEE/AIAA Aerospace Conf.
,
Big Sky, Montana
,
March 2012
[11]
Suhir
,
E.
,
Predicted reliability of aerospace electronics: application of two advanced probabilistic techniques
,
IEEE Aerospace Conference
,
Big Sky, Montana
,
March 2013
[12]
Suhir
,
E.
,
Reliability physics and probabilistic design for reliability (PDfR): role, attributes, challenges
,
EPTC 2014
,
Singapore
,
Nov. 2014
[13]
Suhir
,
E.
and
Yi
,
S.
,
Probabilistic design for reliability of medical electronic devices: role, significance, attributes, challenges
,
IEEE Medical Electronics Symp.
,
Portland, OR
,
Sept. 14–15, 2016
[14]
Suhir
E.
and
Ghaffarian
,
R.
,
Probabilistic Palmgren-Miner rule with application to solder materials experiencing elastic deformations
,
Journal of Materials Science: Materials in Electronics
,
vol.28
,
No.3
,
2017
[15]
Suhir
,
E.
and
Yi
,
S.
Probabilistic design for reliability (PDfR) of medical electronic devices (MEDs): when reliability is imperative, ability to quantify it is a must
,
Journal of SMT
,
Vol. 30
,
Issue 1
,
2017
[16]
Zhurkov
,
S.N.
,
Kinetic concept of the strength of solids
,
Int. J. of Fracture Mechanics
,
vol.1
,
No.4
,
1965
[17]
E.
Suhir
,
L.
Bechou
,
A.
Bensoussan
,
Technical diagnostics in electronics: application of Bayes formula and Boltzmann-Arrhenius-Zhurkov model
,
Circuit Assembly
,
December
3
,
2012
[18]
Suhir
,
E
, and
Kang
,
S.
,
Boltzmann-Arrhenius-Zhurkov (BAZ) model in physics-of-materials problems
,
Modern Physics Letters B (MPLB)
,
vol.27
,
April
2013
[19]
Suhir
,
E.
,
“Three-step concept in modeling reliability: Boltzmann-Arrhenius-Zhurkov physics-of-failure-based equation sandwiched between two statistical models
,
Microelectronics Reliability
,
Oct.
2014
[20]
Suhir
,
E.
, and
Bensoussan
,
A.
,
Application of multi-parametric BAZ model in aerospace optoelectronics
,
IEEE Aerospace Conference
,
Big Sky, Montana
,
March 2014
[21]
Suhir
,
E.
,
Nicolics
,
J.
, and
Yi
,
S.
,
Probabilistic predictive modeling (PPM) of aerospace electronics (AE) reliability: prognostic-and-health-monitoring (PHM) effort using Bayes formula (BF), Boltzmann-Arrhenius-Zhurkov (BAZ) equation and beta-distribution (BD)
,
EuroSimE Conf.
,
Montpelier, France
,
2016
[22]
Suhir
,
E.
,
Statistics- and reliability-physics-related failure processes
,
Modern Physics Letters B (MPLB)
,
Vol. 28
,
No. 13
,
2014
[23]
Gradshteyn
I.S.
and
Ryzhik
,
I.M.
,
Tables of integrals, series, and products
,
Academic Press
,
1980