Prof. David Coit Keynote Speech at SRSE 2023, Beijing, China

Prof. David Coit from Rutgers University presents a Keynote Speech at the System Reliability & Safety Engineering (SRSE) conference held October 20 – 23 2023 in Beijing China. Prof. Coit’s speech is titled ‘System Reliability Models with Clusters of Dependent Degrading Components’

Hello everyone I’m Dave KO from ruter University uh in New Jersey the USA you know I’m speaking to you via video from the USA but tell you the truth I’d really prefer to be in Beijing right now probably some of the audience knows me some knows me very well some not at all

But those who know me realize Beijing is Beijing and chingua University are like a like a second home I feel that way let me let me give some examples there I am in 2014 I’ve given invited seminars several times 2008 2014 2019 the first time 2008

I was invited to shinua by by Kaio Wong thank you Professor Wong I also was part of the IQR meetings and presented 2017 2019 in Shenzhen 2023 this year remotely there I am in 2017 looks like I’m making a very important Point although M on the left I

Think he’s reading a newspaper or something and next to then though we see Professor noser s perala and some of you know Professor sink perala died this year so he was really a great man and a leader in our field and so here’s to you Professor sink

Prala and kind of most importantly I was a visiting professor in the IE dep industrial engineering department at shinga 2019 to 2022 and there I am at the appointment uh ceremony with uh Professor Yan fui thank you Professor Lee wish I was there right now now as said I’m from ruter

University ruter university is in New Jersey which is circled in red ruers is one of the very oldest universities in the United States and in the Northeast anyways in terms of enrollment it’s one of the very largest un universities let’s begin my talk now allive engineering is based on

Assumptions often General Physics and principles physics principles but then assumptions are made we need to make assumptions to to move on I started my engineering uh career by getting a mechanical engineering degree at Cornell University in mechanical engineering we learned things like Point masses frictionless pulleys Springs where the spring force is proportional

These are assumptions that are made are they actually true well yes and no there’re certainly at worst very close approximations and we need to make assumptions to build upon and to advance now in reliability we also make assumptions often maybe a student learning reliability for the first time

Or someone in Industry what are some assumptions one is homogeneous populations and homogeneous time to failure populations another is independent failure times so if you have a complex system with different components an assumption is often made of independent failure times now are those good assumptions homogeneous populations in

Independent failures often yes but not always but not always and my talk today focuses on sometimes they’re not sometimes the failure times or the degradation processes are not independent and that’s what I’m going to talk about today now the word dependence in an engineering sense means many things to

Many people as I said I started in mechanical engineering if you talk to mechanical engineering about failure processes being dependent they think of a physical dependence this causes this degradation of this causes degradation a cause and effect relationship now and that’s that’s fine also there’s economic dependence what is

That an example would be opportunistic maintenance where we’re going to repair one item we have an opportunity to repair another with a cost savings or the reverse economic dependence could be when money or resources used on to fix one component can’t be used on another and but what I’m referring

To is stochastic dependence what is that you have two random variables in this instance degradation for some fix time t or sometime T and these random variables have some dependence relationship typically is measured by a co-variance or correlation coefficient now my talk today is on clusters clusters of

Dependent components and that would be an example of stochastic dependence now do we care again all of engineering is built on assumptions maybe they’re not exactly right but that’s what assumptions are well consider this consider this simple parallel system two components look at the graph there’s some sort of

Degradation process going on when the degradation path reaches that horizontal line it’s a failure and the failure times follows some distribution now this is a hypothetical example both of the failure time distributions are normally distrib uted with a fixed variance but I’m going to vary the co-variance

Between T1 and T2 failure Time 1 and failure time two first covariance essentially of zero or correlation of zero this is a graph of T1 versus T2 failure time of component one failure time component two but let’s increase that variance and increase it again now the dependence relationship is

Is very clear this by the way is simulation Monte Carlo simulation based on those um distributions now let’s increase it even one more time probably unrealistically but now the dependant relationship couldn’t be more clear let’s think another example probably many of you know about load sharing and if you’ve learned about

Load sharing often the professor will give this example of an elevator with three cables if one of the cables breaks right there look at the animation let’s do it again one cable breaks then the remaining ones have to share the load and it increases the well it decreases the reliability and

Again which means the remaining cable needs to carry all the load which is load sharing the way we’re going to model this again you can look at the graph we have a degradation process but we’re going to model the failure times here with a wble distribution and what we’re going to do

Is Define some parameter Alpha what Alpha will be is the wable scale parameter will decrease by Alpha of the remaining two cables when one fails any other the three could fail first when the second one fails the yal scale parameter will decrease by Alpha again okay this is three graphs T1 versus T2

T1 versus T3 T2 versus T3 this is with an alpha of 0.1 if there’s any pattern there or any dependence you can’t see it let’s increase Alpha again H little bit certainly in the first graph again ah now we can see a clear dependence relationship we’ll increase it one more

Time probably unrealistically but now we see very clearly a pattern that the failure times have a dependence relationship if one is going to fail early they all will now do we care do we care does that how is that going to influence our decisions yes we do care consider this

If we have a parallel system and we ignore the dependence our reliability assessment will be optimistic we’ll think the system is more reliable than it is on the other hand for a series system we’ll be pessimistic and of course a series parallel or some other structure could be

Either so it is important to consider the dependence or the co-variance now let’s even go further those of you who know me and some of the audience does know I work in system reliability optimization so consider this formulation I want to maximize system reli ability we have several constraints

We have some decision variables given by a vector X those would be which components to choose levels of redundancy now if we ignore the dependence we might get that um system design on the top upper right loss of redundancy potentially less reliable components the size of the rectangle

Represents how reliable it is as we consider dependence more and more we’ll get designs like the one on the bottom less redundancy but more reliable components so if you’re maximizing system reliability and ignoring dependence when in fact it’s present you’ll end up with an optimal design like the top when you

Want the one on the bottom let’s review now what do I mean you’ve already seen a few graphs like this if you consider the Y AIS capital x degradation what is degrad we’re using the term degradation in a generic sense any observable and measurable lack of performance or excuse me deterioration of

Performance this shows an increasing it could be in fact could be decreasing and you see the different lines those are different degradation paths these are random they’re not all the same now eventually they reach that horizontal line labeled a failure threshold when that occurs a failure occurs you perhaps see those

Three normal distribution looking things kind of not imagine them being rotated so they’re facing you directly at you and that would be a distribution of degradation at some fixed time T and the part of that distribution that is beyond that fa threshold lines would be the

Probab ility of failure for that time T now who who who should we who gets credit many people but in my opinion Professor Bill Meer receives my respect and credit for popularizing and doing a lot of the original work on degradation modeling Meer he’s been a professor at Iowa State since

1975 now I I saw an an I saw a notice somewhere he’s retiring this year so it’s almost 50 years I haven’t confirmed that but I think if Professor merer does retire it will kind of be like uh when wqu retires just means he’ll work just as hard just hopefully doing more things

He likes but meer wrote the first papers or at least the first papers I was aware of of course I wasn’t really actively researching this at the time time and I asked him once what gave you that idea and he said I was a young Professor working at Bell labs and I

Simply was giving the task of assessing the reliability of items laser items that never failed but they degraded and like many good research papers at least from my observation for maybe the first 10 15 years his original work on ation didn’t really generate too much excitement but eventually eventually the

Engineers learned it and it received lots of attention now what are some specific examples of degradation well many as you can see here you can read the list here’s one this is metal corrosion These Bars you see are called rebars what’s a rebar a rebar is encased in concrete and used

To build Bridges the same Bridges you drive on or or you see every day the rebars though corrode the concrete needs to crack first but eventually it will and contamination will occur in the Rucker civil engineering laboratory we are testing many many rebars of different materials in a quest to make

More to produce more reliable Bridges the picture on the far right that’s Jen Hong Lee a PhD student right now from chungu who is doing his PhD on reliability and you see him there each one of those rectangular concrete blocks has a rebar in it now the great thing

About working with civil engineers is they don’t understand fractional factorial designs so they are testing every combination producing a lot of good data fatigue is an example wear is an example now how are we going to model this degradation again degradation an observable and measurable deterioration of performance many models and you know

I don’t have time today because I need to get to my new models but I need to review a little bit the original merer and L model I refer to as a random effects model or a general path model most of my work involves monotonic degradation processes so I use a gamma

Process but some degradation is not monotonic so weener process may be more appropriate there’s others as well the inverse gaussian process has become popular um Jang Yi is a very prolific author he’s he’s he’s he’s written some very good papers now just just briefly this is for

Those of you who maybe aren’t familiar with degradation this is would be the original meiker andl the general path a random effects model we have this function Ada look at the equation we have um a population of Parts they’re all the same but the individual um unit

Is subscript I and we have different time measurements J but we see this function Ada has two vectors two vectors of coefficients or covariant fi and Theta I fi is common for all of the components Theta I is specific just for that unit I and that creates the the

Diversion or the variability in those functions and the randomness I should say gamma process is is very important and simple in a sense just between two points of time t the incremental degradation follows a gamma distribution with two parameters a scale parameter and a shape parameter look at

The equation for the difference between two different degradation measures we see a function of time now let me get to what I’m here to talk about that’s dependent degradation paths before I do that I want to promote and advertise this brand new paper dependent failure Behavior modeling for risk and reliability a

Systematic and critical literature review jigo Jang and baros myself jigo Jang you see the picture on the top he’s a very um Innovative and successful professor at Central suik univers Paris SLE in France uh his PhD is from beung University in Beijing Underneath Him is an baros Professor baros is the head of

The risk reliability and resilience research team at Central suik this paper notice it’s dated November 20123 and as I give this talk it’s October 2023 so we’re looking into the future and uh is a very systematic literature review but much more than a literature review we look at patterns in

Research and Trends in research with time we try to categorize research Trends uh uh the most published authors which journals uh many many things now in that paper we’ve categorized the research on dependence modeling so the first you’re looking to the left here the first big categor statistical dependency modeling and mechanical dependency

Modeling within the C category of statistical dependency modeling first we see uh lifetime dependence this would be your failure time distributions exhibit some dependency relationship next you see System state models this would be your Markoff chain faultry analysis then you see degradation process models that’s what I’m interested that’s

Certainly what I’m interested in my talk today well I’m interested in all of them that’s what I’m interested in today the figure on the right this is for just for the most recent papers this is a mapping of keywords and what we’ve tried to do is subjectively categorize that’s the

Different colors you see now what are some ways again what are some tools when you have multiple components in a system or you know or there could be multiple failure mechanisms but for me it’s components and they’re not independent they have degradation paths that are not independent how can we model those well

There’s many ways certainly many researchers would use a Markoff chain requires some simplification some definition of s of State space shared shock exposure models which I will talk about uh only because I was the author of many of those papers you can have a joint distribution function of the

Degradation at sometime T that’s difficult to do but you can I mean theoretically or you can approximate that with a Copa function you can also have a random effects process a random effects stochastic process or degradation process what is that well you could have a stochastic process a gamma process a

Wiener process but one of the process parameters is itself a random variable and how you define that random variable can introduce this dependence relationship and in fact one of my new models with my co-authors and colleagues does just that based on a extended random effects model not talking about

Machine learning uh today I don’t consider myself a machine learning uh expert although I now have quite a few Publications but that’s generally due to my colleagues and and students and let me call attention to this book this book came out this year 20 23 uh this was in honor of Professor

Hang which many of you know from chungu uestc for his 60th birthday if you’re listening Professor hang happy birthday but in this book and a credit to yulu and the other co-editors uh who did an outstanding job uh myself Jen Hong Lee Chen Yuan we wrote one chapter and it’s a tutorial

Tutal it’s kind of a tutorial you see it on the far right on different methods for dependent uh degradation modeling now what I’m going to do next it’s not a comprehensive review I don’t intend it to be it’s I’m describe the papers that um influenced me and allow me to build

The clustering model which will end my presentation with first just very briefly markof chain because I know many people use this approach and uh certainly in reliability a viable approach for many types of problems could you use this yeah sure in this simple example we have two components

Each have three states so it’s a degradation new partially degraded failed and the way you define the transition probabilities between states as the transition probabilities may increase is perhaps the other components in the system have degraded you could introduce a dependency relationship difficult to do in my opinion Others May

Disagree if you have many components you have many states and there’s many transition probabilities to Define degradation is generally continuous so you would have to Define these states let’s move on now the shared shock exposure models these were largely papers originally anyways by myself and my colleague

Shenme Fong and our students and so I do want to talk about these These are we have a system and all the components are degrading and periodically the random shocks hit the whole system now when the shocks hit the system it affect all of the components and since the number of

Shocks is a random variable that introduces a dependency relationship let me explain see this figure at the bottom part of the the figure you see a parallel a series parallel system now here comes a shock watch carefully and another one now do you notice the shock hits the entire system every component is

Affected and when the shock hits the system each component can fail due to the shock itself it may surpass some threshold but if not there’s some incremental amount of degradation that takes place and they all receive it since the number of shocks is a random variable if there’s many shocks all of

The components will perhaps in a probabilistic sense have higher degradation the number of shocks is low less that introduces the dependence very quickly this would be just for one component out of all of them but you notice the upper graph is the degradation continuous degradation together with these incremental shocks

The lower graph is the shocks themselves but notice every time there’s a shock in the lower graph look up and in fact they’ll be that incremental increase each component can fail due to a soft failure or a hard failure that’s why we say they’re competing we have dependent competing failure

Processes now for each individual component to be reliable and again this is just one out of n look to the right and we see it has to survive all the shocks but even if it survives all the shocks each shock introduces that incremental increase that introduces the dependence very quickly and I apologize

I’ve avoided slides like this most of you who know me know I’m a professor I teach classes in optimization and reliability so they’ll always be one or two slides like this for a series system for this system to be reli ible each component has to be reliable and by the way this

Method isn’t limited to series not at all this is just an example so for each component to survive it has to survive all the shocks those are the W the random W’s in addition to that it has to survive the continuous degradation with incremental shifts then intersect not multiply these aren’t independent events

Why why aren’t they independent that’s because of capital NT look to the equation every one of those components has that NT which is the number of system level shocks how do I solve this not too hard I’m going to condition on the number of shocks and then sum and once I

Condition these other events hard failure and soft failure will become conditionally independent now not independent but conditionally independent based on the number of shocks and there’s the final reliability equation I’m biased it’s my paper but I think the equation in the blue box is one of the most important

Contributions we’ve extended this many times I just wanted to call attention to this one extended paper from 2019 with nin Yi myself uh Shen Fang s Sing Song now look at the graph there’s going to be an animation the failure threshold is fixed the on condition threshold is a decision

Variable so we have formulated this as an optimization problem to minimize the cost rate by changing the on condition threshold for each component think about it if the uncondition threshold is too low you’ll never get an unscheduled uh failure or be very unlikely but you’ll be replacing components or items that

Still have useful life wasteful perhaps if the uncondition threshold is too high you will have unscheduled maintenance or failures at the system level and those will be costly now I’ve grouped together multivariate joint distribution functions and random effects stochastic processes even though they’re very different but often they lead to the

Same place again this is not meant to comprehensive in fact I’m reviewing only those papers that were useful to me to developing the clustering models first we have this paper this is a kind of a simple model Shu Chen Wong Tong this is a wiener process there’s

Only two components C subscript S one and two and as with most weener processes there’s a a drift term and a diffusion term but let’s look at them more closely the drift term has this parameter Alpha and notice Alpha doesn’t have a subscript s where the others do and so Alpha is

Shared by both components and Alpha is a random variable here it’s normally distributed and so as Alpha goes High both the co the the the weener processes for both components are affected that introduces dependence the first paper I’m going to present to you of my of our clustering models it is very different

It’s a gamma process it looks extremely different but it shares a common thought and so I would say it’s influenced by this another paper um again you could argue it’s a simple concept but it’s a good one this is a 2021 paper benl Pandy Wong Shia

Hope some of you guys are in the audience and if you are you know this paper has influenced me we have three independent Gam processes y1 Y2 y u but two components X1 which is the sum of y1 and Y U X2 the sum of Y2 and Yu so Yu is

Shared by both of these so I can condition on Yu integrate and very easily I can get a joint distribution function for those two stresses for this dependent relation ship now my second model I’m going to show is a direct extension from this paper continuing on here’s another paper

Uh this is um a paper with L Wong hang why JK this instead of a a just a component it’s a system it’s a system with n components there’s J copies of that components and subscript K is different time increments where data is collected and it’s a general path Model

A Meer and loop type model which isn’t surprising Yi Hong’s PhD adviser in fact was billm maker but here we see these parameters Omega and the omegas have a multivariant normal distribution with a variance covariance Matrix that introduces the dependence very similarly except for this time we have a wiener

Process in this time though we still have this Omega term but now we approximate with a Copa function now I don’t have any other examples of Copa functions for uh dependent degradation Behavior but there’s many I recall a paper when jinju has written that I’ve referenced many times there’s many

Others now finally my new models an extended random effects stochastic model and a superposition of gamma processes they’re models with my students and colleagues the first one this was published in res 2020 this is where we introduce this concept of clusters of components um let’s let’s proceed now

Here’s four components in a series system these are four simulation runs for the same system now look at the the the red and the red and blue now it isn’t that the red and blue on average is higher than the other two it’s that they’re together that’s what I

Mean by a cluster the first two graphs and these are four simulation runs you know randomly generated is that they’re together if blue is high red tends to be high there’s a dependency relationship among them components three and four same thing there’s two clusters so there’s a dependency relationship two clusters how

Do I model this well it’s not quite like that shoe paper with the alpha term but similar a similar idea so Random effects gamma process each component is a gamma process but we see the scale parameter Theta I Theta parentheses I and it’s given by an equation and look inside this equation

You see these terms Theta 1 Theta 2 Theta K notice there’s no subscript I those are random variables for cluster one cluster 2 Etc in front of each of those thetas is an alpha term that’s a sensitivity coefficient if a certain component does not belong in a particular cluster Alpha is equal to

Zero if it’s partially in one cluster Alpha is corresponds to that partial membership now a component can be in zero clusters one cluster or partial members in more than one cluster now realistically how many clusters well for many applications zero don’t need my paper and and one one

Would be the most common answer in fact when I first wrote this paper we limited it to two and I kind of wish I left it that way the models would have been more understandable but before submitting we generalized it to K that’s the way I think um here’s some example of reliability

Graph I think this is a good model it’s complicated perhaps overly complicated you would need a lot of data to estimate these parameters if you have enough data we can in fact and have developed a likelihood function that we can maximize but the in truth this is a complicated

Model I wanted an easier model so we’ve worked on this now notice this Lee KO and Jang 2024 so it doesn’t exist yet well it ex it exists just not in a final form so that would be Jen Hong Lee myself jigo Jen from France um the way this model works is

This there’s two submodels first model one then model two model one is a direct extension from that binl at all paper in fact model one would be it’s not worthy of a journal paper be conference paper but you’ll see we do more than that in this instance instead of two components and one

Cluster cluster clusters my term we have n components and M clusters however each component can be in exactly zero or one cluster full membership now we can very readily given those assumptions condition again on on the clustering uh random variables integrating we can very quickly get a reliability

Function and from the reli ility function we can get a cost rate function and uh you see on the graph on the right that’s what we see and we can minimize that cause function but let’s extend it to model two now model two is a generalization of this model one now

Look at the equation every con component can be in multiple clusters the clustering gamma function are the W’s with a parameter governed by this Matrix capital A if that small a is a zero it’s not a member of the cluster and often that will be the case now this model is

Straightforward it’s easier to understand it’s um easier to quantify I still need a lot of data here’s a general reliability function for a series system we can do something similar for any system structure so this is this is my current research now let’s see how it

Works now look at this graph these are four different Alphas as we go left to right the dependence increases now there’ll be four simulation runs first look at the left is there any pattern just look at the left ignore the other ones for now no not really between the four

Components now look at the third one I’m going to keep doing the simulations oh we do see and what again what you know we like here Red’s by itself but um we start to see two components the red and the blue are often together and the other two are together as well now

Let’s go all the way to the fourth one Ah that’s an extreme that’s not realistic but we’re trying to make a point there’s complete dependence we have two clusters with complete dependence in reality the second or third graph would probably be more appropriate when there’s dependence relationship but you can see we’re able

To successfully capture the dependence with this model we’re able to successfully model clusters there can be two there going to be three there going to be any number of clusters and this model then does it and with some reliability how many clusters should there be well what we

Recommend in these papers if possible that should be predetermined so that concludes my talk thank you very much um anything I talked about today could also be done with some sophisticated machine learning models and I know researchers are working on it now I know some very good papers by

Researchers in Netherlands Maha matii and others and that concludes my talk thank you very much