Why do complex machines require more maintenance and break down more often than a simpler machines.


I naturally understand why this is. We always say “more moving parts makes there more chance for failure”. But why is this? Is it just how the universe has to be due to the laws of physics?

For instance an airplane is pretty complex but the aluminum that makes up the structure of the plane rarely needs to have complete overhauls while something like the engine is constantly being serviced and kept up to standard.

What is it about spinning parts, heat, and lots of piping that make something constantly be trending towards breakdown?

In: Engineering

The more elements there are in a system, the more ways there are for that system to break down or be ordered incorrectly (thus, the more likely it is to get into a broken down or disordered state).

It’s just numbers. It applies to everything, not just physical things.

Take the sequence [1, 2, 3, 4, 5] and consider this sequence to be the “correct” and functional sequence. There is only one way for this sequence to be correct but 5! – 1 ways for this to be incorrect (or broken down).

Now make the sequence [1, 2, 3, 4, 5, 6], same parameters. One way to be correct, 6! – 1 ways to be broken down. A huge increase in possible ways to be broken down just by adding one element. Now add thousands of elements.

There are a lot of parts to your question so I am going to focus on the purely engineering part. Why do you need to overhaul the engine while the airframe can go on much longer without the same level of maintenance. The answer is: material science. Metals have a lattice structure and mechanical properties given to them by the alloy composition and grain boundary area through heat treatments etc… when you take a piece of metal and expose it to load cycles (thing bending a piece of metal back and forth), even if you don’t break it in one go, you develop microscopic cracks in the structure. These cracks will propagate over a certain number of load cycles. If the load is very large, it will break after few cycles. If the load is light, it will go on for more cycles. Other factors play also like temperature. An engine part experiences huge temperature swings which change the chemical and mechanical properties of the parts over time. They also typically experience huge loads and a lot of vibration. Therefore they develop cracks regularly and need to be replaced. There are entire field of engineering created to predict when this happens. An airframe on the other hand is designed with a pretty big safety factor in mind and does not experience the same level of stress. Mainly the wings and fuselage flex during flight. Now there are many cases where the airframe had to go heavy maintenance, or micro cracks were found in the wing structure. But over all the conditions make it less likely to fail over time because of the nature of the loading cycle.

It’s a statistical phenomenon. If we have unrelated events, the more of them there are the more likely that one of them will happen. This is the same concept as “the more times you flip a coin, the more certain it is that your coin will land on heads at least once”.

Every additional part is another unrelated failure mode – another possible thing to go wrong – so the probabilities combine to make it likely to fail faster.

Failures while running are generally spread out randomly over time, the more possible points of failure you have the greater the chance that one of them will come into play.

Consider a server with a bunch of hard drives in it and each individual drive has a Mean Time Between Failures of 1,000,000 hours. If you have a single 10 TB hard drive then you only have a 50% chance that it’ll fail in the next million hours(that’s what MTBF tells you). Now what if you have two 5 TB drives? The chance of each failing in the next million hours is 50%, but the chance that one of the two will fail is actually 75% and you have a 50% chance of failure by 500k hours. Now what about 16 drives? Now you’re down to 62.5k hours simply by adding more components to the system even though they’re all just as reliable. This is why big servers will generally have some redundancy so they can tolerate one or two drive failures without losing data because they *will* have failures every year.

If any single failure point can take down a system then the more of those failure points you have the more likely the system is to go down even if each failure mode is very unlikely.

Luckily a lot of failure modes in big systems like cars and planes are wear out so we can proactively swap parts like the bearings and tires to avoid a catastrophic system failure that can happen if the failure is random and occurs at an inconvenient time.