Project: Maintenance Requirement of Micro-electronic Equipment
A SURVEY OF THE MAINTENANCE REQUIREMENT OF MICROELECTRONIC
EQUIPMENT IN THE NIGERIA ENVIRONMENT
A FINAL YEAR PROJECT REPORT
UNIVERSITY OF PORT HARCOURT
FACULTY OF ENGINEERING
1.0 LIFE CYCLE COST (LCC) FACTORS
Probably one of the most controversial and difficult subjects of LCC studies is the type of factors which one should include in the evaluation, analysis estimation etc. the reason for this controversy lies in the fact that almost everyone has a different view that constitutes important factors. There are two major types of cost parameters, non-recurring and recurring costs.
Non-Recurring cost factors: the non-recurring LCC factors can usually be related to the following:
Research or Development: This cost will depend on the extent of the research or development specified in the contract. If the extent reaches into the latest state of the art, costs will be high. Costs of minor development on new equipment may be insignificant in relation to other cost factors. There are few equipment which are bought off- the – shelf without having included some required modifications, especially for the military. These costs are often overlooked but are part of the unit price.
Reliability – maintainability engineering: The improvement of reliability or maintainability of a design after or before delivery is another factor. Off – the shelf items may be fully satisfactory to a customer and the manufacturer may like to offer in his proposal an improvement in these parameters. His cost estimated for such improvement will form part of non-recurring cost.
Qualification Approval: The approval of an item prior to acceptance or delivery will require tests, facilities and manpower. Often the tests cannot be carried out at the contractor’s plant. Shipment of equipment to a site for tests will be necessary, new test equipment , special to the item may be required, in the mechanical field special – to – type test equipment is minimal but in such fields as optics, fluidics, microelectronics, high power etc. new methods are often needed. The cost of facilities can be a major one. For example, on a recent test of a reconnaissance drove the facilities required had to be augmented with new sites, protective bankers, fuel pumps, an improved weather station and a survey of landing areas.
Acquisition: The cost of buying an item includes everything which forms part of the actual item, such as power supplies cabling or mounts for a radio set, or spares. It has been suggested that taxes and duties should be added to the buying cost, but this may not be easy because each country, province, state, and government department has different regulations, one sometimes off- setting the other.
Installation: Once an item has been bought it has to be installed and perhaps tested in its final environment. Additional connector may have to be purchased adjustments may have to be made, special mounting bracket may have to be manufactured, interference filters may have to be added externally. This takes time and manpower. Extended field tests prior to taking the item into the final inventory for operational use can be costly.
Transportation: Although this factor is obviously a cost contributing one, the initial conceptual, development and acquisition phases of any program require some travel, shipment of equipment with commercial carriers, trips to contractors by car or otherwise, mailing of bulk material, such as volumes of proposals or contract data packages, all constitute cost factors which occur once.
Recurring Cost Factors
Operating under this category one must review the requirements of the item. A radar set may require special environmental conditions such as air conditioning, to operate properly. It may require certain services such as continuous power sources, fuel and water.
Manpower: There is little equipment which runs without manpower for any length of time. The level of manpower required to operate and maintain it will have to be cost and will depend on the type of equipment. A programmer will earn a different salary that a straker on a ship. The maintainer operating the automatic test equipment will obtain a different income than the maintainer who puts a protective coating on a printed circuit board after repair. The number of persons involved will depend on the type of equipment, the function and mission.
Support: Continuous support in the form of supplies and services is required for any item which is operational. For example, typewriter or teletype requires paper. The operational of the item may require electricity which must be bought or generated by a local power supply. There is almost no end to these support requirements which are continuous and recurring costs.
Maintenance: Probably the largest cost in operating any item is the maintenance at various levels. These recurring costs are the major finding problem at the present time for any tight budget over the useful life of the item, often ten or more years, these costs may amount to more than 40% of the total life cycle cost may believe that with improved or higher reliability of modern parts, the maintenance burden will be reduced. They advocate simplicity but, unfortunately, in order to improve simplicity and equipment reliability one must make use f parts which are extremely reliable; these are very costly. A faster answer to the problem may lie in developing even more complex equipment which has self – maintaining features built – in, which take care of at least equipment for maintenance itself, such as repair of an equipment, or calibration of at test set, or lubrication of a serio motor and many more form forecast especially for the total life cycle of the item but because these costs are high an effort must be made to develop a reasonable estimate which can be compared with historical or other data.
Inventory: Net only is spare parts required or support but these are bought on at time scale to replenish the pipeline or stock. There are parts which cannot be stored for a long time without deterioration and therefore these will create recurring costs.
Inventory: Net only are spare part required for support but there are bought on a time scale to replenish the pipeline or stock. There are parts which cannot be stored for a long time without deterioration and therefore these will create recurring costs.
Maintenance cannot divorce without the concept of reliability: if equipment could be made perfectly. Reliable no maintenance could be necessary at all. A knowledge of which components are most reliable, and which least reliable, assist, in the maintenance task.
WHY PREFECT RELIABILITY CANNOT BE ACHIEVED
In theory maintenance could be avoided completely by making equipment perfectly reliable. In practice, of course perfect reliability cannot be achieved. No one has ever made anything that never wears out, never has a defect and that would last forever. Perfect reliability is like perpetual motion – interesting to imagine, but not practical.
In any equipment, increases in component of reliability usually lead to an increase in complexity. Also, the development of more reliable component, cost of money; research and testing have to be paid for. production of more reliable components involves the introduction of new methods and machinery for manufacture, d of more stringent production testing.
A stage reached where the cost of producing more reliable components out weights the saving that are made by reducing maintenance costs and reduced holdings of spares where this stage is reached, expensive to be practical.
The nearest approach that has been made to eliminating maintenance complexity is in the manufacture of unmanned satellite, for when they are in orbit maintenance is extremely difficult.
Failure of a component or equipment is noticed when it fails to perform its intended function under operating conditions. A failure may be less of output, or a change in output to a value outside the specification limits.
The failure rate of a component or equipment is the number of failures divided by the time during which they occur or when quoted as percentage, it is defined as percentage of failures in a given time. Suppose there are 1000 resistors, and 33 of them fail per year, the failure rate can then be stated as 3.3% per year. This method of expressing failure rates enables the reliability of resistors to be compared directly as far as their use in particular is concerned.
The failure rate for components is often expressed as percentage failures per 1000 hours.
Fig 1.1 variation of failure rate with time. The initial failure period is known as the burn-in infant-mortality period. This is a fairly short period during which failure due to faculty manufacture occur. After this period, a period of low, constant failure rate follows:
The constant failure are period is known as useful life, and during this period failures are random in nature. This means that failure are due to chance alone, and the chance of a failure occurring is the same at all times during the useful life. This period covers the normal life-span of an equipment or component.
The wear-out period shows a rising failure rate as the effects of age, usage and chance combine to give increasingly unreliable equipment.
These three periods enables one to determine such factors as the best time to sell or buy, and second hand values.
RELIABILITY MEASURED AS A PROBABILITY
The term reliability expresses the chance that a component or equipment will function normally for a specified period of time; it is the probability that it will perform its proper function under normal operating conditions for a specified period of time.
If said that equipment has a reliability of 70% for 100 hours period, it means that when operated for 100 hours, it will function correctly seven times out of ten with no failure. The remaining three times out of ten will fail. The failure rate of the equipment is this 30% per 100 hours. If the reliability had been expressed as 7/10 or 0.7 per 100 hours, the failure rate would be expressed as 3/10 or 0.3 per 100 hours. It can be seen that:
Reliability = (1 – failure rate) when reliability and failure rate are measured as percentages, the expression becomes.
Reliability = (100 – failure rate)
RELIABILITY MEASURED AS MEAN TIME BETWEEN FEATURES (M.T.B.F)
Since the probability that a price of equipment will perform successfully depends upon the conditions under which it is operating, the performance expected of it and the time of operation, the most important thing to a user is the average time that equipment will run between failures. This time is known the Mean Time between Failures suppose a unit consists of 1000 resistors whose failure rate is 3.5% per year. Then on average resistors of a year will elapse between resistor failures, i.e. about 11 days. The unit may be expected to run about 11 days before a resistor fail and this is its m.t.b.f.
It then means that reliability, failure rate and m.t.b.f. all depend on chance – on the chance of a component unit or equipment failure. It would be more accurate to say that if the above unit is operated for a very long time, on average it would fail every 11 days.
If there are n components of a particular type, e.g resistors, capacitor and the average failure rate is f per hour, then nf components will be expected to fail every hour. The m.t.b.f. (m) will then be.
The unit in this formula must be consistent; if the m.t.b.f is required in hours, the failure rate must be expressed in number of failures per hour not per 1000 hours.
THE RELIABILITY OF A SYSTEM
Predicting the reliability of a system the failure rates of each of the components must be known, or estimated. The choice of failure rate for each component is not easy, but a figure derived from information concerning failure of components in a similar system working under similar system working under similar conditions is likely to be most useful. The table below represents the calculations for a computer using the same type of components
DATE FORRELIABILITY CALCULATION
Component Number Failure rate Total failure
c/o per 100 per 1000 hours
Transistors 600 0.008 0.480
Semiconductor 25000 0.004 1.00
Resistors 3000 0.0004 0.120
Capacitors 6000 0.0016 0.096
Plugs and sockets 2000 0.0008 0.016
Transformers 150 0.009 0.014
The total number of failure per 1000 hours is on average 1.726 and the m.t.b.f. for the computer is therefore
m.t.b.f. = =578 hours
MAJORITY CAUSES OF UNRELIABILITY
The failure of microelectronic equipment may be due to a number of causes in addition to the failure of components. Negligence, faculty design, omission, unsatisfactory parts and poor workmanship all contributes towards failures. The list below shows some of the causes of failure in order of importance.
- Inadequate engineering design
- Component fixing
- Poor handling in operation or maintenance
- Poor workmanship and lack of checking
COMPONENTS AND RELIABILITY
HOW TESTING IS CARRIED OUT
Components may be tested under varying conditioning of humidity is tested in a box which is temperature controlled by a thermostat, and whose humidity can be altered by infecting water vapour from a boiler. A fan gives an even distribution of temperature and humidity within the box and the variation can be made cycle if desired. Humidity testing is necessary when equipment is to be operated in different climatic conditions, from polar to tropical, and when it must retain its specification within acceptable limits.
Variation of performance with temperature is tested in a similar ragged box, or perhaps in same one. A refrigerator can supply the box with cool air, a heater with not air; for higher temperatures the box itself may in an oven. Temperature testing is necessary to stimulate the effects of different climates or environments such as the microelectronic equipment part in high-flying aircraft or even the high-flying aircraft itself.
Vibration testing is carried out in a machine that may be electricity or mechanically driven. The effects of vibration are important because it is present in all forms of transportation, and also on working surfaces near heavy machinery.
The effects of shock are investigated by subjecting the equipment or components to a can-operated bumping machine driven at 100 to 200 revl min. This machine given impact shocks testing supplies information or the effects of bad handling, such as a dock.
Accelerating tests are carried out by means of a small centrifuge; alteration of the length of the rotating arm and speed of rotation produces different values of acceleration. These tests are used where components are reliable to be subjected to high values of acceleration in operational use.
The effects of low pressure or insulation break-down, tacking at high voltages and the efficiency of sealing are investigated by using bell-jars with rotary vacuum pumps.
Testing the effects of unclean radiation is not a common procedure, but components have been subjected to radiation from unclean power plants and some general result obtained.
- FACTORS AFFECTING MAINTENANCE
THE FACTORS INVOLVED
- Operational requirements
- Equipment characteristics
- Aids to maintenance
- Job environment.
The most important factor defining maintenance policy is the operational requirement. By operational requirement is meant the function an equipment has to perform and the conditions under which it has its function.
Equipment characteristic comprises the way the equipment is built electrically and mechanically, and the way in which it works to satisfy the operational requirement. The letter includes such factors as reliability, safety precautions and environment.
Complexity is equipment characteristic. In the microelectronic sense, complexity is defined clearly the number of, and interconnections between components. Clearly the greater the complexity the more difficult to isolate faults or make working adjustments where inter-connections are profuse. if the task is difficult then the need for good training, or for aids to the task, increase in importance.
Mechanical structure affects the maintenance task mainly through requirements for manual skills such as soldering small components, adjusting dust cares, and dismantling and assembly of a receiver turning card. All are delicate task which have generally to be carried out in confined or otherwise difficult spaces.
From the maintenance point of view reliability is probably the most important characteristic. The combination of information about how often equipment goes wrong, with information about how much effort is required to repair it, is an important measurement of operational usefulness.
Extreme causes of each of the factors affecting the maintenance are frequently encountered. For example, the very high reliability of fired cabling is such that maintenance of it is seldom necessary, whereas, at the other end of the scale, there are examples of such gross unreliability that components in equipment have had to be re-designed.
AIDS TO MAINTENANCE
Aids to maintenance are the tools, test equipment and information which are not required for an equipment normal operation, but which are desirable for carrying out maintenance some aids to maintenance are always available but the amount is extremely variable. At one end of the scale, a circuit diagram may be all that is available, at the other end, automatic test equipment may be installed. The type of aid available prior aid influence training and the ability level of the men who are required to carry out maintenance. If a circuit diagram is the only aid provided, technicians must be trained to use the information in the diagram for such matters as determining the normal voltages to be expected and for making decisions about test procedures.
It is rare to find men who, without training, satisfy the requirements of a particular maintenance task. Because training takes a lot of time and money it is one of the most important factors in determining maintenance policy. The training requirement can be summarized as the difference between the ability required to do a task and the initial ability of men selected for it. That is initial ability plus training gives the required ability.
It is possible to reduce the costs of training either by raising the selection standards for technicians and shortening the training course or by improving the aids to maintenances which are intended to simplify the task and reduce the required ability.
The condition in which technician works are as important as the condition in which equipment is operated. Apart from the physical comfort of the working space, other factors have to be considered, such as the availability of spares, the amount of supervision and guidance given, the time available to complete task and safety precautions.
The best maintenance policy is obtained as a result of the optimum combination of the contributing factors which have been mentioned.
It is worth bearing in mind that policy decisions can be altered as more accurate information becomes available. When equipment has been in use for some time, accurate information is obtainable concerning all the factors which were predicated in the development stages. The information can be collected and used to modify the maintenance policy and increase efficiency. The technician is a key person in this feedback process, and it cannot be over-emphasized that the meticulous recording of maintenance data is an important part of a technician’s task, although, in all too frequent cases, he is not told of the reasons for the collection of this data. Examples of data are recording the completion of routine tasks, recording details of operation, and recording faults. The last is of particular importance both as specific information about equipment and is general information about components. With the generally high mean time between failures which are now obtainable, a long time elapses before sufficient data can be obtained to check reliability predictions and the opportunity to collect valuable data must not be wasted by defects in recording.
- PREVENTIVE AND CORRECTIVE MAINTENANCE
Maintenance has two aims. The first is to prevent failure by routine checks on units or components, for which a predictable rate of wear occurs, with the object of replacing any that are nearing the end of their life. This is called preventive maintenance. The second is detection, location and repair of faults when and as they occur. This is called corrective maintenance. For most equipment both types of maintenance are carried out.
Fig 2.4 preventive and corrective maintenance
The difference between these two categories of maintenance is related to the probabilities of failure as shown in figure 2.4. figure 2.4 (a) shows the probability of failure as failure rate against time during the wear-out period. Figure 2.4 (b) shows the failure rate against time for a component which has a constant rate of wear; these are the conditions in which preventive maintenance is required. Fig 2.4 (c) shows the failure rate against time, during the useful life.
2.4.1 PREVENTIVE MAINTENANCE
Preventive maintenance increase reliability by predicting failures and preventing them from occurring. This should not be confused with the steps taken to improve reliability in the design stage which require no subsequent action by the maintenance technician.
When a component or unit has a known rate of wear the task is either one of measuring directly the amount of wear. Examples are the numbers of operations of a switch, the time an equipment is operating and the varying value of the current gain of a transistor. Such wear leads to the need for routine maintenance schedules, such that failure to carry these out and take any necessary remedial action would result in certain failure of the equipment.
Another reason for preventive maintenance is the need to check that parts of the equipment are working under optimum conditions. It may be the case that, although according to its output an equipment is working perfectly satisfactorily, some components are not working under optimum conditions. The power distribution between components may not be that required by the design.
A consistent variation in the characteristics of a component should be treated under preventive maintenance, whereas there is no point in attempting to rectify random variations until they are large enough to result in a fault and be treated as corrective maintenance.
When the failure of a component or unit is random; i.e the probability of failure at any specific time in its life is a constant, there is no necessary preventive maintenance. Introducing preventive measures for random failure can only lead to an reduction in equipment reliability, because in general the more the normal operation of an equipment is interfered with, the less reliable it becomes. It is used to be thought that regular checking of an equipment would detect potential failures, and hence improve overall reliability, but this is certainly not so in the case of random failures. No routine checking can predict random failures.
There are three phase of carrying out corrective maintenance task for a technician
- He must detect the fact that a failure has occurred; this is fault detection. Although this aspect may at first seem trivial, it is not always immediately apparent that a failure has in fact occurred.
- He must find the faulty component; this fault location.
- He must repair or change the faulty component; this is fault rectification.
The report about a failure seldom provides the technician with much information; indeed he may not be certain that the failure is caused by a fault, and is not the result of incorrect operation. The first requirement is therefore to carry out a functional check, i.e. to test the operation of the component. This will provide both confirmations to assist in diagnosis. For example, if a packet television is reported a showing no picture, the first check is to see that the brilliance control is correctly set. If there is still no picture, check is made to see if the alternative vision channels give an output, or if the sound channel is operating. These are functional checks.
A functional check is clearly most important in the case where equipment is reported faulty but little indication is given of symptoms.
If often happens that an enthusiastic technician finding a familiar fault system ignore the functional check and commences his check in a specific area on the basis of that system alone. This is a technique which sometimes pays divided but there is no doubt that in the long run, and particular, with complex equipment, it is an inefficient and this undesirable method. Systematic Collection of information is the only correct procedure.
Fault location, particularly in complex equipment, is the technician’s most difficult task. The reason for this is that it requires decision to be made, usually based on a large amount of data, for which very little guidance has literate been given. The decisions may require the recall of some of this information from previous checks and a considerable memory lead may be placed on the technician.
METHOD OF FAULT LOCATION
Given an equipment, which is known to contain a fault, there are a number of methods of determining which is the faulty component; each has advantage and disadvantages. The two main methods are the non-sequential and sequential fault location which can also be sub-divided into other integral of fault locations. The sequential method require the technician to make a series of checks or measurements to narrow down the faulty area, and methods in which the fault is located by one measurement or a number of measurements made rapidly and automatically. The non-sequential method of fault location does not require the technician to decide on a series of checks in order gradually to narrow down the faulty area. It is seldom used by technician but an increasing interest is being taken in it an automatic method.
NON-SEQUENTIAL METHOD OF FAULT LOCATION
An example of a non-sequential method is the determination of a fault by the measurement of transfer function by which is meant an examination of the relationship between input and output signals. This example applies primarily at circuit level. A signal of varying frequency is applied to the input of a circuit and the output characteristic curve is measured this output is compared with a set of predetermined characteristics obtained with particular component failures in the circuit. The closest match will then indicate the fault. This technician has limited application, but it has been shown to be feasible for small limit microelectronic circuit.
In a circuit tuned to a particular frequency a fault in any of the tuning component would change the resound frequency, and the nature of the detected change can be used to determine which component is faulty. Similarly, resist or faults, whilst not altering the resonant frequency could change the circuit gain at other frequencies; this gain change can also be related to the failure of a specific component.
Some attention has also been paid to a method of diagnosis which requires no direct connection to the faulty circuit. The method involve the detection of some secondary phenomena which would indicate the serviceability of component. Examples of such phenomena are x-ray absorption, in which circuit defects can be detected using X-rays, infra-red radiation measurement in which thermal characteristic of a circuit can be obtained.
A none commonly encountered non-sequential method is theoretical analysis. Here an attempt is made to deduce, from first principle, the components and type of failure which could cause the detected symptoms. For example, with a circuit which produces some ware output; it might be possible to note a particular distortion in the waveform and attribute this to a certain component failure in the circuit. This is an impressive and sometimes effective technique, but it is also difficult one which is more suited to the designer than the technician.
Generally the technician will not meet the non-sequential methods described, except when they are incorporated in automatic devices. It is more likely that he will have to use technique which requires a series of check to be made in order to narrow down the faulty area i.e. he will have to use sequential testing. The series of checks can be described as either systematic or non-systematic. It is systematic if a set of predetermined rules is used to decide on each check; it is non-systematic if, although each check is made in the fault area, no particular principle or rules are used in deciding which checks to make.
2.0: POWER PROBLEMS IN MICROELECTRONICS (COMPUTERS)
Microelectronics is designed to operate from a clean constant supply of AC power. This AC power must be kept within manufacturers specified tolerances in order for sensitive equipment to operate properly and safely. In fact, IBM states: power line disturbances (variations above or below the voltage available at the wall plug) can cause power variations outside the specified tolerance of data processing equipment.
Microelectronics (computer) sites are constantly subjected to power disturbance that can interfere with normal computer operation. There are three basic types of power disturbances: power noise, voltage fluctuations and power outages.
POWER- LINE NOISE:
Power-line noise is similar to static on a radio broadcast, except that in a computer environment it is more than just an annoyance. Power-line noise can be misread as significant data by a computer, causing untraceable data entry. It can also cause program change or loss and even system damage.
The two basic types or power-line noise are
- Ringing transients caused by network load switching and the switching of power-factor correction capacitors.
- Voltage spikes caused by lighting and by operation of heavy equipment such as elevators and air conditioners.
Both types of noise can cause improper batch termination, damage to magnetic drum memories, extensive downtime and costly reprogramming.
Fluctuating voltage is a common phenomenon that make lights momentarily dim or cause circuit breakers to trip. Fluctuating voltage also creates serious operational problems for sensitive electronic equipment. When voltage is too high, equipment damage may occur. When voltage is too low a computer may lose significant portions of its data and may also function improperly. These malfunctions can result in unprogrammed data changes and in error s in logic and memory.
Sources of voltage fluctuations include:
Transmission – line voltage drops between the utility substation and the user’s service entrance caused by normal transmission – line impedances.
Intra – building voltage drops between the service entrance and the point of use, resulting from normal impedances found in cables, connectors and fuses. Brownouts (planned reductions in power) which are initiate by the utilities during periods of peak usage. (In service cases, noise suppressors are guaranteed to get rid of power – line noise, and they feature a money-back guarantee.
PROTECTION AGAINST VOLTAGE SAGS AND BROWNOUTS
AC Voltage Regulators protect computers and other sensitive equipment against voltage fluctuations that can cause system malfunction, loss of data and other costly problems.
Fast Response: AC Voltage Regulators monitor line voltage an respond to fluctuations in less than one cycle of operating frequency.
High Efficiency: All modes are 98% energy efficient at all load levels. This remarkably high efficiency keeps resulting costs and air conditioning costs to a minimum, resulting in significant dollar saving over less efficient regulators.
Low Distortion: AC Voltage Regulators produce less than 0.1% total waveform distortion. Other voltage regulator devices may produce up 5% waveform distortion in both input and output. This distortion can significantly degrade equipment performance.
Input Power Factor: AC Voltage regulator reflect the power factor of the load almost exactly. This eliminates the need for expensive power factor correction equipment.
2.1 THE EFFECTS OF ENVIRONMENT ON COMPONENTS
Equipment that has to operate in difficult conditions such as high humidity, high or low temperatures, high or low pressure or vibration will naturally have a higher failure rate than which is used in suitable surroundings. Not only does the complete failure of components become more frequent, but small changes equipment as a whole becoming liable to deteriorate in performance until, without a catastrophic failure, it fails to meet its specification limits.
Humidity is the factor which affects the operation of microelectronic equipment most, particularly when high temperature exists at the same time, as in tropical conditions. Moisture has two main effects: the insulable resistance of tag strips, wafers and other insulators is reduced because of the formation of surface film and the absorption of water; and water may form an electrolyte between dissimilar materials thereby producing a spurious voltage by galvanic action. There are also other effects of humidity such as corrosion of plated metals and growth of fungus. Fungus cotins high percentage of water, so that if it grows on insulation, for example, it provides a higher-resistance path for current.
The most important effects of high temperatures on components are those due to softening or melting, expansion and ageing. Many of the plastics and waxes used for protection soften or melt, and protective and lubricating grease may melt. The expansion of different metals used in assemblies can lead to strains which distort the structure, and cause cut-outs to operate prematurely. High temperature accelerates all chemical processes, and rapid ageing of components is one effect which results.
The problem of heat dissipation in microelectronic equipment is always present, and if the equipment has to work in a high temperature, this problem is made more acute. The trend towards smaller components aggravates the situation further, because the reduced surface area of moisture components makes heat dissipation more difficult. Forced air cooling can often assist in heat dissipation.
At low temperature, components suffer from the effects of contraction, hardening and freezing.
Contraction of different metals produces strains in the same way as different metals produces strains in the same way as differential expansion, but liable to crack and break. Contraction and expansion, can also affect components such as inductors and capacitors so that readjustment of critical circuits becomes necessary. Oil and grease harden and perhaps freeze, making operation of switches, controls and variable component stiff. Either lubrication has to do avoid, or special grease must to use. Some types of electrolytic capacitor freeze and some transistors will not work at low temperatures. Batteries, primary and secondary, lose power at low temperatures, although nickel-cadmium storage batteries stand cold conditions better than most other types.
SHOCK, VIBRATION AND PRESSURE:
The effects of shock and vibration are obvious: breakage, bending and weakening, both electrical and mechanical. The methods by which these effects can be reduced are good design of supporting structures or the equipment, and good design of the internal structure.
Low pressure is encountered at high altitudes in airborne or nissile equipment, and high mines and underwater equipment. The main effect of low pressure is the breakdown of air insulation at high frequencies and high voltages, although oozing of liquid or paste through the seal electrolytic capacitors and similar sealed components is also experienced.
Microelectronic equipment may be subjected to radiation when it is used near x-ray hibes, electrostatic generators, cyclotrons, and nuclear reactors; when equipment is used in nuzzle spacecraft and satellite with cosmic rays. The term radiation covers a wide range of waves and particles, including alpha, beta and gamma rays, x-rays, neutrons and neutrinos. Many of these either interact little with matter and reacts with it by attacking the atomic structure. Components irradiated by gamma rays and neutrons in particular, are therefore liable to change in chemical composition.
Sand and dust requirements have long been part of the environment specified for testing military equipment. Owing to the reduced use of components such as variable capacitors, unsealed switches and high impedance analog circuits the effect of such test is fortunately reduced. However, in real situations when humidity is combined with such contaminants, severe problems have reappeared. Correcting packaging can form the basis of the solution to these problems.
Where salt spray tests are specified, very severe corrosion can be expected in a short time. The actual circumstances will be very different in practice. Total immersion in seawater is sometimes the more realistic test. Salt-humidity protection and design techniques consist of paying close attention to covers, ventilation dusts, sealing, critical circuit isolation and the avoidance of dissimilar metal combinations. Protective coatings, such as carefully chosen conformal coatings on circuit boards, have played a major role in reducing failures induced by dirt, moisture and the salt spray tests. Desiccants can also help but are rapidly overcome if exposure is extended.
DIAGNOSTIC MEASURES IN MAINTENANCE
3.0: INVESTIGATED DIAGNOSIS
In operational system, provided that enough time has passed, failure will occur, even after a large effort has been directed toward preventing their occurrence. It is possible to design systems that will mark failure so that a mission may continue uninterrupted, but there always a need t know that a failure has occurred (detection), at all system levels, down to the individual part or line of code. Detection and isolation of failure falls into the general area known as diagnostics. System diagnostics are necessary for complex system where system repair is accomplished by replacement of a unit or an assembly depending on the maintenance concept for the particular system. Careful attention must be attached to the relationship between system hardware and software as it is only through the proper marriage of the two that efficient system diagnostic evolves.
GENERAL CHARACTERISTICS OF DIAGNOSTICS:
To be efficient, diagnostic must perform fault detection and isolation down to the smallest replaceable element. This is accomplished by a number of strategies, including start small, start big, over-lap and marginal checking. The most efficient approach will be determined y the system structure.
Start Small: This techniques starts by checking a small area of circuitry. If the first check passes, then areas of circuitory are included progressively, and in this manner, circuits found to be operating correctly are used to check other circuitory. This process is continued until all circuit that can be checked are checked.
Start Big: The reverse of start small wherein a large group of circuits is checked and when a fault is found further tests are performed to locate the fault in the smallest possible area.
Marginal: This technique is used to detect the effect of part-drift due to ageing. Generally, the changes brought by ageing gradual and may remain unnoticed until a failure occurs. To accomplish marginal checking, certain operating conditions are varied from nominal values. Two methods of changing operating conditions are (a) variation of system D.C. Voltage (b) variation of system chock frequency.
Diagnostic are generally designed to detect and isolate only three types of failure:
- Catastrophic: – This type of failure is permanently present until is repair, and for this reason is usually assisted to detect and isolate
- Intermittent: – This type of failure is not present permanently because it occurs at random intervals. It is extremely difficult to isolate because it presents an inconsistent set of system.
- Machine State: – this type of failure occurs only under certain conditions, most likely
- After certain sequence of operations
- After specific instruction followed by a delay in time or (c) at a specific clock-rate.
DESIGNING TO MEET THE REQUIREMENT:
Several levels of diagnostics may be required because operational constraints, for example, memory size, do not permit the use of only one level of diagnostics. The term Built-In-Term Equipment (BITE) is used here to define the general levels of diagnostics generally required in a complex system,
i.e.: BITE 1, BITE 2, BITE 3.
BITE 1: – This level is of the continuous monitoring type and does not interfere with the operation of the system. No test initiate signal is required but is initiated automatically when a system function is energized. Hardware of software may be involved but in any case, this level is transparent to the operator.
Example: power supply over voltage protection circuitry, background interface integrity checks.
BITE 2: – This is a partially interruption level of testing to provide a confidence check for the operator should he consider that the system is malfunctioning and not providing any BITE 1 failure indication. The operation of the system is not significantly compromised other than the man-machine interface, such as display, which may be used to exhibit test patterns that would attempt to give indications of malfunctions.
Circuitry internal to the equipment will sequence the tests, generate the stimuli, and perform and evaluate the measurements. The circuitry is capable of being commended into various test modes by means of operating controls, and a manual test initiation. Provision is made to ensure that output by the system or operator.
BITE 3: – This is totally interruption level of testing. It is used together with BITE 1 and BITE 2 to form a first line test facility. Diagnostic tapes would normally be available from magnetic tape equipment to fully exercise the system, system, making extensive use of a central processing unit (CPU) aided by Operator Controls. Table below summarizes the diagnostic level characteristics above.
|Diagnostic level||Major characteristics||Purposes|
|(a) Automatically initiated
(a) Manually initiated
(b) Partially or temporary interruptive
(a) Manually initiated
(b) Totally interruptive
The object of BITE 1 is to establish to necessary confidence that the system will operate when required to do so. Since the operator is interested in the performance at a particular time then it is important to be able to carry out the performance monitoring at the time, or immediately before, so as to obtain a reasonable assurance of completing the missior satisfactory.
When the operator is not confident that the system is functioning correctly although there has been no indication from BITE 1 that anything is wrong, he may then use BITE 2 which is partially or temporarily interruptive i.e. may temporarily interrupt processing way will destroy existing data, but which is no. way will destroy the prime equipment program, nor will it be possible for the results to be interpreted as prime data.
BITE 3 – Is essentially a maintenance function where the objective I to diagnose the faulty element so that the fault may be cleared. It may be used as an addition to performance monitoring to enable a fault to be isolated. Since fault detection normally takes place at a time devoted to maintenance, it is not necessary to maintain the normal operational performance while it is taking place although there must be no loss of performance. Usually a fairly of diagnostic program tapes will be made available from magnetic tape equipment to fully exercise the system making extensive use of the CPU and front panel controls.
A system Diagnostic scheme using the concepts discussed above is illustrated in the figure below:
APPROACHES TO ERROR RECOVERY
At the system level, assuming that adequate BITE 1 and BITE 2 hardware and software exists, it should be possible to detect the presence of system errors. This being so, the next question arising is to define the strategy in the event that a failure I announced. Several strategies are available and the appropriate one will depend upon the operational requirements and the type of system used. They are outlined as follows:
Bring the system to a complete Halt-this strategy is simple and economical because the expense and trouble of redundant hardware is avoided. However, this strategy provides no possibility for recovery except within the system Mean Time to Fall Back to a Less Efficient processing- depending where the error has occurred in the system, it may be possible to reduce processing by shutting down the affected hardware or bypassing it. This will permit the system to operate in a reduced or degraded mode. A good example of this, is assigning processing to certain memory areas in the event of failure and reducing system throughout. This is also lower as a fall-soft system arrangement.
Switch to a Standby System – its high operational performance as of extreme concern, and then a system may be completely duplicated both in hardware and software. This technique is more expensive tahtn the other strategies but is effective where a high Mean Time between Failure (MTBF) is downward. This arrangement is often called a fail-safe type system.
- ATTEMPTED DIAGNOSIS
The failure of an item is a function both of the inherent physical behaviour of the piece of hardware and of the behaviour that is required if the hardware is to supply the desired service. Equipment may fail to give the ‘service’ which is arrived at as a result of diverse physical and chemical effects i.e.:
- Random defects in manufacture
- Random effects induced by physical changes over time (e.g. :oxidation, corrosion, crystallization, evaporation, migration of constituents, etc)
- Random defects induced in used by operation and maintenance (e.g.: heat, vibration, contamination, thermal, electrical and mechanical shock, wear etc).
Essentially, the time of a “failure” cannot be predicted and a statistical approach is forced. With few exceptions there is no physical theory leading to a statistical description of the failure process, especially as in practice the events which are identified as “failure” arise from the combined effects of many separate processes in a complex structure.
Modeling of Maintenance Policies:
Selection of a maintenance policy determines the locations at which stocks of spares for replacement and repair are required. The depth of maintenance will determine the indent level of the spares. The location of the repair facilities relative to the location of the operating equipment will determine the delay and cost of transportation. In some cases, it may be advantageous to carry out a stock of spares at a location other than the repair facilities that is at a central depot warehouse. Central warehousing of spare should not be decided up unless it is shown to be advantageous by a logistic model.
|Repair operating equipment by replacement of high indent level (strokes of the higher in dent level)|
|High indent level spares|
|Low indent level spares|
Initiatively it is clear that it is simpler and faster to replace at the highest level in the hierarchy of spares. This will give a high availability of the equipment, and not require as highly skilled a technician as needed for repair at a greater depth.
However, the high indent level spares will be much, more expensive. The cost of spares will be reduced if replacement and repair can be carried out to a greater depth. Counterbalancing this, however, is the need for a more highly trained technician, more expensive test equipment, and a longer time or diagnosis and repair/replacement. If the operating equipment is dispensed at several locations (more spares). It can be seen that the analysis leading to optimum repair level is a relatively complicated matter. In many cases, a mixed policy can be shown to be: most economical overall: that is, replacement at a relatively indent level, and repair of these by replacement at a lower indent level at a second or third echelon of repair as illustrated below.
|Repair operating equipment by replacement of low indent level (strokes of the higher in dent level)|
Flows of failed flow of serviceable
|Repaired lower indent level.|
Downtime of the equipment resulting I non-availability of service result from (a) the time to replace a failed unit; and (b) the delay incurred if a serviceable spare is not available on demand from the spares stock. It is this latter factor which is controlled by the quantity of spares. If there are no spares, every failure will result in downtime of equipment for the length of time to complete the repair and return the unit. This time includes transportation delays, scheduling delays in the repair facility, as well as the active repair time. The more spares available, the shorter the wait will be, though it can accept and work immediately on any number of units then the delays are reduced to a minimum. Usually this will not be the case; there will be a scheduling delay whose magnitude will depend on the capacity of the repair facility and on the inflow rate of failed units. The rate of flow of failed units, in its turn, will depend on the number of equipment which are operating; that is, those which have not failed and are not waiting for a replacement spare.
APLICATION OF MAINTENANCE ENGINEERING
There are several reasons why a part/component or group of associated components may fail premature or malfunction in service.
- Poor component design
- Inadequate component manufacturing quality control and testing,
- Deficient purchase specifications
- Damage in shipping, handing and in test equipment
- Poor component application loading to maintainability problems.
Most microelectronic equipment manufacturers at the present time are involved in assembling large numbers of ‘bought-out’ complex by ‘in-house’ design groups. Often the ‘in-house’ manufactured elements of a system constitute less than 10 percent of the manufacturing cost of a system.
Clearly, any communication problems between the component manufacturer and the systems manufacturer will show up in the favour of application errors. In order to avoid such problems, application engineering groups are maintained by the major component suppliers. A wider variety of applications are produced by component suppliers. However, in practice, few designers have time to do any more than try to understand the component manufacturer’s data sheet. Often, detailed applications data can be obtained only from direct discussions with manufacturers’ engineers. These avenues take time to search out and the great majority of components have no supporting application data.
SUGGESTED MAINTENANCE REVIEW
4.0: DESIGN REVIEW:
Design review is a process whereby engineers, independent of the design to uncover possible faults. These reviews should occur at the following times:
- Upon completion of detailed equipment design,
- Prior to commencement of production
- Periodically during the useful life of the system.
The Conceptional Design Review:
In this type of design review, the basic idea or requirement is taken and a strategy for meeting that requirement is developed. Various feasibility studies musthave been performed on alternate technologies, different support strategies concerning maintainability. The following points should be applied on conceptional design review for maintainability:
- Requirements to disrupt the system for maintenance should be minimized, i.e.: minimize preventive maintenance;
- Modules should be used to as great n extent as possible;
- Components should be grouped functionally,
- Maintenance and supply support policies should be clearly stated and should match the operational environmental
- Self-test capabilities should be built into the system, and
- Requirements for support equipment should be minimized.
Equipment Design Review:
Clear and concise documentation should be presented which outline component reliabilities and how these components interact to produce sub-system and system reliabilities. If new technology is being used has not had extensive testing, a risk analysis should be available resulting the degree of uncertainty associated with the reliability and estimate and the effects on system reliability and cost in the event of the estimate not being valid. Particular emphasis should be placed on determining components which have a major effect on system reliability, ensuring that the reliability of such components is as cost effective as possible.
From maintenance view point, standardization is a key word. Component parts, equipment layouts, packaging and necessary support equipment should be as standardized as possible. This will help to reduce replacement spares, manpower and training costs, as well as assuring better availability of replacement part. Equipment fault isolation should be as simple as possible and require as low skill level as possible.
Adequate consideration should be taken as to the level to which repair will occur, i.e.: at what level to replace rather than repair. Another important point is access and ease of removal of repairable components. The design should be scrutinized to ensure that low reliability components requiring scheduled maintenance are readily removable and do not require extensive dismantling of other components in the system.
In-Service Design Review:
This is an analysis of the predicted reliability and support costs versus those that are being experienced. The reviewer must examine the documentation supporting the design to extract the basis for in-service estimates.
Actual usage data must then be examined to determine how the theory translated into reality. If predications were not met, an analysis should be done to determine why this happened. Besides, poor design such as uncontrollable factors as parts shortage, inflation or manpower shortages can be the cause of wide variance in predicted versus actual costs.
The maintenance training requirement is the difference between the ability required to do a job and the ability of men selected to do it.
The training requirements within an organization should be primarily influenced by three factors such as design specifications of the equipment, the philosophy being used to maintain the equipment and the skill level of the skill levels necessary to maintain the equipment properly. The diagram below is an aid to evolution of a training plan.
EVOLUTION OF A TRAINING PLAN
The Influence of Design Specifications:
The training requirements should be most directly influenced by four design features which are, the technology level employed, the complexity of the design, the provisions for interface and the incorporation of features to enhance maintainability.
The technology level employed determines many of the basic skills required by the maintainers. Different types of skills are required to troubleshoot vacuum type systems as opposed to systems using microcircuit chips.
The Influence of Maintenance Philosophy:
The maintenance philosophy of an organization will decide a number of factors including repairs levels, the repair philosophy at a particular level and what repair will be down in-house.
Repair levels are used to separate the maintenance task into discrete packages. Accompanying each package is a requirement for technicians with given sets of skills. In developing the training plan, these levels must be examined to determine the manpower required at each level and skill requirements.
Certain fault designated for repair at each level fault correction can range from a detailed analysis to find and replace an individual component, or fault isolation to a module and then replacement of the entire module. Repair to the component level will require extensive knowledge in the areas of both fault isolation and fault correction, whereas a replace philosophy will require only skills in fault isolation.
Training requirements should be recorded by giving a clear statement of what is required and why it is required.
Furthermore, the skill should be linked to a general education level, such as an engineer, a technologist or a technician. It should be stated whether a general knowledge is required or a specific knowledge of a skill or technique. A wide variety of training is cost effective, the individual deciding on training must be able to evaluate the objectives of a given course versus his requirement.
Length of course:
The length will give a good indication of the objective and background required for a course. A course lasting one day to one week is designed to give a practical knowledge of a narrowly defined subject. Courses of this type should be used to obtain specific skills. Care must be taken in assigning personnel to these courses as their short length of demands that attendees have the required background.
Courses lasting from two weeks to one month can have two objectives. The first is a general introduction to a field. Theory will generally be presented in a lecture format interspersed with a case studies or exercises to relate the theory to practice. The objective is to introduce the attendee to the subject matter. Therefore pre-requisites are less strict. The second objective for a course of such length is to give an in-depth knowledge of a new technique or system. An example would be a course for technicians to maintain a new radio set. These courses are very practical and are designed to have the attendee as an expert. The attendee at such a course must have a thorough knowledge of basic principles.
The Provision of Factors:
The facts about microelectronic equipment which are required for maintenance come under the headings: Facts concerning the electrical structure, which are information about the components and how they are connected; Fact concerning the mechanical structure , which are information about how the components are laid out and mounted; facts concerning dynamic measurements, which are the voltages and waveforms expected under normal operating conditions: and facts concerning static measurements, which are measurements taken when equipment is not working.
The Provision of Guidance:
Technicians are not expected to work out for themselves when checks on an equipment are to be made and what checks are to be made, so that guidance is usually provided. For preventive maintenance, schedules are always available. This consists of a simple series of checks which are to be carried out in order, and which require no decision concerning the next step. For corrective maintenance, guidance is required for fault detection and fault location. However, the task is not always to carry out the same predetermined services, of checks, so that guidance given is more complicated in form that given for preventive maintenance. The actual form will depend on the method of fault location which is most appropriate for the equipment, and may also depend on the fault symptom. Factual information should also be available on what to check at each point.
Provision of Information:
Having decided what facts and guidance must be provided for maintenance, the problem arises of how best to present the information. The commonest and simplest method is by means of written information in equipment handbooks.
|Paste continuous monitoring|
|Is fault indicated?|
|May continue operation in degraded mode|
|Is fault isolated?|
|Is fault isolated?|
|Switch to Bite 2|
|Is fault indicated?|
|1st time for symptom?|
|1st time for symptom
|Switch to Bite 3|
|Is fault indicated?|
|Operator’s decision 4|
|Is fault isolated?
|Assume fault alarm|
|Operator’s decision 5|
SYSTEM DIAGNOSTIC CONCEPT
- Start operation
- Upon completion of operation replace defective unit
- May abort operation. Replace defective unit
- Inadequate units in order of decreasing failure
- Replace units in order of decreasing failure rate or use external test equipment to isolate failure.
The circuit diagram is the best known written aid, and it has the widest use. Its main purpose is to provide information about the electrical structure of our equipment and it is supplied with almost all microelectronic equipment. The way in which a diagram is drawn depends upon the task for which it is required. Other forms of written aids are: table of voltage; waveforms; structural drawings; and photographs.
Aids which cannot be incorporated into a handbook is called hardware aids. The presentation of maintenance information in hardware aids may be visual, in which film or television is used; or less commonly, aural, in which a tape recorder and earphones are used. Some hardware aids use both aural and visual presentation; an example is a commercially available device which comprises a desk mounted visual unit plus tape deck and control unit. Instructions are given aurally has whole spread application and is being used for routine servicing, fault location, repair and assembly when a bareware aid does more than merely present facts at the requires of the operator, it can be considered to be automatic test equipment.
It is generally recognized in accordance with the laws of chemical and physical degradation, that increasing the electrical, thermal and mechanical stresses on electronics parts will decrease either the time to failure or the time required to accumulate a given amount of degradation. Conversely, decreasing these stresses will reduce the rate of degradation, reduce the probability of catastrophic failure, and thus improve reliability.
Derating is a method of improving the reliability of components by operating them below their normal power, so that dissipation is reduced and hot-spots are avoided. Valves, resistors, and capacitors show significant improvement when they are derated.
Encapsulation or potting of circuits and components should be done to provide protection against humidity, shock and vibration. Encapsulation is done by covering the component with a liquid plastic resin that sets hard on cooling. It is essential; however, that the potting is done carefully, otherwise a partial seal may let in water which remains trapped and cause more damage that if potting had never been carried out.
Cooling of all heat-producing components is essential to good reliability. There are many factors to be taken into account in designing the best method of cooling components, and allowing for the amounts of heat lost by conduction, convection and radiation. Components themselves pose different problems. A valve for example, has a large surface area which can be used for radiation and convection cooling, whereas a transformer generates a lot of its heat inside a mass of metal and must rely largely on conduction for heat dissipation. Encapsulation makes the problem of cooling very difficult because the plastic resin, thus, components with large heat dissipation are not suitable for potting.
The replacement of the valve by resistor is solving a lot of cooling problems.
4.3: ENVIRONMENTAL EFFECT PREVENTION ON COMPONENTS:
Humidity: Prevention of these effects humidity is carried out by choosing insulating materials which do not support a surface film of water and d not absorb it. Glass, quartz and satellite porcelain easily support a surface film; polystyrene and silicones prevent a water film from forming. On the other hand, glass does not absorb water and neither does polystyrene, many ceramics or polyethylene. Cellulose materials should be avoided in conditions of high humidity. Some plastic support of fungus, but ceramics, mica, glass, nylon and polyethylene, for example, are not subject to fungal attack.
Maintenance technicians should be aware of the effects of humidity, and be on the lookout for any possible sources of failure due to damages. Occasionally, equipment has to be used in conditions for which it is not designed, and unexpected faults can occur for this reason. Thus if unsuitable equipment is used in tropical conditions, a watch must be kept on components and materials which could absorb moisture, support a surface film of water or allow the growth of fungus.
Temperature: Some of the effects of high temperature can be overcome by using materials which will withstand the temperature to be encountered. Thus glass and ceramics are used instead of paper and certain plastics.
Equipment may incorporate special means of temperature control, such as heaters, cooling water, or thermostats, and a failure of one of these can be as disastrous as the failure of components directly concerned with the function of the equipment.
Technicians must be on the lookout for the effects of overheating, particularly during visual inspection of the equipment. Molten or distorted insulation, and charred paint in the colour coding of resistors are common symptoms which, however, are often the results of another fault, such as a short circuit, elsewhere in the equipment.
Pressure: Variable components must be kept free of dirt and irregularities, as these act as points for flash-over to occur.
CONCLUSION AND RECOMMENDATION:
From the survey maintenance requirements made so far on microelectronic equipment in Nigeria environment, we can firmly hold the facts humidity, temperature, power fluctuation, shock and vibration, lack of correct training of technicians, neglects of safety precaution in handling these equipment, inadequate aids and fault finding guides to maintenance, non-reliability improvement, has drastically affected the long-term usage by shortening the life-span of these microelectronic equipment both domestically and industrially.
To combat the above mentioned degradation in the maintenance of microelectronic equipment, the importance of correct training of technician to carry out maintenance must be stressed, safety precaution must always be observed, automatic test equipment (ATE), which automates part of maintenance task, should be taken into consideration to lease maintenance.
The provision of aids to maintenance and maintenance policy is necessary. Also factors which must be taken into account when maintenance policy is decided are operational requirements, equipment characteristic and job environment.
The effect of humidity and temperature may be minimized industrially by choosing component materials which would withstand them, as well as by drying, cooling or warming where this is possible. Most of these equipment must kept under the maintenance task.
Reliability may also be improved industrially by: good circuit design, incorporation redundancy i.e. by providing alternative paths for some or all of the equipment functions, derating i.e. operating components below their normal power, using improved constructional methods such as capsulation, printed circuit, wrapped joints and paying attention to cooling.
In consideration power fluctuation, to every microelectronic equipment, there should be an –inbuilt of connected stabilizer (converter) to stabilizer the effect of the power fluctuation.
Arsenaul, J: Apparatus and Applications-maintainability.
Gunnigham, Clair F: Maintainability Engineering New York, Wiley, 1972
D.J. Garland and F.W. Strainer: Modern Electronic maintenance principles, published by pergamon press Ltd.
Klass, P.J, “Reliability test procedure changes set. Aviation week and space technology 19th April 1976.
Marsh, R.T. Avionics Equipment Reliability, An elusive object Defence management Journal, April 1976.
Burklard, A.H. “combind Environment Reliability test”, proceedings of 1977 annual Reliability and maintainability symptoms, IEE, New York, N.Y.
Sumerlin, W.T, “High Reliability Design Techniques” Editing and Reproduction Ltd.