Analysis of Critical Machine Reliability in Manufacturing Cells

Purpose: In an increasingly competitive business environment, machine reliability problem merits special attention in operations of manufacturing cells. This is mainly due to flow line nature of the cellular layout, interdependency of downstream and upstream of machines related to each other. This study investigates the effect of critical machine reliability improvement on production capacity and throughput time in manufacturing cells. Design/methodology/approach: A discrete-event simulation model was developed to investigate the effectiveness of a reliability plan focusing on the most critical production machines in improving the performance level as an alternative to increasing the reliability of all machines. Four machine criticality policies are examined in the simulation experiments. Findings: The results of this experimental study indicated that an improvement of reliability of a limited number of machines leads to an increase in overall production capacity and speed in cellular manufacturing operations. A reliability plan, that focuses on a set of critical machines, potentially offers a more economical alternative to increasing the reliability of all machines in such facility. Research limitations/implications: The results demonstrate that to achieve higher production capacity and shorter throughput times, managers should consider directing more resources to increase the reliability of critical machines, particularly, those with shorter mean time to failure and higher utilization. Limitations of the study include the exclusion of cost of improving machine reliability and maintenance resources; and the cost of production losses due to machine breakdown. Originality/value: The designed simulation model is unique in representing the dynamics of a real world manufacturing cell environment by encoding operational functions such as machine failure, maintenance resource allocation, material flow, job sequencing and scheduling. A new machine availability metric is defined as well.


Introduction
With the advent of global competition and advancements in technology, the reliability of production facility, and predictability in available resources have become critical in meeting the market demand.One such production facility is manufacturing cell also known as cellular manufacturing system (CM) in which autonomous production cells, referred to as machine cells, are built using a group of dedicated dissimilar machines arranged in a series layout.In manufacturing firms, the cellular arrangement provides a solution for processing high-variety product mixed in small batches, as small as one part, which leads to an efficient one-piece flow production.However, the independent nature of a machine cell and its dedication to producing a few part families make the machine reliability more critical in CM compared to other types manufacturing systems.Specifically, in cellular configuration, when a machine is down for scheduled maintenance or an unexpected repair, the work-in-process stalls until the machine returns to operational status.In such event, there is no alternative machine within the same cell to process a part.This can be particularly disruptive to the flow line and serial nature of process sequences in manufacturing cells.
A number of simulation studies indicate that the performance of a cellular system is more seriously affected by the deterioration of machine reliability than the performance of other manufacturing systems.Seifoddini and Djassemi (1996) suggested that machine reliability should be considered more carefully when operating a CM system due to a dedication of machines to machine cells.Das, Lashkari and Sengupta (2007a) presented a flexible process routing approach, which minimized the impact of machine failure.The authors proposed a CM design solution that consisted of assigning machines to cells, and selecting each part of the process route with highest overall system reliability.In a study by Diallo, Pierreval and Quilliot (2001), the design of manufacturing cells in presence of unreliable machines was discussed.The study captured the different states of the system resulting from the availability or unavailability of unreliable machines to build efficient cell configurations when disturbances occur.The study presented by Seifoddini and Djassemi (2001) concluded that the impact of machine breakdown was not limited to lower production rate, but it was also interrupted by the scheduling and productivity of the entire manufacturing system.Das, Lashkari and Sengupta (2007b) pointed an importance of machine reliability in CM systems, where parts were processed on several machines in a serial fashion, causing a highly sensitive system reliability when a machine broke down or underwent maintenance actions.The authors proposed a reliability-based mathematical model using group preventive maintenance approach.Elleuch, Masmoudi and Maalej (2008) believed that impact of disruptive events, such as machine failure on the performance of CM and a solution were based on the notion of intercellular transfer for improving machine availability.Ameli, Arkat and Barzinpour (2008) proposed a multi-objective integer programming approach for machine cell formation problem and explored the effect of machine reliability in selecting alternative process routing for improving cell performance.According to Alhourani (2016), most studies related to manufacturing cell design assume that all machines are 100% reliable, which is not realistic in manufacturing systems.The author offered a methodology that incorporated machine reliability and alternative process routings in designing the manufacturing cells.Das and Abdul-Kader (2011) presented a mathematical model for dynamic changes in part demand and machine reliability.The model considered alternative processing routes for part types, and evaluated the machine reliability along those routes to maximize the overall system reliability in a design of manufacturing cell.In a study presented by Madu and Kuei (1992), a simulation model was discussed for maximizing the steady-state availability of machines in a system with the purpose of reducing the down time cost.Furthermore, the reliability analysis of flexible manufacturing systems, addressed by Kannan and Gosh (1996), argue that CM systems are faced with the problem of uneven loads between cells, leading to high variation in machine utilization.In such uneven machine utilization environment, the machine reliability, particularly those with higher utilization, becomes critical in preventing longer queues in a shop.Finally, Chang, Ni, Bandyyopadhyay, Biller Xiao and Chang (2007) investigated the tradeoffs between maintenance personnel staffing levels and throughput of a production line.Based on simulation results, the authors concluded that the impact of delay in dispatching maintenance staff varies considerably between bottleneck/critical and nonbottleneck stations.
In this paper, experimental simulation modeling data for the reliability of manufacturing cells have been analyzed to compare four different machine criticality policices.In addition to using traditional throughput time, a new practical performance measure, machine availability, is defined to evaluate the performance of manufactruing cells under different levels of machine reliability.

Problem Statement
Very little research has been done to investigate the reliability of critical machines in celular system.The performance of such system depends on the reliability of manufacturing operations within each cell.The development and maintenance of an effective reliability improvement program should be the highest priority in the operation of manufacturing cells.To maintain a high system reliability level at the lowest possible cost, a sound and deliberate reliability improvement plan are necessary.The core of such plan is the determination of critical machines whose reliability is most crucial to the operation of the cellular system.The works of Seifoddini and Djassemi (1996) and Flynn (1989) are among a few studies that have addressed the issue of machine reliability and criticality analysis in CM.The simulation study by Seifoddini and Djassemi suggest that a reliability improvement plan, involving all machines in a CM system, generates a better performance than a reliability improvement plan that involves a few critical machines.However, the differences between the performance of the CM system in the two cases is relatively small.In Flynn's study, the findings suggest that the way the critical machines are defined does not appear to be a major factor in changing the performance of CM.In spite of such finding, the potential cost saving, by increasing the reliability of a limited number of machines as opposed to increasing the reliability of all machines, merits further investigation.Logically, the critical machines must be the primary focus of any reliability improvement plan in a manufacturing system.Some of the most widely used criticality policies for identifying these machines include (Flynn, 1989;Holtsclaw & Uzsoy, 1996;Aytug, Kempf & Uzsoy, 2002;Zimmermann and Monch, 2006): • selection of K machines with the highest repair time • selection of K machines with the longest queue length • selection of K machines with the shortest mean time to failure One criticality policy that has not been used in previous studies is the machine utilization level.Since the machines with high utilization are normally more susceptible to breakdown, assigning these machines as critical machines for the purpose of reliability improvement will be more likely to have a positive impact on the performance of the CM system.In the next section, a methodology for comparing the performance of CM system under four criticality policies and at different system reliability levels is discussed.It is expected that the findings of this study determine the effect of the criticality of each policy on the selection of the critical machines and consequently on the performance of the CM system.Furthermore, the study investigates if there is any performance difference between the criticality policies.

Methodology
Simulation modeling has been effectively used in numerous studies to deal with the complexity of manufacturing systems (Neghaban & Smith, 2014;Djassemi, 2005).In this study, simulation modeling is used to investigate how the manufacturing cell performs under certain criticality policies and machine reliability levels.These policies identify a subset of machines with the: a) the shortest mean time between failures, b) longest queue length, c) highest utilization level, and d) longest mean repair time.
Under these policies, the performance of a celular system, in terms of machine availability and mean throughput time, is estimated at different machine reliability levels, and compared with a scenario when all machines treated with the same level of reliability.The procedure for this comparative study can be summarized as follows.
1. Run a pilot simulation experiment to determine K machines based on the aforementioned criticality policies a to d. 2. Select a starting machine reliability level based on mean time between machine failure (MTBF) such that no bottleneck machine is created.3. Run simulation experiemnts for criticality policies a to d to determine the performance of manufacturing cell under study in terms of mean machine availability and mean throughput time.4. Repeat steps 3 using a step-up machine reliability level.

Compare the outcome of alternative machine criticality policies under different reliability levels
To draw a statistical conclusion on the effectiveness of the criticality policies, the following test of hypotheses were conducted: H o1 : There is no significant gain by increasing the reliability of a subset of the machines identified as critical machines versus increasing the reliability of all machines.
H o2 : The choice of a particular criticality policy makes no significant difference in overall performance of a cellular system.
A paired-t confidence interval test, which is known as an appropriate method for comparing alternative system configurations (Law & Kelton, 1999), is employed to test the hypotheses at 95% confidence level.It is of interest to investigate which of the four machine criticality policies would benefit a CM system under various levels of machine reliability.If the null hypothesis H o1 is rejected for one or more policies, then it can be concluded that the implementation of a reliability plan that focuses on a limited number of machines would be advantageous compared to increasing the reliability of all machines.If the null hypothesis H o2 hypothesis is rejected, it can be concluded that there is a performance difference between the criticality policies.As previously mentioned, this issue had been investigated in a research carried out by Flynn (1989) whose finding suggested that the way the criticality was defined did not appear to be a major factor in changing the performance.
Two performance measures are considered in this study to evaluate the performance of cellular manufacturing.The first measure; mean throughput time has been commonly used in studies pertaining to manufacturing systems.It is recorded by the simulation model as the average times spent by all parts in the system.The second measure, machine availability rate which is developed for the purpose of this study.It combines mean time between the repair time and maintenance staff 's availability.The latter factor provide greater accuracy in performance data generated by the simulation model by starting a repair action whenever a maintenance technician is available as opposed to assuming a repair begins immediately after the failure of a machine occurs.This subject has been discussed in a case study conducted by Mosley, Teyner and Uzsoy (1998) in which the effect of several maintenance staffing and scheduling policies investigated.
It is assumed that the mean time to failure follows an exponential probability distribution.The exponential distribution is special case of Weibull distribution when the failure rate is constant.We are assuming that the machines in the shop under study are at their normal life period with relatively constant failure rate.This assumption has been made in two machine reliability studies conducted by Ameli et al. (2008) and Alhourani (2016) as well.
The failure rate (λ) for a machine is determined based on the reliability value assigned to the machine over the time horizon, t as follows.Let, the probability density function for failure be defined as: Then, the reliability function is: (2) If the reliability of a machine over time t is assumed to be RT, then for the failure rate λ, we have (5) The mean time between failure, MTBF is determined as: Taking into account the mean time to repair (MTTR) and mean waiting time for maintenance technician (MWMT) time a practical measure for determining machine availability can be defined as: (7 where MMR is mean machine availability rate.

Experimental Framework
A simulation model representing a cellular manufacturing shop is developed for the following tasks (Figure 1): (1) To generate demand, assign parts to machine cells, schedule the operations of part-families within each machine cell (

Data Set
An exploratory case study, consisting of 26 machines and 15 part types with the purpose of comparing the criticality policies, has been modeled.The machine part matrix for this shop environment is depicted in Figure 2. A snapshot of the animation depicting a partial view of the shop layout is shown in Figure 3.As it is the case in real manufacturing environments, production data may well fluctuate.To reflect such reality, a probabilistic modeling approach for handling uncertainty has been employed in this study with a demand for parts exponentially distributed with mean inter-arrival time of 40 minutes per order.Each order is composed of a batch of parts with quantities vary uniformly between 5 and 10 parts.Only data in steady state condition are considered in estimating the true value of the performance measures.Based on the examination of plotted data, it is determined that the system can reach a steady state condition after a transient period of 6 months.The data collected over that period are discarded.The simulation is run for 100 days, 16 hours a day beyond transient period and replicated for 24 cycles, about 8 simulated years.

Machine Criticality Heuristic
Based on pilot simulation runs, a subset of machines is identified as critical machines.Potentially, there are significant trade-offs by increasing the size of critical machine pool, and it is expected the production capacity and speed increase as the pool size increases but at the expense cost of higher machine maintenance.In this study, the deciding factor for determining the size of a pool was based on a threshold above which the overall performance improved in a tangible manner as the machine reliability increased.For the case under study, it was determined that a subset of 20% of the machines, approximately equals to 5 machines, were able to meet the minimum threshold for the purpose of reliability improvement.The lists of these machine along with corresponding criticality policies are shown in Table 1.Since the criticality policies 2 and 4 generate the same subset of machines, only one of the two policies, MACH MTBF is included in the analysis of alternatives.

Criticality Policy Critical Machines
All

Setup and Maintenance Tasks
The processing times of manufacturing operations are randomly assigned using uniform distribution with a range of 5 to 20 minutes.The setup time for a part is a function of a machine type and the similarity of the incoming part to its predecessor part of the machine.The following coefficients are used to take into account this dependency: (1) 1.0:When the parts from two different part-families are loaded sequentially.
(2) 0.5: When two parts from the same part-families are loaded sequentially.
As a measure of machine availability, mean time between failure (MTBF) has been generated randomly using exponential distribution which is a special case of Weibull distribution where the failure rate is constant.We are assuming that machines in the shop under study are at their normal life period with relatively constant failure rate.
This assumption has been made in two machine reliability studies conducted by Ameli et al., (2008) and Alhourani (2016) as well.Based on pilot simulation runs, the MTBF for all machines is set at a level to allow a production flow within the five machine cells without creating a major bottleneck or queue saturation (Table 2).This level is used as a baseline for comparing three alternative reliability levels as follows: a) Level 1: 10% increase in reliability for selected critical machines b) Level 2: 20% increase in reliability for selected critical machines c) Level 3: 30% increase in reliability for selected critical machines It should be noted that, to increase the machine reliability level as outlined above, we assumed that common practices for increasing the reliability, such as assigning a dedicated maintenance staff; more rigorous preventive and proactive maintenance management; and machine condition monitoring, could be applied in this shop.
Mean time to repair a machine is uniformly distributed between 60 and 250 minutes.To emulate a real manufacturing environment, the model dispatches a maintenance technician upon a machine breakdown.A repair service begins as soon as a technician is available.

The Workforce
Two types of workforce assignments were incorporated in the simulation model: machine operators and maintenance technicians.The machine operators are trained to manually load/unload parts; set up tools and fixtures; and operate multiple machines within a cell.Typically, for every two machines, one operator is assigned to a cell.No inter-cellular operator assignment was allowed in the model.
Two pools of maintenance technician were modeled.The first pool included one technician dedicated to critical machines.The second pool included three technicians who provided maintenance and repair services for the remaining machines.Under MACH ALL policy, same level of maintenance attention is applied to all machines by dispatching any of the available four technicians for a service call.

Analysis of Results
Raw data results for mean machine availability and mean throughput time are provided in Table 3.To address the research questions, pair t-tests were applied to raw data to make a statistical comparison of alternatives.

Mean Machine Availability
A graphical comparison of results for mean machine availability is presented in Figure 4.It is noticeable that all three criticality policies improve the machine availability compared to MACH ALL policy between 5 to 9 percent, with MACH MTBF policy (k machine for longest MTBF) tends to yield the best performance (Figure 4a).In term of machine reliability, the results also demonstrate an increased reliability of the critical machines from 10% to 30%, which improve the overall machine availability between 3 to 5% (Figure 4b).
An important observation from data plot in Figure 4c is under MACH ALL policy, when all machines were operated with the same increase in reliability level, the MACH MTBF policy still yielded the best performance.This seems to be the opposite of what one may expect.That is, when the reliability of all machines is increased, the overall machine availability is expected to be higher compared to the situation when only a subset of machines is maintained under similar reliability level.In this study, this can be explained by the fact that in the shop modeled, under any critical machine policy, a maintenance technician is allocated to service critical machines.While in MACH ALL policy, four technicians support the entire shop and all machines including the critical ones receive the same repair and maintenance attention.With such maintenance management scheme, under any critical machine policy, the critical machines are given more attention, as a result, the impact on overall machine availability is more tangible than when all machines receive equal attention.Under this condition, while the MACH ALL policy is outperformed by the MACH MTBF policy, but it outperforms the MACH QUE and MACH UTIL policies.It should be noted that any advantage of MACH ALL policy comes with the higher cost of maintaining a higher reliability of all machines versus lower cost of maintaining the reliability of a limited number of critical machines.
Statistically, on the basis of paired t-test data, and in terms of mean machine availability performance, the null hypothesis H o1 was rejected at 0.05 level of significance for MACH MTBF and MACH UTIL policies and was accepted for MACH QUE .As pairwise confidence interval data and corresponding p values are shown in Table 4.
Figure 5 shows the graphical simulation results for comparison of criticality policies in terms of mean throughput time.When the reliability level is increased for only critical machines, the data plot shows that all three policies improve the throughput time compared to MACH ALL policy.The mean throughput times for the three policies are between 91 to 92 minutes and near 97 minutes for MACH ALL policy (Figure 5a).The data plot also suggests that on average across the three criticalities policies, when the reliability of critical machines is improved by 30%, the mean throughput time reduces from 96 to near 87 minutes or about 9 percent (Fig. 5b).However, when the reliability level is increased for all machines, the overall machine availability in MACH ALL would be close to the three criticality policies (Figure 5c).
Statistically, on the basis of paired t-test data the null hypothesis H o1 was rejected at 0.05 level of significance for some cases.The pairwise confidence interval comparison data in Table 5 show the three criticality policies improves the mean throughput time, but not all of them are statistically significant.The significant improvement can be seen by MACH MTBF policy under all three reliability levels, and by MACH UTIL policy at 20 and 30 percent.The only exception is when the reliability of K critical machines with the highest utilization (MACH UTIL ) is raised to 10%, the shop underperforms slightly compared to MACH ALL policy.This result suggests that for machine utilization to be an effective policy, the reliability of selected machine must be increased by at least 20%.The null hypothesis H o1 can not be rejected for MACH QUE policy.The null hypothesis H o2 was rejected for all criticality policies except for MACH UTIL policy vs. MACH QUE policy at 20 and 30 percent reliability-increased levels, though the gap in throughput times of the two policies was very small.

Discussion and Conclusions
In this paper, simulation modeling was used to investigate the effectiveness of four machine criticality policies on the performance of manufacturing cells.Two important observations can be made with respect to the results.First, based on a new machine availability performance metric, the results suggest, in general, the application of any machine criticality policies can significantly increase overall machine availability.The findings for mean throughput time was somewhat mixed in the sense that, average throughput time over all reliability increase levels was the same under MACH MTBF and MACH UTIL policies while it was somewhat higher under MACH QUE and significantly higher under MACH ALL policy, meaning there is no advantage to increase reliability of all machines.
Second, all in all, the selection of a particular criticality policy makes a difference in the outcome.This showed a discrepancy with findings of Flynn (1989), which suggested there was no performance difference between criticality policies.Our results suggested that the MACH MTBF policy (k machines with shortest MTBF) led to significantly superior performance, compared with other two policies.This discrepancy can be explained by differences in the design of the simulation models used in two studies.First, in Flynn's study, intercellular part routing was allowed.Such interaction can reduce the degree of criticality of some machines at the cost of higher intercellular material handling and loss a simple production scheduling the independent machine cells can offer.Second, in our study, the labor resource, particularly the maintenance technicians were coded in the model.This reflects real manufacturing operational environments and allows focusing on the maintenance of resources on the most critical machines.Third, the machine utilization level has been used as a criticality policy.This policy is highly relevant to reliability analysis of a manufacturing system because the higher a machine utilization, the frequency of machine breakdown is expected to be higher.It was worthwhile to note that, though MACH UTIL policy came second in terms of both performances, focusing the reliability efforts on highly utilized machines could reduce the load imbalance inherent in machine dedication in manufacturing cells.Finally, the improvement of reliability of any set of critical machines by 10 to 30 percent, results in approximately 3 to 5 percent increase in overall machine availability and 5 to 9 percent reduction in mean throughput time.These findings suggest that the critical machines scheme's advantage is an improvement in overall production capacity and speed.
Given the limited research in reliability impact of critical machines in cellular manufacturing, we hope the findings of this paper represent an innovative step toward more experimental studies in similar applications.More studies are needed to incorporate important cost factors, such as the costs of machine reliability improvement and maintenance resources; and the cost of production losses due to machine breakdown to determine a sound reliability improvement plan for manufacturing cells.

Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
2) Assign setup times based on similarity of parts (3) Determine MTBF based on statistical distribution (4) Call operator for loading/unloading parts, setting up the tools and fixtures (5) Call/dispatch technician upon machine breakdown (6) Collect statistics The simulation modeling is based on Flexsim simulation software designed particularly for analysis of manufacturing operations.

Figure 1 .
Figure 1.Simulation model flow chart

Figure 2 .
Figure 2. The Machine-part matrix used in simulation modeling

Figure 4 .
Figure 4. Mean machine availability (%) for a) all machines vs. criticality policies, b) all reliability levels, c) all machines with increased reliability

Table 1 .
Criticality policies and corresponding critical machines

Table 2 .
Reliability levels applied in the simulation model

Table 3 .
Summary of simulated performance data