Machine Learning and Deep Learning Based Methods Toward Industry 4.0 Predictive Maintenance in Induction Motors: Α State of the Art Survey

: Purpose: Developments in Industry 4.0 technologies and Artificial Intelligence (AI) have enabled data-driven manufacturing. Predictive maintenance (PdM) has therefore become the prominent approach for fault detection and diagnosis (FD/D) of induction motors (IMs). The maintenance and early FD/D of IMs are critical processes, considering that they constitute the main power source in the industrial production environment. Machine learning (ML) methods have enhanced the performance and reliability of PdM. Various deep learning (DL) based FD/D methods have emerged in recent years, providing automatic feature engineering and learning and thereby alleviating drawbacks of traditional ML based methods. This paper presents a comprehensive survey of ML and DL based FD/D methods of IMs that have emerged since 2015. An overview of the main DL architectures used for this purpose is also presented. A discussion of the recent trends is given as well as future directions for research. Design/methodology/approach: A comprehensive survey has been carried out through all available publication databases using related keywords. Classification of the reviewed works has been done according to the main ML and DL techniques and algorithms Findings: DL based PdM methods have been mainly introduced and implemented for IM fault diagnosis in recent years. Novel DL FD/D methods are based on single DL techniques as well as hybrid techniques. DL methods have also been used for signal preprocessing and moreover, have been combined with traditional ML algorithms to enhance the FD/D performance in feature engineering. Publicly available datasets have been mostly used to test the performance of the developed methods, however industrial datasets should become available as well. Multi-agent system (MAS) based PdM employing ML classifiers has been explored. Several methods have investigated multiple IM faults, however, the presence of multiple faults occurring simultaneously has rarely been investigated. Originality/value: The paper presents a comprehensive review of the recent advances in PdM of IMs based on ML and DL methods that have emerged since 2015.


Introduction
Industrial production is driven by advances in Industry 4.0 technologies in order to meet the requirements of smart manufacturing systems. Due to the complexity and criticality of involved procedures, different types have been developed and implemented, namely reactive, preventive and predictive maintenance. Reactive maintenance is implemented upon the failure of the machine. In contrast, both predictive and preventive maintenance are implemented during normal machine operation. However, preventive maintenance adds cost and lost machine operational time, because it requires the machine's shut down and is done periodically according to scheduling. Furthermore it does not monitor the machine condition continuously and therefore does not build the machine's health profile. Data-driven manufacturing drives the advances in predictive maintenance as machines are being monitored continuously and analysis of the collected health condition data permits early FD/D. Smart manufacturing is central to Industry 4.0 (Frank, Dalenogare & Ayala, 2019). PdM uses the Industry 4.0 technologies cyber-physical system (CPS), Internet of Things (IoT) and big data analytics, to enable self-aware and self-maintenance machines (Lee, Kao & Yang, 2014). AI supports PdM with big sensory data analysis in order to implement FD/D and avoid production shutdowns (Tao, Qi, Liu & Kusiak, 2018). Advances in big data analytics have given strategic importance to data-driven manufacturing (Ji & Wang, 2017). Accordingly, data-driven selflearning manufacturing systems are being transformed in order to proactively make informed maintenance decisions by adapting to emergencies, predicting and preventing faults and failures, using both historical and realtime sensor data. Data-driven PdM includes the sub-processes: data acquisition and preprocessing, feature engineering, including feature extraction and selection; and model training and predicting (Zhang, Yang & Wang, 2019). PdM benefits include reduced costs, prolonged machine lifetime, higher product quality and plant safety (Lee, Lapira, Yang & Kao, 2013). However, PdM in real factories faces many challenges due to the need to integrate the various Industry 4.0 technologies. Key challenges for the implementation of PdM in real factories include the processing of a large amount of sensory data from various monitored processes such as production and logistics and model development that will permit timely machine health monitoring for robust FD/D (Ruiz-Sarmiento, Monroy, Moreno, Galindo, Bonelo & Gonzalez-Jimenez, 2020).
FD/D of IMs is a main research discipline in machinery health management (MHM) (Wang, Fu, Zhang, Gao & Zhao, 2019). AI-driven methods of IMs have enhanced the reliability and efficiency of PdM due to the development of different methods based on NN, fuzzy logic, neural-fuzzy systems, expert systems and multi-agent system (MAS), among others. ML methods have been widely used widely for machine FD/D, yet more recently, various DL methods driven by advanced in DL have been developed, in order to implement FD/D of industrial equipment, enhancing PdM reliability. DL uses long causal chains of NN layers where each layer transforms non-linearly the activation of the network into a higher more abstract representation based on the development of computational models of the real complex system (Schmidhuber, 2015;LeCun, Bengio & Hinton, 2015). DL methods have improved the state of the art in FD/D of IMs, overcoming drawbacks of traditional data-driven FD/D methods including their limited ability to learn from raw data and the dependence on experts for feature selection and extraction (Drakaki, Karnavas, Tzionas& Chasiotis, 2021). Figure 1 shows the flows of the FD/D sub-processes of ML and DL based methods for IMs . The recent developments in data-driven PdM methods of industrial equipment including electrical machines have been presented in reviews.  presented applications of both ML and DL based PdM methods for industrial equipment. ML based applications included support vector machine (SVM), Logistic Regression (LR), Decision Trees and Random Forest. DL methods included Artificial NN (ANN), deep NN and auto-encoder (AE). Zhao, Yan, Chen, Mao, Wang and Gao (2019) presented a review of DL based applications for machine health monitoring. Liu, Meng, Lu and Li (2018) reviewed AI algorithms for fault diagnosis of rotating equipment, mainly focusing on ML methods including k-nearest neighbor (k-NN), SVM, Naïve Bayes and ANN. Karnavas, Chasiotis, Drakaki and Tziafettas (2020) presented a comprehensive review of NN based FD/D methods for IMs, targeting all the major types of faults. Drakaki et al. (2021) presented a review of recent trends in PdM of IMs focusing mainly on MAS and DL based FD/D methods that have emerged in the last five years due to their potential to be implemented in a smart manufacturing system. A comprehensive review of data-driven PdM of IMs including both ML and DL based methods has not been presented recently. In order to fill the gap in presenting a comprehensive review of recent advances in AI based PdM of IMs, this paper aims to provide a review of both ML and DL based FD/D methods that have been developed since 2015. The categorization is done based on the ML or DL techniques used for PdM. Additionally, the main DL architectures applied for FD/D of IMs are overviewed. Finally, discussion of the results, emerging trends and future directions for research are given.
In the following, the methodology is presented next. In the following section, the ML based FD/D methods of IMs are presented. Next, an overview of the main DL architectures used for FD/D of IMs is presented. The DL based FD/D methods of IMs are presented next. Discussion of the results and managerial implications follows. Finally, conclusions are drawn.

Methodology
The research was carried out by surveying in the largest possible extend through all available publication databases using relative keywords and the filtering method that followed included the top ranking journals and conferences. Accordingly, based on the research experience of the authors the search was conducted on the archives of the top ranking journals of each target database and on the archives of the conference proceedings in the fields of electrical machines operation, condition monitoring, fault diagnosis, fault detection and predictive maintenance. The explored databases included mainly IEEE, Elsevier, Wiley, Springer and Taylor and Francis. The keywords used included the terms: induction motor, electric motor, fault diagnosis, fault diagnosis, motor current signature analysis, condition monitoring, motor faults, bearing fault, broken rotor bar; stator faults, inter-turn-short circuit, short-circuit fault, winding fault, machine health monitoring, health condition, pattern recognition, feature extraction, feature Selection, machine learning, deep learning, AI, artificial neural networks, convolution neural network (CNN), long short-term memory (LSTM).
Based on the initial search, 600 journal papers and conference proceedings papers were identified. Next, the authors read the abstract, main text, discussions and conclusions of each identified research items. Based on the search results, the screening process excluded papers in the following cases: the abstract was not relative; the methodology and the results of the main body did not show any significant novelty with respect to the state of the art; discussions and/or conclusions and/or findings were not found important with respect to the state-of-the art. Accordingly, the final number of papers included in this research was one hundred and nine.

Machine Learning Based FD/D Methods of IMs
Manufacturing optimization processes including monitoring, scheduling, control and diagnostics are non-polynomial complete and are associated with large data volumes of high dimensionality. ML methods have been used widely to address manufacturing challenges (Alpaydin, 2010;Wuest, Weimer, Irgens& Thoben, 2016;Drakaki & Tzionas, 2017). They are data-driven techniques that can identify complex and non-linear patterns in manufacturing data and transform sensor data to feature based models that can be used for detection, classification, regression and forecasting (Wuest et al., 2016). They have been used widely to implement PdM of machines including IMs. IMs are widely used in manufacturing facilities in order to cover the needs of AC electric energy demand. Their main advantages include high reliability and performance, robustness to emergencies, simple design and low cost.
Since both electrical and mechanical faults can occur in IMs, their FD/D is a complex decision making process that may involve the presence of multiple faults and external factors related to the experimental data and measurements (Drakaki, Karnavas, Karlis, Chasiotis & Tzionas, 2020). The main fault types include broken rotor bars, short or open circuits in stator or rotor windings, cracked end-rings, bearing faults and eccentricity faults. The preferred PdM method of IMs is Condition based maintenance (CBM) (Seera, Lim, Nahavandi & Loo, 2014). Both vibration and Motor Current Signature Analysis (MCSA) methods are used for CBM of IMs. Vibration online condition monitoring is an invasive technique, because the sensors are attached to the motor. In contrast, MCSA is a non-invasive and low cost online CBM, based on electric measurements of the stator current and post-processing by the power spectrum of the stator current. However, besides IM fault positions, the power spectrum indicates induced harmonics. Therefore, during the last decade, CBM is used in conjunction with AI methods for robust FD/D of IMs. Early research efforts in this direction involved exclusively ML algorithms (Seera et al., 2014). During the last five years, ML based FD/D methods of IMs have been widely developed and used.
Available ML algorithms that have been used effectively for FD/D of IMs include Naive Bayes, k-NN, SVM/Sequential Minimal Optimization (SVM/SMO), ANN, Repeated Incremental Pruning to Produce Error Reduction (RIPPER), Kmeans and C4.5 Decision Tree. Figure 2 shows various ML algorithms and methods, classified according to Wuest et al. (2016).

FD/D Based on MAS Employing ML Classifiers
MAS have been used effectively in implementing FMS by assigning agents to manufacturing resources which communicate and interact with each other to dynamically reconfigure the manufacturing system (Shen, Yu, Miao, Li & Chen, 2006;Drakaki & Tzionas, 2016;Wang, Wan, Zhang, Li & Zhang, 2016;Leitão et al., 2016). Autonomous, intelligent and cooperative agents in a MAS achieve flexibility, robustness, modularity by distributing the control functions including PdM between them. In a smart manufacturing environment, MAS can effectively be implemented and operate in a CPS, having access to IoT, Wireless Sensor Networks, big data and cloud computing (Leitao, Karnouskos, Ribeiro, Lee, Strasser & Colombo, 2016;Wang et al., 2016;Bonci, Longhi, Lorenzoni & Pirani, 2020). Drakaki et al. (2020) presented a MAS based PdM methodology using intelligent agents trained with ML classifiers to achieve robust FD/D for broken rotor bars of SCIM. Current signal preprocessing was done by MCSA and FFT for feature set extraction. The MAS employed intelligent agents representing different rotor health conditions, including a healthy rotor, and rotors with one, two or three broken bars. Each agent acquired intelligence for local decision making corresponding to its health state by employing FANN. A supervisor agent made the final fault assessment after receiving the fault decisions by the individual machine agents. In case of conflict due to the fault diagnosis of the lower level agents, the supervisor agent employed a k-NN classifier. The presented FD/D methodology, shown in Figure 3, achieved high accuracy during experimental validation. Drakaki, Karnavas, Chasiotis and Tzionas (2018) presented a hybrid FD/D method, based on MCSA and an intelligent MAS, to identify different faults in broken rotor bars. Each health state, i.e. healthy rotor, rotor with one, two or three rotor bars was represented by an agent trained by either FANN as well as an agent remained with a k-NN classifier. A supervisor agent having a global view of the system made the final fault assessment. Palácios, da Silva, Goedtel and Godoy (2016a) introduced a MAS using intelligent classifiers for FD/D of bearing and broken rotor bar faults and short circuits between the coils of the stator winding. Data preprocessing was done with time discretization. The MAS consisted of a coordinator agent, a non-defective motor agent, a rotor agent, a stator agent and a bearing agent. The coordinator agent checked whether the IM was working under normal operating conditions. The agent communicated with the non-defective motor agent and in case that the above condition was found false, each of the other agents would make a fault assessment. The final fault diagnosis was achieved by mutual agreement between the rotor agent, stator agent and bearing agent. The agents used different ML classifiers including an ANN, an SVM and a k-NN. Palácios, da Silva, Goedtel, Godoy and Lopes (2017) used a MAS employing a MLP classifier for FD/D of stator short-circuit faults. Data used as input to the MAS was current signals discretised in the time domain.    Bessam, Menacer, Boumehraz and Cherif (2015) presented a method for detection and location of the inter turn short circuit fault in the stator windings in induction machine at non stationary state. The discrete wavelet energy (DWE) was used as the input for a feedforward multilayer perceptron (MLP) trained by back propagation. A low mean squared error (MSE) was set for training. An ANN based model for detection and identification of bearing faults using time domain features was presented by Wu, Chen, Jiang, Ning and Jiang (2015). The results showed that four different methods (Original Signal Analysis, Filtered Signal Analysis, WPT and Envelope Analysis) achieved a high classification rate. The proposed models led to an accuracy of about 100% for training data and over 90% for test data, while a low computational burden was achieved. A methodology for the diagnosis of stator winding turn fault in DTC induction motor drive was proposed by Refaat, Abu-Rub and Iqbal (2015). It was realized through an indicator for detection and diagnosis of stator turn faults using ANN. It was shown that the proposed strategy could improve the discrete cosine transform (DCT) performance without any modification in the drive configuration. A predictability analysis method that provided patterns based on measures of relative entropy, Bhattacharyya distance, and Lempel-Ziv complexity was presented by Schmitt, Scalassara, Goedtel and Endo (2015). The method estimated over reconstructed signals obtained from wavelet packet decomposition components. The signals under study were collected from motors with faults in the inner or outer races.

FD/D Based on NN
A method based on feed forward MLP trained by back propagation was presented by . It was based on monitoring negative sequence voltage and the three-phase shift between the line current and phase voltage to detect inter-turn short circuit fault (ITSC) fault and unbalanced supply voltage. The data that was required for training and testing the NN was generated using simulated model including stator fault with different unbalanced supply voltage and ITSC faults. The mathematical modelling of IM and prediction of broken rotor bar fault, at different load conditions, by ANN technique using simulation was proposed by Patel and Bhalja (2015). An approach focused first on the application of ANNs and then fuzzy logic systems to reduce significantly the effect of load variations on fault detection procedure was presented by Lashkari, Azgomi, Poshtan and Poshtan (2015). The proposed ANN methodology could detect and locate ITSC fault, while the fuzzy approach was capable of detecting and diagnosing the severity of ITSC fault. A study of real time industrial drive vibration signal data for broken bearing detection using probabilistic wavelet NN was presented by Jayakumar and Thangavel (2015) in order to increase the fault detection rate and to handle larger power demand. An ANN-based FD/D approach for stator winding turn faults was presented by  based on analysis of the (RTHF-FFT) magnitude component of the three-phase stator line current. A methodology based on the identification of the PQ disturbance using discrete wavelet transform and subsequent classification by ANN was proposed by Sridhar, Uma-Rao and Jade (2015). The classification accuracy reached 97%. Sridhar, Uma-Rao and Sukruth. (2015) presented an FD/D method for broken rotor bars in IMs. The method combined stator current processing using wavelet and fault classification using ANNs. The classification accuracy was 93.75%. Lin and Tan (2015) presented an intelligent controlled three-phase squirrel-cage induction generator (SCIG) system. The system was used for grid-connected power application. A probabilistic fuzzy neural network (PFNN) using back propagation learning algorithm was used as the tracking controller for the DC-link voltage of the DC/AC power inverter.
A methodology in which vibration signals were decomposed by WPT was presented by Patel and Giri (2016). Features were extracted from the dominant wavelet coefficient for three conditions of bearing, namely normal, defect on inner race and defect on outer race. The features obtained from raw and dominated wavelet coefficient vibration signals were used as inputs to the ANN classifier to evaluate its performance. The obtained results showed that the ANN performance was improved considerably when data was chosen of higher BDI vibration signals as compared to the raw vibration signals. An approach focused first on the application of ANNs and then fuzzy logic systems to reduce significantly the effect of load variations on fault detection procedure was presented by Lashkari, Azgomi, Poshtan and Poshtan (2016). The proposed ANN methodology could detect and locate ITSC fault, while the fuzzy approach was capable of detecting and diagnosing the severity of ITSC fault. Vieira, Cunha, Medeiros and Silva (2016) designed several MLP classifiers in order to detect stator winding inter-turn short circuit of a tri-phase IM. The results were used to evaluate the influence of operational conditions of the system consisting of converter, motor and load on the accuracy. Several hypotheses were identified based on the characteristics of the data set. Vieira, Medeiros and Silva (2016) proposed a method that was based on a neural classifier and a data acquisition process embedded in a fault detection system for IMs driven by frequency converter. Experiments, the changes in the original dataset resulted in a reduction of 96% and the diagnosis time also decreased by 60%, compared to the previous work. Kaviarasan TamilSelvan and Venugopal (2016) presented bearing fault analysis of 3-phase SCIM by using an AI method, under load and no load. Wolkiewicz and Kowalski (2016) combined the symmetrical components method with NN for detection of winding faults in IMs. An intelligent FD/D method that used a multi-objective feature selection using non-dominated sorting genetic algorithm II was presented by Arjmand-M and Sargolzaei (2016). The wavelet packet decomposition was used in order to improve the signal-to-noise ratio. An MLP was used for fault diagnosis of bearing faults. An FD/D method based on NN and features extraction for a three-phase inverter was proposed by Asghar, Talha and Kim (2016). Several features extracted from the Clarke transformed output current were used in NN for FD/D. Results showed that the proposed method could effectively identify multiple faults. An FD/D method for stator short-circuit faults in an inverter-fed IM was presented by Godoy, da Silva, Goedtel, Palácios, Bazan and Morinigo-Sotelo. (2016a). The input to an MLP was the amplitude of the stator current signal, in the time domain. After a discretization of the current signal, the PCA was used to reduce the complexity of the classifier. An accuracy level exceeding 80.35% was achieved. A method for FD/D of rotor faults was proposed by Marmouch, Aroui and Koubaa. (2016). Radial basis function (RBF) and probabilistic neural network (PNN) were used to complement MCSA. RBF achieved better accuracy compared to PNN in different conditions and the achieved accuracy ranged between 93.333% and 98.889% for those conditions. Rama-Devi, Siva-Sarma and Ramana-Rao (2016) presented a FD/D method based on modular NNs to classify different disturbances such as supply unbalance and stator inter-turn faults. The features were extracted from three phase residues obtained from wavelet multiresolution analysis. Different modular NNs were developed and outperformed ANN, in the context of classification accuracy. Palácios, Goedtel, Godoy and Fabri (2016b) presented a FD/D method for stator faults in IMs. The method was based on the discretization of the current signal, in the time domain by using a variable optimization technique of PCA and MLP and RBF. The classification accuracy was above 84% for the MLP and above 83% for the RBF network. An FD/D technique for broken rotor bars in IMs was proposed by Bensaoucha, Ameur, Bessedik and Seghiour (2017). The envelope analysis of the stator currents was calculated based on the Hilbert Transform. Then FFT was applied, followed by a NN classifier for final fault diagnosis. A RBF NN and PNN based method with dimensionality diminution was proposed by Marmouch, Aroui & Koubaa (2017) for online MCSA. The PCA was used for selecting the appropriate training parameters. Wang, Xie Zheng and Hang (2017a) presented an analysis of backpropagation and FFT algorithms and the results were compared. The superiority of the backpropagation NN was shown in terms of calculation. The simulation results verified the feasibility of the backpropagation NN detecting accuracy close to FFT. ANNs were used for FD/D of bearing faults in three-phase IMs connected directly to the power grid (Gongora, Goedtel, da Silva and Graciola., 2016). The method was tested experimentally. Ghosh, Ahmed, Mollaeian, Tjong and Kar (2016) presented a model that used ANN and lumped parameter thermal network (LPTN) for parameter estimation and calculation of losses in drives. An FD/D method for broken rotor bars based on the TSA method and a NN was proposed by Khiam, Ouassaid and Ngote (2017). The IM health conditions studied included a healthy induction motor and a motor with one, two and three broken bar fault. The feedforward multilayer NN was the optimal choice. The classification accuracy of motor faults was higher than 95%. A FANN was used for fault detection of stator winding fault in IM (Rajamany and Srinivasan, 2017). The main input to the FANN was the sum of the absolute values of difference in the peak values of phase currents from each half cycle. Dmitry, Maxim and Dmitry (2017) investigated the impact of the NN structure and presented an analysis of the choice of the structure of NN based on the number of input variables and the amount of the experimental part for predicting AC motor's faults. Bazan, Scalassara, Endo, Goedtel,  proposed an FD/D method for stator windings short circuit faults based on measures of mutual information between the phase current signals. Two different ANNs were used during testing. The results showed classification accuracy over 93%.  proposed a data driven method where firstly the most significant features of the three phase current signal were analyzed and then a Curvilinear Component based analysis (CCA), which is a nonlinear manifold learning technique, was implemented to compress and interpret the feature behavior. Then, an MLP network was used to develop a classifier. The effectiveness of the developed model was verified experimentally with data provided on-line and in real-time. Kumar, Cirrincione, Cirrincione, Tortella, and Andriollo (2018) proposed a fault diagnosis and classification scheme for induction machines, by using MCSA together with NNs. With 2 classes the accuracy for fault detection was 98.4%, while the accuracy of 4 classes was 97.5%. A method for bearing FD/D in IMs was presented by Navasari, Asfani and Negara (2018). It was based on an initial current analysis using discrete wavelet transformation and an ANN. The damage was detected if the energy value obtained from the wavelet transformation on the damage data was greater than the energy value under normal circumstances. Results showed success of 100% at the inner race damage, 98% at the outer-race damage and damage by 100% at the ball bearing.
An FD/D method based on feature extraction and classification using intelligent systems was presented by Bazan, Scalassara, Endo, Goedtel, Palácios & Godoy (2018). Information from delayed stator current signals were used as inputs to C4.5 decision trees and subsequently to MLPs. Various offline and online experimental tests were implemented to test voltage unbalance, load torque variations, and 1% to 10% short circuit levels. The minimum classification accuracy was 95% for short circuit levels equal or greater than 3%.
An NN based method was presented by Bensaoucha, Ameur, Bessedik and Moati (2018) to diagnose short circuit faults between turns of the same coil in SCIM. The results showed the efficiency and accuracy of the NNs for FD/D. Shrivastava (2019) presented an FD/D for bearing faults of AM. A laboratory setup was developed and results were based on the analysis of AM generated vibration signals using time domain analysis (TDA) and feed forward NN. The general fault classification rate in AM was 97.45%. A multiple FD/D method using ANN was presented by Jigyasu, Mathew and Sharma (2018). The current and vibration responses of healthy motor, motor with bearing, rotor and stator defects were analyzed. The feature extraction process was implemented in time domain only. Results showed that among various transfer functions in ANN, the trainlm transfer function performed best and the traingdm performed worst for FD/D. A method developed to diagnose broken rotor bar faults in IMs by Hajiaghasi, Rafiee, Salemnia, Aghamohammadi and Soleymaniaghdam (2019). The flux density from motors with broken rotor bars was analyzed by finite element method (FEM). The amplitudes of the stator current signal in the time-frequency domain were used to extract coefficients that were subsequently used as inputs to a PNN. Leite (2019) proposed an evolutionary NN (EANN) and an evolving fuzzy granular NN (EGNN) for FD/D of inter-turns short-circuit in the stator windings of IMs. A classification accuracy of 96.28% was achieved by EGNN using product T prod and maximum S max fuzzy neurons. The EANN achieved classification accuracy of 94.67% using arithmetic crossover, global and local random mutation, and tournament selection. Bayar, Terzi and Ozgonenel (2019) presented a model that simulated the effects of asymmetric external failures that could not easily be implemented in a laboratory or in the field. The obtained data was used as input to a PCA-ANN jointly with the steady state operation data. The model could identify the type of failure with minor deviations such as 0.25% for phase-ground failure, 1% for phase-phase failure and 0.8% for phase-phase-ground failure. An FD/D method based on radial basis function NN (RBFNN) was presented by Raj and Balaji (2020). Data preprocessing was implemented with Gabor filter and segmentation was done with the DCT-DOST transformation. The proposed method achieved classification accuracy of 98.3% and sensitivity of 98%. A method based on Discrete Cosine transform (DCT) for analyzing speed and DL PNN was presented for diagnosis of bearing faults (Salih & Loganathan, 2020). The achieved classification accuracy was 95%, higher than the accuracy achieved with the SVM and ANN classifiers.

FD/D Based on Other ML Techniques (Besides ANN)
Skowron, Wolkiewicz, Orlowska-Kowalska and Kowalski (2019) used the ISCA analysis of stator currents along with the Kohonen self-organizing network for FD/D of the converter-fed IM. The method achieved an accuracy of 93-95% for rotor faults, 74% for stator faults, 95% for no fault, and 72% for mixed faults. Palácios, da Silva, Goedtel and Godoy (2015) investigated FD/D approaches for IMs based on intelligent classifiers, namely, Naive Bayes, k-NN, SVM/SMO, MLP, Repeated Incremental Pruning to Produce Error Reduction (RIPPER), and C4.5 Decision Tree. The achieved accuracy for fault classification was larger than 90%. Specifically, for the classification of 1 hp motors with stator faults, the k-NN and MLP FD/D methods showed accuracies of 100% for the classification of stator faults; 99.7% for the classification of broken rotor bars and over 92.5% for the classification of multiple faults, respectively. The best classification accuracy for bearings faults, 99.9%, was achieved by the k-NN FD/D method. The k-NN method showed overall accuracy of 92.3% and 93.4%. The k-NN method showed overall accuracy of 92.3% and 93.4% for the classification of stator faults in 2 hp. Godoy, da Silva, Lopes, Goedtel, Palácios and Lopes (2015) presented four different pattern recognition approaches (FAM, SVM, k-NN and MLP) in order to classify broken rotor bar defects in inverter-fed IMs. The severity of each type of fault was evaluated by testing situations of one to four broken bars. The SVM/SMO, k-NN and MLP methods are capable of providing fast and accurate diagnosis and classification (over 95% accuracy rate) for predicting broken rotor bar defects. For multi-classification, considering the set composed of all inverters, the overall accuracy was above 90% for classifying healthy motors. Godoy Palácios, da Silva, Goedtel and da Silva (2016b) used Fuzzy Artmap (FAM), SVM/SMO, k-NN and MLP for the FD/D of bearing faults. MLP and k-NN methods could quickly and accurately identify bearing faults in IMs driven by different models of frequency inverters.  presented two FD/D methods to identify stator faults of IMs, by analyzing current signals in the time domain. The methods were based on a MAS with classifier behavior and a NN. The faults were related to short circuits between turns in the stator coil of 1% to 10%. Palácios, Godoy, Goedtel, da Silva, Morinigo-Sotelo and Duque-Perez (2017) used different types of intelligent classifiers in order to diagnose IM faults: ANN/MLP, k-NN and SVM/SMO. The investigated faults were related to stator short-circuit, broken rotor bars and bearing defects The MLP and k-NN classifiers showed accuracy above 89%. Various ML techniques for rotor FD/D were evaluated and compared by Martin-Diaz, Morinigo-Sotelo, Duque-Perez and Romero-Troncoso (2018). The fault types included half broken rotor bar and one broken rotor bar. Results showed that Naive Bayes and Bagging were the best classifiers and k-NN, MLP and SVM (RBF) were the worst classifiers. Godoy, da Silva, Goedtel, Palácios, Scalassara., Morinigo-Sotelo and Duque-Perez (2018) compared different ML methods in the context of the classification accuracy. The methods included Fuzzy ARTMAP network, SVM, k-NN and MLP. A comparison was made of an analogue and a digital filter to the current signal provided by the IM. MLP, k-NN and SVM methods provided fast and accurate diagnosis for broken rotor bar defect classification. Patel and Giri (2018) provided a hybrid feature pool using calculated features from time, frequency, and time-frequency domain. Fortysix features in total were included in the feature pool and the optimal features were selected using PCA and minimum Redundancy Maximum Relevance (mRMR) from the feature pool. The representative features were input to the RBF NN, SVM and RF classifiers. The results showed that the RF classifier achieved an accuracy of 99.93% against the 98.125% from RBF NN and 99.189% from SVM model for four class problem and for ten class problem a 99.81% classification accuracy was achieved by using the RF classifier and SVM classifier against the 99.43% accuracy achieved by the RBFNN classifier.

Deep Learning -A Brief Overview
DL advancements enable new approaches in the field of fault diagnosis, due to the ability of deep neural structures to learn invariant features (LeCun et al., 2015;Lee, Jo & Hwang, 2017). In this section we present a brief theoretical background of some recent deep architectures in the field of machine health monitoring such as CNN, RNN and AE architectures. These architectures have brought significant innovation for fault diagnosis systems (either as standalone or integrated multi-model structures (Shao, McAleer, Yan & Baldi, 2018). It is important to mention that data-driven DL approaches are proven to achieve state of the art performance in fault diagnosis because of their capability to transform complex information fusion problems into supervised multi-label classification tasks (Hoang & Kang, 2019;Wang et al., 2019).

Convolutional Neural Networks (CNNs)
A typical CNN architecture consists of alternating and stacking convolution and pooling layers followed by fully connected layers. Convolutional layers use trainable filters called kernels that propagate the input data with a continued or windowed sliding technique. The output of this process is a series of generated feature maps that quantify the stimuli produced by each corresponding filter based on its activations (Sun, Zhao, Yan, Shao & Chen, 2017). Therefore, each filter produces a unique view of the input data through the prism of the filter. The number and the type of filters used in this step are key factors to the performance of the network because different perspectives of translation-invariant features can have a major impact on the feature extraction of following layers. Each convolution layer in the network provides a level of abstraction in the feature extraction process . Each convolutional layer is followed by a pooling layer. These pooling layers are responsible for grouping the feature maps generated by the convolutional layer into a single value as well as identifying the most important features extracted by each filter regardless of the positional importance of the feature on other filters. Grouping the feature maps leads to dimensionality reduction which is crucial to the continuity of operations on the data in the network. The pooling operation can be described as a desegmentation of the subsets of data generated by the convolutional layer (Xiao, Huang, Qin, Liu, Li, & Liu, 2019). Succeeding the convolutional and pooling layers, a typical CNN architecture involves a flatten layer which is a fully connected layer, identical to the layers of an MLP. This layer converts the multidimensional features to one dimensional data. Finally, a softmax layer is used as a probabilistic classifier. Figure 4 shows a CNN architecture.

Recurrent Neural Networks (RNNs)
RNNs are networks that perform the same computation for every element in a set of sequential input data. Even though their typical architecture is simple, consisting of an input layer, a hidden layer and an output layer, they are difficult to train and subject to a lot of fine-tuning. RNNs use recurrent loops where the internal state of the cells is fed as input on the next time step (Shipmon, Gurevitch, Piselli & Edwards, 2017). Consequently, RNNs can handle sequential data by utilizing their structure similar to the form of short memory. In essence, each prediction is affected by the provided input data as well as the history of the output states of the network. Furthermore, it should be pointed out that based on the above, each prediction affects future predictions. Common problems that occur when training RNNs are those of vanishing or exploding gradients. These problems happen when the slope (or gradient) of the loss function, with respect to the weights of the network, is too small or growing exponentially during backpropagation. Some methods to tackle this problem include gradient clipping (Goodfellow, Bengio & Courville, 2016), weight regularization using kernel regularizers (Pascanu, Mikolov & Bengio, 2013), and the use of Long Short-Term Memory units (LSTMs). LSTMs are a form of RNNs with different internal cell architecture that are capable to deal with the vanishing gradient problem. More specifically, the LSTM has explicit memory cells, called Constant Error Carousels (CECs) that determine whether to overwrite, retrieve or save information in each time step (Zaremba, Sutskever & Vinyals, 2015). LSTM networks perform well in motor fault diagnosis because they are able to identify and evaluate the temporal evolution of a fault, whereas other neural network architectures ignore the sequential nature of the problem . Figure 5 shows an RNN. Figure 5. The RNN unit consists of one hidden layer and the output. The LSTM unit comprises a more complex architecture in order to determine which data will be used in each time step

Auto-Encoders (AEs)
The AE is a particular type of neural network that compresses the input data into a low-dimensional code and attempts to reconstruct the data from this latent-space representation. Specifically, AE training aims to converge to the identity function, so that the output X is similar to input X. Traditionally, AE architecture consists of three core parts. The encoder, the decoder and the code. The encoder and the decoder are fully-connected feedforward neural networks, similar to MLPs. The input data passes through the encoder to produce the code, which is a low-dimensional nonlinear representation of the data. The decoder utilises the code to reproduce the input data as output. The training procedure minimises the reconstruction error by learning to represent the most salient features of the training data in the code (Goodfellow et al., 2016). It is important to mention that hyperparameters such as the number of layers of encoder and decoder along with the code size significantly affect the performance of the AE. Figure 6 shows an AE. The benefit of using AE in machine health monitoring is twofold. First, the vibration signal data size is often too big to process effectively, therefore a meaningful compression that retains the most important information may provide considerable performance improvements in following steps of fault diagnosis, such as in a CNN (Shao et al., 2018). Second, recent studies suggest using the decoder structure of the AE as a feature extraction operation in order to ensure only the meaningful information is kept, which increases the quality of monitoring data. This variant of AE is called denoising AE (DA) .

Deep Learning Based FD/D Methods of IMs
Supervised learning methods have been classified as shallow-based and deep-based ones (Afrasiabi, Afrasiabi, Parang & Mohammadi., 2019). Shallow based methods do not have the potential to extract features from the raw data and therefore, are combined with manual feature extraction methods such as Fast Fourier Transform (FFT) and wavelet transformation. DL methods have been introduced lately, in order to increase the reliability and the efficiency of the FD/D process, focusing on learning the high level features directly from raw data. DL aims to eliminate the shortcomings of the traditional manual feature extraction methods, by using nonlinear transformations through many NN layers to identify the relations of the features to the raw data and learn the features of the underlying model. Deep learning achieves learning in exhibiting the desired system behavior using long causal chains of computational layers where each layer transforms, either linearly or non-linearly, the activation of the network (Schmidhuber, 2015 (Schmidhuber, 2015). In industrial IoT, data-driven machine FD/D of production systems can benefit from the computing power of the DL based methods to implement the analytics for the big data coming from multiple sensors that monitor the machine health condition . Shallow based FD/D methods are processed sequentially in stages. A range of available techniques can be used to implement the preprocessing of the current signals including FFT, wavelet transformation and time discretization. Manual feature extraction and selection follows in order to provide the training and test data sets to be used for learning the FD/D of IM. In contrast, DL methods circumvent the drawbacks of the conventional FD/D methods by training the classifiers directly from raw data. As a result, the manual process of feature extraction and selection is not part of the methodology, avoiding human introduced errors. Furthermore, big sensory data collected for PdM of IMs in industrial IoT environments have to be processed in real time and provide instant FD/D, rendering the use of DL approaches necessary to maximize the system performance. The critical characteristic of the DL FD/D methods is the development of an end-to-end learning system, which learns the intrinsic system features directly from raw data. Traditional and DL based FD/D methods are shown in Figure 7.

FD/D Based on CNN
Ince, Kiranyaz, Eren, Askar and Gabbouj (2016) proposed an FD/D method for IMs based on 1D CNNs achieving the fusion of the feature extraction and classification phases of the motor faults. The method was applied to the raw data with a fault detection accuracy higher than 97%. An CNN based FD/D method for IMs was proposed by Wang, Zhao, Wu, Xie and Zhang (2017b). The vibration signals from various IMs were preprocessed using short time Fourier transform (STFT) to obtain the corresponding time-frequency map. A CNN was used then to extract the feature of the time-frequency map. A test set accuracy of up to 100% was achieved. Kao, Wang, Lai and Perng (2018) presented two FD/D methods. The first method used a wavelet packet transform (WPT) and the second one used a deep 1-D CNN with a softmax layer. Results showed a classification accuracy equal to 98.8% achieved with the 1D CNN and 98.1% for the WPT. Chattopadhyay, Saha, Delpha and Sil (2018) investigated the benefits of DL and presented FD/D results for motor faults using novel semi 2D CNNs. The experimental results showed improved performance by 3-10% compared to conventional feature engineering based methods. It was found computationally superior compared to 2D CNN. An online CNN based FD/D method was presented for fault identification (Pandarakone, Masuko Mizuno & Nakamura, 2018). The FFT was implemented using the stator load current, followed by the feature extraction of certain frequency components used subsequently to train the CNN. Shao, Jiang, Zhang & Liang (2017) presented an FD/D method for mechanical fault diagnosis using deep transfer learning. Initially the raw data were converted to images by using a Wavelet transformation to obtain timefrequency distributions. Then, a pre-trained network was used to extract lower level features. The time-frequency images were used for better adjustment of higher level CNN layers. The method showed state of the art results on each dataset, with test accuracies for some datasets close to 100%, and in the gearbox dataset, improvements were from 94.8% to 99.64%. Lal-Senanayaka, Van Khang & Robbersmyr (2018) presented a deep DL based FD/D method to identify faults in the electric powertrains. A CNN was used to identify both single and multiple faults. Hoang and Kang (2019) presented an FD/D method that used raw current signals from multiple phases as direct input to a CNN. Feature sets obtained from each phase were classified separately by the CNN. An information fusion technique was introduced to fuse information from all of the developed CNNs. An FD/D method to identify normal condition, rotor and bearing faults in IMs based on CNNs was presented by Lee, Pack and Lee (2019). Vibration signal data was used as input to the CNN. The achieved classification accuracy was 98%, 98%, and 100% for the IMs under normal, rotor fault, and bearing fault conditions. A CNN based method to diagnose faults in IMs taking into account the motor speed (Han, Choi, Hong & Kim, 2019). The vibration signal was used as input to a CNN. Afrasiabi et al. (2019) presented a deep NN based FD/D method for the identification of bearing faults in IMs. A method that accelerated and compressed CNN was developed. The proposed method achieved high accuracy compared with conventional CNN and SVM, ANN and learning vector quantization (LVQ). The achieved classification ball accuracy was 96.13%, the inner race accuracy was 96.8% and the outer race accuracy was 97.2%. A framework for FD/D of IMs based on transfer learning was proposed by Xiao et al. (2019). A modified TrAdaBoost algorithm and CNNs were used on the small amount of target data. Different IMs and fault types were tested and the proposed method achieved higher accuracy compared to other classifiers including traditional 1D CNN, k-NN, SVM, multilayer perceptron (MLP), transformer component analysis (TCA) and original TrAdaBoost. The authors claimed that the method provides superior efficiency to different sizes of training and test data sets. They argued that the assumption that training and test data sets show the same distribution is not valid in real industrial environment with continuously changing operating conditions. The method showed state of the art results on each dataset, with test accuracies for some datasets close to 100%, and in the gearbox dataset, improvements were from 94.8% to 99.64%.
An FD/D method for bearing and rotor fault detection in squirrel cage IM (SCIM) was proposed using an adaptive gradient optimizer based on deep CNN (Kumar & Hati, 2020). The method achieved an average accuracy of 99.70%. An FD/D method based on MCSA and novel 2D CNN avoiding manual feature extraction was proposed (Azamfar, Singh, Bravo-Imaz & Lee, 2020). Skowron (2020) investigated the possibility of using deep NNs to detect stator and rotor faults. CNNs were used to identify stator shorted turns and broken rotor bars using an axial flux signal. An FD/D approach based on a CNN model with small kernel size, an adaptive gradient optimizer and batch normalization was proposed by Kumar and Hati (2021). The CNN model with higher number of computational layers achieved reasonable classification accuracy for different health states of SCIM. The efficiency for the Case Western Reserve University (CWRU) dataset was higher than 99.50%. An approach for fault diagnosis that is applied directly on raw vibration signals avoiding conventional feature engineering was presented, based on dilated CNN (D-CNN) (Khan, Kim & Choo, 2018). A method based on CNN for diagnosis of bearing faults on embedded devices using acoustic emission signals was proposed by Pham, Kim and Kim (2020). The method achieved classification accuracy up to 99.58% with less computation overhead compared to other DL based FD/D methods. A PdM model using CNN (PdM-CNN), was proposed for fault classification of rotating equipment and guidance on timing of maintenance actions (Souza, Nascimento, Miranda, Silva & Lepikson, 2020). Data was collected from a vibration sensor mounted on the motor-drive end bearing. The achieved classification accuracies were equal to 99.58% and 97.3%, when the method was applied to the MaFaulDa and CWRU publicly available databases respectively. A non-invasive thermal image-based method for bearing FD/D in rotating machines was proposed using both ANN and CNN (Choudhary, Mian & Fatima, 2021). Comparison results between the shallow and deep learning approaches showed that the CNN based on the LeNet-5 structure achieved better classification accuracy than the ANN, equal to 99.80%. A 1D CNN based intelligent FD/D method was proposed for identification of rolling element bearing faults (Shashank, Prasad, Srinivasa, Adarsh & Raj, 2021). In contrast to conventional CNN architectures, the proposed model did not use a fully connected layer. The method achieved classification accuracy of 99.47%. Chen, Liu, Yang, Wu and Ye (2021) proposed a 1D CNN to improve the accuracy of the diagnosis of rolling bearing faults (Chen et al., 2021). The number of convolution kernels decreased with the reduction of the convolution kernel size. The method introduced the dropout operation improving the generalizing ability. The experimental results showed an average classification accuracy of 99.2% under a single load and 98.83% under different loads. A CNN based method, FaultNet, for FD/D of bearing fault was presented by Magar, Ghule, Li, Zhao and Farimani (2021). The achieved classification accuracies of approximately 99% were achieved from the CWRU and Paderborn University datasets. Xiao, Huang, Zhang, Shi, Liu and Li (2018) presented a method based on LSTM to identify faults of the threephase asynchronous motor. The performance of the proposed method was compared to the performance of other classification methods such as LR, SVM, MLP, and basic RNN. The achieved accuracy was 98.28. An FD/D method based on LSTM and using the rotating sound of the motor in the no-load test was presented by Nakamura, Asano, Usuda and Mizuno (2021).

FD/D Based on AE
Sun, Shao, Zhao, Yan, Zhang and Chen (2016) presented an NN based on AE for FD/D of IM faults. Mao, He, Li and Yan (2016) proposed a variant of AE encoder to classify bearing faults. Jia, Lei, Guo, Lin and Xing (2018) used a local connection network on a normalized SAE, called NSAE-LCN to overcome disadvantages of traditional AEs for mechanical signal feature extraction. Sun, Yan and Wen (2018) presented an intelligent FD/D method for rotating machinery focused on bearing faults, based on compressed sensing and DL. Compressed sensing based on the random Gaussian matrix captured the information hidden in the compressed data and a deep NN based on stacked sparse AEs was used to implement the fault classification. Mao, Feng, Liu, Zhang and Liang (2021) proposed fusion of discriminant information and structural information among different fault conditions in a deep an AE model in order to improve the representative ability of features. Chen and Li (2017) proposed data fusion from multiple sensors using an SAE and a DBN for bearing fault identification and diagnosis. A semi-supervised DL algorithm, called semi-supervised deep learning network, aiming to industrial environments was adapted for gearbox fault classification in an FD/D scheme (Razavi-Far, Hallaji, Farajzadeh-Zanjani, Saif, Hedayati-Kia, Henao et al., 2018). The developed FD/D framework included an information fusion and a decision making part and could identify multiple faults and simultaneous faults. The proposed method was found superior compared to other methods. Torabi, Sundaram and Toliyat (2017) presented a method to identify inverter faults in less than one millisecond without using additional hardware by using Discriminant Analysis and SVM. An adaptive self-recurrent wavelet NN was used as a nonlinear system identifier to estimate a nonlinear model to generate appropriate fault symptoms based on the gate signals and actual motor currents. A fault diagnosis model was presented by Liu et al. (2018) for a three-phase asynchronous motor based on SAE NN. An SVM was used for classification and the achieved accuracy was 100%. Shao et al. (2018) proposed a bearing fault diagnosis method based on convolutional deep belief network (CDBN). Vibration signal data was compressed initially using an AE. Then, feature learning was implemented using the developed CDBN. Gaussian visible units were used to design convolutional RBM (CRBM) and the exponential moving average method was employed to improve the overall performance. Jalayer, Orsenigo and Vercellis (2020) performed a sensitivity analysis on the input channels to evaluate the efficiency of the proposed multi-domain feature set in different DL architectures. The convolutional LSTM (CLSTM) showed superior performance. A review of the state of the art models was presented, and the performance of the proposed method was compared with the performance of twelve algorithms. Results showed that the proposed model achieved 100 % of accuracy with shorter inputs compared to the other models.

FD/D Based on Hybrid DL Techniques
Α multi-level information fusion method for FD/D of IM has been introduced, termed Multi-Resolution and Multi-Sensor Fusion Network, that includes multimodal feature learning, LSTM for temporal coding of signals from multiple sources and multi-resolution information extraction . Raw current and vibration signals were measured online and joint representation and temporal encoding of raw data streams was implemented using both CNN and LSTM. Plakias and Boutalis (2020) presented the Attentive Dense Convolutional Neural Network (ADCNN), using Dense Convolutional blocks with an attention mechanism. Based on simulation results, the proposed method had fewer unknown learning parameters and achieved accurate results with fewer unknown learning parameters showing an accuracy of 99.51%. This work was enhanced further and two later works (Karnavas, Plakias & Chasiotis, 2021a;Karnavas, Plakias & Chasiotis, 2021b) presented results with an accuracy of 99.57% and 99.6% respectively. An FD/D method based on a Convolutional LSTM RNN (CRNN) was proposed in order to diagnose bearing faults in short time (Khorram, Khalooei & Rezghi, 2020). The proposed approach achieved the highest classification accuracy in the literature equal to 97.13% for the IMS and 99.77% for the CWRU bearing datasets respectively. Viola, Chen and Wang (2020) the FaultFace method for detection of different types of failure on Ball-Bearing joints based on deep convolutional generative adversarial network (DCGAN) to obtain a balanced dataset and CNN networks for fault detection. The method used a 2D representation of a signal called FacePortrait, using time-frequency representations. A novel FD/D method based on dual-path RNN with a wide first kernel and deep CNN pathway (RNN-WDCNN) that used as input raw temporal signals such as vibration signals was proposed, in order to diagnose rolling element bearing faults from electromechanical drive systems (Shenfield & Howarth, 2020). The method could classify input sequences faster than conventional FFT based methods. A fault diagnosis method based on a combination of hierarchical symbolic analysis (HSA) and particle swarm optimization (PSO) with a convolutional neural network (PSO-CNN) named HPC model was proposed by Saravanakumar, Krishnaraj, Venkatraman, Sivakumar, Prasanna & Shankar (2020). The proposed model achieved a maximum classification accuracy of 98.97% and 99.09% under two different datasets. An FD/D hybrid DL based model using CNN and gcForest for bearing faults identification was proposed by Xu, Li, Wang, Li, Sarkodie-Gyan and Feng (2020). The raw bearing vibration signals were converted into time frequency images using the CWT.
Results showed that the proposed model achieved high classification accuracy for the bearing faults with the fault detection rate more than 98% for datasets of different sizes. A method based on thermal images aimed for FD/D of IMs (Khanjani & Ezoji, 2020). Initially thermograms using SIFT-based key-points matching were used to identify region of interest. The images were transformed into feature vectors based on a pre-trained CNN. Kmeans was then used to cluster the training vector samples into cold and hot clusters. An SVM-based classifier was trained for each cluster. The method achieved 100% fault classification accuracy. Zhang, Cong, Yuan, Zhang and Bai (2021) proposed an early FD/D of rolling bearing faults approach based on multiscale CNN and gated recurrent unit network with attention mechanism (MCNN-AGRU). The training data used normal data. A wise local response CNN based Naïve Bayes algorithm (WCNN-NB) was proposed for multiple fault diagnosis in rotating machines (Aljemely, Xuan, Xu, Jawad, & Al-Azzawi, 2021). The results showed classification accuracies of 99.68%, 92.5% and 97.5% for three data sets with tolerable misclassification rates under the investigated operational conditions. A deep metric learning based on Yu norm was proposed for FD/D of the faults of the rolling bearings that could measure the similarities and differences between data samples and reduced intraclass scatter and interclass similarity (Xu, Li, Lin, Wang & Peng, 2021). A back propagation NN was used to integrate feature extraction and classification in the proposed FD/D scheme. The results showed that the presented method achieved superior performance to the deep metric learning models based on Euclidean distance, DBN and SVM. The achieved classification accuracy was 95.72%. An improved AOctConv (Attention Octave Convolution) structure was presented for FD/D and applied to the ResNet50 backbone network (AOC-ResNet50) (Xiao, Liu, Zhang & Zhang, 2021). The results showed classification accuracy up to 98.0% using the AOC-ResNet50 network, higher than the achieved accuracy using the ResNet50 and Oct-ResNet50 networks. An improved DBN was proposed for FD/D of rolling bearings faults (Hu, Tang, Wu & Liu 2021). The forward training part employed an RBM to learn the hidden features of vibration signal data. The result showed that the presented method achieved higher classification accuracy of 92.15% compared to standard DBN and CNN, avoiding the manual feature extraction. An FD/D method for bearing fault identification in IMs was proposed using the image classification transformer (ICT) adapted to work as an image classifier trained in a supervised manner (Alexakos, Karnavas, Drakaki, Tziafettas, 2021). The short time Fourier transform (STFT) was proposed for pre-processing in order to acquire time-frequency representation vibration images from raw data in variable healthy and faulty conditions. The ICT, inspired from the transformers used for natural language processing, was also proposed as an alternative approach to CNNs. The proposed method achieved a classification accuracy of 98.3%.

FD/D Based on Other DL Techniques
Deep NNs were used to identify five classes of gearbox faults applied to three common monitoring signals (Heydarzadeh, Kia, Nourani, Henao & Capolino, 2016). The initial features used as inputs to the deep NNs were extracted from discrete wavelet transform. The method was validated using vibration, acoustic and torque measurements. The corresponding accuracy for each measurement was 97.31%, 93.24% and 95.31% respectively. PdM based on a MAS that employs intelligent agents trained with ML classifiers has recently emerged as a promising FD/D approach for IMs. Agents in a MAS employing ML classifiers, such as MLP or k-NN, achieve distributed control of the FD/D tasks of various fault types of IMS and provide global system health diagnostics. DL FD/D methods have shown high efficiency and classification accuracy, exceeding 99% in many cases. They have increased the reliability and robustness of traditional ML based FD/D methods, introducing an end-to-end learning system by implementing FD/D directly from raw data, thus avoiding manual feature selection and extraction. The majority of the proposed DL based methods have been tested by using the CWRU dataset, however, other datasets such as the IMS dataset, as well as domain specific datasets and laboratory ones have been also used. The performance of the developed DL based FD/D methods was shown in terms of one or more from the following performance indicators: classification accuracy, computational cost reduction, reduced number of epochs, robustness to the training data size, robustness to over-fitting, data usage from shifted domains, and data usage obtained under noisy conditions. Based on the surveyed studies, CNN based FD/D methods as well as hybrid methods have been mostly developed. Other DL based methods involve RNN as well as LSTM, as well as AE. CNN based FD/D methods consume significant computational resources compared to traditional time and frequency based signal processing methods such as WPT (Pham et al., 2020). Therefore, in order to circumvent the disadvantages of CNN architectures when they are applied to limited resources, signal processing and pruning approaches can be applied (Pham et al., 2020). Pre-processing followed by DL methods have been proposed more recently (such as Pham et al., 2020;Aljemely et al., 2021;Alexakos et al., 2021) in order to increase the classification accuracy and improve overall FD/D performance. Differences between laboratory datasets and real industrial ones have been highlighted in research works for future research as well as the need for more available industrial datasets.

Discussion and Managerial Implications
Based on the survey results, future research should focus on increasing the performance of DL based FD/D methods of IMs mainly in terms of computational resource consumption. Accordingly, future research should involve the development of pre-processing and pruning approaches in order to circumvent the limitations of CNN architectures. Future research should focus on the development of FD/D methods of multiple IM faults. Moreover, focus should be placed on producing more public datasets, including industrial as well as laboratory ones. Furthermore, future research should focus on the integration of DL based methods in a MAS based PdM system. In a CPS environment a MAS based FD/D decision support system empowered by DL methods could assist in the development of self-aware and self-maintenance machines (Drakaki et al., 2021). In real production environments under the Industry 4.0, big sensory data are transmitted over the network and transferred to the cloud where DL can assist in big data analytics and provide feedback to the agents in a MAS to implement online monitoring and FD/D tasks. Additionally, for each fault type, the merits and limitations of the developed ML and DL based FD/D methods could be thoroughly investigated in order to better guide future research.

Conclusions
The paper has presented a state of the art survey of the ML and DL based PdM methods that have been developed since 2015. Recent trends have shown that DL based methods have dominated the research field in recent years, mostly enabled by advances in data-driven and smart manufacturing. MAS based methods employing ML classifiers have also been developed. MAS technology is an enabler technology for the implementation of smart manufacturing, CPS and therefore PdM in an Industry 4.0 environment. Moreover, DL has shown the potential to mine information from big sensory data. DL methods have mostly been developed using CNN architecture and its variants and to a lesser degree using RNN and its variants. Hybrid DL based methods have been introduced including methods combining ML and DL, as well as methods combining signal pre-processing with DL methods. The need for domain independent learning methods as well as more available public datasets including industrial ones has been highlighted in several research studies. The advantages of DL based methods including the end-to-end learning process with automatic feature engineering have been shown by comparison of the developed methods with traditional ML methods and the corresponding performance improvement shown withe DL methods. However, computational complexity and increased resource consumption, as well as limited in the availability of suitable datasets remain challenges in the PdM/FD/D research field. A more detailed analysis of the pros and cons of each PdM method for FD/D of different type of IM faults as well as the investigation of the applicability of the methods for industrial applications could be explored in future research. MAS based PdM leveraging advantages of DL methods could also be explored for its potential for industrial applications. Article's contents are provided on an Attribution-Non Commercial 4.0 Creative commons International License. Readers are allowed to copy, distribute and communicate article's contents, provided the author's and Journal of Industrial Engineering and Management's names are included. It must not be used for commercial purposes. To see the complete license contents, please visit https://creativecommons.org/licenses/by-nc/4.0/.