A Machine Learning Approach to Enable Bulk Orders of Critical Spare‐Parts in the Shipping Industry

Purpose: The main purpose of this paper is to propose a methodological approach and a decision support tool, based on prescriptive analytics, to enable bulk ordering of spare parts for shipping companies operating fleets of vessels. The developed tool utilises Machine Learning (ML) and operations research algorithms, to forecast and optimize bulk spare parts orders needed to cover planned maintenance requirements on an annual basis and optimize the company’s purchasing decisions. Design/methodology/approach: The proposed approach consists of three discrete methodological steps, each one supported by a decision support tool based on clustering and Machine Learning (ML) algorithms. In the first step, clustering is applied in order to identify high interest items. Next, a forecasting tool is developed for estimating the expected needs of the fleet and to test whether the needed quantity is influenced by the source of purchase. Finally, the selected items are cost-effectively allocated to a group of vendors. The performance of the tool is assessed by running a simulation of a bulk order process on a mixed fleet totaling 75 vessels. Findings: The overall findings and approach are quite promising Indicatively, shifting demand planning focus to critical spares, via clustering, can reduce administrative workload. Furthermore, the proposed forecasting approach results in a Mean Absolute Percentage Error of 10% for specific components, with a potential for further reduction, as data availability increases. Finally, the cost optimizer can prescribe spare part acquisition scenarios that yield a 9% overall cost reduction over the span of two years. Originality/value: By adopting the proposed approach, shipping companies have the potential to produce meaningful results ranging from soft benefits, such as the rationalization of the workload of the purchasing department and its third party collaborators to hard, quantitative benefits, such as reducing the cost of the bulk ordering process, directly affecting a company’s bottom line.


Introduction
The maintenance of the machinery onboard a vessel is a critical task, since any engine failure results in delays and down-times in the voyage of a vessel, which translates into additional cost and penalties (Kian, Bektaş & Ouelhadj, 2019). Therefore, ship management companies establish full proof and robust planned maintenance frameworks and systems, whilst giving high prioritization to the planned maintenance of their vessels and undertaking cross departmental projects to ensure the timely delivery of high-quality spare parts, with as low a cost to the ownership as possible. Depending on the age of the vessel and the type of machinery, those needs may vary but the overhaul needs in parts and services when accumulated for the whole fleet may amount to a considerable expense for the company. At the same time, costs of maintenance and administration increase in a diminishing proportion as the size of a ship increases (Lun, Lai & Cheng, 2010), thus deeming the fleet mix an important parameter, further complexing the design, plan and execution of the maintenance strategy.
In the research presented in this paper, the authors focus on the planned maintenance needs of a fleet of ships and more specifically on sixteen (16) components commonly found when addressing the technical needs of the vessels, e.g. connecting rods, pumps etc. The process of simultaneously assessing spare part needs, and consequent purchasing, for a fleet of ships is commonly known as a bulk ordering process. Our research simulated the process of bulk ordering and the subsequent planned maintenance needs of a ship management company operating approximately 75 vessels. The bulk ordering process is far from a trivial task, since the number of distinct items ordered each year is considerably high-amounting to several thousand different spare parts-and the delivery locations are not constant and are subject to the vessels' movements. Furthermore, the suppliers that can provide the necessary parts in the necessary volumes for overhauling processes are few and are mostly concentrated in two geographic regions, i.e. Europe and Asia.
In that context, the main challenges that shipping companies encounter are uncertainty, volume and administrative workload. Uncertainty is inherent in the process since the demand for ship spares possesses an erratic nature as it can arise at any time (Jiang, Kong & Liu, 2011). The need for ordering in high volumes is also essential, in order for the ordering process to achieve economies of scale. In that sense, bulk orders ideally would refer to more than thirty (30) and sometimes reaching up to ninety (90) vessels with a span of several thousand unique items every year, making it very time-consuming to negotiate with the implicated suppliers and conclude the selection process. Finally, administrative workload is undoubtedly high due to the large number of interconnected parties and stakeholders participating in the process, which makes it very unwieldy and slow-moving. It is essential that the administrative costs do not increase unreasonably as a proportion of the value of the purchased items (Huiskonen, 2001). Empirical research results indicate that even in cases with very mature planned maintenance systems installed, significant additional FTE (Full Time Equivalent) effort is needed for the smooth completion of the process.
The above challenges have triggered the case company, presented in this study, to scour for ways to optimally address them and unlock further value of the bulk order process. It has to be noted that current practice in the shipping industry is to leverage experience and simple analytics to determine the optimum quantities, timing and allocation of suppliers among the procured items. The absence of a sophisticated and well-structured process drives the total timing of this project to almost nine months (this project is undertaken every year) with the involvement of multiple departments. The case company looked to ML due to its current strong standing and high maturity profile in deploying advanced analytics to increase effectiveness and boost efficiency in supply chain areas such as general consumables forecasting, crew scheduling and strategic network design. This gave rise to the topic of this research paper which will aim to address the aforementioned key challenges by taming a very sizeable and overly complex dataset, providing ways to extract useful information and insights from historical data, facilitating the ability to forecast the needs of the fleet, reducing administrative workload and support the decision-making process by generating indicative solutions.
In the following sections, the applicability of ML in dealing with similar business issues will be examined and the design of an integrated tool that aims to tackle challenges throughout the bulk ordering process, will be attempted. More specifically, clustering and forecasting of the quantities needed by the vessels will be undertaken, to provide a laser focused and current view of the critical needs of vessels. This will be done by integrating exogenous factors, such as ship age, which in a way determine demand for spare parts. To further reduce the administrative workload and generate cost optimal scenario in steps, blending of analytics with traditional operations research, i.e. prescriptive analytics, will be examined in an attempt to drive the optimum sourcing decisions on the basis of minimum Total Cost of Ownership (TCO). As a result, a decision-support tool is formulated that can lead to significant decrease of the administrative workload and the total time needed to complete the project and provides viable scenarios for spare parts acquisition strategies that overall yield a total cost saving of 9%.

Literature Review
Despite its importance, spare parts management literature has paid little attention to its integration in supply chain management to optimize ordering policy and reduce costs of spares in the maritime industry (Vukić, Stazić, Pijaca & Peronja, 2021). Few are the authors who identify and highlight the significance of the subject, one of them being Nenni and Schiraldi (2013), who state that spare parts management in the maritime industry is indeed a very important issue due to the complexity and uniqueness of the ship operational environment, where reliability and safety are particularly essential.
Still, according to our review and to the best of our knowledge, only three journal papers are published in the last five years that are directly relevant to the research presented in this paper. First, the paper by Eruguz, Tan and van Houtum (2018) attempts to minimize the expected total discounted cost of spare part deliveries, part replacements, and inventory holding over an infinite planning horizon. To do so, the authors formulate the problem as a Markov decision process and use numerical experiments to show that the cost savings obtained by the integrated optimization of spare part inventory and part replacement decisions are significant. Finally, they attempt to validate their approach, by using real-life data from a collaborative company, i.e. Fugro Marine Services. In the research presented in this paper, we also attempt to provide an integrated approach to forecast and optimize the spare parts quantities in bulk needed to cover planned maintenance requirements on an annual basis. Still, this is done quite differently, by utilizing ML algorithms for clustering and forecasting, while introducing cost optimization only in the third step of the approach. By evaluating these two approaches in combination, one can actually note that there is an increased level of complementarity. The model proposed by Eruguz et al. (2018), deals with Condition Based Maintenance, hence the Markov chain, while this paper deals with tactical Planned Maintenance. The CBM model in Eruguz et al. (2018) seems to have difficulties in scaling, while the one proposed in this paper utilises a more condition-agnostic approach, thus is able to scale more efficiently on fleet level and for numerous components. In our opinion, these two approaches could work in tandem, offering a very strong predictive maintenance approach, albeit using telemetry data on P-F curves, instead of Markov chains.
Second, we have to note the very recent efforts by Jimenez, Bouhmala and Gausdal (2020). In their paper, the authors develop a predictive maintenance solution based on a computational artificial intelligence model using realtime monitoring data in the shipping industry. In doing so, they analyze a set of historical sensor data, using the statistical programming language R. Their results highlight the potential of using big data analytics for developing a predictive vessel maintenance model. Still, the authors state that there is a number of further issues that have to be addressed prior to designing the algorithms and a solution based on artificial intelligence.
Third, the work of Kian et al., (2019), is considered marginally related to the work presented in this paper in the sense that it also provides an integrated solution for the challenge of spare parts management for maintenance scheduling in the maritime industry. But this is where the relevancy with the research presented in our paper stops, since the authors focus on a specific problem of Condition-Based Monitoring predictive maintenance dealing with a vessel operating on a given route that is defined by a sequence of port visits. When a warning on part failure is received, the problem decides when and to which port each part should be ordered, where the latter is also the location at which the maintenance operation would be performed. The authors use a mathematical programming model of the problem and a shortest path dynamic programming formulation for a single part to solve the problem. Furthermore, the validation they use is based on simulation tests of different scenarios and not in actual case data.
Finally, our research produced a small number of references, which attempt to address the issue of spare parts management in the shipping industry from different perspectives. Their approaches are not directly comparable with the one presented in this paper, still they are highlighted in the remainder of this section, for completeness purposes. Azizah and Subiono (2018) focus exclusively on the spare parts of the ship engine and propose a Petri-Net approach representing its component's spare part ordering. Then they elaborate a max-plus algebra model to obtain the date when the spare part should be ordered. Efficiency measures or cost benefits from applying the method are not mentioned in the paper. Hmida, Regan and Lee (2013), propose a multicriteria inventory policy using inventory classification method integrated with a preventive maintenance program. The authors report savings of thirty-four (34) service days as a result of the decrease in the number of downtime days due to pumps failure and 10% reduction in inventory, by the application of their approach. Last, but not least, we have to note the emergence of Additive Manufacturing (AM) in the scientific field of spare parts management for the shipping industry, which promises to provide disruptive solutions, enhanced flexibility and significant economies for the whole process (Kostidi & Nikitakos, 2008).

The Clustering Component
The ordering of spare parts for planned maintenance purposes is a time-consuming project, in part due to the high number of items that comprise a bulk order. In the interest of industry-wide standardization, each item in the spare parts industry can be referred with a unique number called a maker reference. In each bulk order, there could be over 4,000 distinct product codes making it an arduous process to compare the items or even to systemically log the prices of each supplier. Therefore, it seems important to be able to narrow down the high-interest product codes for each bulk order to facilitate and expedite the process. From an analytics perspective this is a task best tackled by the clustering approach, which is one of the most common unsupervised ML techniques (Hinton & Sejnowski 1999). This way the analysis can be focused only on product codes that have been identified as high-interest and therefore the volume of administrative workload for the departments will be smaller.
Identifying the input variables in the unsupervised learning algorithm is of great importance as the relationship between them will determine the product codes upon which forecasting will be attempted. In this study, we identify the following input variables: a) Price: which indicates the acquisition price of the item (also accounting for discounts-if any), b) Quantity: which indicates the number of times the item was bought in the past for planned maintenance purposes, c) Total Volume: as provided by the product of price times quantity. This variable highlights the importance of items that have a medium price but are ordered in considerable quantities, thus making the total volume quite high, d) Number of Unique Vessels: which indicates the number of different vessels that the item is installed on. This variable increases the importance of an item, even if it doesn't have a considerable volume, price or quantity, if it is installed on many vessels and therefore has an increased influence in the uniformity and possible problems across several vessels and e) Average Age: which indicates the average age of the vessels this item is installed on. The main goal is to determine the product codes that have an abnormally high price, quantity, combination of both or/and are installed on several vessels. Therefore, a clustering process is required in order to identify the 'outliers' of the dataset, thus labelling the items that have the characteristics described above. This process is often called anomaly detection (Zimek & Filzmoser 2018). In this paper, the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm (Ester, Kriegel, Sander & Xu, 1996) is used, as it automatically creates a cluster containing the outliers. DBSCAN is a data clustering algorithm that given a set of points in some space, classifies in the same category, points that are closely packed together. At the same time, it marks as outlier points, those that lie in low-density regions. The items are clustered using DBSCAN algorithm with minPoints=10 and local radius for expanding cluster set to eps = 1.8. The results are visualized in Figure 1.
Cluster 0, hereafter the "Outlier Cluster", contains the outliers of the analysis. The mean quantity of the Outlier Cluster is considerably higher than the one of Cluster 1, which contains the clear majority of the data, here on after the "Average Cluster". The same can be said for the price of the Outlier Cluster as compared to the price of the Average Cluster. Evidently, the total volume, which is computed as the product of the aforementioned characteristics (price and quantity), is also considerably higher. Finally, the number of unique vessels, that as mentioned before describes the number of distinct vessels that the specific item is installed on, is also considerably higher in the outlier cluster. However, the average age of the vessels is virtually the same for the two clusters.
Cluster 2 contains a small fraction of the total items that have a large quantity and are installed on several young vessels. The product codes that will be included in the analysis are those that are contained in the outlier cluster. To further analyse the data k-Means clustering on the previously identified high-interest items is performed.  The cluster of k-means was further used as an independent variable in the forecasting analysis of the next section. The product codes previously identified as cost drivers, will be used as a basis for the bulk order price collection and winner nomination. Simulation results indicate that approximately 4% of the total items can represent roughly 50% of the total cost. By leveraging this finding, the purchasing department of any ship management company can focus only on the pre-identified items to collect prices, assess the quotations, negotiate with and select the winners and therefore decrease its administrative workload. Concurrently, the product codes identified as cost drivers will provide the basis for the subsequent steps of this analysis, i.e. the predictive and prescriptive components.

The Predictive Forecasting Component
The proposed forecasting model aims to calculate the nominal quantity that a vessel needs in the coming year and maker reference. The model variables are: a) Average Age: The age of the vessel is one of the most important vessel characteristics and, as described in the introduction, the maintenance of the vessel and thus the quantities of the items that will be ordered are highly correlated; b) Average Price: The price of an item is one of the most important demand characteristics and in the sections below its relationship with the final quantity will be examined; c) DWT: This variable is an indicator to the size of the vessel and to its needs; d) Type: A categorical variable which, in combination with DWT, denominates the class of the vessel; e) Origin: This variable indicates the country of construction of the vessel. It is a categorical variable of three levels: South Korea, Japan and China and aims to unveil correlations between the shipyard and the quality of the vessel and f) k-Means cluster: denoting one of the three clusters the item belongs to as per above.
The available dataset is divided into 16 core components. In the simulation dataset, there is an evident scarcity of data for this application therefore one of the main challenges that needs to be addressed is the overfitting of the models. The training set and the test set with a random 80-20 data partition, as for most of the components, lacks the number sufficient number of data points to create a validation set. This is a commonplace finding in the shipping industry where abundance of data availability and quality remains elusive. In addition, time series analysis, which seems as the most direct approach to such a problem is not applicable in this case due to factors commonly plaguing such algorithms e.g. multicollinearity, heteroscedasticity and autocorrelation (Hanke and Wichern, 2009) as well as due to the high degree of influence of the demand by exogenous factors. Considering the aforementioned limitations, the below models were chosen; • Random Forest (RF), due to its ability to avoid overfitting (Hastie, Tibshirani & Friedman, 2008) and its superior efficiency (Ho, 1995). • Generalized Linear Model (GLM), due to its simplicity and its ability to handle error distributions other than normal ones (Nelder & Wedderburn, 1972). • Principal Component Regression (PCR); due to its ability to be applied when the number of variables is high in relation to the number of available data. (Jackson, 1991).
For each component the mean absolute percentage error is computed as per the below formula: The results are presented in Table 2. For a number of components, forecasting did not take place as the entries were not enough to properly train and test the algorithms.
As can be seen from Table 2 the forecasting error is, in some cases, considerable while in some cases, where it is below 40%, it qualifies as satisfactory. The satisfactory error level was determined after interviews with practitioners of the shipping industry, specifically in the purchasing department. To better visualize the performance of the algorithms, a snapshot for the forecasting results for the component 'Compressors' is depicted in Figure 3.  For the component 'Fuel Injection Valves', the error of the random forest algorithm is 98%. As seen in Figure 4, there are two data points that have order quantity over a hundred items, which is significantly higher than the average quantity observed. Therefore, a data cleansing method is used to determine such data points and eliminate them from the training and evaluating sets of the algorithms. In the specific dataset, it is common to come across data points that can be considered as outliers. In a business sense, this can be explained by a superintended engineer over/ under estimating actual demand or a vessel having abnormally high/low needs for a specific year due to a sequence of unplanned maintenance events. Consequently, DBSCAN is used to determine the outliers and exclude those data points from the analysis, with minPts = 6 and eps = 0.5. Parameter minPts was set to six by multiplying the dimensions by two. The dimensions for which DBSCAN is applied are only the numeric ones (age of the vessel, DWT, price of the product code). Parameter eps was dictated by interpreting the k-NN graph (Ester et al., 1996).
Next, the three models described before are applied, trained and tested on the 'cleaned' dataset. The training and testing of the algorithms in a sanitized dataset, produces better results, by reducing the mean average percentage error by around 36%. Still, there are some components that the outlier handling could not decrease the error to satisfactory levels, i.e. 'Cylinder Heads', 'Cylinder Liners'. For those components, an analysis of the optimum eps value is presented in Figure 5. According to Ester et al. (1996), when the eps parameter decreases the number of data points included in the analysis is increased (the outliers are decreasing). At the same time the forecasting error decreases as well. As can be seen from the graph, the optimum error (without simultaneous elimination of a considerable amount of data points) is at eps=0.8. However, the MAPE, after the optimization process, is still considerably high making it very difficult to rely on the applied forecasting methods for these components with this dataset. The least mean absolute error for each component for all methods (RF, PCR, GLM) are presented in Table 3, showcasing that the best performing method for the vast majority of the components is the random forest algorithm. This result was expected as the random forest algorithm best handles the exogenous factors that influence the outcome in a stochastic manner that makes it impervious to over/ under fitting (Hastie et al., 2008). Finally, the exceptional case of items that are reordered during the same period is studied. This reordering is either due to miscalculation of the vessel's needs, unplanned maintenance events or failure of previously bought equipment. To determine those additional quantities a forecasting process is used. The forecasts are generated by following the same sequence of actions as for the nominal quantities with one additional dimension, i.e. Bulk Market. This variable is a categorical variable with two levels: maker or parallel, i.e. purchased from original equipment manufacturer or from a source providing imitation spares, respectively. The variable will be used to explore possible correlation between the source of purchase and the additional quantities. The results are presented in Table 4. The forecasting of the extra quantities is performed only on a small number of components (12 out of 16) as the rest do not exhibit variations in the data making the completion of the training very challenging. Especially after outlier handling the number of forecastable components drops even further, from 12 to 8. This particular finding could generate an area for further research in the future.

Minimum MAPE Method
In conclusion, forecasting the nominal needs of the vessels exhibits satisfactory results (average MAPE 53%) and could, in the future when the training samples increase, become more accurate.
For some specific components that show increased accuracy, e.g. fuel oil system (MAPE = 10%) the tool can be used to expedite the process, while decreasing the workload both for the vessel and for the shore-based engineers. However, the forecasting of the extra needs does not yield such results. The average MAPE is increased compared to the forecasting of the nominal quantities, while the number of components upon which forecasting is applied decreases. In the next section, the forecasting results are used to create the cost optimizer that leads to optimum allocation of items to vendors so as to minimize total cost of the bulk orders.

Cost Optimization Component
The cost optimization prescriptive model ties in the entire bulk order analytics framework and shifts it toward the decision support domain by serving as a guideline on the optimal cost basis of spare parts procurement. The model facilitates the choices whether each spare part should be ordered more times than the nominal need of the vessel and whether it should be bought from maker or from the parallel market. The components of the cost function are the following: • Acquisition cost: it represents the cost of purchase for each item. It depends on the total quantities and on the acquisition price of each item. What needs to be noted here is that for the two main categories of suppliers, makers and parallels, the acquisition price changes considerably.
Where safety stock depends on the desired service level (SL). The safety stock will also be added to the acquisition and transportation cost as it is assumed that both the target inventory and the safety stock are bought together, considering that price fluctuations in the spare parts are not high. The safety stock follows the formula below (Ballou, 2003).
Where LT is the lead time, Z¬SL is the inverse distribution function of a standard normal distribution with cumulative probability of the underlying service level and demand refers to the historical demand of the relevant item. For both the lead time and average demand there are more than 30 observations therefore by the central limit theorem it can be said that these variables satisfy the underlying assumptions (i.e. normal distribution) of the above formula.
• Transportation cost: this cost component represents the cost of the transportation of each item on board the vessel. This cost depends on several parameters such as the location of the supplier, the trading route of the vessel, any specific requirements for clearance etc. For the purposes of this analysis it is assumed that the transportation cost depends mainly on the lead time which determines the transportation mode to main logistics hubs, e.g. the Netherlands.
Empirical research indicates that there are two main regions from which ship spare parts can be sourced: Europe and Far East (Japan, Korea and China). Without loss of generality, we assume that around 30% of the spare parts are sourced in Europe wherein we will assume transportation costs to be zero, given the proximity of the vessels and the high frequency that they call European ports. Therefore, according to the above the final formula for the transportation cost is the below: • Inventory Cost: this cost component represents the costs that are incurred because of the inventory held on the vessel. The inventory cost follows the simple formula below where: SS is the safety stock and WACC is the weighted average cost of capital with WACC ∈ [3%, 8%].
• Stock out cost: this cost component represents the costs that are incurred when an item that should have been on board the vessel is not. For its computation the authors present a novel approach as an inverse function of the Safety Stock SL variable

Acquisition Cost
The main components of the acquisition cost are the total quantities and the purchase price. To determine the level of the nominal needs and any additional quantities for each product code the best performing model (the one exhibiting the smallest MAPE) was used (see Table 3). Figure 6, depicts the final quantities (nominal + extra) for 'Fuel Injection Valves', and the differences created from the sourcing parameter (maker/parallel). Next, the safety stock for each item is calculated. The average demand and standard deviation of are computed regardless of the market, using past data. The lead time assumes on-hand stock availability from makers and a range between 5 and 35 days from the parallel market. Lastly, to make the calculation stochastic in nature rather than deterministic, and account for the forecast errors of the previous models for each component the forecast bias is computed, and it is determined whether there is an over-forecasting or an under-forecasting bias. If there is an over-forecasting bias, then the safety stock computed is multiplied by the accuracy of the forecast of nominal quantities. The forecast bias is computed using the following formula: This forecast bias, is sometimes called the normalized forecast metric. As can be seen, the metric [-1, +1] where 0 indicates the absence of forecast bias (Singh, 2017). Negative values show a tendency to under-forecast and positive values to over-forecast. In a business sense, the safety stock is needed to cover needs arising from either demand and/ or lead time variability, e.g. unplanned maintenance events or manufacturing issues at supplier. However, if the demand has been forecasted with a method that indicates positive bias the final quantity that will be purchased will be unnecessarily high. This reasoning explains the final formula of the safety stock. An example of a positive forecast bias for the component 'Cylinder Liners' is shown in Figure 7.
As seen in Figure 7, the forecast for some components tends to overestimate the quantities that will ultimately be needed to cover the planned maintenance needs of the vessel. Therefore, the over forecasted quantities can be used as safety stock. This will avoid over-stocking the vessels with unnecessarily high quantities of items that have been forecasted with methods that exhibit high positive forecast bias. To determine the optimum service level of each product code exhaustive enumeration was used, as the problem size is limited and there are no problem-specific heuristics to reduce the set of candidate solutions to a manageable size. Random service levels were used to compute the total cost of the items and the service level having the minimum total cost was identified as the optimum service level and was used in the final step of the prescriptive model. The random service levels were chosen in the range of 95% to 99.9%. As the items ordered in the bulk process are critical for the smooth running of machinery, this range was chosen to address business and technical needs often found in the shipping industry. In Figure 8, one can see a high fluctuation of the total cost (around 20%) as the service level changes, highlighting the need to determine the optimum service level.
Minimum cost is achieved for different service levels for makers (maker optimum service level = 95.18%) and for non-makers (parallel optimum service level = 96.31%), which is explained by the changes in the underlying quantities which can be observed in Figure 9.
To calculate the acquisition cost of each item the prices of the items depending on the market were determined using historical data.

Stock-out Cost
This cost component represents the cost of urgently re-supplying the vessel with the item, if the existing stock runs out. Stock-out cost is normally higher than the previous costs since there is no time to receive quotations from several vendors or to make extensive price negotiations. When a requisition is made on an urgent basis, lead time is of critical importance and the transportation cost of the shipment can be higher due to an inconvenient delivery port and/or because there might be no additional orders to achieve economies of scales in either airfreight, road freight or launch boat cost. The components of the stock out cost are analysed below: where is the percentage of the times that on an urgent basis the parts are bought from the maker. For the purposes of this simulation we will assume a=0.5 and where SL is the service level for each item.
where the urgency factor can be defined as -

Results
To evaluate the performance of the proposed model, here on after 'Model 1', we are including a comparison with the common practice in the maritime industry here on after 'Model 2'. The results are shown in Table 5.
From empirical research, we have concluded that ship management companies do not take stock out cost into consideration when determining the quantities needed for the fleet and the underlying stockout probability on average is 25%. As can be derived from Table 6 there is a well expected difference in total costs after the application of the two alternatives, i.e. Model 1 and Model 2. It is evident that the cost components that drive this difference are the acquisition and the stock out costs. The increased acquisition cost can be attributed to two main factors: quantity increase due to safety stock estimation and alternate choice between more expensive spare parts purchased from original manufacturers over imitation spare parts. As previously said the stock out cost of Model 2 was calculated assuming that the actual quantities did not account for safety stocks. Figure 11, visualizes the effect of the probability of stock out in actual savings from the implementation of Model 1. The implementation of the model is expected to generate savings from the first year of implementation when the underlying probability of stock out for Model 2 is more than 30%. What needs to be noted here is that the increased quantities purchased in the first year can be considered as an investment which, keeping all other factors stable, would be paid back in full in the 2nd year of implementation of Model 1 regardless of the probability of stock out. In the first year of implementation, the purchase of higher quantities to serve as inventory is proposed increasing the acquisition cost by 27% compared to Model 2. In total, the acquisition cost is increased 13% accounting for the nominal purchases of the 2 years and the creation of a safety stock on board the vessels. On the other hand, by implementing Model 1 the stock out cost dramatically decreases (22% decrease) resulting in total cost savings of around 9% as shown in Figure 12. In conclusion, the analysis indicates that the purchase of safety stock will lead to decreased costs from the second year of implementation. The safety stock demands an initial investment of almost one quarter of the total acquisition cost that, according to the model, will decrease the stock out probability, thus decreasing the total costs of the process.

Conclusions
The main objective of this paper is to propose a methodological approach supporting the spare parts' bulk ordering process of companies managing and operating a fleet of vessels in the shipping industry. Studying the literature on spare parts management in the shipping industry and its intersection with ML-enhanced forecasting techniques and tools, confirmed the authors' initial assumption, that the shipping industry lags behind in both understanding and further more utilizing these techniques and tools in the every-day business practice. Literature proves that spare parts forecasting with the use of ML is still an area at its infancy for the companies of the shipping industry and that is exactly where the research presented in this paper sets its focus. That is to provide proof to shipping industry professionals that ML can be a useful and efficient tool that they can understand, master and apply to their everyday practice.
The proposed approach is based on the development of a comprehensive decision support tool able to facilitate the process of the bulk orders and optimize the purchasing decisions. The approach consists of three discrete methodological steps, each one supported by a decision support tool based on clustering and ML algorithms. In the first step, the initial dataset is rationalized using clustering techniques to reduce the base of analysis by identifying the high interest items. The rationalization refers directly to the reduction of workload for the departments involved in the process and the creation of a targeted and added value subset for further analyses. The next step involves the development of a forecasting tool for estimating the expected needs of the fleet regarding the previously identified items and to test whether the needed quantity is influenced by the source of purchase. Finally, in the third methodological step, a cost-related decision support tool is developed to cost-effectively allocate the selected items to a group of vendors.
The application of our proposed approach produced several interesting findings that can fuel fruitful discussions at the decision-making level of the companies in the shipping industry. First, one has to note the value of clustering, when applied to the item codes dataset. In the benchmarking dataset chosen in this paper, focusing on the 4% of high-interest items which contribute to 50% of the total process cost, results in a significant reduction of the administrative workload for both the internal departments involved in such a process as well as the suppliers. The use of the proposed statistical forecasting tool for the nominal needs of the vessels, initially produces satisfactory results (average MAPE 53%), which can potentially further improve with the training of the samples. Additionally, for specific components that show increased accuracy, e.g. fuel oil system (MAPE = 10%), the tool can be used to expedite the process while decreasing the workload both for the vessel and for the shore-based engineers. However, a further analysis including more independent variables, related to the types of machinery and their nominal running hours etc. could be performed in the future as they were out of the scope of this paper.
In addition, the proposed prescriptive model suggests that increased quantities will potentially lead to total cost savings of 9% versus the baseline case depicted in Model 2. This comes as a result of the high contribution of the stock out cost to the total cost function. Indeed, the cost to deliver an item on board at expedite/urgent conditions is very high. Even if the previous exercise showed no influence of the source of purchase on the extra quantities, the prescriptive model suggests an increased allocation of items to makers. This, however, mainly stems from the fact the makers have lower delivery lead times, thus driving the safety stock down. In summary, the application of the proposed prescriptive model suggests an initial investment on increased safety stock that would be paid back in full (all other factors constant) after the 2nd year of implementation.
Based on the results described above, the authors conclude that the adoption and application of their proposed methodology from shipping companies managing and operating a large fleet of vessels has the potential to produce meaningful results ranging from soft benefits, such as the rationalization of the workload of the purchasing department and its third party collaborators to hard, quantitative benefits, such as reducing the cost of the bulk ordering process, which directly affects a company's bottom line. Inherent limitations do exist in the proposed approach, such as the concept of endogeneity. In purchasing, traditionally prices are 'manipulated' by the choice of the supplier. The proposed approach indirectly assumes that quantity is, among others, a function of price, which often is not accurate, since the price suffers from shocks relative to the choice of the supplier. The authors, in coordination with the case company, decided not to treat the endogeneity issue. First, because the actual business impact of such an intervention was deemed by purchasing executives as not practical and efficient, since year-to-year price surges were spotted in just a few large suppliers and second because development was at risk in falling into an unnecessary and not cost-effective vicious cycle, performing an iteration process in order to correct the error between the quantity that was forecasted with a certain price and the actual quantity that might have a different price minimum. In other words, a change in the supplier of an item, would create a new price, and the combination of price-item as an input in the forecasting tool, could generate a different quantity, potentially leading back to a different choice of supplier/ service level etc.
Further research on the subject includes the re-evaluation of the entire algorithmic framework using datasets with larger breadth in terms of independent variables and depth in terms of observations. More specifically, with a sizeable enough critical mass of data, deep learning algorithms could be performed and assessed to see if they would further improve accuracy and performance indicators, such as the Mean Absolute Percentage Error produced by the statistical forecasting tool. Moreover, the companies adopting the tools proposed in this paper should re-evaluate the P-F curves of the critical items identified by the clustering algorithms, especially if their origin is from the parallel market. Retrieving and consolidating this information in a structured format from the engineering crew onboard, in between overhauls with the bulk order spare parts, would subsequently pave the way for a more holistic predictive/ condition-based maintenance model. The latter should gauge this improved visibility into the spare part reliability, when predicting demand or prescribing outcomes for optimum total cost of ownership.