Port Throughput Influence Factors Based on Neighborhood Rough Sets: An Exploratory Study

Purpose: The purpose of this paper is to devise an efficient method for the importance analysis on Port Throughput Influence Factors. Design/methodology/approach: Neighborhood rough sets is applied to solve the problem of selection factors. First the throughput index system is established. Then, we build the attribute reduction model using the updated numerical attribute to reduction algorithm based on neighborhood rough sets. We optimized the algorithm in order to achieve high efficiency performance. Finally, the article do empirical validation using Guangzhou Port throughput and influencing factors’ historical data of year 2000 to 2013. Findings: Through the model and algorithm, port enterprises can identify the importance of port throughput factors. It can provide support for their decisions. Research limitations: The empirical data are historical data of year 2000 to 2013. The amount of data is small. Practical implications: The results provide support for port business investment, decisions and risk control, and also provide assistance for port enterprises’ or other researchers’ throughput forecasting.


Introduction
In recent years, China's national economy grows rapidly, total import and export trade keeps rising and the port throughput also grows steadily.Data of Ministry of Transportation Highway And Waterway Transportation Industry Statistical Bulletin shows that in 2012, the National port cargo throughput was 10.776 billion tons, made an increase of 7.3% over the previous year.
From 2006 to the end of 2012, the national average growth rate of total port cargo throughput was about 13%, which in the year of 2009 due to the restructuring of the national economy, the growth rate slowed slightly, with an average increase of about 7%.The number of ports increases from previous year's 22 to 26 whose cargo throughput over one hundred million tons.
Various factors can affect the port cargo throughput, such as political, economic, cultural and natural environment.The factors' uncertainties and interactions affect the throughput growth trend.
Rough set theory is a powerful mathematical tool for dealing with inconsistent information in decision situations.Rough set methods can be applied as a component of hybrid solutions in machine learning and data mining.They have been found to be particularly useful for rule induction and feature selection (semantics-preserving dimensionality reduction).Rough setbased data analysis methods have been successfully applied in bioinformatics, economics and finance, medicine, multimedia, web and text mining, signal and image processing, software engineering, robotics, and engineering.
Based on the above discussion, this paper propose an efficient method based on rough sets to analysis the importance of Port Throughput Influence Factors.The rest of the paper is organized as follows.In Section 2, we provide some basic material about Rough set theory.In Section 3, we describe the Port Throughput Influence Index System (Hu, 2005), provide an attribute reduction algorithm and make some improvements.We test the performance of our model and present the empirical results in Section 4. Finally, Section 5 summarizes the conclusions of the paper.The experiment demonstrates that the proposed method can efficiently process attribute reduction.Also the results can provide assistance to management for making timely and accurate risk aversion, encourage enterprises to targeted adjust the business strategies.

Deficiencies of the Classical Rough Set
The classical rough set theory (Pawlak, 1998;Pawlak & Skowron, 2007a) proposed by Prof.
Pawlak simulates the human learning and reasoning process, and uses production rules to represent the learned knowledge (Jiang, Sui, 2015).It's easy to understand, accept and use.And it's particularly applicable when dealing with incomplete or inaccurate information.But as an effective granular computing model, it cannot directly process the numeric data exists in reality world.
The classical rough set theory is based on equivalence relations (Pawlak & Skowron, 2007b;Pawlak & Skowron, 2007c;Qin, Yang & Pei, 2008).However, equivalence relation is often hard to be satisfied because of its restrictions and limitations (Zhu & Wang, 2003).Classical rough set theory then has been extended from equivalence relation to some other relations, such as similarity relation, tolerance relation and arbitrary binary relation (Zhu, 2007a;Liu & Zhu, 2008).Notably, classical rough set theory has also been extended to covering-based rough sets (Zhu, 2007b;Kondo, 2006).The port indicator system used in this article is all numeric data.So if we want to use the classical rough set theory to reduce the attributes (Hu, Yu & Xie, 2008).We must discretize the data, which will inevitably bring about irreversible loss of information, and directly influent the forecast results of port throughput.The neighborhood rough sets can avoid the influence on data accuracy of data discretization (Wang, 2006;Hu et al., 2008;Hu, Yu, & Liu, 2010).
Numerical data set can be directly processed by neighborhood rough sets without the need for data discretization (Yu, Bai & Yun, 2013;Liu, Huang, Jiang & Zeng, 2014).So, this article uses reducing methods based on Neighborhood Rough Sets theory.

Reduction Theory Based on Neighborhood Rough Sets
Granulation and approximation are basic problems of rough set theory and granular computing.Pawlak's rough sets model is built on distinct equivalence relation of discrete space, domain space granulation generated by equivalence relation's dividing the domain.But to the real number space, the value of the object is not discrete, using equivalence relation will result in the value of individual properties' over fitting (Hu et al., 2008).
Neighborhood Rough Sets theory extends the classical rough sets theory.It translate the Symbolic data sets processing based on equivalence relation and indiscernibility relation into hybrid data processing based on the neighbor relationship between distance and neighborhood, and it can deal directly with continuous and hybrid data sets.Consequently, Neighborhood Rough Sets theory can avoid the important and potential loss of information caused by data preprocessing and discretization (Jin, Tung, Han & Wang, 2006;Wang, 2006).
The core concept of neighborhood rough set model is to extend the equivalent approximation of classical rough set model with neighborhood approximation, which enables it to support both numerical and discrete data types.This section will only introduce several necessary concepts on neighborhood rough set model and its reduct, some further details can be found in reference (Hu et al., 2008;Hu, Yu & Liu, 2010).
Definition 1 (Metric space): Given an N-dimensional real number space , : Then we call <, > a Metric space.
Definition 2 (Neighborhood Particle): To a non-empty finite set of real numbers in the given real number space U = {x1, x2, …, xn}.For any object xi on U, it's is defined as a neighborhood: In the formula below,  ≥ 0,  (xi) represents the neighborhood information particles  which are generated by xi, it's short called neighborhood particle of xi.For two-dimensional real numbers space, the norm neighborhood based on norm 1 is diamond, the norm neighborhood based on norm 2 is circle, and the norm neighborhood based on infinite norm is square.It's showed in Figure 1.
3. Attribute Reduction of Port Throughput's Influencing Factors

Analysis of Throughput's Influencing Factors
Factors that have influence on the port throughput are very complicated, in general, can be divided into two categories: macro and micro influencing factors (Zhang, Yan & Xu, 2006).
Macro factors mainly refers to the objective regional factors, such as the size of the hinterland The Port Throughput Influence Index System showed in Figure 1

Attribute Reduction
In this article, we use the Forward-Greedy Numerical Attribute Reduction Algorithm based on the neighborhood model.The algorithm is a heuristic algorithm, which uses dependent function to build the Forward-Greedy Numerical Attribute Reduction Algorithm.Its basic idea is: the reduction collection starts with an empty set, each time calculates all remaining properties' attribute importance, choose the attribute whose attribute importance value is the max to add into the reduction set, until all remaining properties' attribute importance value are 0, which means that add any new attribute, the dependent function value in the decision-making system will make no change (Kryszkiewicz, 1998).
Li Sanle has put forward an optimization method for this algorithm (Li, 2012).In this paper, the algorithm is further optimized by adjusting the calculation sequence.Ultimately, get the best attribute reduction.
The improved algorithm is described as follows: Input: NOT = <U, A, D,> Output: Reduce RED Step 1: ai  A, calculate the neighborhood relation Nai; Step 2: Calculate CoreD(A), RED  CoreD(A), if posRED(D) = posA(D), turn to Step 7; else, turn to Step 3; Step 3: ai  A -RED, calculate Ea i RED; Step 4: Choose the mix attribute ak in Ea i RED, , when there are two or more minimum, calculate with the attribute that has the least attribute values.

Experimental Evaluation
On the basis of job in section 3.1, considering availability of data, we search the Statistical Yearbook and choose nine influential factors to finish attribute reduction, such as GDP, foreign trade turnover, the added value of primary industry, the added value of secondary industry.
This article selects Guangzhou Port and its hinterland economy data as the Empirical Analysis sample.The arranged data is showed in Table 2.When calculate sample's neighborhood, standardize the numeric attributes to interval [0,1] (Jian, Liu, Fang, Dang, Zhu, Wu, et al., 2007), in order to reduce the impact of results because of dimension inconsistent of attributes.Data is showed in Table 3.In this problem, we with a no duplicate completely coverage of domain, the principle is priority select the max coverage of U of neighborhood.This article set the step length of  as 0.05, through adjusting relations between  and neighborhood to reach the best attribute reduction.
In this article, we use MATLAB to reduce attribute.Results are as follows: Guangzhou as southern China's largest mineral trading center such as coal, secondary industry is still the main economic development of Guangzhou and the surrounding areas.A highly developed collection and distribution system can improve cargo turnover efficiency, and increase port product capability.Through the above analysis, the experimental results have been given a reasonable explanation, which proves that this method is accurate and reliable.

Conclusion
This article aims at reducing attributes for factors of port throughput using the theory of Neighborhood Rough Sets, finding out the key influential factors for port throughput.In order to provide support for port business investment, decisions and risk control, and also provide This article mainly has following results: 1. Port Throughout Indicator System.Firstly, analyzed the entire port industry's throughput influential factors from various aspects, limit the entire port industry's throughput influential factors system.Then combined with Guangzhou Port's practical data, establish the Port Throughout Indicator System considering internal and external factors on the basis of analyzing Guangzhou Port's formation mechanism.
2. Attributes Reduction of Neighborhood Rough Sets.There are many factors, which can influent port throughput, so how to select the most important factor as the forecast factor is the key point for a successful forecast model.This article used the improved Forward-Greedy Numerical Attribute Reduction Algorithm based on the neighborhood model to reduce the Port Throughout Indicator System, then chose the most important factor according to the reduce results.This method avoids the interference of unrelated factors to the model results, while improves models' operating efficiency.

Figure 1 .
Figure 1.Two-dimensional spatial neighborhood particles contents some un-quantifiable indicators, such as natural environment, political environment.In this article, we mainly use the quantifiable indicators to analyze the port throughput forecast.The used indicators are as follows: GDP, the added value of primary industry, the added value of secondary industry, the added value of tertiary industry, foreign trade turnover, port fixed investment, area added value (transportation, warehousing, postal), road freight, sea freight, rail freight and throughput of peripheral port.
First, calculate the relative D nucleus of condition property B, and assign relative D nucleus to RED.Then calculate all the condition property equivalence relations in the decision table except relative D nucleus.Finally, calculate each equivalence relation using the boundary inspired factor.(6) Descend EB, obtain the minimum attribute value of EB, and then assign the attribute value to RED, use posRED(D) = posA(D) as the ending condition of the improved recursive algorithm.The posRED(D) represents a set of objects, which can be subsumed to decision class D in domain U according to condition attributes in RED.Repeat the above property selection process, and retain the most suitable property.Use the new generated RED and the rest attribute to generate new equivalence relations.The inspired factor EB acts as the measure standard for selecting attributes.When the recursive operations satisfy the constraints, concentrate all the attributes and delete the attributes whose EB values are big and with low classification capability when combined with relative D nucleus.

Figure 2 .
Figure 2. The Importance Degree of Attribute assistance for port enterprises' or other researchers' throughput forecasting.On the basis of practical research and analyze of factors for port throughput, this article selects Guangzhou Port and it's hinterland economy data as the Empirical Analysis sample.Uses the improved Forward-Greedy Numerical Attribute Reduction Algorithm based on the Neighborhood Model to get Guangzhou Port Throughput's key influential factors.

Table 1 .
area, the social product develop level, Export-oriented economic development level and the number of import and export commodities.Micro factors refers to the port's self-construction conditions, including natural conditions and social economic factors, such as topography, waterways, hydrological and meteorological conditions, vehicle type, ship type, handling and loading ability and technology level, labor organization and management level, type of cargo through the port handling services.All of the above factors are likely to become the important factors which influent the port throughput capacity.Port Throughput Influence Index System In this section, we build the Port Throughput Influence Index System on the basis of personal practical research and analysis and the research of scholars at home and abroad.The index system including seven aspects: natural environment, political environment, hinterland economy, hardware, service level, collection and distribution capabilities and port logistics.Specific information shown in Table1.
: the source of Table2isChina statistical yearbook(1990China statistical yearbook( -2013)).Index Number 1: GDP (Total GDP of Guangzhou, Foshan and Dongguan, Unit hundred million Yuan).Index Number 2: Foreign trade turnover of Guangzhou (Unit hundred million Dollars).Index Number 3: The added value of primary industry (Unit hundred million Dollars).Index Number 4: The added value of secondary industry (Unit hundred million Yuan).Index Number 5: The added value of tertiary industry (Unit hundred million Yuan).
RemarkIndex Number 6: Construction Investment of Guangzhou Port (Unit hundred million Yuan).Index Number 7: Collection and distribution capabilities of Guangzhou Port (Unit Ten thousand Tons).Index Number 8: Transportation, warehousing and postal added value of Guangzhou (Unit hundred million Yuan).Index Number 9: Throughput of peripheral port cargo (Unit Hundred million tons).Index Number 10: Decision attribute, real throughput of Guangzhou Port (Unit Hundred million tons).

Table 2 .
Original Data of the Throughput of Guangzhou Port On the basis of comprehensive reference on literature information, we set the neighborhood radius in 0.1 to 0.4.And in order to avoid data overlap in the boundary neighborhood, using left open right closed interval for neighborhood of .So the neighborhood relations constitute a full coverage of domain.