Kidney Impairment Prediction Due to Diabetes Using Extended Ensemble Learning Machine Algorithm

– diabetes is the main cause for diabetic kidney disease (dkd), which affects the filtering units of kidneys slowly and stops it’s function finally. This consequence is common for both genetic based (type 1) and lifestyle based (type 2) diabetes. However, type 2 diabetes plays a significant influence in increased urine albumin excretion, decreased glomerular filtration rate (gfr), or both. These causes failure of kidneys stage by stage. Herein, the implementation of extended ensemble learning machine algorithm (eelm) with improved elephant herd optimization (ieho) algorithm helps in identifying the severity stages of kidney damage. The data preprocessing and feature extraction process extracts three vital features such as period of diabetes (in year), gfr (glomerular filtration rate), albumin (creatinine ratio) for accurate prediction of kidney damage due to diabetes. Predicted result ensures the better outcome such as an accuracy of 98.869%, 97.899 % of precision ,97.993 % of recall and f-measure of 96.432 % as a result.

predicting the prevalence of cancer cells. They identified that classification techniques functions faster with feature selection and get slow during the absence of feature selection process. Velliangiri et al. [8], applied EHO for the detecting security attacks in cloud environment. The Algorithm is combined with fuzzy techniques for rules learning. The performance of the algorithm is evaluated using continuous simulations of computers and is compared with various state of art techniques. El Asnaoui et al. [9], applied the single and ensemble learning models for the pneumonia disease classification. The results obtained are compared with singe and combined form of MobileNet and ResNet 50 models. The performance metrics followed are accuracy, sensitivity, precision, recall F1-Score.
The ensemble model is elevated s a best performing model in pneumonia disease classification. Pérez, E., et al. [10], presented a melanoma detection convolutional neural network architecture based on ensemble learning and genetic algorithms. So an accurate prediction of affected level of DKD is an important life strengthening factor for diabetic patients. To acquire this, proposed method involves Improved Elephant Herd Optimization (IEHO) algorithm for feature extraction and Extended Ensemble Learning classifier for the classification of DKD levels.
II. RELATED WORK Elshaarawy et al. [11] established an irrational and quick return to the genesis. Because of the balanced management clan updating operator and separating operator, the EEHO algorithm is more exploitative than the EHO algorithm. Wei Li et al. [12] developed an Improved Elephant Herd Optimization algorithm to enhance parameter control and selection, convergence speed, and efficiency of optimal solutions. An Ning & Ding et al. [13], implemented deep ensemble learning for Alzheimer's disease classification and obtained 4% better performance results over six ensemble algorithms. Gupta, A et al. [14] developed and tested an ensemble based for identifying Covid 19 related health issues which results good performance. Ibomoiye Domor Mienye et al. [15], created a model for predicting heart disease risk, where multiple CART models are combined into a homogenous ensemble model. ROCC is used to validate the accuracy of the suggested ensemble learning approach. Prasad et al. [16] employed machine learning techniques to assess kidney illness prediction. Naive Bayes, random forest, decision table, and J48 algorithms were tested, and their performance was assured for better detection of kidney problems caused by diabetes. Olayinka et al. [17], devised an ensemble approach to the diagnosis of chronic renal disease. The ensemble approaches such as Bagging and Random Subspace methods have effectively diagnosed the chronic kidney diseases.
Dong, Z et al. [18], created an ensemble model. The model predicted that DKD was more likely to occur in older T2DM patients with high homocysteine (Hcy), bad glycemic control, low serum albumin (ALB), low estimated glomerular filtration rate (eGFR), and high bicarbonate over the following three years. The ensemble model outperforms. Ghelichi-Ghojogh et al. [19], analyzed the links between CKD and a variety of behavioral and health-related factors in Iranian patients using logistic regression algorithm. The factors such as low birth weight, diabetes, chemotherapy are identified as most relevant causes of CKD. Xu et al. [20], applied random forest algorithm to predict diabetic kidney disease and obtained 89.831% of accuracy. The analysis involved totally 29 indicative markers including Microalbuminuria (ALB) and albumin-to-creatinine ratio etc. Based on the performance result, the confined the random forest algorithm is more suitable for clinical prediction of kidney diseases caused due to poor maintenance of diabetes.
Kandasamy Vidhya et al. [21], analyzed the possible complications of diabetes based on the habitual nature of patients. A Deep Belief Network (DBN) model constructed for disease prediction identifies the diabetes related risks depends on the dayto-day activities of patients. Ilyas et al [22] .'s model, which can successfully and sustainably identify all CKD phases, was developed using the Random Forest and J48 algorithms and the J48 algorithm is identified as the better one. Kandasamy Vidhya et al. [23], implemented Modified Adaptive Neuro Fuzzy Inference System (MANFIS) to analyze the diseases prevalent in the society. Based on the multi-variate combinations of symptoms possible diseases are predicted.
Gazi et al. [24], performed a comparative study on CKD prediction using various algorithms and the LR algorithm is recognized as a best performing one based on the metrics of precision and accuracy. Satish Kumar David et al. [25], experimented with WEKA machine tool to predict the diabetic kidney disease by applying different techniques. The framework's effectiveness is evaluated using a variety of criteria, and the decision tree algorithms are found to be the most effective at forecasting DKD. Lin, CC et al. [26], developed a risk prediction model for CKD with the patients suffering from diabetes. The risk variables for CKD were identified using the Cox proportional hazards regression model. Violeta et al. [27], applied machine learning techniques to identify the biomarkers for diabetic nephropathy (DN). Determined that the techniques for accurate prediction that perform best are random forest and logistic regression.
Dunkler et al. [28], determined the relative influence of predictors using two risk prediction models for the occurrence and development of CKD after 5.5 years. For diabetes type 2 patients, albuminuria and eGFR were the most important markers in predicting the onset and progression of early CKD. Two machine learning algorithms were trained by Allen A et al. [29] to predict the phases of DKD severity, and their results were compared to the CDC risk score. The models were evaluated using both an external dataset compiled from various sources and a hold-out test set. A new temporal-enhanced gradient boosting machine (GBM) model was created by Song X et al. [30] that dynamically updates and groups learners in response to new events inpatient's life with greatest calibration in both moderate and high-risk categories. Gao et al. [31], created a model for predicting renal function deterioration in individuals with type 2 DKD on an individual basis (T2DKD) and identified nomogram and risk table, are clinical indicators for predicting renal function deterioration in T2DKD patients at the bedside.

Materials And Methods
Poor diabetes treatment over time may cause kidney blood vessel clusters that filter waste from the circulation to become damaged, raising blood pressure. Renal disease is aggravated by high blood pressure, which increases pressure in the kidney's delicate filtering process. As an impact of kidney damage, the mortality count increases. So, an effective kidney mutilation prediction system is in need for the wellbeing of the diabetic community.

Dataset Description
The Chronic Kidney Disease Dataset gathered from UCI Machine Learning Repository is used to estimate the severity of CKD. 400 instances have 26 properties, including 14 nominal qualities and 12 numerical attributes. The cases are categorized well in advance as having CKD or not. Along with the existing attributes, 'gender' attribute is added. The total number of 26 attributes contains the elements of clinical and physiological in nature. The attribute information of the given dataset is given in Table 1 below.

Pre-Processing Of Diabetic Data
Data pre-processing is a method used for converting noisy and irrelevant data to clean one suitable for further utilization in analysis and prediction. It is an important process in data handling which lessens the dimensionality of data and helps to achieve better result. Data preprocessing is necessary before model development in order to remove a dataset's undesired noise and outliers that could cause the model to deviate from the intended training set. The effectiveness of the model is addressed at this step. The processing is done by implementing the techniques missing value imputation using mode. The missing values are filled by mode calculation.

Feature Extraction
The process of data reduction by removing extraneous data is closely related to feature extraction. The system reduces the dimensionality and simplifies the utilization which in turn minimizes the training time and improves accuracy. The IEHO algorithm extracts features in an optimized way.

Improved Elephant Herd Optimization (Ieho) Algorithm
EHO is a global optimization technique that is modeled based on elephant behavior in nature. It does not make reference to previous measurements for present data processing. It varies from traditional meta-heuristic algorithms in this sense. If the features extracted from the previous model are fully exploited and used in other optimization process, the performance may be improved significantly. So, in the case of current implementation the EHO algorithm extracts highly significant three features from the previous iterations and is well enhanced by performing crossover and mutation. This procedure greatly increases the optimization effects. So, the proposed method for EHO is termed as Improved EHO (IEHO). Elephants' herding behaviors sparked the development of this algorithm. Elephants are sociable creatures in general having a composite social organization consisting of multiple clans (groups or networks) led by a matriarch. A clan is made up of one or more mother elephants and their calves. Male elephants like to live alone and will quit the clan as they grow older. Female elephants prefer to live in domestic clusters. Clan characteristics suggest exploitation, whereas abandoning elephants suggests population exploration. where, where min h denotes the search space least limit while max h denotes the higher limit of search space. The   1 , 0  rand is a random number chosen from a uniform distribution.
To make optimization more effective, crossover and mutation operations are performed when attribute positions are assessed. The 2-point cross-over is picked from among the several forms of crossovers. The parental qualities are given '2' points in the crossover selected. The genes in between the two locations are swapped between the parental and child traits, resulting in the child's attributes. These points are assessed as follows: As shown in Equations 6 and 7, the parameter swapping happens as of each attribute with new population. The newly created network of attributes are arbitrary populated until better fitness is acquired. The IEHO algorithm is explained as follows. The algorithm selects features such as diabetes Mellitus (nominal), sugar stages and albumin (Creatinine) ratio. Then based on the level of Serum Creatinine and patient's age the Glomerular Filtration rate is calculated.

Classification Of Kidney Damage Level Using Extended Ensemble Learning Machine (EELM) Algorithm
There is only one hidden layered feed forward neural network used in the Ensemble Learning Machine technique that generates weights between the output and hidden layers using the least square method. The bias values are randomly generated based on the input entropy and is called Extended ELM (EELM). Instead of generating random bias values, the proposed method uses functions of entropy as the selection criteria of optimal bias values. The Ensemble learning algorithm is extended by adding the performance of IEHO algorithm for selecting the optimal features. In addition, the entropy calculation approach is combined with the ensemble algorithm. The architecture of the proposed EELM is shown in Table 2 and the steps involved in EELM are explained below.
Step 1: Initially, refer the training sample   (9) In this the values m and n specify the matrix for input and output.
Step 2: Assigned weight between input layer and the hidden layer is as mentioned in the weight matrix is represented in equation 10.  (12) Step 5: Choose the SoftMax activation function , which calculates the probability distribution of factors for classifying the level of kidney damage. The probability range of SoftMax function is true (1) and false (1) and the total probability would be one. The prediction of kidney disease prediction majorly involves the class with maximum probability. The resultant matrix H is given as below, Column vector of the output matrix H is as follows: Step 6: By referring Equations (5.12) and (5.13), compute  R as per equation (15).
where K' is the transpose of K and  R is the weight difference. The weight matrix values  are calculated using the least square approach to obtain a unique solution with minimal error.
A regularization term  is also included to increase the network's generalization ability and make the findings more stable.
When there are less hidden layer neurons than training data, the condition can be represented as, When there are more hidden layer nodes than the training data, it can be stated as, A standard ELM with d hidden neurons and activation function   f A are mathematically modelled by, (19) where, the activation function f i is calculated with the weighted function with bias value generated by the generator function G for the i th , j th layers of ELM structure involved.

Results and Discussion Prediction of Kidney Damage Stages due to Diabetes
The proposed EELM algorithm, classifies the level of DKD into five stages such as NoDKD (0), mild (1), moderate (2), Severe (3), End-stage Renal (4) based on the period of diabetes, eGFR value and Albumin (Creatinine Ratio) as shown in Table 3. It is to be noted that the period of diabetes has a direct impact upon the functions of kidneys. The long persistence of diabetes can harm the blood vessels inside the kidneys which automatically degrade the filtration capacity of the kidney. So the GFR starts decreasing. Due to the reduced filtration rate, the albumin urea starts increasing and it is drained through filters. The level of kidney damage is classified into five stages as level 0 indicates that there is no damage of kidney due to diabetes, level 1 indicates that there is no kidney disease due to diabetes, level 2 is the moderate level of damage, level 3 is the indication of severe damage of kidney and finally level 4 indicates that the patient has gone to end stage of renal failure where the kidney completely stops its operation. Thus, the EELM algorithm effectively identifies the levels of kidney damage.

Performance Analysis Of EELM Algorithm
The performance of the IEHO method is evaluated using a confusion matrix, which is a tabular representation of the ability of a classification technique over test data in which the right and incorrect predictions are clearly shown. True Positive (TP) indicates that the expected component is positive or true, whereas False Positive (FP) indicates that the result is positive but is really false. False Negative (FN) is a false prediction that labels positive as negative. Negative values are correctly predicted as negative in True Negative (TN). The confusion matrix is constructed based on these interpretations, and the corresponding error rate, accuracy, and precision are employed.
The confusion matrix shown, conveys that the proposed EELM algorithm classifies the stage 0 accurately, which is of 150 patients, who have not been affected by the DKD and are normal, but the prediction in stage 1 has 24 false predictions, stage 2 has 8 false predictions, stage 3 has 7 false predictions and stage 4 has 8 false prediction values. So, the error rate of EELM is 0.117.
Performance of implemented EELM classification algorithm is measured with a precision of 99.481%, precision of 98.231%, recall of 98.953% and F-Measure of 98.582%. It is compared with the performance of other algorithms such as SVM, RBF, MLP and ELM techniques for classifying the affected stage of DKD as shown in Fig 5. Similarly, the visual inspection of the validation performance of EELM algorithm is shown in Fig 6.

F-MEASURE
III.CONCLUSION Thus the proposed EELM algorithm for kidney disease prediction determines the phases of kidney damage caused by diabetes automatically. This method classifies kidney damage into four categories: no DKD, mild, moderate, severe, and end-stage renal disease. The IEHO algorithm is used to choose a feature from the pre-processed data. For modifying the weights on the activation function, the IEHO and entropy-based bias value creation supports an optimum feature extraction. Furthermore, accuracy, precision, recall, and F-measure are used to assess the classification performance during both training and validation. When comparing the performance of the proposed system to that of individual classifiers, it is clear that extended ensemble learning classifiers outperform individual classifiers.

Availability Of Data And Materials
The dataset analyzed in this experiment was taken from the UCI repository where the CKD dataset is publicly available. The users can download the datasets from the link shared below. https://archive.ics.uci.edu/ml/datasets/chronic_kidney_disease

Data Availability
No data was used to support this study.

Conflicts of Interests
The author(s) declare(s) that they have no conflicts of interest.

Funding
No funding was received to assist with the preparation of this manuscript.

Ethics Approval and Consent to Participate
The research has consent for Ethical Approval and Consent to participate.