Transfer Driven Ensemble Learning Approach using ROI Pooling CNN For Enhanced Breast Cancer Diagnosis

– Cancer is a major cause of death that is brought on by the body's abnormal cell proliferation, including breast cancer. It poses a significant threat to the safety and health of people globally. Several imaging methods, such as mammography, CT scans, MRI, ultrasound

II. LITERATURE SURVEY Numerous cutting-edge techniques for the detection of breast cancer have been developed as a result of the advancements achieved in the area of biomedical studies. This section narrates the research that was carried out on effectual cancer diagnosis using machine learning as well as deep learning.
Jing et al. [12] focused on a loss mechanism that combined an improved squared-error loss with a paired-ordering reduction based on the ratings of the residual data. By considering this error rate, the researchers were able to enhance a deep feed-forward system employed for analyzing the collected information. Utilizing the RankDeepSurv approach, the authors made predictions for relapses in nasopharyngeal carcinoma. RankDeepSurv utilized a total of eight clinical indicators to predict recurrence, resulting in a C-index that was 0.681 higher than the conventional survival concept.
Dmitrii Bychkov et al. [13] proposed the construction of a deep learning network using a combination of CNN and recurrent methods to estimate the prognosis of colorectal cancer using images of tumor cell samples. CNN stands for Convolutional Neural Network, while recurrent models refer to Recurrent Neural Networks. The researchers examined digitally preserved tumor samples from 420 cancer patients. By analyzing the tissue structure, they found that DL systems may be more proficient than traditional human observers in predicting the outcome of cervical cancer cases.
DeepSurv, developed by Katzman et al. [14], is a Cox regression hazard DNN and a state-of-the-art method designed to provide personalized therapy recommendations. DeepSurv can identify the connections between an individual's variables and their clinical outcomes. It is a DL feed-forward network that predicts how a patient's factors will influence their risk level using connection weights. The research demonstrates that DeepSurv outperforms other advanced survival models and reliably predicts more subtle connections between a participant's characteristics and their risk of disability. The MesoNet methodology developed by Pierre Courtiolet al. [15] and colleagues accurately predicts the survival rates of mesothelioma patients by making use of whole-slide digitised images rather than the local region labelling done by a toxicologist. On DCNNs, this approach is built. The ultimate evaluation and therapy for patients may be affected by a number of variables that MesoNet discovered. unexpectedly the study discovered that these regions are primarily found in the matrix and are the histologists associated to infection, cell variability, and degeneration. These findings imply that novel biomarkers may be discovered by employing DL techniques to identify distinctive features that are predictive for medical outcomes. Jakob Nikolaset al. [16] propose three different methods for assessing the calibre of CNN training. The first method involves determining whether or not the categorization was successful using a distinct training batch. Dispersed stochastic neighbourhood analysis of deep level stimulation shows the classification distinction among the subsequent technique. In the third step, DeepDream visualizes deep neuron responses on 46 VGG19 DL simulation layers. A pyramid at level 12 and 75 repetitions, a scale of 1.1, and enlarging the image's spectrum offer the greatest seeing quality. Panagiotis Korfiatis et al. [17] offer 3 distinct RDNNs models in order to evaluate their capability to identify methylation states without the requirement of a separate tumour segmentation stage. Statistically speaking, the ResNet50 design is superior to both the ResNet18 and the ResNet34 designs. The use of DNNs for the identification of biological markers in routine medical imaging is highlighted in this paper. Additionally, this study suggests a strategy that avoids the necessity for costly pre-processing. The currently available approaches for predicting cancer are outlined in Table 1.
While there are notable distinctions between compression tasks and feature extraction classification approaches, the utilization DLs pre-trained models on extensive database such as ImageNet has demonstrated its advantages in clinical imaging procedures. Prior research has shown that integrating transfer learning from relevant tasks and training on comparable datasets can lead to performance enhancements in the training set. Consequently, it is reasonable to expect that a system incorporating learning techniques derived from related activities and transfer driven ensemble learning could potentially achieve superior outcomes.
III. PROPOSED TRANSFER DRIVEN ENSEMBLE MODEL This work introduces a novel ensemble classifier that can detect and classify images simultaneously, without the need for user involvement. The proposed method incorporates several components, including a convolutional neural network (CNN) for image generation, pseudo-color picture recognition, and segmentation. Additionally, the framework employs feature extraction using an ensemble of efficientnet, ResNet101, and VGG19 models. Before the feature extraction techniques it includes random cropping, horizontal flips, and color space augmentations are applied. These transformations capture important patterns and characteristics in image recognition problems. To evaluate the effectiveness of this approach, the publicly available MIAS breast cancer dataset is utilized. The architecture for the proposed method is mentioned in Fig 1. Moreover, the proposed model planned to work for multi-modal inputs. When segmenting pseudo-color images, the ROI Pooling CNN can be trained on a labeled dataset where each pseudo-color image is associated with corresponding segmentation masks or labels. During training, the network learns to identify patterns and features within the pseudo-color images that correspond to different object classes or regions. For non-color or grayscale images, the ROI Pooling CNN can be trained on a similar labeled dataset where each image is associated with segmentation masks or labels. Although the absence of color information may limit the network's ability to distinguish certain features, ROI Pooling CNNs are still capable of learning and leveraging other discriminative visual cues, such as texture, edges, or shapes, to perform accurate segmentation. Once the ROI Pooling CNN model is trained, it can be utilized to segment unseen pseudo-color or non-color images. The input images are fed into the network, and the ROI Pooling CNN generates predictions or probability maps indicating the likelihood of each pixel or region belonging to specific classes or segments.

Pre-Processing The Input Data
A digital image's distribution shows how often every intensity level occurs. The normal range of hues in the digital picture runs from 0 to L1. For a discrete function, the histogram is expressed as g(rk) = nk, where nk denotes the number of pixels with the gray level rk. The kth gray level is represented by rk.
The main objective of histogram equalization is to achieve a uniform distribution of intensities by adjusting the intensity values of pixels. During the histogram equalization process, each intensity level rk is transformed into a new intensity sk using the following equation: The transformation equation is determined by (rk) = nk/n, where nk denotes the number of pixels with brightness level rk and n is the overall amount of pixels in the picture. Each brightness level rk, where k is a number between 0 and L-1, is subjected to this modification.In the first step, two sub histograms are created from the input imagine based on the chest area's average brightness intensity. These sub histograms are referred to as historgl and historgube. Historgl represents the histogram of intensity levels that are higher than the mean intensity (Imean), while historgube represents the histogram of intensity levels that are lower than the mean intensity (Imean). Additionally, there is another histogram called histu, which corresponds to a uniform distribution. To modify the histograms historgl and historgu, equations (2) and (3) are employed, respectively.
The values of α used in the equations range from 0 to 1, and the histograms historgl, historgu, histmodl, histmodu, and histu belong to the space R256x1.
The calculation of the overall modified histogram, histmod, is done using equation (4). When the value of α is 0, the modified histogram will be identical to the uniform histogram, histu. On the other hand, when α is 1, the modified histogram will be the same as the histogram of the original image.
Consider an opening of size n by n focused at the Pixel point (x, y) in a picture I. The local mean at this specific position is calculated in the following manner.

Fig 1. Proposed System
Data Exploration Phase Data Exploration Techniques, are employed to identify outliers and detect correlated variables for easier accessibility. Feature dispersion, coefficient of correlation, and iterative component reduction were data exploration methodologies in our research.

Feature Distribution
To comprehend the nature of the dataset, we initially examined the distribution of each feature. This involves assessing the prevalence disparities among both benign and malignant instances in the WDBC database and examining the existence or lack thereof of breast cancer in the BCCD database. For each characteristic, dispersion diagrams were made, enabling us to determine its distribution. Information was binary-coded, with benign information represented as 0, malicious information as 1, missing information as 0, and presence data as 1. A distribution plot between 0 and 1 was then generated to visualize the distribution of each feature.

Feature Correlation
The correlation among every pair of qualities is calculated using the coefficient of correlation Pearson (r). The properties may be divided into three distinct groups according to the connection coefficient: correlated features, inversely correlated attributes, and independent characteristics. Positively connected characteristics have a correlation coefficient of +1. On the other hand, negatively correlated features indicate that the variables move in opposite directions, with a correlation coefficient of -1. This allows us to determine the relationship between the two features based on their correlation.

Recursive Features Elimination (RFE)
In the area of machine learning, deciding on features is a necessary phase, particularly when working with databases that include a lot of characteristics. The objective is to identify the optimal number of features to utilize in order to increase model dependability and achieve the highest levels of prediction accuracy. Attribute reduction with RFEs maintains data understanding. RFE was used in this research to reduce the number of characteristics from 30 to 15. RFE begins by removing each attribute one at a moment from the training database unless the necessary quantity of attributes is reached. Using a specified data mining technique, the predictive algorithm is modified, the features are ranked according to relevance, and the characteristics that are least significant are removed. This iterative process continues until the specified number of features is retained. During each iteration, the features are assessed for their importance and contribution to the model's performance. RFE seeks to identify a particular group of attributes that could successfully describe the information and enhance a model's predicting ability by repeatedly deleting fewer important characteristics.

Hyperparameter Optimization
The process of fine-tuning the parameters in a machine learning model is known as hyperparameter optimization. These settings are critical for managing the procedure of learning and have a big effect on how well the representation performs. Grid searching, random searching, Bayesian effectiveness, gradient-based effectiveness, and population-and evolutionarybased approaches are just a few of the strategies that may be used to optimize parameter This research used grid search optimization owing of its success. Grid search optimization involves creating a grid of parameter values and generating candidate combinations using a brute-force approach. Finding the variable settings that provide the greatest cross-validation statistic ratings is the objective for an search grids. Given that our datasets were related to disease prediction, we employed GridSearchCV from scikit-learn. This technique allowed us to evaluate the hyperparameters across all prediction models.The analysis was conducted using GridSearchCV, which explores different combinations of the provided hyperparameters and their respective values. For each combination, the performance was assessed using the specified parameters, including the estimator, param_grid, scoring metric, verbosity level (verbose), and number of parallel jobs (njobs). We set out to find the optimal parameters that gave our forecasting algorithms the greatest degree of accuracy by methodically comparing various the hyperparameter setups.

Segmentation With ROI Pooling CNN
In Faster R-CNN, the RoIPool operation is utilized to compute the area of each Region of Interest (RoI) and perform max pooling at different extraction levels to address the issue of varying feature sizes. However, this approach loss the spatial data, result of losses true RoI images and their corresponding features.
To overcome this problem, ROI-CNN introduces RoIAlign as a replacement for RoIPool in Faster R-CNN. RoIAlign preserves the spatial details by using a branch mask to accurately capture the impacts of RoIAlign. After the network design is completed, the radiologist trains the Mask R-CNN using ultrasound imaging data, biopsy data, and tumor contours. Throughout training, picked at random instances are distinguished into some training set and the set of validations. The uniformity and stability of the approach are tested against the set of validation samples using the learning set. Mask R-CNN is effective for modeling neural networks as the lost function quantity L on the used for training information lowers. The examination and forecasting of fresh data is then carried out using this established algorithm. The function lossing for the Mask ROI-CNNs is represented by the following equation: Where class + box are idential as in quicker R-CNNs and is given by: Average binary cross-entropy loss (Lmask) is given by: M-CNN model which is trained is evaluated quantitatively using mean average precision (mAP) as the accuracy for detecting/segmenting process on the validation set as where A is the outcome of the segmentation model and B represents its relative tumor contour described by proficient radiologist. indicates the image count; is the area which overlaps true clinical and detected regions of lesion; and is true clinical lesion size.

Data Augmentation
Deep learning techniques are widely recognized for their effectiveness in handling large amounts of data in today's environment. One popular approach used by practitioners is data augmentation, which significantly enhances the diversity of training models by leveraging the available data. Common data augmentation methods applied to train large neural networks include padding, horizontal flipping, and cropping. However, most existing methods for neural network training rely on simple augmentation types.
To address this limitation, policies are employed as powerful forms of data augmentation, allowing for capturing various data variances and exploring the depth of neural network architectures. Considering the inherent variations present in different subtypes of breast cancer images, data augmentation becomes crucial. The Python library augmentor is utilized to facilitate data collection and augmentation. This method involves applying random cropping, rotating, resizing, and flipping techniques.
In order to create both testing and training datasets, a fivefold cross-validation approach is utilized. This technique ensures that the available data is split into five subsets, with each subset used as a testing dataset once while the remaining subsets are used for training the models.The categorization of breast cancer is discussed and determined when there are discrepancies between physician annotations and general public annotations.
In medical picture collections, preprocessing of data is essential, and data augmentation is necessary due to the limited number of patient volunteers available. Regardless of the specific type of transformation applied to the dataset, the augmentational evidence includes the Region of Interest (ROI) in every augmented dataset.For every period for training process, various approaches were applied to each input image, including random brightness adjustments, random contrast modifications, random movement, random flipping, and standardization.
By multiplying each input by N, each input can generate N times the number of outputs, allowing for N epochs to be experienced. However, during the testing phase, input samples are used for standardization to ensure consistency.The subsequent steps aim to focus on the precise location of the breast tumor. Firstly, connected domains are removed by eliminating small areas that are less than 40% of the maximum area. Secondly, connected regions close to the center of the image are selected using the C-V level-sets methodology. These actions help concentrate on identifying the specific location of the breast tumor.

Feature Extraction Using Efficientnet
For feature extraction, EfficientNet is used as the encoder and UNet is used as the decoder. Specifically, the efficientnet b5 and b6 networks are employed, which involve various scales, sizes, and convolutions at the beginning, To reduce the computational cost, convolution is applied earlier on kernels with larger sizes, effectively reducing the channel dimension. Pooling layers are primarily used to decrease the input sizes and improve computational efficiency. Max pooling is utilized to reduce a 44 matrix to a 22 matrix. The pooling process is influenced by variables such as filter size, steps, and maximum pooling. Size 2 filtration systems, 2 walks, and a maximum pool 22 matrix were selected for this investigation.
Attributes from incoming pictures constitute the Convolutional layers' primary output. The architecture of ensemble model with VGG19, ResNet, EfficientNet is depicted in Fig 2. Different Convolutional Layers are employed to capture various types of features from the images, including texture, edges, highlighted patterns, and colors. A completely linked layer, consisting of three layers that are completely interconnected, pool layers and layers based on convolution, is used for categorisation. The Softmax activation algorithm is used in this situation since its result is binaries. After the output of Brainstorm is received in the form of (5, 5, 2048), indicating a feature vector with a dimension of 51200 pixels (51200-by-51200), further empirical procedures are needed to weed out noisy and unnecessary characteristics. This step is crucial in ensuring the stability and effectiveness of the techniques. The chi-square method is employed to eliminate features. The main criteria for removal are the interdependence among features and the presence of high computed correlation values. This determination is carried out for all classes and all features based on Equation (10): In the equation provided, Ek represents the expected values, and ok represents the actual values. The tree-based classifier is utilized to compute features, resulting in a significant enhancement in classification performance. This approach is valued for its simplicity, reliability, and high accuracy. In each decision tree, the importance of Gini index is employed to determine the significance of each node. Equation (11) is used to establish the allocation of two child nodes.
In the given equation, nij represents the importance of the jth node, wj represents the weight of the node with the highest number of samples, and Cjreflects the nodal impurities level. j. The terms Lef(j)& right(j) refer to the left and right split child nodes of node j, respectively.
fij (12) The significance of feature I is denoted by fij, while the significance of node j is represented by nij. To normalize scores among 0 and 1, accumulated features significance coefficients are lowered.

= (13)
Efficientnet follows two pathways. The first pathway, also known as the contraction path, functions as the encoder. It involves stacking layers of activation, pooling, and convolution. The second pathway, referred to as the expansion path, gradually expands the input to match the size of the encoder output. The expansion path, or decoder, facilitates precise localization through transferred convolutions. It combines feature maps from the contraction path with higher-level features and spatial information. The encoder contributes improved spatial information, while low-level feature maps generated by the encoder are useful for analyzing complex scenes with multiple objects and their relative outlines. Both Efficientnet and UNet generate intermediate-level feature maps, which are then integrated. By upsampling specific portions with larger feature channels, contextual information is propagated to layers with higher resolution.

Resnet50 And Resnet101
ResNet-50 and ResNet-101 are variants of ResNet, which stands for residual networks. This addresses the problem of performance degradation and increased error that can occur when adding more layers to a neural network. By incorporating residual blocks and shortcut connections, ResNet models have made significant advancements in image recognition and classification tasks.
ResNet-50 provides access to a pre-trained model with 50 weighted layers. Additionally, there are two other variations of ResNet: ResNet-101 and ResNet-152. These models offer deeper architectures and more sophisticated feature representations, which can be beneficial for complex image recognition tasks.
This pre-trained transfer learning models, including ResNet-50, ResNet-101, EfficientNet, and VGG19, were utilized in our study to leverage their learned features and improve the accuracy of breast cancer identification. By leveraging the knowledge gained from extensive training on large-scale datasets, these models can enhance the performance of our classification task.

VGG16 And VGG19
The VGG model has proven to be one of the top-performing models in the ImageNet classification challenge, which involves a vast dataset of over 14 million images categorized into 1000 classes.

Fig 2. Ensemble Model With VGG19, Resnet, Efficientnet
VGG is a powerful feature extractor that can be utilized for various tasks, including image classification and detection. It was initially pretrained on the Imagenet dataset, which contributed to its ability to generalize well to unseen images. The VGG16 & VGG19 topologies are two of the primary types of VGG. VGG16 has 16 layers, the initial 13 of which are convolutional, the next 3 of which are completely linked together, the subsequent four of which lower the dimensions of space by max-pooling, while the last of which is entirely interconnected with softmax activating. VGG19 with 16 convolutional layers that three layers that are fully linked, five maximum-pooling layers, and one soft maximum layer, has a more complex topology for the extraction of features and encoding.

Empirical Results
Numerous thorough tests were run using the MIAS database collection to demonstrate the efficacy of the suggested systems and contrast them with cutting-edge methods. The proposed system's code was written using MATLAB software R2020b on a Windows 10 PC featuring a Core i7-4650U CPU and 8 GB of RAM. The tests used an 80% training set of randomly generated mammographic pictures for the deep learning systems that were suggested, in line with the advised training methodology. 10% of the training data was randomly selected as a validation set throughout the learning process in order to assess the learners' abilities and note the weight combinations that produced the highest accuracy.  The recommended system had been previously trained on the MIAS database utilizing the specified classifiers. When the learning procedure pauses for a while (a time known as validation tolerance), a learning methodology that progressively lowers training rate is put into place. The suggested approach was trained using the subsequent hyperparameters: 15 iterations, a batch size range of 32 to 128 with a twofold increase, 6 patient steps, and 0.95 impetus increments. Furthermore, a batch rebalancing technique was employed to improve the distribution of infection forms during the batch stage.Batch normalization was used to prevent network overfitting effectively. Deep neural network (DNN) algorithms inherently involve randomness at different stages, leading to results that display a certain level of unpredictability [30]. Thus, collective learning has been accepted as a technique to enhance the efficiency of the proposed methods. In this work, we suggest using the multiple-runs ensembles strategy, which involves retraining the identical model more than once to accomplish stack generalizations.

Performance Metrics
The metrics that follow are used for contrasting our recommended system with other platforms in attempt to evaluate efficiency: Multiple performance measurements that cover a range of topics are used in the assessment. It includes Ps which referred to individuals who had been misidentified with a cancerous weight, FNs which indicate to individuals who were incorrectly determined as not having the condition regardless of having a cancerous mass & TPs which relate to individuals who have been correctly recognized as having a cancerous volume.

Analyzing The Suggested Approach
In this part, we used the MIAS database and separate it into an 80% training collection and a 20% test collection to try out the techniques that were suggested. This split was chosen to ensure reasonable execution times for the experiments.
Over the course of 15 epochs, we trained the ensemble and transfer models for the first experiment. The learning rate varied from 0.0002 to 0.008, and the batch size was between 32 and 128. We set the corresponding weights of the first fifty levels in the combined model and the initial 250 levels in the resulting transfer analysis to zero to produce the starting point. We ran this instruction tackle over three occasions continuously monitoring and recording the average level of accuracy on the set that was validation. The mean precision scores of the modifications are shown in Table 2.
As mentioned earlier, the ensemble model was designed to support various runs of training with the identical parameters. Since the weights are randomly initialized for each run, the accuracy may vary from one run to another. We recorded and displayed only the best result from each run in Fig 5. When comparing the ensemble and transfer models, both models achieved a best accuracy of 98.3%. In Table 2, you can find the average accuracy attained by the ensemble model when the first 50 layers were frozen. The model underwent training for 15 epochs using ensemble.  The segmentation procedure began by using raw photos as input, which were then subjected to morphological closure. Morphological closure involved applying erosion followed by dilation using a structural element. Opening was employed to remove small objects, while closing was used to fill small holes. Connected components analysis (CC) was subsequently utilized to identify connected regions in the binary images. Among the identified connected regions, the largest connected area was chosen for the masking process. This process, as shown in Fig 4, involved setting the background pixels to zero.

Evaluate The State-Of-The-Art Methods
By evaluating the proposed method to previous research on mammograms recognition on massive structures, we evaluated its effectiveness and dependability. In this division, we provide the results of our projected system, which incorporates oversampling with the proposed technique, and compare them with the outcomes of current techniques (refer to Table 3). The suggested approach exhibits exceptional results that surpass the precision of existing methods, as indicated in Table 3. Additionally, our improved transfer learning system offers greater portability in comparison to other models. Overall, our suggested system outperforms existing approaches in terms of accuracy.
Table3 also presents a comparison of the top accuracies achieved and the execution times for each classifier, with bolded entries indicating the highest performances. The proposed classifier stands out with the highest accuracy of 98.3% achieved in just 0.03 seconds. Conversely, the simple CNN classifier demonstrates the quickest execution time but the lowest accuracy. On the other hand, LR utilizing the VGG approach performs reasonably well with the highest cross-validation [35] accuracy of 84.4%. In contrast, the ResNet50 classifier with hyperparameters exhibits the longest execution time of 4.023 seconds, despite achieving a relatively higher accuracy of 85%.   Fig 7, correctly classified images are highlighted with green frames, while incorrectly classified images are denoted with red frames. Observing the misclassified images, it becomes apparent that certain benign and malignant images share similar textures, potentially due to factors like high breast density. Nevertheless, the results obtained clearly outperform other suggested methods. Previous studies that categorized mammography mass lesions utilized shallow convolutional neural networks, basic neural networks, or a combination of extracted ROI pooling CNN features and manually created descriptors. However, the noteworthy advantage of CNNs lies in their end-to-end learning, which enhances speed and reduces the need for complex algorithms. The internal components of ROI Pooling CNNs self-optimize, leading to improved performance. Additionally, ROI Pooling CNNs are computationally more efficient than traditional neural networks, requiring fewer parameters and less training time. The depth and design of the model were found to significantly impact its performance when comparing it to other methods. Fig 8 shows the results of ensemble model-driven transfer-based framework applied to an image of a large lesion that are identified as cancer. IV.CONCLUSION In conclusion, this study addresses the challenges associated with accurately identifying the ROI in breast cancer diagnosis. The complex nature of pre-processing, feature extraction, and segmentation processes in conventional machine learning methods often leads to reduced efficiency and accuracy. To narrow the variability gap among observers, the research introduces an improved deep-learning approach that combines production and detection of pseudo-color images with ROI Pooling CNN-based segmentation. The output of this segmentation process is then fed into ensemble models such as Efficientnet, ResNet101, and VGG19 for image classification. The suggested technique does away with the requirement for humans to identify and categorize cancerous breasts imagery. During the feature selection procedure, information enhancement approaches are used to increase the models' resilience. By implementing and simulating the suggested segmentation and classification algorithms, the study aims to reduce the frequency of incorrect diagnoses and improve classification accuracy. This approach can provide valuable second opinions for pathologists and contribute to early disease identification. Using an accurate forecast of 98.3%, the findings indicate the efficiency of the suggested methodology. It outperforms individual pre-trained models, including Efficientnet, ResNet101, VGG16, and VGG19, by significant margins (2.3%, 1.71%, 2.01%, and 1.47%, respectively). These results demonstrate the possibility of the suggested deep-learning strategy to improve the validity and precision of the breast cancer evaluation, assisting in the early and precise identification of the disease. The extension of the research is by incorporating relevant clinical data, such as patient demographics, medical history, or genetic information, into the classification system could provide additional context and potentially improve accuracy. Combining clinical and imaging information may make it possible to develop a more thorough and individualized approach to identifying breast tumors and treating.

Data Availability
No data was used to support this study.

Conflicts of Interests
The author(s) declare(s) that they have no conflicts of interest.

Funding
No funding was received to assist with the preparation of this manuscript.

Ethics Approval and Consent to Participate
The research has consent for Ethical Approval and Consent to participate.