Development and Evaluation of a Computer Vision System for Robot Navigation and Object Recognition in Real-World Environments

– The article discusses the vision framework for computing that includes image recognition, classification, prioritization, and navigation control modules. In this framework, a user model is used to feed the robotic controllers, whose performance improves in dynamic virtual contexts. In contrast, the vision module uses a multi-level perceptual neural network capable of efficient image segmentation, object recognition, and color segmentation, using the control module Position-Based Vision Serving (PBVS) and actions such as Avoid Collision (),Go-Ahead (), and Follow( ). It controls the motion of the robot, so the system successfully tested and met the requirements of the Antimedia Robotics Pioneer I robot. In addition, it was consistent with real life. The results show the effectiveness of the system in providing effective guidance and avoiding obstacles. Furthermore, the study investigates the use of artificial neural networks for image recognition and classification. In addition, it requires the use of SpCoMapping to add language maps to useful information. In summary, studies have emphasized the potential of computer vision and neural networks to improve robotic communication and language learning.


I. INTRODUCTION
Robotics is a multidisciplinary domain that encompasses the creation, assembly, functioning, and use of robots [1].Robotics encompasses several disciplines that specialize in certain facets of the technology.Within the field of mechanical engineering, the word robotics pertains to the creation of the physical components of robots.In contrast, within computer science, robotics mostly involves the examination of robotic software.Additionally, there are several other facets of robotic development and related areas that intersect with it, such as electrical, control, software, information, electronic, telecommunications, computer, mechatronic, materials, and biomedical engineering.The objective of robotics is to develop devices capable of aiding and supporting humans.
The discipline of robotics focuses on the creation of robots capable of automating activities and performing a wide range of professions that may surpass human capabilities.Robots are employed in various scenarios and serve diverse functions.Presently, they are particularly utilized in hazardous settings such as inspecting radioactive substances, detecting, and neutralizing bombs, as well as in manufacturing processes [2].Additionally, robots are deployed in environments where human presence is not feasible, such as outer space, underwater, high-temperature conditions, and the cleanup and containment of dangerous materials and radiation.Robots have the ability to adopt many physical forms, with some specifically designed to closely mimic human beings.This is said to facilitate the acceptability of robots in certain imitative actions typically carried out by humans.The robot'sintent to mimicactivities of human like thinking, lifting, speaking, walking, and other duties.A significant number of contemporary robots draw inspiration from natural phenomena, therefore making valuable contributions to the domain of bio-inspired robotics.
Some robots need operator input for operation, whilst others run independently.The idea of developing independent robots may be traced back to ancient times, but, important advancements in perception of their competence and prospective imposition did not occur until the 20 th century [3].Historically, many researchers, inventors, engineers, and technicians have often believed that robots would eventually possess the ability to imitate human behavior and perform activities in a manner similar to humans.The area of robotics is seeing fast growth nowadays due to ongoing technology advancements.Researching, creating, and constructing new robots serves a range of practical uses, including home, commercial, and military applications.Several robots are specifically designed to do tasks that pose significant risks to human beings, such as disarming explosive devices, locating individuals trapped in unsafe structures, and investigating underground mines and sunken vessels.Robotics is used in STEM education as a pedagogical tool.
Computer vision is a multidisciplinary domain that focuses on enabling computers to acquire advanced comprehension from digital pictures or movies.From an engineering standpoint, its objective is to mechanize activities that can be performed by the human visual system.Computer vision focuses on the automated extraction, analysis, and comprehension of valuable data from either a single picture or a series of images [4].We aim to develop a framework for devices to capture what they see using an automatic visual recognition system.Moving into computer vision, the field explores how artifacts interpret image-based information.Events vary; it can be video or images from multiple lenses and scanners.This task involves preprocessing, cutting images into blocks, and importing them into a module that handles and directs them.Studies show that when robots must work in environments full of change and motion, an efficient approach works best for their control systems.
The visual component is based on multi-layer perceptron (MLP) neural networks; these handle tasks such as sorting colors, sculpting blocks from images, and dotting objects.The course also delves into design fine-tuning, which is best suited to describing neural networks so they can learn effectively.The results highlight the precision and effectiveness of the visual system and its capacity for practical use in real-life scenarios.In addition, the research examines the use of artificial neural network models in picture processing and their potential to enhance language learning attributes.In summary, the research presents a justification for creating and using a computer vision system to enable robots to navigate and avoid obstacles.
The rest of the article has been organized as follows: Section II presents a discussion of the related works on computer vision system for robot navigation and object recognition in real-world environments.Section III focusses on computer vision system, discussing elements such as the robotic hardware, Saphira environment, interface, vision module, and the navigation control module.Section IV presents a critical analysis of the results, which focus on the system tuning, and related experimentations.Lastly, Section V presents a conclusion of the system used for robotic navigation and object recognition.

II. RELATED WORKS
Human-robot interaction is a contemporary field of study that explores the use of robots with visual systems in production, opening up new possibilities [5].By prioritizing the gesture modality and image processing capacity, an industrial robot may immediately interpret visual input and initiate an action.The majority of industrial robotic systems are focused on the duties and objectives particular to a certain context.The primary objective of the robotic program is to investigate the interface to a tele laboratory, which is specifically designed and developed to provide the user with several capabilities for the robot.These capabilities include cameras, control framework, and different representation methods such as augmented and virtual reality.When the application is introduced into an industrial setting, the interaction between humans and robots may become a difficulty.However, this issue has been more easily managed in recent times due to the advancements in current vision systems.
In the manufacturing environment, robots with vision systems can easily avoid problems.New innovations have resulted in bots with vision that can independently find clear paths in industrial areas.Independence requires such machines to travel without human assistance.Depending on the program, vision technologies fall into two categories: event-based or object-specific.They each outfit the robots with cameras and sensors and capture footage critical to their projects.3D imaging technology enhances the ability to identify controllable features while identifying potential obstacles in humans.Object handling strategies-think object selection functions-depend exactly on the locations and visibility of objects in this environment.The additive 3D vision for a robot's energy is high in offices.This improvement comes from advanced hardware combined with sophisticated (voice-activated) software.
Using 3D thinking, robots develop honed skills that allow them to perform precise automated tasks (passive voice).Thus, accuracy in these areas is greatly improved by the continued improvement of such systems.The OPC architecture is a safe and reliable communication protocol designed for manufacturers, capable of connecting the smallest sensor all the way up to the corporate IT level and cloud.OPC Vision is a specialized image processing system built specifically for use on factory floors and general industrial platforms.The proposed aim is to include any element of image processing components into industrial automation applications in order to develop machine vision technologies capable of interacting with the whole plant.The user-level system of processing image offers a linguistic representation of visual data.
As presented by Quigley [6], manufacturing investigation duty systems of image processing interact with programmable logic controllers (PLCs).This approach transmits a pass or fail outcome to the Programmable Logic Controller (PLC) after picture analysis.The OPC Vision implements standardized communication methods.An ERP system has the ability to accurately identify the attributes of a frame grabber or retrieve image processing system streams via events for clients.An industrial automation application requires data and code to allocate tasks in the cloud.The cloud serves as a comprehensive middleware solution utilized in the Robotic Operating System (ROS) to facilitate communication and allow robots.The Cloud-based Robotic framework is specifically designed for high-bandwidth robotic applications in the industry, enabling the outsourcing of vision-enabled tasks.
This study presents a computer vision system consisting of primary parts.This article explores computer vision as a branch of artificial intelligence (AI) that empowers systems and computers to extract significant insights from videos, modern photos, and other visual inputs.These insights may then be used to initiate activities or provide suggestions.AI empowers computers with the ability to reason, whereas computer vision equips them with the capacity to see, scrutinize, and comprehend.According to Cyert and DeGroot [7], computer vision operates similarly to human vision, but with humans having a prior advantage.The human visual system benefits from extensive exposure to contextual information over the course of a lifetime, enabling it to effectively discern objects, determine their distance, detect motion, and identify abnormalities within a picture.Computer vision enables machines to complete these jobs using algorithms, data, and cameras instead of relying on visual brain, optic nerves, and retinas.Moreover, it accomplishes these tasks in a much shorter amount of time.Due to its ability to assess a large number of processes or items per minute, a system trained to monitor asset manufacturing may detect invisible faults or problems that transcend human capabilities.

III. COMPUTER VISION SYSTEM
The system of computer vision described in this study consists of two primary elements.The first component is accountable for the recognition, segmentation, and preprocessing of the picture.The second module implements the control of navigation, which is responsible for directing the robot throughout the surroundings.An RS232 connection facilitates the transmission and reception of data between the laptop and the robot [8].The interface of USB facilitated the establishment of a link between the camera and the laptop.The suggested system design may be categorized as adhering to the automatic model, indicating that the robot lacks previous knowledge of the environment and does not retain the obtained information.
The reactive paradigm is very appealing for implementing robots'controllers operating in dynamic and real environments.This may be deduced from the reality that the automatic acts, under this model, are clearly defined relationships that dictate how the robot should respond based on the data gathered from its sensors.Furthermore, the behavior established in this paradigm may serve as a foundation for developing controllers that adhere to the hybrid paradigm.In this scenario, the robot would be capable of doing more intricate tasks, such as exploring the surroundings by generating a map and navigating around barriers.

The Robotic Hardware
The trials for the proposed vision system were conducted and evaluated only on an ActivMedia Robotics Pioneer I Robot.The robot's initial perception is formed using a set of seven Polaroid sonars 6500.External communication may be conducted via radio or the RS232 interface.In order to achieve visual awareness of the surroundings, the suggested graphic system used a Creative WebCam Go Plus camera.The camera has been positioned atop the robot and oriented downwards towards the floor.This design allows the robot to detect graphic data across a range of up to 4.8 meters in front of its base.

Saphira Environment
The interaction between the Pioneer I robot and advanced system is facilitated by the Saphira [9].The Saphira is an ecosystem comprised of many collections used for the development of robotics applications.The Artificial Intelligence Center at Stanford Research Institute is responsible for its maintenance.The library contains functions that enable users to develop applications in the C/C++ language for controlling moving robots such as the Pioneer I.It provides an abstract interface to the hardware of the robot.

The Interface
The interface is a streamlined component that facilitates the communication between the system and the user.The parameters required for experiment control include color selection, maximum velocity, behavior weights, translation and rotation angles for picture segmentation (the specified color to be tracked).The interface also includes connection routines that are responsible for establishing and terminating the association with the simulator or the Pionner I robot.The interface allows the user to configure the test`s settings and see all the processing data via images and data created throughout the experiment.

Vision Module
A module may have several meanings, including: In the context of computer software, a module refers to a distinct and self-contained unit of code that may be developed and managed separately for use in other systems.For instance, a developer may design a module that encompasses the necessary code for using a sound card or conducting input/output operations on a certain kind of filesystem.The module may thereafter be disseminated and used by any system requiring that specific feature, and the development of the module can progress autonomously.This methodology is often referred to as a modular design.In the context of computer hardware, a module refers to a self-contained constituent of a larger and more intricate system.For example, the memory module may interface with a computer motherboard to function as an integral part of the system.The module of vision is responsible for data processing and acquisition, which is then transmitted to the control module.The vision module acts as an interpreter that takes an image from the environment and produces a high-quality rendering of the image as a result.
For instance, there is a crimson item located at location XY.The control module may create the required instructions for the robot's navigation established the output of the automatic module.The vision module is comprised of three distinct tasks: recognition, image segmentation, and preprocessing.The first task involves the capture of a picture from the camera, which has a tenacity of 320X240 pixels, and then reducing it to a resolution of 80X60 pixels.The picture decrease is achieved by the computation of the average of the data points.A 4X4 pixel window is applied to the acquired picture, resulting in a point that represents the mean value of the 16 pixels inside the window.The resolution of this level, set at 80X60 pixels, has been determined by empirical analysis using genuine environmental pictures.The system has implemented the minimal resolution required for object recognition.The picture segmentation procedure is executed using a color categorization system.
The system of classification consists of a collection of MLP neural networks see Fig. 1, where each network is responsible for categorizing a certain color, distinguishing it from other colors in the picture.Inquiring about a crimson color, we label every speck on an image-crimson or not.This task unfolds as we examine RGB figures for each dot [10].Feed these RGB snippets to an MLP neural network; the outcome reveals whether crimson reigns or another tint prevails.Employing what's termed segmentation approach, this technique isolates one pigment alone.Thus, in our divided display, only two colors emerge: white signifies our color of choice; black encompasses all that remains.Effectively identifying split segments within the visual part stands as our primary objective.Aiming for precision here, we consult a specialized MLP neural network uniquely suited for the task at hand.Within the segmented snapshot (measuring 80X60 dots), these serve as input signals [11].Subsequently, our network renders a verdict: presence-or absence-of an object?Affirmation from this mechanism not only confirms its existence but also determines its precise location onscreen.Here dwells our distinguished MLP neural network-with trio neurons crowning its output tier.
The 1 st and 2 nd outputs reflect the X and Y positions, of the item in the picture.The third output (P) indicates whether or not the equipment is current in the Fig 1 .The X and Y outputs were distinct based on empirical observations from trials conducted in a real setting, where the pictures were labeled according to the following values.X varies from 0.0 to 1.0 with an increment of 0.1, whereas Y ranges from 0.0 to 1.0 with the same increment.These values indicate the measurements corresponding to the object's location in the visual sector, not depending on the actual surroundings.
Nevertheless, during the conducted testing, it was seen that the Y output values of 0.0, 0.4, 0.7, 0.8, 0.85, 0.88, 0.9, and 0.91 could be associated with intervals of 0.5m in the actual habitat.When Y produces a value of 0.0, it indicates that the item is in close proximity to the robot.A Y output of 0.4 signifies that the image is about 0.5 meters away from the robot.This pattern continues with further Y outputs, each representing an additional distance of 0.5 meters.This estimate method for the robot's location in the actual environment has been designed based on the assumption that obtaining the actual 3D position of an item is not feasible without using stereo vision.The posture of the end-factor robot is determined by using the graded vision sensorrepresentation of the object.This estimation is achieved by the matrix of transformation   .The retrieved characters f, derived from the reconstruction, are used to evaluate the pose Pa.This pose is a role of both the location of the orientation θ and the end effector ().The alignment is determined by the use of inverse kinematics that compute the location of the target item () relative to the frame of a camera.The joint restrainer is specifically intended to ensure that the error e, which represents the difference between the actual pose Pa and the reference posture Pd, is reduced to zero.
Ultimately, PBVS control computes the angular velocity (ωc) and linear velocity (vc).The inverse Jacobian matrix is computed using these numbers to determine the velocities that are joint required for the robot's movement.A benefit of PBVS is that it allows for the formulation of desired tasks in cartesian space, making it easy to calculate the error.In their study, Vijayan and Ashok [12] conducted a comparative investigation of PBVS and IBVS for industrial assembly, focusing on the aspects of accuracy and speed.The PBVS method was applied to the ABB robot and it was discovered that it had a quicker execution time compared to the IBVS method.Additionally, the PBVS method was evaluated using numerous cameras.Sharma et al. used PBVS (Pose-Based Visual Servoing) to determine the transformation between the camera and robot base of a mobile robot.They employed the gradient descent approach for this estimation.Cheng, Li, Jiao, and An [13] used PBVS (posture-Based Visual Servoing) to estimate the posture of a robot and navigate around barriers.The authors used the aversion torque approach to determine the manipulators that are repulsive inverse kinematics, yielding results that were characterized by speed, flexibility, and accuracy.Using this controller, the robot may autonomously navigate around the surroundings in a random manner while looking for the designated color item.Once the robot detects the item, the regulator guides the robot towards it.The controller was designed using the field potential approach, with the objective of producing the actual trajectory that the robot should traverse, considering the data produced by three actions enforced: Avoid Collision(), Follow(), and Go-Ahead().The primary function, Go-Ahead(), aims to maintain continuous movement of the robot.Follow() behavior is responsible for propelling the robot towards the destination by using information obtained from the vision module.The last activity, Avoid Collision(), ensures that the robot moves in the contradictory direction of any impediments encountered.The action is shown by a vector of 2D that includes the direction (  ) and magnitude    ⃗⃗⃗⃗ ).The action Go-Ahead() does not include the perspective of the environment.The objective of this action is to maintain continuous motion of the robot.A vector, denoted as (  ), is used to depict the uniform field behavior.This vector has a constant direction and magnitude.
In contrast, the Follow() behavior utilizes the visual perception module to guide the robot towards the destination.As previously stated, the output of neural network yields the X and Y coordinates that correlate to the object's location.Subsequently, the output Y has been discretized in the 3D actual world with values that increase by 0.5m.By using this method, it becomes feasible to compute an estimated numerical representation for the 3-dimensional coordinates, specifically referred to as X3D and Y3D.This function takes the input signals X and Y from the MLP network recognition and outputs the duo ([X3D], [Y3D]) in millimeters, which represent estimations of the image`s location in the actual habitat.Subsequently, the behavior of Follow()may be executed.The Follow() behavior operates using an attractive force,

𝑥(𝑡)
Robot denoted by a transmitter with a constant magnitude (Ca), and an angle that is rotation that shows the robot's orientation towards the destination.The direction is derived from the information acquired from the module of vision The Avoid Collision behavior is responsible for maintaining a safe distance between the robot and any impediments.In this scenario, the sonar signals are taken into consideration.The Avoid Collision() function of the robot calculates the resulting force by summing together the seven unique vectors generated by its seven sonars, each representing a repulsion force.The Avoid Collision() activity generates a repulsion force via a radial field that follows a decay pattern described as follows: The variables in the equation are explicitly specified as follows: The variable  denotes the length between the obstacle and the robot, while  indicates the limit of closeness.Additionally,  is a constant that governs the pace at which the function of exponential declines.The indication of the aversion force, represented by the angle , is determined by: The variables   and   represent the  and  coordinates, measured in millimeters, collected from each of the seven sonar devices, denoted by .The negative indicator is used to reverse the direction, since this action is necessary to maintain a safe distance between the robot and the barriers.The resulting repelling weight is determined by the vector total of the vectors that are individual produced by each sonar, as seen in Equation 7.
The resulting force (robot trajectory) is determined by calculating the forces for each prescribed behavior.
The weights of each force in the system are denoted by Pi, where  = 1, 2, 3.The values of   in the studies were stated as follows: • 1 = 0.5 • 2 = 1.0 • 3 = 0.7.Once the resultant weight has been computed by integrating the behaviors, the robot must penetrate using the values: The translation velocity of the robot is denoted as   , whereas   represents the high speed of the robot.The robot`s direction is determined by the arctan  ,   function.

IV. RESULTS AND DISCUSSION
The findings are divided into two parts: the first part discusses the system tuning procedure, including parameter design and neural network learning, while the second section details the tests conducted with the Innovator I Robot in an actualworld setting.

System Tuning
The first section of this document outlines the conducted tests and the resulting outcomes of the system tuning process.To be more precise, we are presenting the outcomes of the process of learning of the NNs that are accountable for the recognition task and the picture segmentation job.Two algorithms, namely backpropagation [14] and Rprop[15], were used for the learning process.Two different implementations of the backpropagation method were used: batch and on-line mode.

Image Segmentation Neural Network
A database containing 100 patterns was built for the purpose of training the image segmentation neural network.To train the neural network to identify the color red, a dataset consisting of 50 pixels that are red patterns and 50 other colors was generated.This piece included three colors: red, blue, and yellow.By splitting the data bank into two sections-80% for the technique of learning and 20% for testing the set geology-a few experiments were conducted to determine an appropriate neural network topology.These studies revealed that the earth science 3X3X2, which denotes a NN with three RGB neurons of input, three hidden layer neurons, and two output layer neurons, was more suited to the red and blue colors than the yellow color, where the topology 3X5X2 produced superior results.Tests using the 10-Fold Cross-Validation approach were conducted after the identification of an appropriate topology.The outcomes of the learning process employing the following algorithms are shown in Fig 4 : On-Line Backpropagation, Rprop, and Backpropagation for the red color (the results were almost same for the blue and yellow colors).
Fig 4 represents the number of iterations required for the neural network to converge, as well as the standard deviation (SD) and the mean square error (MSE) averages during the learning process's ten-fold execution.The findings show that the algorithms' accuracy was almost identical, with the only difference being the number of iterations each learning method utilized.For every approach, the amount of processing time required for learning was about equal.To assess the picture segmentation process's real-time performance, certain Pentium IV 2.26 GHz tests were run.The neural network took 0.0125 seconds for the red and blue colors to fragment a picture with 80X60 pixels, and 0.0187 seconds for each frame including yellow.

Recognition Neural Network
The use of artificial neural network models in image processing, where parallel architectures and high computing speed are necessary, is become more appealing.Neural network applications to challenges requiring some level of intelligence or human-like performance have been the subject of several articles published recently.An innovative neural network design for image identification and classification is described in [16].According to Cios and Shin [17], an object's attribute may be estimated or recognized using a suggested neural network known as an image recognition neural network (IRNN).An analog gray level picture is fed into an IRNN, which outputs an appropriate recognition code.
A database of seven hundred patterns was developed in order to aid the MLP in charge of the recognition job during its learning phase.Every pattern is made up of a preprocessed, segmented frame (a binary picture of 80 by 60 pixels) and the label for that frame, which indicates where the item is in the visual field.We also partition the database into two portions, 20% for testing the configured neural network and 80% for the process of learning, in order to determine the topology of this neural network.The three algorithms-On-Line Backpropagation, Batch Backpropagation, and Rprop-were used to determine that the topology 4800X10X3 is suitable for accomplishing this recognition task.The design of this neural

Experimentations
Following system calibration, a number of tests using the Pionner I robot were conducted to evaluate the CVS's performance in an actual setting.A Creative WebCam Go Plus and a Pentium III 500MHz laptop were used to run the CVS.As stated in [18], the purpose of the first studies was to determine appropriate parameters for the behavior force weights.These tests were conducted in a basic setting in order to fine-tune the system's ability to track a target and avoid obstacles without colliding.Following the first trials, which were carried out to determine certain system parameters, we constructed many scenes in a real setting with arbitrary barriers and a target that was also positioned arbitrary inside these scenes (as shown in Fig 6).In every trial conducted with the determined settings, the robot demonstrated navigational skills by dodging barriers and tracking a target-characterized by a distinct color-when it entered its range of vision.Wolf and Sukhatme [19]conducted an experiment in which they generated a semantic map inside a real-world setting.Fig 8 displays the robot and the environment we used, respectively.The laboratory room doubles as the researcher's study area and living quarters for experiments.In this experiment, we show that SpCoMapping can learn terminologies as location names without establishing them by extracting word information from provided phrases.To demonstrate how many words may be associated to a location without the place names pre-setting, we utilized lines as word characteristics.We gave each of these 20 lines-which include 50 vocabulary words-five times.Approximately 407 times, we gave the RGB information.We used SpCoMapping in this experiment since we did not know how many spatial notions there were.We have 120 as the maximum number of spatial ideas.The hyperparameters that we have specified are as shown:  = 1.0 * 106,  = 0.6,  = 100.0,and  = 4.0.Using the  −  techniqueas collective data between sentences and words, we establish the weight for vocabulary features.The following equation was used to determine each word's weight in the sentence.
ℎ , =  , ∑  ,  log    (11) where  , is the total word count in the phrase ,is the quantity of phrases, and Indicates how many sentences there are, including word I.The weights of words that are used in numerous phrases, such as "is,""here," and "you," are decreased by using the tf-idf algorithm.This procedure aids in the learning of vocabulary associated to locations.The suggested approach will be stable if we sample   100 moments and use the mean as   .Furthermore, we used word information to facilitate pre-learning via spatial conceptformation.Klein et al. [20] computed 1000 iterations while keeping the hyperparameters set to the same values.The scholars initialized π,μ in Algorithm 1 line 1 using the prelearning outcome.In Analytic 1, line 2, we set the value of  , as follows: If mxt is not an unoccupied area, then  , lack any dimensional concept.
, = 0 (12) The sampling equation may be expressed as: The distribution of multivariate Gaussian is represented as (µ, ), where µpre is the medianbeeline of the position dispersion for the l th pre-studying group, and  denotes the matrix of covariance of the transformation of position for the l th category of pre-studying.Utilizing pre-studying enhances the stability of language acquisition characteristics and reduces the number of iterations needed for learning.V. CONCLUSION This research introduced a computer vision system comprising of preprocessing, segmentation, and image identification, with a module for navigation control.It was decided that the responsive model is well suited for the design of robots operating in a realistic, dynamic environment.The vision module used a multilayer perceptron (MLP) neural network for color segmentation, image segmentation, and object recognition.The navigation control module used position-based visual servoing (PBVS) to guide the robot and incorporated Go-Ahead(), Follow(), and Avoid Collision() functions to facilitate obstacle avoidance and navigation.Study Active Media Robotics with the Go Plus Camera of Creative Webcam was developed on the Pioneer I robot.The tuning process includes parameterization using the backpropagation RProp algorithm and learning a neural network.The findings indicated that the algorithms exhibited comparable levels of accuracy, differing mainly in the number of iterations used by each learning approach.The segmentation task's real-time performance was evaluated using Pentium IV 2.26 GHz testing.The research also examined the use of models of artificial neural network in processing of image and their capacity to estimate or identify object properties.A further experiment was carried out using SpCoMapping to produce a semantic map in an actual environment.The researchers used sentences as lexical features to illustrate the extent to which words might be linked to a specific area without prior specification of place names.

Fig 1 .
Fig 1.The Recognition MLP Neural Network.Navigation Control Module A control module was created to enable the robot to travel around the habitat by keeping off barriers and use Positionbased visual servoing (PBVS) for guidance.The PBVS system consists of three fundamental modules: the module of Feature Extraction, the module of Pose Estimation, which utilizes methods of processing image, and the Control module where conventional or intelligent controllers are implemented, as seen in Fig 2.The posture of the end-factor robot is determined by using the graded vision sensorrepresentation of the object.This estimation is achieved by the matrix of transformation   .The retrieved characters f, derived from the reconstruction, are used to evaluate the pose Pa.This pose is a role of both the location of the orientation θ and the end effector ().The alignment is determined by the use of inverse kinematics that compute the location of the target item () relative to the frame of a camera.The joint restrainer is specifically intended to ensure that the error e, which represents the difference between the actual pose Pa and the reference posture Pd, is reduced to zero.Ultimately, PBVS control computes the angular velocity (ωc) and linear velocity (vc).The inverse Jacobian matrix is computed using these numbers to determine the velocities that are joint required for the robot's movement.A benefit of PBVS is that it allows for the formulation of desired tasks in cartesian space, making it easy to calculate the error.In their study, Vijayan and Ashok[12] conducted a comparative investigation of PBVS and IBVS for industrial assembly,

Fig 4 .
Fig 4. The Image Segmentation Neural Network's Learning Procedure Yields.

Fig 5 .Fig 6 .
Fig 5. Final Product of Segmenting the Image Using the RGB Color Space.

Fig 7 .
Fig 7.The Outcomes of The Recognition Neural Network's Learning Process.
10 cells in the layer that is hidden, 3 cells in the outer layer (representing , , and ), and 4800 cells in the inner layer (corresponding to 80X60 pixels of binary).The cross-validation technique, using a 10-fold approach, was conducted to the neural network set with dimensions of 4800X10X3.The results of this procedure, employing the three designed algorithms, are presented in Fig 7.Fig 7 displays the almost identical results of the two methods, with the benefit that the Rprop algorithm ran quicker than the others.

Fig 8 .Fig 9 .
Fig 8.The robot and an example of trials conducted in an actual setting.A robot used for conducting research on an individual's daily habitat Human Support Robot.We used the HSR developed by TOYOTA, which is equipped with the Xtion PRO LIVE sensor manufactured by ASUS for capturing RGB information, and the UST-20LX sensor manufactured Fig 9 displays the ownership map of grid of the user's staying habitat and the pre-learning result.The ownership map of grid has 19,255 pixels that are categorized as unoccupied space.