Real-Night-time Road Sign Detection by the Use of Cascade Object Detector

Variations in perspective, illumination, motion blur, and weatherworn degeneration of signs may all be essential factors in road-sign identification. The current research purpose is to evaluate the effectiveness of the image processing technique in detecting road signs as well as to find the appropriate threshold value range for doing so. The efficiency of the cascade object detector in detecting road signs was tested under variations of speed and threshold values. The suggested system involved using video data to calculate the number of frames per second and creating an output file that contains the specified targets with their labels to use later in the final process (i.e., training stage). In the current research, two videos captured some types of traffic signs (40, 60, 80, and cross signs) in Palestine and Al-Rubaie streets during night time in Baghdad city. The practical significance is demonstrated here by using the optimal threshold value for more accurate object detection. Through an increase in threshold values, results show that the highest precision value, which is equal to one, occurred for crossroad sign relations with stable behavior, followed by 80 (i.e., 1-0.824) and 60-speed signs (i.e., 1-0.315), respectively, with positive relationships, and ended by speed sign 40, which witnessed a reverse relationship with increasing threshold values until the breakdown case took place, which usually occurred above the threshold value equal to thirty (i.e., 0.471-0.134)


Introduction
With the increase in vehicle numbers on the road, the management of traffic signs is under increased strain, and hence speed limit signs must warn drivers and pedestrians. The latter is usually used to regulate traffic and show the state of the road, hence the need for an intelligent system is persistent [1,2].
Many methods for detecting and identifying traffic signs have been developed. Major automakers are conducting extensive research on real-time and automatic traffic sign recognition in partnership with universities and other institutes, to incorporate it into so-called "Driver Support Systems" [3]. The indicators from a previous database established by training are recognized using the cascade object detector. In any object recognition system, there are two key processes in the revelation process. The first is detection, and the second one is recognition. Based on colors, the traffic signs are categorized into groups by the use of the large database obtained by training the video after detection. As a result, training is an essential mode of any object detection system. The color distinguishes the road signs from each frame in the detection. The traffic signs are classified using a large database created by training the footage after colour detection [4]. As a result, any object detection system must include training as a component. Developing a system that can independently navigate a vehicle is becoming a more fascinating topic. For monitoring the environment, the vehicle is outfitted with sensors such as radar, laser, GPS, and a camera. The most widely used method for developing such a system is to combine a camera with computer vision technology. In comparison to other sensors, a camera gives a lot of information and is a low-cost instrument [5]. Sometimes road signs are positioned in various orientations and at various heights, and they may be obscured by trees, dust, or fade away over time. However, because road signs are lit at varying intensities at different times of the day, traffic sign detection necessitates a very strong algorithm implementation [6].
In traffic-sign detection, several problems may be involved, like variations in perspective, illumination, occlusion, motion blur, and weather-worn deterioration of signs [7]. Some of the challenges that drivers confront are time constraints during the day, overcast weather days, and poor visibility [8]. The majority of the time, drivers are unaware of traffic signs. However, one of the biggest causes of accidents is negligence on the part of the drivers. On many occasions, accidents happen when the weather is terrible or when people are intoxicated. Accidents have become a big social problem in recent years. Every day, human lives become increasingly unpredictable. The need to create an automatic system to cope with traffic congestion is critical [9]. The presented work aims to detect some road signs by adopting a cascading object detector under different speeds throughout the night to investigate the best circumstances for achieving the goal of obtaining robust road sign identification. The next sections will include the related works on such topics of interest; a general preview of the cascade object detector; the proposed system with fine details; the section of results that shall introduce the performance of such a technique in detecting road signs; and finally, the section of conclusion, which shall present the most important results throughout the whole research.

Literature Review
Many researchers have adopted various techniques for detecting and recognizing road signs. R. A. S. et al. [10] used a basic image processing technique for automatically recognizing two different traffic signs: stop and speed images. The proposed method detects the location of the sign in the image and the processing methods include RGB domain thresholding, dilation of an image, mapping of the region, and thresholding. The algorithm recorded an accuracy of over 80%.
M. B. Mohammad et al. [11] used a driver assistance system located on board the vehicle to detect the traffic signs, alert the driver about the environment ahead, and help in preventing possible accidents. They used the Lucy-Richardson filter to preprocess the corrupted frame and to identify and extract potential symbols. They performed eight connected component analyses with a multi-class SVM classifier to classify them. The results showed an audio output so that it plays corresponding to the extraction process. In addition to that, the extracted objects are finally displayed on a video screen.
A. Chigorin et al. [12] used a system for large-scale automatic traffic sign recognition and mapping. The system was trained on synthetically generated data and did not require laborintensive labeling of the training data. The authors evaluated the proposed system based on the Russian traffic sign. Their results showed that the usage of a deep neural network on a cascade detector yielded an improvement to the hit rate of the detector on average by 7% in comparison to a cascade trained on dipole features. Their results showed that color features enhanced the accuracy and detector speed in a significant way. In addition to that, they showed that training from synthetic data presented better accuracy than training from real data.
F. Nasser et al. [13] proposed a video system for continuous feature descriptors and matching. Such a system contained three steps; image transformation using the Haar filter, feature detection of the interesting points from accelerated segment tests (FAST), corner detection, and the third step included the use of Speeded Up Robust Features (SURF) in which the points were described. They found that the algorithm of FAST corner detection along with the SURF descriptor of feature, tracking, and matching adequacy is faster, better, and more efficient than the Scale Invariant Feature Transform SIFT descriptor and SURF key-points, which could be considered optimal in the process of matching accuracy. S. J. Shahbaz et al. [14] used a SURF object detector to recognize different samples of road signs in daylight. Results showed that the highest precision occurs in the threshold range (20-25) for all used signs except for the cross sign symbol, which witnessed its highest precision at lower levels of the threshold value (i.e., 5).
H. Ayad et al. [15] introduced a modified adaptive segmentation technique, namely the Saliency Cut method, to modify the segmentation problem on real-world images. Their proposed method improves some of the segmentation problems and outperforms the current segmentation method with an accuracy of 77.368%. It can also be used as a very useful step in enhancing the performance of visual object categorization.
L. Wang et al. [16] proposed one of the most essential functions in intelligent transportation: the Advanced Driver-Assistance System (ADAS). ADAS outperforms traditional modes of transportation in terms of passenger safety. Their results showed that color recognition of the traffic cones was extremely accurate, with success rates of 85%, 100%, and 100% for red, blue, and yellow cones, respectively. Additionally, by combining color and depth photos, 90% of the traffic cones' distance was correctly perceived.
S. B. Wali et al. [17] introduced a method that is insensitive to the changes in lighting, rotation, translation, and viewing angle and yields a short processing time with a low falsepositive rate. The utilized system, which included RGB color segmentation and form matching, as well as a support vector machine (SVM) classifier, yielded good results in terms of accuracy (95.71%), false-positive rate (0.9%), and processing time (0.43 s). The system's accuracy was good, and its processing time is relatively short, which will be useful for identifying traffic signs, particularly on Malaysia's highways.

Cascade Object Detector
The Cascade object detector system comes with several perceptual classifiers for detecting frontal faces, profile faces, noses, eyes, and the upper body. However, these classifiers are not always sufficient for a particular application [18]. A cascade of classifiers is a degenerated decision tree made up of stages of increasing complexity, with the first stage training a classifier to detect almost all objects of interest (traffic signs) and then triggering the evaluation of the second stage classifier, which has also been adjusted to achieve a high detection rate [19]. Each sub-window is subjected to a series of classifiers. The number of sub-windows has drastically decreased after numerous phases of processing [20]. The detection process takes the shape of a degenerate decision tree and is depicted in Figure 1 as a "cascade" detector [21]. Several integrated (nested) layers, each containing a boosted classifier, make up a cascade of boosted classifiers. The cascading function works as a single classifier that combines the results of the previous steps.

Proposed System
For one frame that contains the specified road sign, the first step in the cascade technique involves labeling road signs manually with a rectangle box by the use of an image labeller after dividing the captured video into several frames, and the labeling process can be presented in Figure 2. The results can be saved in a (.mat file) named (A) for an example to go ahead toward the next part (i.e., the training stage) where the saved file (A) is summoned to use the cascade approach to process the training process. The training process consists of twenty steps toward getting an XML file and then saving it in a specified folder (B) to use it later in the differentiation process. Additional videos not included in the training process and the marking process could be included in such a file. So, the images in the final will be identified as real or false detected targets (dit). The cascade object detector can be best summarized by the block diagram shown in Figure 3.  And the sequential steps can be shown in the following algorithm: Algorithm I-Cascade object detector in detecting road signs.

Output: new folder (result) that includes frames (images) extracted from the reading frames of an input video (vid).
Steps Occasionally, the computer does not identify the indicators, so a fake sign may be detected, as presented in Figure 4 as an example. Such a problem can be eliminated and solved by raising the specified threshold value optionally.

Tools and Dataset Acquisition
In the current work, the captured video is recorded by an iPhone 12 Max mobile camera with a 12 MP resolution. Two videos captured some types of traffic signs, which do not exceed five seconds for each video. The road signs are four different traffic signs (40, 60, 80, and cross) as shown in Figure 5. Such signs were located at different positions with various orientations in Baghdad city (Palestine and Al-Rubaie Streets). The condition of video shooting is executed here at night-time under different speeds at a height of 1.5-1.6 m above the ground, as shown in Figure 6. A laptop was used with the following specifications: Core i7-8550U, RAM: 8 GB, 64-bit architecture, 1.99 GHz Intel(R) CPU. The software program is executed by using MATLAB (R2020a),

Results and Discussions
To measure cascade performance, the following equation can be used to calculate the precision estimator [24] Where TP and FP are the correct and misclassified positive instances, respectively. Figure 7 shows the result of target detection by using the cascade technique under various threshold values (5-35) for 40, 60, 80, and cross signs, respectively, where the first column represents the detected image, the second column is the labeling stage, and the third column shows the extracted detected sign at final. Table 1 summarizes the estimator's variation (i.e., dit, TP, FP, and P) with increasing threshold values where th, dit, Tp, Fp, and P represent threshold value, total detected instances, true positive, false positive, and precision value, respectively.  After several attempts, it was finally approved to use threshold values in the range of 5-35. Increasing the value of the threshold restricted the cascade's performance at last. By increasing threshold values, the detection instances sometimes decrease, as seen for speed signs at 60, 80, and cross signs. Other times it remains constant, as for speed sign 40. Figure 8 shows parameters' variations with increasing threshold values (i.e., 5-35). Through the increase in the threshold value, it can be noticed that the value of TP records a disparate variation for all used signs. TP recorded stable behavior through the threshold value range (5-25) for speed sign 40, as shown in Figure (8a). The threshold value is equal to 15, as the maximum value recorded for the speed sign 60, as shown in Figure (8b). On the other hand, the largest value for TP resulted in the lowest threshold values for sign 80 and cross sign, as in Figures (8c) and (8d), respectively. For all speed signs, in addition to the cross one, the largest value for detection parameter occurred for speed signs 80, 40, cross sign, and 60 at final. In contraction to the previous relationships, FP's relations are the opposite of TP's curves through linear relations, which are close to zero, resulting in the cross sign, followed by an 80-speed sign, respectively. In the precision-threshold relationship shown in Figure 9, a stable state with a higher value was recorded for the cross sign, followed by positive relationships with increasing threshold values for speed signs 80 and 60. Due to its multi-angle shape, the speed sign 40 recorded a reverse relationship while increasing its threshold value to reach its break down state at a threshold value greater than thirty.

Conclusion
The proposed system involved calculating the number of frames per second from video data and producing an output file containing the specified targets along with their labels for use later in the process (i.e., the training stage). The important challenge of automatic night-time detection of traffic signs has been solved. Through increasing threshold values, the technique of detecting road signs is accurately executed with high precision for cross signs, followed by speed signs 80 and 60, respectively, with positive relationships. Regarding its shape, the speed sign 40 witnessed a reverse relationship with increasing threshold values. Studying alternative strategies for object detection and comparing them is a promising avenue for future research.

Compliance with Ethical Standards
There are no conflicts of interest besides the fact that the research did not receive any specific funding but was performed as part of the employment of the authors.