Automated Deception Detection Systems, a Review

Humans use deception daily since it can significantly affect their life and provide a getaway solution for any undesired situation. Deception is either related to low-stakes (e.g. innocuous) or high-stakes (e.g. with harmful situations). Deception investigation importance has increased, and it became a critical issue over the years with the increase of security levels around the globe. Technology has made remarkable achievements in many human life fields, including deception detection. Automated deception detection systems (DDSs) are widely used in different fields, especially for security purposes. The DDS is comprised of multiple stages, each of which should be built/trained to perform intelligently so that the whole system can give the right decision of whether the involved person is telling the truth or not. Thus, different artificial intelligent (AI) algorithms have been utilized by the researchers over the past years. In addition, there are different cues for DDS that have been considered for the previous works, which are either related to verbal or non-verbal cues. This paper presents a review on the basic methods and the used deception detection techniques for the recent 10 years, that were studied and performed in the field of DDS, with a comparison of the deception detection accuracy reached and the number of participants used for system training.


I. Introduction
Deception is defined as concealing the truth from other individuals using face and body gestures [1]. People tend to use deception for many reasons. From a psychological perspective, there are two types of deception, low-stakes (face saving) and high-stakes (malicious deception). The low-stakes is related to human social life and it is not necessary to be detected while the high-stakes is necessary to be detected because this type is considered as malicious deception. For example, interviewing is necessary to detect whether either suspect person is guilty or innocent [2]. Many researches and studies have been conducted to detect the second type. Moreover, the person that tends to lie uses cognitive load than the innocent person because deception requires thinking and imagination before answering any question [3][4][5][6][7][8]. Recently, DDS has been widely used in different applications, such as security, hiring new employees for business, criminal investigation, law enforcement, terrorism detection …etc. [9].
The earliest implementation of DDS was in the polygraph test, commonly referred to as the lie detector, which detects suspected persons based on measuring different psychological cues, such as blood pressure, pulse rate, brain activity, respiration, and skin color change [10][11][12]. The polygraph test has several drawbacks, such as requiring a high level of training and violating the participant's body (physical contact). It is also inclined to the difficulty of distinguishing the high error rate for false positives for stressed innocent participants, or false negatives when emotions are controlled by guilty participants [13][14][15][16][17]. These problems prompted the use of other methods, yielding more reliable and non-invasive techniques, such as the visual feature extraction from suspects' face and body. Deception features can be classified as either verbal or non-verbal. Each type contains specific categories. The verbal cues are extracted from the voice analysis while non-verbal cues are extracted from various physical measures, including full body motion, head movement, facial expressions, eye gaze, pupil dilation, and eye blinking [18,19]. Figure-1 shows the classifications of deception detection features. The next two sections will discuss verbal and non-verbal features.

II. Verbal Features
The voice tone can directly reveal the internal intent of participants and determines whether the subject is deceptive or not. There are two states, in which the voice tone either rises or becomes lower. The tone rises when a person becomes angry or excited, while the tone lowers in sadness and shame. When a suspect talks, the voice tone differs whether the person was under stress or not. Thus, voice can be used as a non-invasive technique for DDS. The voice tone is considered as a verbal feature from which the researchers can determine the deception state for participants [20]. A voice analysis-based DDS study [5] detected the mean fundamental frequency (F0) and formant frequencies (F1, F2). It was concluded that when a person is under stress due to deception, F0 value increases for all participants. The values of F1 and F2 also increase for some participants, but not all. Figure-2 shows the results of mean F0 at normal (baseline) and stressed states for 12 participants.   Another study was designed to investigate deception using human voice [21]. The used database was available online and the collected video clips were extracted from real world. The designed algorithm consisted of several steps. At the beginning, the extracted speed segments are passed to the normalization process, followed by applying hamming window that is used for each speech signal. Then, Discrete Wavelet Transform (DWT) was used in order to obtain time-frequency features of the selected speech signal. A reduction process was performed on the collected features, which included the calculation of signal energy, entropy, skewness, kurtosis, and standard deviation. Finally, an Extreme Learning Machine (EML) was used for classification. The detection accuracy was 91.66% when tested on 24-speech examples only.

III. Non-verbal features
These are more likely considered for DDS due to the efficiency and high detection accuracy. These features are listed in Figure-1, some of which , including eye blinking, head movements, and facial expressions, are going to be discussed in details below, since they are most commonly used and attracting more attention in research works recently. Psychological theories behind the Non-verbal cues-based DDS technique were proposed in 1850 [22] , in 1851 [23], and in 1872 [24]. However, Darwin"s theory has not been tested until lately [25] after performing several experiments. The team declared that facial expressions due to concealing emotions are completely different than those of normal persons [25]. Another research found that emotional leakage can happen everywhere on the human face. All the above mentioned research works utilized facial action coding system (FACS) which was developed by an earlier work [26]. FACS is a comprehensives system that distinguishes seven classes of emotions, namely anger, surprise, fear, sadness, happiness, disgust, and contempt. It categorizes all visual facial activities into 44 unique Action Units (AUs). Each AU is related to specific facial muscles. These AUs, which are either a single one or a combination of several AUs, are also referred to as emotion-specified facial expressions. For example, to represent the happy state, it is required to activate both AU 6 and AU 12 [6,27,28,29]. Non-verbal features are:

1) Full Body Motion
Full body motion means tracking the motion of all human body parts. There are many types of techniques that are used to detect and recognize full body motion. The first technique depends on silhouettes without more detailed appearance information. The second technique depends on the use of Histograms of Oriented Gradients (HOG). While the third technique is the use of deep learning [18].

2) Eye Gaze
This is another non-verbal technique based on identifying eye gaze direction. The signal sent from human eye is considered as a rich source of information because this signal directly reveals the mental process. Eye gaze direction estimation is used to determine the feelings, imagining, remembering something happened, lying, and performing internal dialogues. The gaze direction can give an indication of the mental process, which leads to help to detect whether the person is innocent or guilty. Both eye motion and estimation of gaze direction are related to nonverbal cues that are used in DDS [18,30,31].

3) Head Movement
Other cues for deception detection are based on analysing head movement and position. When suspected persons tend to deceive, they relatively move their head in non-regular patterns or in different direction due to the use of more cognitive load. While the innocent persons move their head in a regular or in specific direction or they do not move their head during the interview because they utilize less cognitive load. Many algorithms are used for determining head position, most of them depend on the holistic approach that can either use the displacement of the Region of Interest (ROI), which is a face part, like eyes, mouth…etc., or the whole head. The major advantage of this approach is providing a complete picture due to its dependence on a local approach using ROI, which leads to a more comprehensive information [7] .
A developed technique for head movement detection was proposed [7]. The algorithm consists of several steps. The first step is capturing the first frame and transforming it from a coloured image, RGB, into a grayscale image, then performing face detection using Viola-Jones algorithm. The second step selects the local ROI from the detected face image with no or little movement in order to be used for optimal head motion. The third step is performing convex hull function to determine the centroid, followed by determining reference points. When the next frame comes, the centroid for this frame is computed to determine the output. Figure-3 shows the red point that represents the centroid of the first frame, which is considered as reference, while the centroid of the current frame is the yellow point, which is used to compute the current position of head. Finally, the blue points are marked as reference points for computing the centroid for the current frame. This study was performed on ten participants with a detection accuracy of 58.25%. Another study [32] focused on deception detection based on blob analysis. The used technique analysed the movement of both head and hand as well as its dependence on identifying skin colour [32].

4) Pupil Dilation
When the eye pupil dilates, it becomes bigger than normal. The size of pupil is affected by two factors; the muscles in the coloured part of the eye (iris) and the amount of light directed to the eye.

5) Eye Blinking
Eye blinking count is one of the most common non-verbal cues for deception detection. Eye blinking count means the number of times that the human eye performs blinking. It is usually used with eye blinking duration as features to distinguish lying or telling the truth. One related study [8] showed that blinking count and duration increased during deception. An algorithm was designed for detecting blinking using AU 45. The algorithm starts by capturing a sequence of images then performs landmark detection on these images, as shown in Figure-  In that study [8], the distances between the landmarks are used to detect whether the eye is open or closed. For instance, the equation below can be used to determine the distance for eye opening using left eye lower (eye LL), left eye upper (eye LU), right eye upper (eye RU), and right eye lower (eye RL) points, as in Equation (1)  (1) For blinking duration calculation, the study emphasized the necessity of using a high-speed camera with specific resolution to calculate the time required for one frame. This time is multiplied by the total number of effective frames collected during participant interviewing. The following formula calculates the eye blinking duration [8]: Blinking duration = number of frames × time for one frame (2) Another study performed by Elkins employed blinking rate to identify deception. The result of this study on 176 subjects showed that the blinking rate increased during deception because the subject utilized cognitive load while thinking to answer the question during the interview. The detection accuracy of this study was 93% [33].

6) Facial Expressions
Human face is considered as a rich source of emotional expression. Each facial muscle is responsible for a specific emotion; these muscles are encoded into AUs. These AUs are encoded according to FACS.
The facial expressions are the most popular and more reliable cues for DDS. Each expression can be described by its related AU, where each AU is related to a single or a combination of facial muscles. The AUs are encoded based on FACS to design a DDS, which can distinguish innocent from guilty participants. A previous work [34] presented a DDS that consists of three stages. The first stage is video recording and dataset collection, in which each participant was asked several questions with either truthful or deceptive answer. The second stage is feature extraction in the form of AUs. Eight AUs are represented as indictors for deception. Table-1 demonstrates the selected features for the proposed DDS. The study was performed on 43 participants. The recorded videos were used for training and testing the system. The detection accuracy was 84%. Another research team [35] designed an automatic deception detection system that depends on facial clues. They detected specific AUs and used them as indicators for deception. These AUs are AU1, AU2, AU4, AU12, AU15 and AU45. Table-2 shows that each AU is responsible for a specific facial expression for the mentioned study. The detection accuracy of this technique was 76.92%. Eyelids close and open rapidly Eyes Finally, the differences between verbal and non-verbal features are as explained in Table-3.

Table 3-The main difference between verbal and non-verbal features
Verbal Features Non-Verbal Features Using direct communication between participants and interviewer Using non-direct communication between participants and interviewer Speech signal is considered as the only feature used for direct communication Include different types of cues like facial expressions, eye gaze, pupil dilation, head movements, eye blinking and full body motion Easy to analyze Difficult to detect and analyze Less popular and considered less efficient compared with non-verbal features, because they achieve less detection accuracy More popular and high efficiency because they achieve high detection accuracy

IV. Discussion of the Used Deception Detection Techniques
A DDS mainly consists of three stages, which are video capturing and pre-processing stage, features extraction stage, and finally, the classification stage. After applying the required system stages for each research work, different deception detection techniques were used. The techniques that are used for the last decade are listed in Table-3, demonstrating the number of participants, features" details, and detection accuracy for each work. The table aids in providing a broad view for the recent DDS works and, accordingly, recognizes the pros and cons for these works so that a decision can be made on the most efficient techniques [34].
By analysing Table-4 with respect to accuracy, it is concluded that the extraction facial microexpressions-based DDS technique used by [36] scored the highest accuracy of 85%. However, the number of participants was only 4, which helped in reducing the load on the classification process. The works of [34] and [37] achieved the second highest accuracy of 84% equally. The work in [34] depended on facial expression, specifically AUs, as the base technique for deception detection, while that in [37] considered measuring temperature change in the nose area. In [34], 43 participants were tested, while in [37], only 11 participants were tested. Accordingly, the work of [34] is considered as presenting the optimum performance deception detection technique since it accomplished high accuracy with relatively high number of participants. Other researchers used other DDS techniques, achieving accuracies between 70% to 80%. The used techniques were facial expression, facial micro-expression, thermal imaging, and measuring brain activities. The highest accuracy among them was 79.2%, obtained by [40] using thermal imaging technique and testing 27 participants. The other accuracy of 70.26%, with 270 participants was obtained from Mafia game database collected from the web [43], who depended on facial expressions. The worst (lowest) accuracy of 58.25% was obtained by [7] that considered detecting head movement on 10 participants. Finally, other works which did not include accuracy determination cannot be discussed and compared here.

V. Conclusions
The present paper presented an overview about the automated deception detection systems that are used in different applications, such as security, hiring new employees for business, criminal investigation, law enforcement, and terrorism detection. There are different types of cues that are used for deception detection. These cues or signals that fall into one of two categories, either verbal or nonverbal. The non-verbal features are more likely to be used than the verbal ones, due to their simplicity and provision of high detection accuracy. The additional details of these features were demonstrated in this paper. The different deception detection techniques introduced by research works that were performed over the past decade were listed, including the accuracy levels, number of participants, and type of used features. These works" results were analyzed in details after Table-4, and accordingly, the work in [34] yielded the optimum DDS technique due to its high accuracy level and relatively high number of participants.