Effects of Using Static Methods with Contourlet Transformation on Speech Compression

Compression of speech signal is an essential field in signal processing. Speech compression is very important in today’s world, due to the limited bandwidth transmission and storage capacity. This paper explores a Contourlet transformation based methodology for the compression of the speech signal. In this methodology, the speech signal is analysed using Contourlet transformation coefficients with statistic methods as threshold values, such as Interquartile Filter (IQR), Average Absolute Deviation (AAD), Median Absolute Deviation (MAD) and standard deviation (STD), followed by the application of (Run length encoding) They are exploited for recording speech in different times (5, 30, and 120 seconds). A comparative study of performance of different transforms is made in terms of (Signal to Noise Ratio,Peak Signal to Noise Ratio,Normalized Cross-Correlation, Normalized Cross-Correlation) and the compression ratio (CR). The best stable result of implementing our algorithm for compressing speech is at level1 with AAD or MAD, adopting Matlab 2013a language.


ISSN: 0067-2904 INTRODUCTION
The speech communication plays an important role in every day applications, particularly after the creation of cell phones and Internet services, which generated the possibility of transmitting voice over the networks in a digital format. Therefore, there is a need for a method to store the speech signal as small as possible and to provide an algorithm to compress data, thus reducing the amount of space needed for the data and reducing the size of the required kit [1]. Speech compression methods primarily target the casting off quick time period correlation amongst speech samples and long time period correlation among repeated pitch patterns. In practice, there is always a trade-off between bandwidth utilization and speech quality. Speech compression is appreciably used in many functions, such as mobile telephony, video teleconferencing systems and mobile satellite [2]. The main methods of speech compression used nowadays are waveform coding, transform coding, and parametric coding. Waveform coding attempts to reproduce input signal waveform at the output. Transform coding at the establishing of the technique sign is transformed into the frequency domain, afterward only dominant spectral points of sign are maintained. In parametric coding, indicators are represented through a small set of parameters that can describe them accurately [3].

Related work
Many researchers have presented many works of speech compression to discuss the rate of compression and the quality of speech compression. Researchers such as Maher [4] suggested a onestep combination of speech compression and encryption, using Contourlet transformation and compressive sensing simulation. The results showed the quality of the reconstructed speech and the coding strength of the audio signal, with a good compression ratio. Maher and Ali [5] implemented a single step new method for speech signal encryption and compression. Combined procedures for compression / encryption are conducted using compressive sensing (CS).To increase the sparsity of the signal needed by CS, the contourlet transform is used. The chaotic system is used to generate the CS sensing matrix due to its randomness and very high sensitivity to initial conditions. This significantly increases the key encryption size to 10 135 then using a logistic map. Alsaif and Albadrani [6] suggested a method for speech compression using wavelet transformation and contour transformation. They took speech (which is often a single dimension) into a twodimensional array (to be suitable for transferring it to contourlet transformation), and then implementation wavelet transform. Afterwards, Contourlet transform was applied on coefficients of high wavelet. After transforming or storing speech, t decompression was applied by using an inverse way of those transformations. The measurements of SNR, PSNR, NRMSE and Corr. were applied to test the performance of the results, which indicated very good results.

Problem statement
We noticed from previous research that the Contourlet coefficients (especially the high-frequency band HH) were not studied through the application of the four statistical methods of IQR, STD, AAM and MAD, in terms of the effects of zeroing or canceling those coefficients (especially HH), their effectiveness on the percentage of compression, and the quality of the sound recovered from the compression process,. Based on the above notion, the idea of this research has emerged to present a method of compressing speech by deleting non-influencing samples via the application of several steps. In the first step, the speech signal is converted into a two-dimensional matrix to fit the Contourlet transformation. In the next step, four statistic methods (IQR, STD, AAM and MAD) are used to obtain threshold values. The performance of the work is assessed through the use of the measures of Compression Ratio (CR), Signal to Noise ratio (SNR), Peak Signal to Noise Ratio (PSNR), Normalized Root Mean Square Error (NRMSE), and Normalized Cross-Correlation (NCC), which were measured for reconstructed speech obtained from Contourlet based speech compression techniques [7].

CONTOURLET TRANSFORMATION
Contourlet transformation was presented by an earlier work [8]. It was a new representation of the two-dimension data that can capture the fundamental geometry of the information and escape multiple flexible resolution and data scanning in a topical and directional way. It is based on the integrate ration of two structures of filters: the Laplacian Pyramid (LP) and the Direction Filter Bank (DFB). The final result is called Contourlet transformation [9]. This is illustrated in e Figure -1.
Laplacian Pyramid (LP) filter is used to analyse input data from low frequency (LL), low high frequency (LH), high low frequency (HL) and high frequency (HH). In each level, LL and band pass (LH, HL, and HH) are given. The band pass of data enters the Direction Filter Bank (DFB) that produces Contourlet coefficients. Low-frequency enters a second time into the Laplacian pyramid. This process is repeated until the exact details are obtained. Figure-1 demonstrates the process of obtaining Contourlet coefficient by using LP and DFB [10].

Laplacian Pyramid
The hierarchical filter is derived from the Gaussian pyramid. It is the representation of incoming data for several levels that occurs from, through repeated filtering of the analysis using the hierarchical Laplace filter. This involves two steps. The first is by analysing the Gaussian pyramid and the second is obtained from the Gaussian pyramid to the Laplace pyramid [10].

Direction filter bank
Direction filter bank contains information on high frequencies, such as smooth contour and speech information. It is implemented by the decomposition method of k-level binary tree, followed by 2k directional sub bands, where k is a positive integer [4]. This is illustrated in Figure -

THRESHOLDING METHOD
The success of Contourlet transformation in speech compression and reduction of the size of speech depends on the thresholding method. Different threshold methods have been proposed, sharing a common approach to reduce Contourlet transformation coefficients. These thresholds can be applied in the HH band pass at each level. In this research, four thresholding statistic methods are compared. Interquartile Filter (IQR) IQR filter is the range of the middle 50% of a distribution. It is calculated as the difference between the upper quartile and lower quartile of a distribution. Since an outlier is a remark which deviates very often from the other observations, then any outliers in the distribution must be at the ends of the distribution. The variation measure of dispersion can be strongly influenced by outliers. One solution to this problem is to cast off the ends of the distribution and measure the range of scores in the middle. Thus, the IQR will eliminate 25% of the distribution in the backside and 25% in the top. Then, the distance between the extremes of the centre of the 50% distribution that remains is measured [11].
IQR is a strong measure of variability. The general formulas for calculating Q1 and Q3 are given in equation (1), (2), and (3): Standard deviation is the measure of dispersion of a set of data from its mean. It measures the absolute variability of a distribution. The greater the dispersion or variability, the greater is the preferred deviation and higher will be the magnitude of the deviation of the data from their mean. Standard deviation is also acknowledged as root-mean rectangular deviation, as it is the squared root of capability of the squared deviations from the arithmetic mean [12], as shown in equation (4): where the value of the point in the data set x the mean value of the data set and n the number of points in data set.

Average Absolute Deviation (AAD)
Average Absolute Deviation is a mathematical statistical measure that depends on the use of the debtor to find the mediator [13]. The average is the mean value in a data set, as in equation (5):

Median Absolute Deviation (MAD)
For forecast results, Median Absolute Deviation (MAD) is used as a measure of accuracy. MAD plays a critical role in realizing that the predictions are accurate and reliable. By seeing how successful or unsuccessful the current forecast has been, forecast error will be reduced and a more accurate and correct forecast next round will be generated [14]. MAD is the median distance among the mean of a set of numbers, as in equation (6):

EFFICIENT CRITERIONS
The quality of speech data is retrieved from a Contourlet by efficient criterions (SNR, NRMSE, CORRLATOIN, NCC).

Signal to Noise Ratio (SNR)
This test gives the value of speech signal to noise energy [15], it is calculated as shown in equation (7): where X is the original speech signal and Y is the compression speech signal.

Normalized Root Mean Square Division (NRMSD)
Comparing data with sets and models of different scales is simple and measured by RMSD. It is a common way for measuring the performance of models by equation (8) [16]: where y is the original speech signal and x is the reconstructed signal. It is then easy to measure NRMSD by equation (9): Lower NRMSD values indicate less variation among data or models.

Normalized Cross-Correlation (NCC)
When two signals such as (x, y) are compared, the structured cross-correlation differs between (1 and -1). If the standardized cross-correlation value is (1), this means that the two signals are the same. When their value is (-1), the two signals differ [5]. The normalized cross-correlation for two signals is defined in equation (10):

The Correlation
It is an indication of the relationship between variables, such as signal (x, y) [13]: Cov: covariance, (∂x ∂y): standard division of (X, y).

Compression Ratio (CR)
The compression ratio, also known as the compression power, is used to measure the datarepresentation size reduction created by an algorithm for data compression [6]. It counts the ratio of the number of zeros in the compressed speech to the number of coefficient in the original speech, as show in equation (12): X: set of data in compressed speech, and Y: set of data in original speech.

Run length encoding (RLE)
It is a simple way to compress data without losing information. The concept of this method is that when the value of x is repeated successively for a number of times n, then the repeated x values are replaced by the number of value n. This method is called the Run length encoding (RLE) [6].

THE ALGORITHM (METHODOLOGY)
Speech compression is one of the important operations in digital signal processing domain. It can be done using various algorithms or methods. The algorithm of this research is used by losing data compression for speech signal processing. We suggest a method that can eradicate values out of threshold bands in each band of high frequency for Contourlet coefficients. The threshold that we have applied is one of four statistic methods (IQR, STD, MAD, and AAD). The methodology uses four steps: • Step1: In this step, recorded speech is stored for (5, 30, and 120 seconds), Then it splits into data frames and then convert them from one dimension vectors to two dimensions arrays. • Step2: When pre-processing the data in Step 1, a Contourlet transformation of the two-dimensional array is calculated at levels 1, 2 and 3, as shown in Figure-2. At each plane, a low frequency is given with a high-frequency band passing. The low frequency is converted to the next level, while the high frequency sub bands pass enters into the directional filter bank which produces the Contourlet coefficients. Then the threshold limit (thr) is calculated depending on statistical methods for each sub band passing in the high frequency, as the data is not in a normal distribution. The data in each sub band is compared with the threshold (thr), then the data with a value less than the threshold is reset. Then the algorithm known as run length encoding (RLE) is executed, as shown in Figure-3. • Step3: This step represents the receiving side of the signal. The received data is decoded using an RLD algorithm, then the Contourlet Transform algorithm is decoded and rebuilt to convert a twodimensional set of data into a vector speech signal. Efficiency standards (SNR, PSNR, NRMSE, CORRELATION, NCC, CR.) are applied. The final result is that the speech is clear and acceptable to be heard after decompression, if the correlation coefficient of the original signal with the signal after decompression is greater than the threshold limit , as shown in Figure-

EXPERIMENTS and COMPRESSION
The algorithm we suggested in the previous section is applied on speech samples of different sizes [5sec., 30sec., 120sec.] in 22050 HZ, using matlab13. The results of implementing speech compression on the first three Contourlet transformation levels are recorded in table1, with four statistic methods. , a multifarious value of efficient criterions was recorded. We can observe that CR was decreased progressively from level1 to level3. We also can note that the CR values were stable in level1 for all speech samples, compared to the level2 where CR values are relatively varying. While we also noted that the CR in the level3 was lower than that in the other levels. The data in table1 also show that the size of the speech file has effects on correlation and other efficient criterions. For small speech file size, we can get high values of efficient criterions of speech quality. Figure-3 shows the relation of correlation and speech file size in level1. To determine the best statistic method (IQR,STD,MAD,AAD) as threshold with Contourlet transformation in speech compression of table1 data, the following equation (Eq13) was proposed, depending on the preference of efficient criterions. The results are shown in table-2: BRT= ( * 40 + correlation * 30 + (snr + psnr + CR )* 10 ) / 100 (13) Where =CNN*109, BRT the best result threshold As related to level2, it is clear that each of AAD and STD where better than the rest of the metrics. But CR here was very varying to different speech samples. It is also unstable with a certain limit, and it is less than level1 rate. At level3, STD has achieved the best result, but the values were generally less than the required limit at the suggested algorithm. For this reason, level3 can be neglect   Figures- (4, 5, and 6) shows that the best result of BRT can be implement with AAD or MAD in level1. As mentioned, the IQR has failed to get satisfactory results at all levels and with different samples. Although there is a high CR of some samples, but the rest of efficient criterions values where unacceptable. Table-3 A comparison of the SNR value and CR% in the proposed algorithm and other related methods. From the table, it is clear that the proposed method gives better SNR and CR% values than those provided by M. K.M. and A. M [4] and Khalil I. Alsaif and H. S. Albadrani [6], but the results are not better from those of Maher K. Mahmood Al-Azawi and Ali M. Gaz [5].

CONCLUSIONS
The algorithm application proved that the best stable result of the efficient criterions of using statistic methods as a threshold for Contourlet transformation of speech compression is at level1, as reached by using AAD or MAD.While IQR did not check out a good result. It is also clear that the percentage of correlation can be increased by dividing the large speech sizes into smaller sizes. It is also noticed that zeroing of samples in the Countourlet coefficients did not affect the quality of the received sound at the receiving end.