Machine Learning Approach for New COVID-19 Cases Using Recurrent Neural Networks and Long-Short Term Memory

This research aims to predict new COVID-19 cases in Bandung, Indonesia. The system implemented two types of deep learning methods to predict this. They were the recurrent neural networks (RNN) and long-short-term memory (LSTM) algorithms. The data used in this study were the numbers of confirmed COVID-19 cases in Bandung from March 2020 to December 2020. Pre-processing of the data was carried out, namely data splitting and scaling, to get optimal results. During model training, the hyperparameter tuning stage was carried out on the sequence length and the number of layers. The results showed that RNN gave a better performance. The test used the RMSE, MAE, and R2 evaluation methods, with the best numbers being 0.66975075, 0.47075, 0.29616625, and 0.7644 on the test data.


Introduction
Coronavirus or severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2) is a virus that attacks the respiratory system.The disease caused by this viral infection is called COVID-19.Coronavirus can cause mild disorders of the respiratory system, severe lung infections, and even death.According to the official website of the West Java government, on 15 May, 2022, 106,028 people in Indonesia were confirmed to be infected with the coronavirus.It is not a small number because Indonesia is ranked 7th in the number of COVID-19 cases on the Asian continent.
Many preventive measures have been taken by the government and the public to stop the spread of COVID-19, such as the implementation of the large-scale social restriction system in Bandung from April 22, 2020, to May 3, 2020.The restriction itself is a regulation made by the government to prevent the transmission of COVID-19.There are many rules, such as provisions for when the people of Bandung may be outside their homes, provisions for operating hours for places of business, and others.Although it aims to reduce the spread of COVID-19 in the city, there are side effects that are felt by many people in the city of Bandung.Many businesses cannot operate during the restriction period, disrupting the community's economy.With this situation, one solution is to predict the number of COVID-19 cases in the future so that this information can help the government make a policy on ISSN: 0067-2904 Yulita et al. Iraqi Journal of Science, 2023, Vol. 64, No. 11, pp: 5887-5895 5888 whether it will be enforced again or not.Forecasting the number of confirmed cases of COVID-19 can be done using machine learning, especially RNN and LSTM.
Machine learning is the ability of a machine or computer to learn something [1].With artificial intelligence (AI) embedded in a machine or computer, the machine can process the given input and give the desired output.It itself is a part of AI, which is more specifically working with statistics and data patterns to learn patterns.Just like the learning process in humans, machines need to be given examples or teachings so that they can understand what process should be followed.One of the machine learning methods itself is an artificial neural network.
Recurrent neural networks (RNN) are one type of artificial neural network.RNN has the property of being able to present sequential or time-series data.The processed data will be influenced by the previous data instance, so that it is said to be able to remember historical data [2].Thus, the prediction of the number of COVID-19 cases in the city of Bandung can utilize machine learning technology as much as possible.Long short-term memory (LSTM) is an evolution of the RNN architecture that aims to make accurate predictions of a variable, where the variable in this case is the number of COVID-19 numbers.From many previous research results, the LSTM model is able to provide better performance than traditional machine learning models such as ARIMA [3].The difference between the use of deep learning and traditional machine learning is the ability of deep learning to perform feature extraction and feature selection automatically.
Research that specifically addresses the prediction of new cases of COVID-19 can be found in a number of machine-learning studies.Yulita et al. studied it for a province in Indonesia with traditional machine learning [11].The use of deep learning has also been applied to this prediction [12][13].However, as we know, the pattern of the spread of COVID-19 in each region is different, so different models are needed.This research utilizes deep learning for prediction in a city in Indonesia.Through existing prediction models, local governments can better anticipate this disease.

Method
A "time series" is a series or sequence of events or observations taken sequentially over time.There are data points that will be related to the fixed-time method.The method is the process of analyzing the relationship between the variables in the data and the time variable.However, time series are just historical data without any relationship to future data.Therefore, by using this data, it can be used to make a prediction of what will happen in the future by processing existing time series data [14].This study analyzed time-series predictions for new cases of COVID-19 in Bandung, Indonesia.This research was conducted in stages that include data collection, pre-processing (which is divided into data splitting and data scaling), model creation, and training.Figure 1 shows the flow of the research carried out.

Data Pre-processing
Before the training process, the data needed to be processed first in order to produce better performance.The preprocessing stage in this research was the data splitting process and the data scaling process.Data splitting is a process where the overall data is divided into training data and test data [15].The method used was holdout, which divided the entire data into 80% training data and 20% test data.The shared data will remain sequential because it includes time series data.Figure 3 is a visualization of the overall data division into training data and test data.The next preprocessing stage is the data scaling process using the min-max scaling method.The way min-max scaling works is that it adjusts the data within a certain range from a minimum to a maximum value.The range of values used is 0 to 1. Table 1 shows the data after normalization.

RNN and LSTM
RNN is one of the artificial neural network architectures where the output neurons will be reused and entered as input to the previous layer of neurons [16].Thus, when processing data at time t, it will also have a weight value from time t-N [17].The network can process errors or predictions from the past, which are described as output or hidden unit activities, for more precise and accurate future prediction calculations.
LSTM is an evolution of the RNN architecture that adds a memory cell that can store information for a long period of time [18].LSTM can be a solution to the vanishing gradient problem owned by RNN, which causes RNN to fail to capture long-term dependencies, thereby reducing the accuracy of a calculation or prediction [19].There are 3 different types of gate units used in LSTM: input gate, forget gate, and output gate.The input gate serves to determine whether an input will be added to the memory cell or not.The forget gate is useful for determining whether a memory from a previous time will be kept or forgotten.While the output gate is useful for determining how influential the memory in the cell state is on the results of calculations or predictions [20].
There is an activation function in the form of a sigmoid function in forget gates, where the result of the calculation is a Boolean value, namely 0 or 1.If the result is 1, then all data will be stored, and vice versa, if the result is 0, all data will be discarded.There are two activation functions in the input gates that are executed, namely the function to determine which value will be updated using the sigmoid function.The second is the tanh activation function to create a new vector value that will be stored in the memory cell.In cell gates, there is a function that will be executed, namely, a function that will replace the value in the previous memory cell with the new memory cell value, where this value is obtained by combining the values of the forget gate and input gate.In the cell gates, there are two functions that will be executed, namely the function to decide which part of the memory cell value will be issued; this function is in the form of a sigmoid function.The next function is a function to place a value in a memory cell with the tanh function.The results of the two gates are multiplied to produce the final output [21].
There are several hyperparameters in RNN and LSTM.Hyperparameters are variables in a model that will affect how the model works.The values will be determined before the modeltraining process.In this study, there will be several hyperparameters whose values will be changed to find the best value for each one; this process is called hyperparameter tuning.The hyperparameters that will be used in this study are the length of the sequence of inputs that will be entered into the model and the number of layers that will be used in each experiment.In each experiment, an epoch hyperparameter with a fixed value of 1000 epochs is also determined.One epoch is counted when all data has gone through one forward process and one backward process in the model training process.This study used only one variable, namely the number of confirmed cases of COVID-19.The hyperparameter tuning involved two variables, namely the length of the sequence and the number of LSTM layers.The sequence lengths used were 5, 7, 10, and 14.The number of LSTM layers used was 1, 2, 3, and 4. For each set, the hyperparameter was carried out five times, with 1000 epochs for each experiment.It was done to find the average value of the evaluation and anticipate errors in the evaluation results.

Evaluation
There was an evaluation stage to re-examine the accuracy and performance results of machine learning.This study applied some evaluation methods to measure the level of accuracy of the forecasting method. The mean absolute error (MAE) is a method to calculate the average absolute error [22].

∑| ̂| [1]
 Root Mean Squared Error (RMSE), an evaluation method that calculates the square of the error divided by the number of data and takes its root.

√ ∑( ̂)
[2]  R 2 is an evaluation method that calculates the proportion of variance values described in the independent variables in the model.The result of the R 2 calculation gives a maximum value of 1.The closer the value is to 1, the better the evaluation value.
Where: ̂ : predicted value of y ̅ : mean value of y

Results and Discussion
Hyperparameter tuning was done to find the hyperparameter that had the best performance.The evaluation scores RMSE, MAE, and R2 were determined through tests.Tables 2 and 3 show the hyperparameter tuning process on the RNN and LSTM models.The RNN model was better than the LSTM model.The best sequence length value for the RNN model was 14, and the number of layers was 1.However, if we look at the results of the data test on the RNN model with its best hyperparameters, the RNN model was overfitting.The model could not study the pattern in the data, so the results of the model trial on the RNN look like a straight line in Figure 4. LSTM showed a different pattern than RNN in Figure 5. RMSE, MAE, and R2 values were best when the sequence length was 10, as shown in Table 3.The worst was 7. The effect of sequence length on model learning was very large.It affected the amount of data that entered the model to be trained.The data used as training data amounted to 80% of the total data, namely 244 pieces of data.A sequence length of 5 means the machine predicts a value by looking at 5 data points before the value is predicted.Likewise, when using sequence lengths of 7, 10, and 14, the fewer the sequences, the fewer patterns or information that can be learned by the machine, and the more difficult it is for the machine to predict future data trends.However, a sequence length that is too short will not ensure optimal machine learning.If the amount of data per sequence is too small, the machine cannot see the pattern or trend of the data in each sequence.Therefore, the optimal sequence length was 10 because it was not too short so the machine could learn the trend of the data.With limited data, stacked LSTM layers caused training and machine predictions to be more biased than their original values.It was also found that the most optimal number of LSTM layers is a layer that is not stacked, or one layer.The best model was one that used a hyperparameter of sequence length 10 with a layer of LSTM.In other words, with RMSE 0.764423, MAE 0.560741, and R2 -0.688390, the model was not layered.Figure 5 is a visualization of the LSTM's results.There are two lines with different colors.The red line is the predicted data on the test data.It can be seen that the model could not follow the test data well.It was not caused by the failure of the model in the training process but by the characteristics of the initial dataset.In the test data, there was a lot of data that exceeded the peak data in the training data.During the model training process, the model never saw data with a value as high as the test data.Therefore, the model could not predict any data higher than the peak data in the training data.Although it could not predict spikes in data, the model seems to be able to follow up and down patterns in the data.

Conclusion
After conducting research, the research shows that the number of confirmed COVID-19 cases in the city of Bandung can be predicted using the RNN and LSTM models.But to be able to work optimally, it is necessary to do hyperparameter tuning on the hyperparameters that will be used in the model.In this study, two hyperparameters were selected: the length of the sequence and the number of layers.Both RNN and LSTM obtain optimal conditions by using a one layer LSTM model.According to the test results, the optimal RNN model built in this study has a performance with an RMSE value of 0.66975075, an MAE value of 0.47075, and an R2 value of -0.29616625.The best LSTM has an RMSE value of 0.764423, an MAE value of 0.560141, and an R2 value of -0.688390.It shows that RNN is better than LSTM.Our suggestions that can be considered for further development are related to data quality.The more data in the time series forecasting, the better the prediction of the model.The greater the amount of data, the more data trends will be formed, ranging from short-term to long-term data trends or even seasonal data trends.Therefore, the model's ability to remember information or data trends will be maximized.In this study, the data available was too little, but due to the urgency of the situation of the spread of COVID-19, a study was conducted to provide an overview and immediate results in handling cases of the spread of COVID-19 in the city of Bandung.So, for further research, it is recommended to do research again so that it can help handle the spread of COVID-19 with the latest conditions.After carrying out the hyperparameter tuning process and getting the most optimal combination of hyperparameters, the model can be used to predict the number of new COVID-19 cases in the city of Bandung in the future.

Acknowledgements
We thanks to Rector Universitas Padjadjaran.Financial support was received from online data and a library research grant from Universitas Padjadjaran in 2020.

Figure 1 :
Figure 1: The system implementation 2.1 DataThe data used in this study is the number of new cases of COVID-19 in the city of Bandung from March 1, 2020, to December 31, 2020, which was taken from the official website of the Coordination Center for COVID-19 Information and Coordination of West Java Province, which can be accessed at the following link:

Figure 2 :
Figure 2: Number of Confirmed Cases of COVID-19 in Bandung

Figure 3 :
Figure 3: Splitting data: The blue and red lines show the train and test data, respectively

Figure 4 :
Figure 4: RNN's results: The blue and red lines show the actual dan predicted data, respectively.

Figure 5 :
Figure 5: LSTM's results: The blue and red lines show the actual dan predicted data, respectively.

Table 1 :
Data after scaling

Table 2 :
The performance of RNN

Table 3 :
The performance of LSTM