Enhancing Early Cancer Detection: An Investigation of DNA Sequences and Machine Learning

Authors

DOI:

https://doi.org/10.24996/ijs.2024.65.8.15

Keywords:

Cancer Classification, Machine Learning, Model deployment, Cancer identification on DNA reads.

Abstract

The study aimed to address the global challenge of cancer-related fatalities by investigating the feasibility of identifying or predicting the early-stage presence of three distinct forms of cancers, colon, thyroid and urothelial carcinoma, via the analysis of raw DNA sequences. The data, sourced from the NCBI database, underwent a series of pre-processing techniques, including kmer analysis, under-sampling and count vectorization. Subsequently, machine learning algorithms, including logistic regression and multinomial Naive Bayes, were implemented on the pre-processed data with logistic regression demonstrating superior accuracy of 80.10% with calibration and 78.54% without calibration. To enhance the model's extrapolative capabilities, the logistic regression model was further calibrated utilizing the sigmoid method. The final model was deployed through the utilization of the open-source streamlit package.

Downloads

Published

2024-08-30

Issue

Section

Biotechnology

How to Cite

[1]
A. G. . Ganie and S. . Dadvandipour, “Enhancing Early Cancer Detection: An Investigation of DNA Sequences and Machine Learning”, Iraqi Journal of Science, vol. 65, no. 8, pp. 4303–4312, Aug. 2024, doi: 10.24996/ijs.2024.65.8.15.

Similar Articles

1-10 of 1530

You may also start an advanced similarity search for this article.