Deep Learning Techniques for Video Summarization Based onObject Detection

Authors

  • Shadan Abdul Haleem Department of Computer Science, College of Science, Mustansiriyah University, Baghdad, Iraq
  • Eman Hato Department of Computer Science, College of Science, Mustansiriyah University, Baghdad, Iraq

DOI:

https://doi.org/10.24996/ijs.2026.67.5.%25g

Keywords:

Deep Learning, Video Summarization, Keyframe Selection, Object Detection, Clustering Algorithm

Abstract

     With the rapid growth of video content, effective video summarization methods are essential. This paper introduces a new framework using deep learning for object detection. YOLOv8 first identifies objects in each frame from every 15-frame sequence. These objects are cropped and resized for feature extraction with Residual Neural Network (ResNet 50). A clustering process using Hierarchical Density-Based Spatial Clustering (HDBSCAN) classifies each object. Finally, keyframes are randomly selected from each object cluster to create a concise summary. This paper primarily contributes to the identification of video objects, such as people and vehicles, to retain the most informative content. Additionally, it generates a video summary that significantly reduces the original length while preserving a diverse range of video content. The framework’s performance was tested on the SumMe dataset, with accuracy and F1-score as key metrics. Results show an overall detection accuracy of 0.8988 and an F-score of 0.9451. The method produced very short video summaries, saving an average of 95% of the time compared to the original videos, demonstrating a significant reduction in video length while maintaining summary reliability.

Downloads

Issue

Section

Computer Science

How to Cite

[1]
S. A. . Haleem and E. . Hato, “Deep Learning Techniques for Video Summarization Based onObject Detection”, Iraqi Journal of Science, vol. 67, no. 5, doi: 10.24996/ijs.2026.67.5.%g.