Deep Learning Techniques for Video Summarization Based onObject Detection
DOI:
https://doi.org/10.24996/ijs.2026.67.5.%25gKeywords:
Deep Learning, Video Summarization, Keyframe Selection, Object Detection, Clustering AlgorithmAbstract
With the rapid growth of video content, effective video summarization methods are essential. This paper introduces a new framework using deep learning for object detection. YOLOv8 first identifies objects in each frame from every 15-frame sequence. These objects are cropped and resized for feature extraction with Residual Neural Network (ResNet 50). A clustering process using Hierarchical Density-Based Spatial Clustering (HDBSCAN) classifies each object. Finally, keyframes are randomly selected from each object cluster to create a concise summary. This paper primarily contributes to the identification of video objects, such as people and vehicles, to retain the most informative content. Additionally, it generates a video summary that significantly reduces the original length while preserving a diverse range of video content. The framework’s performance was tested on the SumMe dataset, with accuracy and F1-score as key metrics. Results show an overall detection accuracy of 0.8988 and an F-score of 0.9451. The method produced very short video summaries, saving an average of 95% of the time compared to the original videos, demonstrating a significant reduction in video length while maintaining summary reliability.



