Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
26 result(s) for "semantic video summary"
Sort by:
Ontology reasoning scheme for constructing meaningful sports video summarisation
As digital sports video becomes increasingly pervasive, semantic video summary becomes one of the important components for the next generation of multimedia applications. Ontology is a feasible way to mine the semantic information from the video stream. However, current ontology-based methods did not concentrate on the effectiveness and soundness of semantic reasoning. Here, the authors propose a content-directed ontology reasoning approach to produce meaningful sports video summarisation. The proposed ontology can facilitate the metadata acquisition of video and the improvement of query performance. It also provides a flexible way to query the sports video database, which cannot be achieved by simple keyword search. For annotating, describing and managing the sports video content, we propose a sports video descriptive language (SVDL) based on the proposed ontology. Moreover, the semantically meaningful sports video abstraction is produced by reasoning engine which is based on the extension of the Tableau algorithm. Meanwhile, the soundness and completeness of the reasoning algorithm can be solidly proved. Subjective assessment experimental results reveal the reliability and efficiency of the propose scheme.
Personalized Video Summarization: A Comprehensive Survey of Methods and Datasets
In recent years, the scientific and technological developments have led to an explosion of available videos on the web, increasing the necessity of fast and effective video analysis and summarization. Video summarization methods aim to generate a synopsis by selecting the most informative parts of the video content. The user’s personal preferences, often involved in the expected results, should be taken into account in the video summaries. In this paper, we provide the first comprehensive survey on personalized video summarization relevant to the techniques and datasets used. In this context, we classify and review personalized video summary techniques based on the type of personalized summary, on the criteria, on the video domain, on the source of information, on the time of summarization, and on the machine learning technique. Depending on the type of methodology used by the personalized video summarization techniques for the summary production process, we classify the techniques into five major categories, which are feature-based video summarization, keyframe selection, shot selection-based approach, video summarization using trajectory analysis, and personalized video summarization using clustering. We also compare personalized video summarization methods and present 37 datasets used to evaluate personalized video summarization methods. Finally, we analyze opportunities and challenges in the field and suggest innovative research lines.
Fuzzy C-mean clustering technique based visual features fusion for automatic video summarization method
Video Summarization is one of the most important processes in multimedia applications. It is the process of taking a few segments of each scene to create a video summary that describes the story of an entire video in a short amount of time. Automatic video summarization has many applications which would benefit several domains of healthcare, security, surveillance, and other many applications. However, creating a comprehensive video summary that encompasses keyframes of video shots is a challenging task in terms of providing well representative features extraction and features classification techniques. The main objective of this study is to produce informative video summarization, conveying interestingness, representativeness, and visual semantic information. That’s in addition to preserving the continuous flow of motion information for video sequence. Therefore, this paper proposed SURF and Hog techniques for features description, and Covariance matrix (CM) for features reduction. Fuzzy C-mean clustering algorithm is proposed for features classification and generation summary based keyframes selection. The experiments are conducted using annotated video sequences SumMe and VSUMM datasets. The obtained results showed good performance of proposed method achieving average F-score of 0.5197 on SumMe videos, and 0.9221 on VSUMM videos.
Query-attentive video summarization: a comprehensive review
Since the last decade, the diverse applications of video summarization have gained increased attention, motivating researchers in the domain of computer vision to generate optimal and comprehensible video summaries. The main challenge in the research of video summarization is user perception and preference as humans are the ultimate consumers of generated summary. A single video summary cannot satisfy all users unless the summarization algorithm interacts with end users and adapts to their requirements. Conventional video summarization can not tackle the user requirements. This study explores various state-of-the-art techniques developed for generating user-intended video summaries, focusing on query-attentive video summarization. Query-attentive video summarization is a multi-modal summarization method that generates a video summary that satisfies the viewer’s requirements by taking input queries from the viewers. This paper discusses the fundamental aspects of query-attentive video summarization, tracing its progress and evolution over time. Contemporary approaches are explored in detail, highlighting developed techniques with advantages and limitations. Additionally, the article also studies publicly available datasets, including extensively utilized Query-Focused Video Summarization dataset, since these datasets ensure the validity and applicability of developed techniques. Evaluation metrics, which are essential tools for measuring performance and assessing user satisfaction are also studied and performance comparisons are presented. After investigating the domain of query-attentive video summarization, this article addresses the current research challenges and identifies potential future research objectives. This comprehensive review offers a complete guide for new researchers in the field of query-attentive video summarization, covering both existing and future real-time applications.
Automatic highlight detection in videos of martial arts tricking
We propose a novel strategy for the automatic detection of highlight events in user-generated tricking videos, to the best of our knowledge, the first one specifically tailored for this complex sport. Most current methods for related sports leverage high-level semantics such as predefined camera angles or common editing practices, or rely on depth cameras to achieve automatic detection. However, our approach only relies on the contents (themselves) in the frames of a given video, and consists in a four stage pipeline. The first stage identifies foreground key points of interest along with an estimation of their motion in the video frames. In the second stage, these points are grouped into regions of interest based on their proximity and motion. Their behavior over time is evaluated in the third stage to generate an attention map indicating the regions participating in the most relevant events. The fourth and final stage provides the extracted video sequences where highlights have been identified. Experimental results attest to the effectiveness of our approach, which shows high recall and precision values at frame level, with detections that fit well the ground truth events.
Video Summarization Based on Feature Fusion and Data Augmentation
During the last few years, several technological advances have led to an increase in the creation and consumption of audiovisual multimedia content. Users are overexposed to videos via several social media or video sharing websites and mobile phone applications. For efficient browsing, searching, and navigation across several multimedia collections and repositories, e.g., for finding videos that are relevant to a particular topic or interest, this ever-increasing content should be efficiently described by informative yet concise content representations. A common solution to this problem is the construction of a brief summary of a video, which could be presented to the user, instead of the full video, so that she/he could then decide whether to watch or ignore the whole video. Such summaries are ideally more expressive than other alternatives, such as brief textual descriptions or keywords. In this work, the video summarization problem is approached as a supervised classification task, which relies on feature fusion of audio and visual data. Specifically, the goal of this work is to generate dynamic video summaries, i.e., compositions of parts of the original video, which include its most essential video segments, while preserving the original temporal sequence. This work relies on annotated datasets on a per-frame basis, wherein parts of videos are annotated as being “informative” or “noninformative”, with the latter being excluded from the produced summary. The novelties of the proposed approach are, (a) prior to classification, a transfer learning strategy to use deep features from pretrained models is employed. These models have been used as input to the classifiers, making them more intuitive and robust to objectiveness, and (b) the training dataset was augmented by using other publicly available datasets. The proposed approach is evaluated using three datasets of user-generated videos, and it is demonstrated that deep features and data augmentation are able to improve the accuracy of video summaries based on human annotations. Moreover, it is domain independent, could be used on any video, and could be extended to rely on richer feature representations or include other data modalities.
VSMCNN-dynamic summarization of videos using salient features from multi-CNN model
A dynamic video summarization system detects key parts of the input video to generate its compact representation. The summaries can be used for efficient management of video data. This paper proposes an approach, Video summarization based on multi-CNN model (VSMCNN), that exploits major aspects of human cognition to generate meaningful summaries from videos. As the method focuses on dynamic summarization, the input video is divided into a set of shots. A multi-CNN model, which is a combination of different pre-trained models of CNN, is used for feature extraction from shots. The salient features are extracted from high dimensional feature vector using an unsupervised feature reduction technique applied in multiple subspaces to rank features in the vector. The distance measure between feature vectors is then thresholded to detect prime parts of the tested video. Experiments are performed on SumMe dataset and the results prove that our approach is successful in detecting portions of the tested video that has an essential message. The analysis shows that the method outperforms the state-of-the-art methods in the literature. Further evaluation on comparison with human-generated summaries in the ground truth proves the effectiveness of the proposed method. The paper also presents a detailed analysis to show which combination of pre-trained models of CNN is best suitable for generating dynamic summaries.
Unleashing the Power of Contrastive Learning for Zero-Shot Video Summarization
Video summarization aims to select the most informative subset of frames in a video to facilitate efficient video browsing. Past efforts have invariantly involved training summarization models with annotated summaries or heuristic objectives. In this work, we reveal that features pre-trained on image-level tasks contain rich semantic information that can be readily leveraged to quantify frame-level importance for zero-shot video summarization. Leveraging pre-trained features and contrastive learning, we propose three metrics featuring a desirable keyframe: local dissimilarity, global consistency, and uniqueness. We show that the metrics can well-capture the diversity and representativeness of frames commonly used for the unsupervised generation of video summaries, demonstrating competitive or better performance compared to past methods when no training is needed. We further propose a contrastive learning-based pre-training strategy on unlabeled videos to enhance the quality of the proposed metrics and, thus, improve the evaluated performance on the public benchmarks TVSum and SumMe.
Static video summarization using multi-CNN with sparse autoencoder and random forest classifier
A summarization system detects the parts of the input video that contain an essential message. Such a system aims to generate a very compact and meaningful representation of the original video. A novel method to detect key-frames for static summarization is presented in this paper. The method detects key-frames based on feature vectors extracted from multiple pre-trained Convolutional Neural Network models (Multi-CNN). The features are extracted using four pre-trained models of CNN. These vectors are fed to Sparse Autoencoder, which outputs a combined representation of the input feature vectors. The key-frames of input video are extracted based on combined feature vectors using Random Forest Classifier. The evaluation of the method is done using two datasets: VSUMM and OVP, based on user summaries present in the ground-truth. The method was able to achieve an average F -score of 0.83 on VSUMM dataset and 0.82 on OVP dataset, respectively. The method attained promising results compared to other state-of-the-art methods in the literature. Multi-CNN model was also able to generate high-quality summaries consistently from videos of all categories. Further experiments prove that Multi-CNN model in combination with Random Forest classifier performs better than other classifiers considered in the study.
Predicting learning performance using NLP: an exploratory study using two semantic textual similarity methods
Most learning analytics (LA) systems provide generic feedback, because they primarily draw on performance data based on quiz scores. This study explored the potential of student-generated summaries as an alternative method for predicting learning performance. Two hundred and fifty-four undergraduates first watched a series of six short video lectures and then wrote a short summary for each one. Based on their median performance quiz scores, the participants were divided into two performance groups. Sparse and dense text vectorization methods were used to represent the video lectures and student summaries. Three semantic textual similarity features were computed using cosine similarity and were used as input into seven common machine learning algorithms. The results indicated that the sparse similarity features outperformed dense ones in classifying performance. Also, the best classification accuracy was achieved using the K-Nearest Neighbors and Random Forrest algorithms. Overall, the findings suggest that semantic similarity measures can be used as additional proxy measures of learning, thereby enabling the real-time monitoring and evaluation of student understanding in LA contexts.