Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
69 result(s) for "Video-annotation"
Sort by:
Video content analysis of surgical procedures
BackgroundIn addition to its therapeutic benefits, minimally invasive surgery offers the potential for video recording of the operation. The videos may be archived and used later for reasons such as cognitive training, skills assessment, and workflow analysis. Methods from the major field of video content analysis and representation are increasingly applied in the surgical domain. In this paper, we review recent developments and analyze future directions in the field of content-based video analysis of surgical operations.MethodsThe review was obtained from PubMed and Google Scholar search on combinations of the following keywords: ‘surgery’, ‘video’, ‘phase’, ‘task’, ‘skills’, ‘event’, ‘shot’, ‘analysis’, ‘retrieval’, ‘detection’, ‘classification’, and ‘recognition’. The collected articles were categorized and reviewed based on the technical goal sought, type of surgery performed, and structure of the operation.ResultsA total of 81 articles were included. The publication activity is constantly increasing; more than 50% of these articles were published in the last 3 years. Significant research has been performed for video task detection and retrieval in eye surgery. In endoscopic surgery, the research activity is more diverse: gesture/task classification, skills assessment, tool type recognition, shot/event detection and retrieval. Recent works employ deep neural networks for phase and tool recognition as well as shot detection.ConclusionsContent-based video analysis of surgical operations is a rapidly expanding field. Several future prospects for research exist including, inter alia, shot boundary detection, keyframe extraction, video summarization, pattern discovery, and video annotation. The development of publicly available benchmark datasets to evaluate and compare task-specific algorithms is essential.
Efficiently Scaling up Crowdsourced Video Annotation
We present an extensive three year study on economically annotating video with crowdsourced marketplaces. Our public framework has annotated thousands of real world videos, including massive data sets unprecedented for their size, complexity, and cost. To accomplish this, we designed a state-of-the-art video annotation user interface and demonstrate that, despite common intuition, many contemporary interfaces are sub-optimal. We present several user studies that evaluate different aspects of our system and demonstrate that minimizing the cognitive load of the user is crucial when designing an annotation platform. We then deploy this interface on Amazon Mechanical Turk and discover expert and talented workers who are capable of annotating difficult videos with dense and closely cropped labels. We argue that video annotation requires specialized skill; most workers are poor annotators, mandating robust quality control protocols. We show that traditional crowdsourced micro-tasks are not suitable for video annotation and instead demonstrate that deploying time-consuming macro-tasks on MTurk is effective. Finally, we show that by extracting pixel-based features from manually labeled key frames, we are able to leverage more sophisticated interpolation strategies to maximize performance given a fixed budget. We validate the power of our framework on difficult, real-world data sets and we demonstrate an inherent trade-off between the mix of human and cloud computing used vs. the accuracy and cost of the labeling. We further introduce a novel, cost-based evaluation criteria that compares vision algorithms by the budget required to achieve an acceptable performance. We hope our findings will spur innovation in the creation of massive labeled video data sets and enable novel data-driven computer vision applications.
Making sense of danmu: Coherence in massive anonymous chats on Bilibili.com
Although coherence has been widely studied in computer-mediated communication (CMC), insufficient attention has been paid to emergent multimodal forms. This study analyzes a popular commentary system on Chinese and Japanese video-sharing sites – known as danmu or danmaku – where anonymous comments are superimposed on and scroll across the video frame. Through content and multimodal discourse analysis, we unpack danmu-mediated communication analyzing the newest interface (on Bilibili.com), the comments, the interpersonal interactions and the unusual use of the second-person pronoun. Results show that despite the technological constraints (hidden authorship, unmarked sending date and lack of options to structure comments), users construct order in interactions through repetition, danmu-specific expressions and multimodal references, while using playful language to make fun. This study provides an up-to-date analysis on an increasingly popular CMC medium beyond well-studied social networking sites, and broadens the understanding of coherence in contemporary CMC.
Virtual pointer for gaze guidance in laparoscopic surgery
BackgroundA challenge of laparoscopic surgery is learning how to interpret the indirect view of the operative field. Acquiring professional vision—understanding what to see and which information to attend to, is thereby an essential part of laparoscopic training and one in which trainers exert great effort to convey. We designed a virtual pointer (VP) that enables experts to point or draw free-hand sketches over an intraoperative laparoscopic video for a novice to see. This study aimed to investigate the efficacy of the virtual pointer in guiding novices’ gaze patterns.MethodsWe conducted a counter-balanced, within-subject trial to compare the novices’ gaze behaviors in laparoscopic training with the virtual pointer compared to a standard training condition, i.e., verbal instruction with un-mediated gestures. In the study, seven trainees performed four simulated laparoscopic tasks guided by an experienced surgeon as the trainer. A Tobii Pro X3-120 eye-tracker was used to capture the trainees’ eye movements. The measures include fixation rate, i.e., the frequency of trainees’ fixations, saccade amplitude, and fixation concentration, i.e., the closeness of trainees’ fixations.ResultsNo significant difference in fixation rate or saccade amplitude was found between the virtual pointer condition and the standard condition. In the virtual pointer condition, trainees’ fixations were more concentrated (p = 0.039) and longer fixations were more clustered, compared to the Standard condition (p = 0.008).ConclusionsThe virtual pointer effectively improved surgical trainees’ in-the-moment gaze focus during the laparoscopic training by reducing their gaze dispersion and concentrating their attention on the anatomical target. These results suggest that technologies which support gaze training should be expert-driven and intraoperative to efficiently modify novices’ gaze behaviors.
A Distributed Automatic Video Annotation Platform
In the era of digital devices and the Internet, thousands of videos are taken and share through the Internet. Similarly, CCTV cameras in the digital city produce a large amount of video data that carry essential information. To handle the increased video data and generate knowledge, there is an increasing demand for distributed video annotation. Therefore, in this paper, we propose a novel distributed video annotation platform that explores the spatial information and temporal information. Afterward, we provide higher-level semantic information. The proposed framework is divided into two parts: spatial annotation and spatiotemporal annotation. Therefore, we propose a spatiotemporal descriptor, namely, volume local directional ternary pattern-three orthogonal planes (VLDTP–TOP) in a distributed manner using Spark. Moreover, we developed several state-of-the-art appearance-based and spatiotemporal-based feature descriptors on top of Spark. We also provide the distributed video annotation services for the end-users so that they can easily use the video annotation and APIs for development to produce new video annotation algorithms. Due to the lack of a spatiotemporal video annotation dataset that provides ground truth for both spatial and temporal information, we introduce a video annotation dataset, namely, STAD which provides ground truth for spatial and temporal information. An extensive experimental analysis was performed in order to validate the performance and scalability of the proposed feature descriptors, which proved the excellence of our proposed approach.
Video Region Annotation with Sparse Bounding Boxes
Video analysis has been moving towards more detailed interpretation (e.g., segmentation) with encouraging progress. These tasks, however, increasingly rely on densely annotated training data both in space and time. Since such annotation is labor-intensive, few densely annotated video data with detailed region boundaries exist. This work aims to resolve this dilemma by learning to automatically generate region boundaries for all frames of a video from sparsely annotated bounding boxes of target regions. We achieve this with a Volumetric Graph Convolutional Network (VGCN), which learns to iteratively find keypoints on the region boundaries using the spatio-temporal volume of surrounding appearance and motion. We show that the global optimization of VGCN leads to more accurate annotation that generalizes better. Experimental results using three latest datasets (two real and one synthetic), including ablation studies, demonstrate the effectiveness and superiority of our method.
Validation of Sensor-Based Food Intake Detection by Multicamera Video Observation in an Unconstrained Environment
Video observations have been widely used for providing ground truth for wearable systems for monitoring food intake in controlled laboratory conditions; however, video observation requires participants be confined to a defined space. The purpose of this analysis was to test an alternative approach for establishing activity types and food intake bouts in a relatively unconstrained environment. The accuracy of a wearable system for assessing food intake was compared with that from video observation, and inter-rater reliability of annotation was also evaluated. Forty participants were enrolled. Multiple participants were simultaneously monitored in a 4-bedroom apartment using six cameras for three days each. Participants could leave the apartment overnight and for short periods of time during the day, during which time monitoring did not take place. A wearable system (Automatic Ingestion Monitor, AIM) was used to detect and monitor participants’ food intake at a resolution of 30 s using a neural network classifier. Two different food intake detection models were tested, one trained on the data from an earlier study and the other on current study data using leave-one-out cross validation. Three trained human raters annotated the videos for major activities of daily living including eating, drinking, resting, walking, and talking. They further annotated individual bites and chewing bouts for each food intake bout. Results for inter-rater reliability showed that, for activity annotation, the raters achieved an average (±standard deviation (STD)) kappa value of 0.74 (±0.02) and for food intake annotation the average kappa (Light’s kappa) of 0.82 (±0.04). Validity results showed that AIM food intake detection matched human video-annotated food intake with a kappa of 0.77 (±0.10) and 0.78 (±0.12) for activity annotation and for food intake bout annotation, respectively. Results of one-way ANOVA suggest that there are no statistically significant differences among the average eating duration estimated from raters’ annotations and AIM predictions (p-value = 0.19). These results suggest that the AIM provides accuracy comparable to video observation and may be used to reliably detect food intake in multi-day observational studies.
Using expansive learning to design and implement video-annotated peer feedback in an undergraduate general education module
Existing studies have measured the effect of video-based feedback on student performance or satisfaction. Other issues are underacknowledged or merit further investigation. These include sociocultural aspects which may shape the design and implementation of video-based feedback, the ways students use technology to engage in feedback, and the processes through technology may transform learning. This study investigates the design and implementation of a video-annotated peer feedback activity to develop students’ presentation skills and knowledge of climate science. It explores how their use of a video annotation tool re-mediated established feedback practices and how the systematic analysis of contradictions in emerging practices informed the subsequent redesign and reimplementation of the approach. Employing a formative intervention design, the researchers intervened in the activity system of a first-year undergraduate education module to facilitate two cycles of expansive learning with an instructor and two groups of Hong Kong Chinese students ( n  = 97, n  = 94) across two semesters. Instructor interviews, student surveys, and video annotation and system data were analysed using Activity Theory-derived criteria to highlight contradictions in each system and suggest how these could be overcome. The findings highlight the critical importance of active instructor facilitation; building student motivation by embedding social-affective support and positioning peer feedback as an integrated, formative process; and supporting students’ use of appropriate cognitive scaffolding to encourage their interactive, efficient use of the annotation tool. Conclusions: In a field dominated by experimental and quasi-experimental studies, this study reveals how an Activity Theory-derived research design and framework can be used to systemically analyse cycles of design and implementation of video-annotated peer feedback. It also suggests how the new activity system might be consolidated and generalised.
Semi-automation of gesture annotation by machine learning and human collaboration
Gesture and multimodal communication researchers typically annotate video data manually, even though this can be a very time-consuming task. In the present work, a method to detect gestures is proposed as a fundamental step towards a semi-automatic gesture annotation tool. The proposed method can be applied to RGB videos and requires annotations of part of a video as input. The technique deploys a pose estimation method and active learning. In the experiment, it is shown that if about 27% of the video is annotated, the remaining parts of the video can be annotated automatically with an F-score of at least 0.85. Users can run this tool with a small number of annotations first. If the predicted annotations for the remainder of the video are not satisfactory, users can add further annotations and run the tool again. The code has been released so that other researchers and practitioners can use the results of this research. This tool has been confirmed to work in conjunction with ELAN.
Current Trends and Future Directions of Large Scale Image and Video Annotation: Observations From Four Years of BIIGLE 2.0
Marine imaging has evolved from small, narrowly focussed applications to large-scale applications covering areas of several hundred square kilometers or time series covering observation periods of several months. The analysis and interpretation of the accumulating large volume of digital images or videos will continue to challenge the marine science community to keep this process efficient and effective. It is safe to say that any strategy will rely on some software platform supporting manual image and video annotation, either for a direct manual annotation-based analysis or for collecting training data to deploy a machine learning–based approach for (semi-)automatic annotation. This paper describes how computer-assisted manual full-frame image and video annotation is currently performed in marine science and how it can evolve to keep up with the increasing demand for image and video annotation and the growing volume of imaging data. As an example, observations are presented how the image and video annotation tool BIIGLE 2.0 has been used by an international community of more than one thousand users in the last 4 years. In addition, new features and tools are presented to show how BIIGLE 2.0 has evolved over the same time period: video annotation, support for large images in the gigapixel range, machine learning assisted image annotation, improved mobility and affordability, application instance federation and enhanced label tree collaboration. The observations indicate that, despite novel concepts and tools introduced by BIIGLE 2.0, full-frame image and video annotation is still mostly done in the same way as two decades ago, where single users annotated subsets of image collections or single video frames with limited computational support. We encourage researchers to review their protocols for education and annotation, making use of newer technologies and tools to improve the efficiency and effectivity of image and video annotation in marine science.