Catalogue Search | MBRL

A generalizable approach for multi-view 3D human pose regression

by Padoy, Nicolas , Kadkhodamohammadi, Abdolrahim in Annotations , Cameras , Communications Engineering

2021

Despite the significant improvement in the performance of monocular pose estimation approaches and their ability to generalize to unseen environments, multi-view approaches are often lagging behind in terms of accuracy and are specific to certain datasets. This is mainly due to the fact that (1) contrary to real-world single-view datasets, multi-view datasets are often captured in controlled environments to collect precise 3D annotations, which do not cover all real-world challenges, and (2) the model parameters are learned for specific camera setups. To alleviate these problems, we propose a two-stage approach to detect and estimate 3D human poses, which separates single-view pose detection from multi-view 3D pose estimation. This separation enables us to utilize each dataset for the right task, i.e. single-view datasets for constructing robust pose detection models and multi-view datasets for constructing precise multi-view 3D regression models. In addition, our 3D regression approach only requires 3D pose data and its projections to the views for building the model, hence removing the need for collecting annotated data from the test setup. Our approach can therefore be easily generalized to a new environment by simply projecting 3D poses into 2D during training according to the camera setup used at test time. As 2D poses are collected at test time using a single-view pose detector, which might generate inaccurate detections, we model its characteristics and incorporate this information during training. We demonstrate that incorporating the detector’s characteristics is important to build a robust 3D regression model and that the resulting regression model generalizes well to new multi-view environments. Our evaluation results show that our approach achieves competitive results on the Human3.6M dataset and significantly improves results on a multi-view clinical dataset that is the first multi-view dataset generated from live surgery recordings.

Journal Article

Share this book

Add to My Shelf

Computer vision in surgery: from potential to clinical value

by Amin Madani , Maria S. Altieri , Deepak Alapatt in 692/308/2778 , 692/308/575 , Algorithms

2022

Hundreds of millions of operations are performed worldwide each year, and the rising uptake in minimally invasive surgery has enabled fiber optic cameras and robots to become both important tools to conduct surgery and sensors from which to capture information about surgery. Computer vision (CV), the application of algorithms to analyze and interpret visual data, has become a critical technology through which to study the intraoperative phase of care with the goals of augmenting surgeons’ decision-making processes, supporting safer surgery, and expanding access to surgical care. While much work has been performed on potential use cases, there are currently no CV tools widely used for diagnostic or therapeutic applications in surgery. Using laparoscopic cholecystectomy as an example, we reviewed current CV techniques that have been applied to minimally invasive surgery and their clinical applications. Finally, we discuss the challenges and obstacles that remain to be overcome for broader implementation and adoption of CV in surgery.

Journal Article

Share this book

Add to My Shelf

SAGES consensus recommendations on an annotation framework for surgical video

by Madani Amin , Hashimoto, Daniel A , Altieri, Maria S in Algorithms , Annotations , Artificial intelligence

2021

BackgroundThe growing interest in analysis of surgical video through machine learning has led to increased research efforts; however, common methods of annotating video data are lacking. There is a need to establish recommendations on the annotation of surgical video data to enable assessment of algorithms and multi-institutional collaboration.MethodsFour working groups were formed from a pool of participants that included clinicians, engineers, and data scientists. The working groups were focused on four themes: (1) temporal models, (2) actions and tasks, (3) tissue characteristics and general anatomy, and (4) software and data structure. A modified Delphi process was utilized to create a consensus survey based on suggested recommendations from each of the working groups.ResultsAfter three Delphi rounds, consensus was reached on recommendations for annotation within each of these domains. A hierarchy for annotation of temporal events in surgery was established.ConclusionsWhile additional work remains to achieve accepted standards for video annotation in surgery, the consensus recommendations on a general framework for annotation presented here lay the foundation for standardization. This type of framework is critical to enabling diverse datasets, performance benchmarks, and collaboration.

Journal Article

Share this book

Add to My Shelf

Surgical data science for next-generation interventions

by Kikinis, Ron , Hashizume, Makoto , Vedula, Swaroop S. in 692/700 , 692/700/565/545 , Biomedical Engineering/Biotechnology

2017

Interventional healthcare will evolve from an artisanal craft based on the individual experiences, preferences and traditions of physicians into a discipline that relies on objective decision-making on the basis of large-scale data from heterogeneous sources.

Journal Article

Share this book

Add to My Shelf

Multicentric validation of EndoDigest: a computer vision platform for video documentation of the critical view of safety in laparoscopic cholecystectomy

by Boni, Luigi , Cassinotti, Elisa , Mascagni, Pietro in Algorithms , Annotations , Cholecystectomy

2022

BackgroundA computer vision (CV) platform named EndoDigest was recently developed to facilitate the use of surgical videos. Specifically, EndoDigest automatically provides short video clips to effectively document the critical view of safety (CVS) in laparoscopic cholecystectomy (LC). The aim of the present study is to validate EndoDigest on a multicentric dataset of LC videos.MethodsLC videos from 4 centers were manually annotated with the time of the cystic duct division and an assessment of CVS criteria. Incomplete recordings, bailout procedures and procedures with an intraoperative cholangiogram were excluded. EndoDigest leveraged predictions of deep learning models for workflow analysis in a rule-based inference system designed to estimate the time of the cystic duct division. Performance was assessed by computing the error in estimating the manually annotated time of the cystic duct division. To provide concise video documentation of CVS, EndoDigest extracted video clips showing the 2 min preceding and the 30 s following the predicted cystic duct division. The relevance of the documentation was evaluated by assessing CVS in automatically extracted 2.5-min-long video clips.Results144 of the 174 LC videos from 4 centers were analyzed. EndoDigest located the time of the cystic duct division with a mean error of 124.0 ± 270.6 s despite the use of fluorescent cholangiography in 27 procedures and great variations in surgical workflows across centers. The surgical evaluation found that 108 (75.0%) of the automatically extracted short video clips documented CVS effectively.ConclusionsEndoDigest was robust enough to reliably locate the time of the cystic duct division and efficiently video document CVS despite the highly variable workflows. Training specifically on data from each center could improve results; however, this multicentric validation shows the potential for clinical translation of this surgical data science tool to efficiently document surgical safety.

Journal Article

Share this book

Add to My Shelf

Preserving privacy in surgical video analysis using a deep learning classifier to identify out-of-body scenes in endoscopic videos

by Vardazaryan, Armine , Mutter, Didier , Lavanchy, Joël L. in 639/705/117 , 692/308/575 , Cholecystectomy

2023

Surgical video analysis facilitates education and research. However, video recordings of endoscopic surgeries can contain privacy-sensitive information, especially if the endoscopic camera is moved out of the body of patients and out-of-body scenes are recorded. Therefore, identification of out-of-body scenes in endoscopic videos is of major importance to preserve the privacy of patients and operating room staff. This study developed and validated a deep learning model for the identification of out-of-body images in endoscopic videos. The model was trained and evaluated on an internal dataset of 12 different types of laparoscopic and robotic surgeries and was externally validated on two independent multicentric test datasets of laparoscopic gastric bypass and cholecystectomy surgeries. Model performance was evaluated compared to human ground truth annotations measuring the receiver operating characteristic area under the curve (ROC AUC). The internal dataset consisting of 356,267 images from 48 videos and the two multicentric test datasets consisting of 54,385 and 58,349 images from 10 and 20 videos, respectively, were annotated. The model identified out-of-body images with 99.97% ROC AUC on the internal test dataset. Mean ± standard deviation ROC AUC on the multicentric gastric bypass dataset was 99.94 ± 0.07% and 99.71 ± 0.40% on the multicentric cholecystectomy dataset, respectively. The model can reliably identify out-of-body images in endoscopic videos and is publicly shared. This facilitates privacy preservation in surgical video analysis.

Journal Article

Share this book

Add to My Shelf

Formalizing video documentation of the Critical View of Safety in laparoscopic cholecystectomy: a step towards artificial intelligence assistance to improve surgical safety

by Felli Emanuele , Swanstrom, Lee , Taha, Emre in Artificial intelligence , Bile , Cholecystectomy

2020

BackgroundIn laparoscopic cholecystectomy (LC), achievement of the Critical View of Safety (CVS) is commonly advocated to prevent bile duct injuries (BDI). However, BDI rates remain stable, probably due to inconsistent application or a poor understanding of CVS as well as unreliable reporting. Objective video reporting could serve for quality auditing and help generate consistent datasets for deep learning models aimed at intraoperative assistance. In this study, we develop and test a method to report CVS using videos.MethodLC videos performed at our institution were retrieved and the video segments starting 60 s prior to the division of cystic structures were edited. Two independent reviewers assessed CVS using an adaptation of the doublet view 6-point scale and a novel binary method in which each criterion is considered either achieved or not. Feasibility to assess CVS in the edited video clips and inter-rater agreements were evaluated.ResultsCVS was attempted in 78 out of the 100 LC videos retrieved. CVS was assessable in 100% of the 60-s video clips. After mediation, CVS was achieved in 32/78(41.03%). Kappa scores of inter-rater agreements using the doublet view versus the binary assessment were as follows: 0.54 versus 0.75 for CVS achievement, 0.45 versus 0.62 for the dissection of the hepatocystic triangle, 0.36 versus 0.77 for the exposure of the lower part of the cystic plate, and 0.48 versus 0.79 for the 2 structures connected to the gallbladder.ConclusionsThe present study is the first to formalize a reproducible method for objective video reporting of CVS in LC. Minute-long video clips provide information on CVS and binary assessment yields a higher inter-rater agreement than previously used methods. These results offer an easy-to-implement strategy for objective video reporting of CVS, which could be used for quality auditing, scientific communication, and development of deep learning models for intraoperative guidance.

Journal Article

Share this book

Add to My Shelf

Endoscapes, a critical view of safety and surgical scene segmentation dataset for laparoscopic cholecystectomy

by Mutter, Didier , Mascagni, Pietro , Okamoto, Nariaki in 639/166/985 , 639/705/117 , 692/698/2741/44

2025

Minimally invasive image-guided surgery heavily relies on vision. Deep learning models for surgical video analysis can support surgeons in visual tasks such as assessing the critical view of safety (CVS) in laparoscopic cholecystectomy, potentially contributing to surgical safety and efficiency. However, the performance, reliability, and reproducibility of such models are deeply dependent on the availability of data with high-quality annotations. To this end, we release Endoscapes2023, a dataset comprising 201 laparoscopic cholecystectomy videos with regularly spaced frames annotated with segmentation masks of surgical instruments and hepatocystic anatomy, as well as assessments of the criteria defining the CVS by three trained surgeons following a public protocol. Endoscapes2023 enables the development of models for object detection, semantic and instance segmentation, and CVS prediction, contributing to safe laparoscopic cholecystectomy.

Journal Article

Share this book

Add to My Shelf

Segmentation-Free Estimation of Left Ventricular Ejection Fraction Using 3D CNN Is Reliable and Improves as Multiple Cardiac MRI Cine Orientations Are Combined

by Roy, Catherine , Vardazaryan, Armine , Labani, Aissam in Artificial intelligence , cardiac MRI , Classification

2024

Objectives: We aimed to study classical, publicly available convolutional neural networks (3D-CNNs) using a combination of several cine-MR orientation planes for the estimation of left ventricular ejection fraction (LVEF) without contour tracing. Methods: Cine-MR examinations carried out on 1082 patients from our institution were analysed by comparing the LVEF provided by the CVI42 software (V5.9.3) with the estimation resulting from different 3D-CNN models and various combinations of long- and short-axis orientation planes. Results: The 3D-Resnet18 architecture appeared to be the most favourable, and the results gradually and significantly improved as several long-axis and short-axis planes were combined. Simply pasting multiple orientation views into composite frames increased performance. Optimal results were obtained by pasting two long-axis views and six short-axis views. The best configuration provided an R2 = 0.83, a mean absolute error (MAE) = 4.97, and a root mean square error (RMSE) = 6.29; the area under the ROC curve (AUC) for the classification of LVEF < 40% was 0.99, and for the classification of LVEF > 60%, the AUC was 0.97. Internal validation performed on 149 additional patients after model training provided very similar results (MAE 4.98). External validation carried out on 62 patients from another institution showed an MAE of 6.59. Our results in this area are among the most promising obtained to date using CNNs with cardiac magnetic resonance. Conclusion: (1) The use of traditional 3D-CNNs and a combination of multiple orientation planes is capable of estimating LVEF from cine-MRI data without segmenting ventricular contours, with a reliability similar to that of traditional methods. (2) Performance significantly improves as the number of orientation planes increases, providing a more complete view of the left ventricle.

Journal Article

Share this book

Add to My Shelf

Deep Learning to Classify AL versus ATTR Cardiac Amyloidosis MR Images

by Roy, Catherine , Vardazaryan, Armine , Labani, Aissam in algorithm vs. human comparison , Amyloidosis , Biopsy

2023

The aim of this work was to compare the classification of cardiac MR-images of AL versus ATTR amyloidosis by neural networks and by experienced human readers. Cine-MR images and late gadolinium enhancement (LGE) images of 120 patients were studied (70 AL and 50 TTR). A VGG16 convolutional neural network (CNN) was trained with a 5-fold cross validation process, taking care to strictly distribute images of a given patient in either the training group or the test group. The analysis was performed at the patient level by averaging the predictions obtained for each image. The classification accuracy obtained between AL and ATTR amyloidosis was 0.750 for cine-CNN, 0.611 for Gado-CNN and between 0.617 and 0.675 for human readers. The corresponding AUC of the ROC curve was 0.839 for cine-CNN, 0.679 for gado-CNN (p < 0.004 vs. cine) and 0.714 for the best human reader (p < 0.007 vs. cine). Logistic regression with cine-CNN and gado-CNN, as well as analysis focused on the specific orientation plane, did not change the overall results. We conclude that cine-CNN leads to significantly better discrimination between AL and ATTR amyloidosis as compared to gado-CNN or human readers, but with lower performance than reported in studies where visual diagnosis is easy, and is currently suboptimal for clinical practice.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter