Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Language
      Language
      Clear All
      Language
  • Subject
      Subject
      Clear All
      Subject
  • Item Type
      Item Type
      Clear All
      Item Type
  • Discipline
      Discipline
      Clear All
      Discipline
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
8 result(s) for "2d/3d convolutional neural networks"
Sort by:
Tropical Cyclone Intensity Estimation Using Multi-Dimensional Convolutional Neural Networks from Geostationary Satellite Data
For a long time, researchers have tried to find a way to analyze tropical cyclone (TC) intensity in real-time. Since there is no standardized method for estimating TC intensity and the most widely used method is a manual algorithm using satellite-based cloud images, there is a bias that varies depending on the TC center and shape. In this study, we adopted convolutional neural networks (CNNs) which are part of a state-of-art approach that analyzes image patterns to estimate TC intensity by mimicking human cloud pattern recognition. Both two dimensional-CNN (2D-CNN) and three-dimensional-CNN (3D-CNN) were used to analyze the relationship between multi-spectral geostationary satellite images and TC intensity. Our best-optimized model produced a root mean squared error (RMSE) of 8.32 kts, resulting in better performance (~35%) than the existing model using the CNN-based approach with a single channel image. Moreover, we analyzed the characteristics of multi-spectral satellite-based TC images according to intensity using a heat map, which is one of the visualization means of CNNs. It shows that the stronger the intensity of the TC, the greater the influence of the TC center in the lower atmosphere. This is consistent with the results from the existing TC initialization method with numerical simulations based on dynamical TC models. Our study suggests the possibility that a deep learning approach can be used to interpret the behavior characteristics of TCs.
Unified End-to-End YOLOv5-HR-TCM Framework for Automatic 2D/3D Human Pose Estimation for Real-Time Applications
Three-dimensional human pose estimation is widely applied in sports, robotics, and healthcare. In the past five years, the number of CNN-based studies for 3D human pose estimation has been numerous and has yielded impressive results. However, studies often focus only on improving the accuracy of the estimation results. In this paper, we propose a fast, unified end-to-end model for estimating 3D human pose, called YOLOv5-HR-TCM (YOLOv5-HRet-Temporal Convolution Model). Our proposed model is based on the 2D to 3D lifting approach for 3D human pose estimation while taking care of each step in the estimation process, such as person detection, 2D human pose estimation, and 3D human pose estimation. The proposed model is a combination of best practices at each stage. Our proposed model is evaluated on the Human 3.6M dataset and compared with other methods at each step. The method achieves high accuracy, not sacrificing processing speed. The estimated time of the whole process is 3.146 FPS on a low-end computer. In particular, we propose a sports scoring application based on the deviation angle between the estimated 3D human posture and the standard (reference) origin. The average deviation angle evaluated on the Human 3.6M dataset (Protocol #1–Pro #1) is 8.2 degrees.
2D MRI image analysis and brain tumor detection using deep learning CNN model LeU-Net
MRI image analysis and its segmentation for the accurate and automatic detection of brain tumors at an early stage is very much crucial for diagnosis the disorders and save human lives. Since most deep learning models have a large number of layers, they also take longer processing time, making them unsuitable for smaller image datasets. Hence, we have proposed, the detection of abnormality from brain MR images using a Less Layered and less complex U-Net model (LeU-Net) architecture. The principle of LeU-Net is inspired by the Le-Net and U-Net models, but completely different from both the design and architectural perspectives. The abnormality detection indicates the classification of the tumorous cell from overall Magnetic Resonance images. The Proposed deep learning model (LeU-Net) performance was compared with the existing basic CNN models Le-Net, U-Net, and VGG-16. The model performance was evaluated using evaluation metrics accuracy, precision, F-score, recall, and specificity. The experiment is performed on MR Dataset with uncropped images and cropped images (removed unwanted area) and compared the result with all three models. The LeU-Net model registers overall 98% accuracy on cropped images and 94% of accuracy on uncropped images. The LeU-Net model has much faster processing (simulation) time, it only takes 244.42 s and 252.36 s, respectively, to train the model with 100 epochs on the uncropped and cropped images. We have compared the performance of our proposed model with various state-of-the-art techniques, and it provides the best classification accuracy among all.
Enhancing visual quality of spatial image steganography using SqueezeNet deep learning network
The aims of improving steganographic method are divided into two groups: the first is to make the hiding capacity as high as possible; the second is to make the visible distortion as low as possible. The higher the visual quality of the stego-image, the less suspicious it becomes, which can increase security. However, the distortion caused by embedding data into images is not predictable and typically image dependent. If the user has a database of possible cover images, finding a suitable cover image that can sustain high visual quality after embedding is challenging. Thus, an automatic cover selection method is needed. In this paper, the problem of visual quality of the stego-image is tackled as a classification problem, where a CNN-based classifier is employed to select images that can have high imperceptibility after the process of embedding. To achieve that, a CNN was trained to classify images into “High Quality” and “Low Quality”. The CNN was based on SqueezeNet architecture, and was trained in two scenarios; transfer learning and learning from scratch. The two classifiers were able to achieve very high classification accuracies of F 1  = 0.926 and 0.904.
Learning to detect anatomical landmarks of the pelvis in X-rays from arbitrary views
PurposeMinimally invasive alternatives are now available for many complex surgeries. These approaches are enabled by the increasing availability of intra-operative image guidance. Yet, fluoroscopic X-rays suffer from projective transformation and thus cannot provide direct views onto anatomy. Surgeons could highly benefit from additional information, such as the anatomical landmark locations in the projections, to support intra-operative decision making. However, detecting landmarks is challenging since the viewing direction changes substantially between views leading to varying appearance of the same landmark. Therefore, and to the best of our knowledge, view-independent anatomical landmark detection has not been investigated yet.MethodsIn this work, we propose a novel approach to detect multiple anatomical landmarks in X-ray images from arbitrary viewing directions. To this end, a sequential prediction framework based on convolutional neural networks is employed to simultaneously regress all landmark locations. For training, synthetic X-rays are generated with a physically accurate forward model that allows direct application of the trained model to real X-ray images of the pelvis. View invariance is achieved via data augmentation by sampling viewing angles on a spherical segment of 120∘×90∘ .ResultsOn synthetic data, a mean prediction error of 5.6 ± 4.5 mm is achieved. Further, we demonstrate that the trained model can be directly applied to real X-rays and show that these detections define correspondences to a respective CT volume, which allows for analytic estimation of the 11 degree of freedom projective mapping.ConclusionWe present the first tool to detect anatomical landmarks in X-ray images independent of their viewing direction. Access to this information during surgery may benefit decision making and constitutes a first step toward global initialization of 2D/3D registration without the need of calibration. As such, the proposed concept has a strong prospect to facilitate and enhance applications and methods in the realm of image-guided surgery.
Dewarping of document images: A semi-CNN based approach
The camera-captured digital documents may be often distorted and warped due to various document surfaces or camera angles. Also, the OCR systems find difficulty in reading such distorted images. In this paper, a framework for dewarping the images based on estimating the change of pixel-positions due to the unevenness of the surface is proposed. Here, at first, the changes of pixel-positions are measured using the warping factors, which depend on warping position and control parameters. The warping control parameters are calculated from the top and bottom text lines of the document. The warping positional parameters are estimated using the convolution neural network (CNN) that needs many images for training. Capturing such a large number of images is very difficult. For this purpose, we synthetically generated a warped document image dataset. The proposed dewarping technique works for both alphabetic and alpha-syllabary scripts. The results on Bangla (alphasyllabary) and English (alphabetic) are encouraging.
Fusing information from multiple 2D depth cameras for 3D human pose estimation in the operating room
PurposeFor many years, deep convolutional neural networks have achieved state-of-the-art results on a wide variety of computer vision tasks. 3D human pose estimation makes no exception and results on public benchmarks are impressive. However, specialized domains, such as operating rooms, pose additional challenges. Clinical settings include severe occlusions, clutter and difficult lighting conditions. Privacy concerns of patients and staff make it necessary to use unidentifiable data. In this work, we aim to bring robust human pose estimation to the clinical domain.MethodsWe propose a 2D–3D information fusion framework that makes use of a network of multiple depth cameras and strong pose priors. In a first step, probabilities of 2D joints are predicted from single depth images. These information are fused in a shared voxel space yielding a rough estimate of the 3D pose. Final joint positions are obtained by regressing into the latent pose space of a pre-trained convolutional autoencoder.ResultsWe evaluate our approach against several baselines on the challenging MVOR dataset. Best results are obtained when fusing 2D information from multiple views and constraining the predictions with learned pose priors.ConclusionsWe present a robust 3D human pose estimation framework based on a multi-depth camera network in the operating room. Depth images as only input modalities make our approach especially interesting for clinical applications due to the given anonymity for patients and staff.