Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Series Title
      Series Title
      Clear All
      Series Title
  • Reading Level
      Reading Level
      Clear All
      Reading Level
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Content Type
    • Item Type
    • Is Full-Text Available
    • Subject
    • Publisher
    • Source
    • Donor
    • Language
    • Place of Publication
    • Contributors
    • Location
40,343 result(s) for "Image Processing and Computer Vision"
Sort by:
Robotics, Vision and Control : Fundamental Algorithms In MATLAB® Second, Completely Revised, Extended And Updated Edition
Robotic vision, the combination of robotics and computer vision, involves the application of computer algorithms to data acquired from sensors. The research community has developed a large body of such algorithms but for a newcomer to the field this can be quite daunting. For over 20 years the author has maintained two open-source MATLAB® Toolboxes, one for robotics and one for vision. They provide implementations of many important algorithms and allow users to work with real problems, not just trivial examples. This book makes the fundamental algorithms of robotics, vision and control accessible to all. It weaves together theory, algorithms and examples in a narrative that covers robotics and computer vision separately and together. Using the latest versions of the Toolboxes the author shows how complex problems can be decomposed and solved using just a few simple lines of code. The topics covered are guided by real problems observed by the author over many years as a practitioner of both robotics and computer vision. It is written in an accessible but informative style, easy to read and absorb, and includes over 1000 MATLAB and Simulink® examples and over 400 figures. The book is a real walk through the fundamentals of mobile robots, arm robots. then camera models, image processing, feature extraction and multi-view geometry and finally bringing it all together with an extensive discussion of visual servo systems. This second edition is completely revised, updated and extended with coverage of Lie groups, matrix exponentials and twists; inertial navigation; differential drive robots; lattice planners; pose-graph SLAM and map making; restructured material on arm-robot kinematics and dynamics; series-elastic actuators and operational-space control; Lab color spaces; light field cameras; structured light, bundle adjustment and visual odometry; and photometric visual servoing. \"An authoritative book, reaching across fields, thoughtfully conceived and brilliantly accomplished!\" OUSSAMA KHATIB, Stanford.
Predicting image credibility in fake news over social media using multi-modal approach
Social media are the main contributors to spreading fake images. Fake images are manipulated images altered through software or by other means to change the information they convey. Fake images propagated over microblogging platforms generate misrepresentation and stimulate polarization in the people. Detection of fake images shared over social platforms is extremely critical to mitigating its spread. Fake images are often associated with textual data. Hence, a multi-modal framework is employed utilizing visual and textual feature learning. However, few multi-modal frameworks are already proposed; they are further dependent on additional tasks to learn the correlation between modalities. In this paper, an efficient multi-modal approach is proposed, which detects fake images of microblogging platforms. No further additional subcomponents are required. The proposed framework utilizes explicit convolution neural network model EfficientNetB0 for images and sentence transformer for text analysis. The feature embedding from visual and text is passed through dense layers and later fused to predict fake images. To validate the effectiveness, the proposed model is tested upon a publicly available microblogging dataset, MediaEval (Twitter) and Weibo, where the accuracy prediction of 85.3% and 81.2% is observed, respectively. The model is also verified against the newly created latest Twitter dataset containing images based on India's significant events in 2020. The experimental results illustrate that the proposed model performs better than other state-of-art multi-modal frameworks.
Enhanced balancing GAN: minority-class image generation
Generative adversarial networks (GANs) are one of the most powerful generative models, but always require a large and balanced dataset to train. Traditional GANs are not applicable to generate minority-class images in a highly imbalanced dataset. Balancing GAN (BAGAN) is proposed to mitigate this problem, but it is unstable when images in different classes look similar, e.g., flowers and cells. In this work, we propose a supervised autoencoder with an intermediate embedding model to disperse the labeled latent vectors. With the enhanced autoencoder initialization, we also build an architecture of BAGAN with gradient penalty (BAGAN-GP). Our proposed model overcomes the unstable issue in original BAGAN and converges faster to high-quality generations. Our model achieves high performance on the imbalanced scale-down version of MNIST Fashion, CIFAR-10, and one small-scale medical image dataset. https://github.com/GH920/improved-bagan-gp .
Synthetic Data generation using DCGAN for improved traffic sign recognition
Traffic sign detection and recognition perform a vital function in real-world driver guidance applications, including driver assistance systems. Research into vision-based traffic sign detection (TSD) and traffic sign recognition (TSR) has gained considerable attention in the scientific community, led mainly by three variables: identification, monitoring, and classification. In addition, TSR provides valuable details and alerts for smart cars including advanced driving assistance (ADAS) and cooperative intelligent transport systems (CITS). Our work will generate high-quality synthetic prohibitory sign images using deep convolutional generative adversarial networks (DCGAN). This paper analyzes and discusses CNN models incorporating different backbone architectures and feature extractors, focusing on Resnet 50 and Densenet for object detection. Assessment of the models provides important information, including mean average accuracy (mAP), workspace capacity, detection period, and the amount of billion floating-point operations (BFLOPS). The maximum average accuracy is 92% (Densenet DCGAN), led by 91% (Resnet 50 DCGAN), 88% (Densenet), and 63% (Resnet 50). We find when using the original image and a synthetic image, accuracy increases, while detection time falls. Our findings show that combining original images and synthetic images in the dataset for training can improve intersection over union (IoU) and traffic sign recognition performance.
Exploiting Diffusion Prior for Real-World Image Super-Resolution
We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution. Specifically, by employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model, thereby preserving the generative prior and minimizing training cost. To remedy the loss of fidelity caused by the inherent stochasticity of diffusion models, we employ a controllable feature wrapping module that allows users to balance quality and fidelity by simply adjusting a scalar value during the inference process. Moreover, we develop a progressive aggregation sampling strategy to overcome the fixed-size constraints of pre-trained diffusion models, enabling adaptation to resolutions of any size. A comprehensive evaluation of our method using both synthetic and real-world benchmarks demonstrates its superiority over current state-of-the-art approaches. Code and models are available at https://github.com/IceClear/StableSR.
Group Normalization
Batch Normalization (BN) is a milestone technique in the development of deep learning, enabling various networks to train. However, normalizing along the batch dimension introduces problems—BN’s error increases rapidly when the batch size becomes smaller, caused by inaccurate batch statistics estimation. This limits BN’s usage for training larger models and transferring features to computer vision tasks including detection, segmentation, and video, which require small batches constrained by memory consumption. In this paper, we present Group Normalization (GN) as a simple alternative to BN. GN divides the channels into groups and computes within each group the mean and variance for normalization. GN’s computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. On ResNet-50 trained in ImageNet, GN has 10.6% lower error than its BN counterpart when using a batch size of 2; when using typical batch sizes, GN is comparably good with BN and outperforms other normalization variants. Moreover, GN can be naturally transferred from pre-training to fine-tuning. GN can outperform its BN-based counterparts for object detection and segmentation in COCO (https://github.com/facebookresearch/Detectron/blob/master/projects/GN), and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks. GN can be easily implemented by a few lines of code in modern libraries.
Deep Image Deblurring: A Survey
Image deblurring is a classic problem in low-level computer vision with the aim to recover a sharp image from a blurred input image. Advances in deep learning have led to significant progress in solving this problem, and a large number of deblurring networks have been proposed. This paper presents a comprehensive and timely survey of recently published deep-learning based image deblurring approaches, aiming to serve the community as a useful literature review. We start by discussing common causes of image blur, introduce benchmark datasets and performance metrics, and summarize different problem formulations. Next, we present a taxonomy of methods using convolutional neural networks (CNN) based on architecture, loss function, and application, offering a detailed review and comparison. In addition, we discuss some domain-specific deblurring applications including face images, text, and stereo image pairs. We conclude by discussing key challenges and future research directions.
Deep Learning for Generic Object Detection: A Survey
Object detection, one of the most fundamental and challenging problems in computer vision, seeks to locate object instances from a large number of predefined categories in natural images. Deep learning techniques have emerged as a powerful strategy for learning feature representations directly from data and have led to remarkable breakthroughs in the field of generic object detection. Given this period of rapid evolution, the goal of this paper is to provide a comprehensive survey of the recent achievements in this field brought about by deep learning techniques. More than 300 research contributions are included in this survey, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics. We finish the survey by identifying promising directions for future research.
SDNet: A Versatile Squeeze-and-Decomposition Network for Real-Time Image Fusion
In this paper, a squeeze-and-decomposition network (SDNet) is proposed to realize multi-modal and digital photography image fusion in real time. Firstly, we generally transform multiple fusion problems into the extraction and reconstruction of gradient and intensity information, and design a universal form of loss function accordingly, which is composed of intensity term and gradient term. For the gradient term, we introduce an adaptive decision block to decide the optimization target of the gradient distribution according to the texture richness at the pixel scale, so as to guide the fused image to contain richer texture details. For the intensity term, we adjust the weight of each intensity loss term to change the proportion of intensity information from different images, so that it can be adapted to multiple image fusion tasks. Secondly, we introduce the idea of squeeze and decomposition into image fusion. Specifically, we consider not only the squeeze process from source images to the fused result, but also the decomposition process from the fused result to source images. Because the quality of decomposed images directly depends on the fused result, it can force the fused result to contain more scene details. Experimental results demonstrate the superiority of our method over the state-of-the-arts in terms of subjective visual effect and quantitative metrics in a variety of fusion tasks. Moreover, our method is much faster than the state-of-the-arts, which can deal with real-time fusion tasks.
Learning to Prompt for Vision-Language Models
Large pre-trained vision-language models like CLIP have shown great potential in learning representations that are transferable across a wide range of downstream tasks. Different from the traditional representation learning that is based mostly on discretized labels, vision-language pre-training aligns images and texts in a common feature space, which allows zero-shot transfer to a downstream task via prompting, i.e., classification weights are synthesized from natural language describing classes of interest. In this work, we show that a major challenge for deploying such models in practice is prompt engineering, which requires domain expertise and is extremely time-consuming—one needs to spend a significant amount of time on words tuning since a slight change in wording could have a huge impact on performance. Inspired by recent advances in prompt learning research in natural language processing (NLP), we propose Context Optimization (CoOp), a simple approach specifically for adapting CLIP-like vision-language models for downstream image recognition. Concretely, CoOp models a prompt’s context words with learnable vectors while the entire pre-trained parameters are kept fixed. To handle different image recognition tasks, we provide two implementations of CoOp: unified context and class-specific context. Through extensive experiments on 11 datasets, we demonstrate that CoOp requires as few as one or two shots to beat hand-crafted prompts with a decent margin and is able to gain significant improvements over prompt engineering with more shots, e.g., with 16 shots the average gain is around 15% (with the highest reaching over 45%). Despite being a learning-based approach, CoOp achieves superb domain generalization performance compared with the zero-shot model using hand-crafted prompts.