Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
156 result(s) for "Renderers"
Sort by:
3D Tune-In Toolkit: An open-source library for real-time binaural spatialisation
The 3D Tune-In Toolkit (3DTI Toolkit) is an open-source standard C++ library which includes a binaural spatialiser. This paper presents the technical details of this renderer, outlining its architecture and describing the processes implemented in each of its components. In order to put this description into context, the basic concepts behind binaural spatialisation are reviewed through a chronology of research milestones in the field in the last 40 years. The 3DTI Toolkit renders the anechoic signal path by convolving sound sources with Head Related Impulse Responses (HRIRs), obtained by interpolating those extracted from a set that can be loaded from any file in a standard audio format. Interaural time differences are managed separately, in order to be able to customise the rendering according the head size of the listener, and to reduce comb-filtering when interpolating between different HRIRs. In addition, geometrical and frequency-dependent corrections for simulating near-field sources are included. Reverberation is computed separately using a virtual loudspeakers Ambisonic approach and convolution with Binaural Room Impulse Responses (BRIRs). In all these processes, special care has been put in avoiding audible artefacts produced by changes in gains and audio filters due to the movements of sources and of the listener. The 3DTI Toolkit performance, as well as some other relevant metrics such as non-linear distortion, are assessed and presented, followed by a comparison between the features offered by the 3DTI Toolkit and those found in other currently available open- and closed-source binaural renderers.
Utilizing the Neural Renderer for Accurate 3D Face Reconstruction from a Single Image
Recently, deep learning-based methods have shown significant results in 3D face reconstruction. By harnessing the power of convolutional neural networks, significant progress has been made in recovering 3D face shapes from single images using the 3D Morphable Model approach. However, training neural networks typically requires a large amount of data, while face images with ground-truth 3D face shapes are scarce. In this paper, we propose an unsupervised learning framework for accurate 3D face reconstruction from a single image. Our key idea is to process the images generated by a differentiable renderer and leverage the advantages of Generative Adversarial Networks to train a powerful neural renderer that produces highly realistic face images resembling the input image. We then modify traditional fitting methods to exploit the advantages of the neural renderer in finding optimal face parameters for improved 3D face reconstruction. The experimental results demonstrate that our method is capable of generating more accurate 3D face reconstruction results.
Video-driven speaker-listener generation based on Transformer and neural renderer
The traditional speaker-centric synthesis methods prioritize language accuracy but overlook emotional connection and feedback mechanisms with the audience. This paper is dedicated to an in-depth exploration of responsive speaker-listener generation, aiming to enhance communication by providing real-time non-verbal feedback such as head movements and facial expressions. Driven by video, we extract 3DMM coefficients to model facial features and head poses. Combining this with a Transformer speech encoder extracting 45-dimensional acoustic features, we achieve speaker generation at the sentence level. For responsive listener generation, we introduce two attention mechanisms in the Transformer decoder: cross-modal multi-head attention aligning audio-motion modalities and biased causal self-attention suitable for longer audio sequences. Finally, by aligning audio with a behavioral model and optimizing an enhanced neural renderer for facial images, we successfully achieve precise control over facial movements. Extensive experiments demonstrate the superiority of our approach compared to existing technologies.
Single-view facial reflectance inference with a differentiable renderer
We introduce a deep-learning based algorithm to infer high-fidelity facial reflectance from a single image. The algorithm uses convolutional neural networks to encode the input image into a latent representation, from which a decoder and a detail enhancing network reconstruct decoupled facial reflectance (albedo, specular, and normal) as well as the environmental lighting. These decoupled components, together with a 3D facial mesh estimated from the image, are then fed into a differentiable renderer to produce a rendered facial image. This allows us to iteratively optimize the latent representation of the facial image by minimizing the image-space reconstruction loss. Experimental results show that optimizing the latent representation through the differentiable renderer can effectively reduce the discrepancy between the original image and the rendered one, leading to a more accurate reconstruction of characteristic facial features such as skin tone, lip color, and facial hair.
Deterministic Global 3D Fractal Cloud Model for Synthetic Scene Generation
This paper describes the creation of a fast, deterministic, 3D fractal cloud renderer for the AFIT Sensor and Scene Emulation Tool (ASSET). The renderer generates 3D clouds by ray marching through a volume and sampling the level-set of a fractal function. The fractal function is distorted by a displacement map, which is generated using horizontal wind data from a Global Forecast System (GFS) weather file. The vertical windspeed and relative humidity are used to mask the creation of clouds to match realistic large-scale weather patterns over the Earth. Small-scale detail is provided by the fractal functions which are tuned to match natural cloud shapes. This model is intended to run quickly, and it can run in about 700 ms per cloud type. This model generates clouds that appear to match large-scale satellite imagery, and it reproduces natural small-scale shapes. This should enable future versions of ASSET to generate scenarios where the same scene is consistently viewed from both GEO and LEO satellites from multiple perspectives.
Continuous Talking Face Generation Based on Gaussian Blur and Dynamic Convolution
In the field of talking face generation, two-stage audio-based generation methods have attracted significant research interest. However, these methods still face challenges in achieving lip–audio synchronization during face generation, as well as issues with the discontinuity between the generated parts and original face in rendered videos. To overcome these challenges, this paper proposes a two-stage talking face generation method. The first stage is the landmark generation stage. A dynamic convolutional transformer generator is designed to capture complex facial movements. A dual-pipeline parallel processing mechanism is adopted to enhance the temporal feature correlation of input features and the ability to model details at the spatial scale. In the second stage, a dynamic Gaussian renderer (adaptive Gaussian renderer) is designed to realize seamless and natural connection of the upper- and lower-boundary areas through a Gaussian blur masking technique. We conducted quantitative analyses on the LRS2, HDTF, and MEAD neutral expression datasets. Experimental results demonstrate that, compared with existing methods, our approach significantly improves the realism and lip–audio synchronization of talking face videos. In particular, on the LRS2 dataset, the lip–audio synchronization rate was improved by 18.16% and the peak signal-to-noise ratio was improved by 12.11% compared to state-of-the-art works.
PaintNet: A shape-constrained generative framework for generating clothing from fashion model
Recent years have witnessed the proliferation of online fashion blogs and communities, where a large amount of fashion model images with chic clothes in various scenarios are publicly available. To facilitate users to find the corresponding clothes, we focus on studying how to generate pure wellshaped clothing items with the best view from the complex model images. Towards this end, inspired by painting, where the initial sketches and following coloring are both essential, we propose a two-stage shape-constrained clothing generative framework, dubbed as PaintNet. PaintNet comprises two coherent components: shape predictor and texture renderer. The shape predictor is devised to predict the intermediate shape map for the to-be-generated clothing item based on the latent representation learning, while the texture renderer is introduced to generate the final clothing image with the guidance of the predicted shape map. Extensive qualitative and quantitative experiments conducted on the public Lookbook dataset verify the effectiveness of PaintNet in clothing generation from fashion model images. Moreover, we also explore the potential of PaintNet in the task of cross-domain clothing retrieval, and the experiment results show that PaintNet can achieve, on average, 5.34% performance improvement over the traditional non-generative retrieval methods.
Quest markup for developing FAIR questionnaire modules for epidemiologic studies
Background Online questionnaires are commonly used to collect information from participants in epidemiological studies. This requires building questionnaires using machine-readable formats that can be delivered to study participants using web-based technologies such as progressive web applications. However, the paucity of open-source markup standards with support for complex logic make collaborative development of web-based questionnaire modules difficult. This often prevents interoperability and reusability of questionnaire modules across epidemiological studies. Results We developed an open-source markup language for presentation of questionnaire content and logic, Quest, within a real-time renderer that enables the user to test logic (e.g., skip patterns) and view the structure of data collection. We provide the Quest markup language, an in-browser markup rendering tool, questionnaire development tool and an example web application that embeds the renderer, developed for The Connect for Cancer Prevention Study. Conclusion A markup language can specify both the content and logic of a questionnaire as plain text. Questionnaire markup, such as Quest, can become a standard format for storing questionnaires or sharing questionnaires across the web. Quest is a step towards generation of FAIR data in epidemiological studies by facilitating reusability of questionnaires and data interoperability using open-source tools.
State of the Art on Deep Learning-enhanced Rendering Methods
Photorealistic rendering of the virtual world is an important and classic problem in the field of computer graphics. With the development of GPU hardware and continuous research on computer graphics, representing and rendering virtual scenes has become easier and more efficient. However, there are still unresolved challenges in efficiently rendering global illumination effects. At the same time, machine learning and computer vision provide real-world image analysis and synthesis methods, which can be exploited by computer graphics rendering pipelines. Deep learning-enhanced rendering combines techniques from deep learning and computer vision into the traditional graphics rendering pipeline to enhance existing rasterization or Monte Carlo integration renderers. This state-of-the-art report summarizes recent studies of deep learning-enhanced rendering in the computer graphics community. Specifically, we focus on works of renderers represented using neural networks, whether the scene is represented by neural networks or traditional scene files. These works are either for general scenes or specific scenes, which are differentiated by the need to retrain the network for new scenes.
MAPLIBRE-RS: TOWARD PORTABLE MAP RENDERERS
Map renderers play a crucial role in Web, desktop, and mobile applications. In this context, code portability is a common problem, often addressed by maintaining multiple code bases: one for theWeb, usually written in JavaScript, and one for desktop and mobile, often written in C/C++. The maintenance of several code bases slows down innovation and makes evolution time-consuming. In this paper, we review existing open-source map renderers, examine how they address this problem, and identify the downsides of the current strategies. With a proof of concept, we demonstrate that Rust, WebAssembly, and WebGPU are now sufficiently mature to address this problem. Our new open-source map renderer written in Rust runs on all platforms and showcases good performance. Finally, we explain the challenges and limitations encountered while implementing a modern map renderer with these technologies.