Catalogue Search | MBRL

An Assessment of the Seasonal Uncertainty of Microwave L-Band Satellite Soil Moisture Products in Jiangsu Province, China

by Xing, Zanpin , Yi, Chuanxiang , Zhou, Hongwei in Agricultural management , Algorithms , Autumn

2024

Accurate surface soil moisture (SM) data are crucial for agricultural management in Jiangsu Province, one of the major agricultural regions in China. However, the seasonal performance of different SM products in Jiangsu is still unknown. To address this, this study aims to evaluate the applicability of four L-band microwave remotely sensed SM products, namely, the Soil Moisture Active Passive Single-Channel Algorithm at Vertical Polarization Level 3 (SMAP SCA-V L3, hereafter SMAP-L3), SMOS-SMAP-INRAE-BORDEAUX (SMOSMAP-IB), Soil Moisture and Ocean Salinity in version IC (SMOS-IC), and SMAP-INRAE-BORDEAUX (SMAP-IB) in Jiangsu at the seasonal scale. In addition, the effects of dynamic environmental variables such as the leaf vegetation index (LAI), mean surface soil temperature (MSST), and mean surface soil wetness (MSSM) on the performance of the above products are investigated. The results indicate that all four SM products exhibit significant seasonal differences when evaluated against in situ observations between 2016 and 2022, with most products achieving their highest correlation (R) and unbiased root-mean-square difference (ubRMSD) scores during the autumn. Conversely, their performance significantly deteriorates in the summer, with ubRMSD values exceeding 0.06 m3/m3. SMOS-IC generally achieves better R values across all seasons but has limited temporal availability, while SMAP-IB typically has the lowest ubRMSD values, even reaching 0.03 m3/m3 during morning observation in the winter. Additionally, the sensitivity of different products’ skill metrics to environmental factors varies across seasons. For ubRMSD, SMAP-L3 shows a general increase with LAI across all four seasons, while SMAP-IB exhibits a notable increase as the soil becomes wetter in the summer. Conversely, wet conditions notably reduce the R values during autumn for most products. These findings are expected to offer valuable insights for the appropriate selection of products and the enhancement of SM retrieval algorithms.

Journal Article

Share this book

Add to My Shelf

Script-a-Video: Deep Structured Audio-visual Captions via Factorized Streams and Relational Grounding

by Team, Tencent Hunyuan in Large language models

2026

Advances in Multimodal Large Language Models (MLLMs) are transforming video captioning from a descriptive endpoint into a semantic interface for both video understanding and generation. However, the dominant paradigm still casts videos as monolithic narrative paragraphs that entangle visual, auditory, and identity information. This dense coupling not only compromises representational fidelity but also limits scalability, since even local edits can trigger global rewrites. To address this structural bottleneck, we propose Multi-Stream Scene Script (MTSS), a novel paradigm that replaces monolithic text with factorized and explicitly grounded scene descriptions. MTSS is built on two core principles: Stream Factorization, which decouples a video into complementary streams (Reference, Shot, Event, and Global), and Relational Grounding, which reconnects these isolated streams through explicit identity and temporal links to maintain holistic video consistency. Extensive experiments demonstrate that MTSS consistently enhances video understanding across various models, achieving an average reduction of 25% in the total error rate on Video-SALMONN-2 and an average performance gain of 67% on the Daily-Omni reasoning benchmark. It also narrows the performance gap between smaller and larger MLLMs, indicating a substantially more learnable caption interface. Finally, even without architectural adaptation, replacing monolithic prompts with MTSS in multi-shot video generation yields substantial human-rated improvements: a 45% boost in cross-shot identity consistency, a 56% boost in audio-visual alignment, and a 71% boost in temporal controllability.

Paper

Share this book

Add to My Shelf

AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics

by Team, Tencent HY in Anime , Coders , Collapse

2026

Video generation models internalize physical realism as their prior. Anime deliberately violates physics: smears, impact frames, chibi shifts; and its thousands of coexisting artistic conventions yield no single \"physics of anime\" a model can absorb. Physics-biased models therefore flatten the artistry that defines the medium or collapse under its stylistic variance. We present AniMatrix, a video generation model that targets artistic rather than physical correctness through a dual-channel conditioning mechanism and a three-step transition: redefine correctness, override the physics prior, and distinguish art from failure. First, a Production Knowledge System encodes anime as a structured taxonomy of controllable production variables (Style, Motion, Camera, VFX), and AniCaption infers these variables from pixels as directorial directives. A trainable tag encoder preserves the field-value structure of this taxonomy while a frozen T5 encoder handles free-form narrative; dual-path injection (cross-attention for fine-grained control, AdaLN modulation for global enforcement) ensures categorical directives are never diluted by open-ended text. Second, a style-motion-deformation curriculum transitions the model from near-physical motion to full anime expressiveness. Third, deformation-aware preference optimization with a domain-specific reward model separates intentional artistry from pathological collapse. On an anime-specific human evaluation with five production dimensions scored by professional animators, AniMatrix ranks first on four of five, with the largest gains over Seedance-Pro 1.0 on Prompt Understanding (+0.70, +22.4 percent) and Artistic Motion (+0.55, +16.9 percent). We are preparing accompanying resources for public release to support reproducibility and follow-up research.

Paper

Share this book

Add to My Shelf

HY-WU (Part I): An Extensible Functional Neural Memory Framework and An Instantiation in Text-Guided Image Editing

by Team, Tencent HY in Adaptation , Learning , Parameters

2026

Foundation models are transitioning from offline predictors to deployed systems expected to operate over long time horizons. In real deployments, objectives are not fixed: domains drift, user preferences evolve, and new tasks appear after the model has shipped. This elevates continual learning and instant personalization from optional features to core architectural requirements. Yet most adaptation pipelines still follow a static weight paradigm: after training (or after any adaptation step), inference executes a single parameter vector regardless of user intent, domain, or instance-specific constraints. This treats the trained or adapted model as a single point in parameter space. In heterogeneous and continually evolving regimes, distinct objectives can induce separated feasible regions over parameters, forcing any single shared update into compromise, interference, or overspecialization. As a result, continual learning and personalization are often implemented as repeated overwriting of shared weights, risking degradation of previously learned behaviors. We propose HY-WU (Weight Unleashing), a memory-first adaptation framework that shifts adaptation pressure away from overwriting a single shared parameter point. HY-WU implements functional (operator-level) memory as a neural module: a generator that synthesizes weight updates on-the-fly from the instance condition, yielding instance-specific operators without test-time optimization.

Paper

Share this book

Add to My Shelf

HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents

by Wang, Oran , HY Vision Team , Liang, Yves in Benchmarks , Cognition & reasoning , Effectiveness

2026

We introduce HY-Embodied-0.5, a family of foundation models specifically designed for real-world embodied agents. To bridge the gap between general Vision-Language Models (VLMs) and the demands of embodied agents, our models are developed to enhance the core capabilities required by embodied intelligence: spatial and temporal visual perception, alongside advanced embodied reasoning for prediction, interaction, and planning. The HY-Embodied-0.5 suite comprises two primary variants: an efficient model with 2B activated parameters designed for edge deployment, and a powerful model with 32B activated parameters targeted for complex reasoning. To support the fine-grained visual perception essential for embodied tasks, we adopt a Mixture-of-Transformers (MoT) architecture to enable modality-specific computing. By incorporating latent tokens, this design effectively enhances the perceptual representation of the models. To improve reasoning capabilities, we introduce an iterative, self-evolving post-training paradigm. Furthermore, we employ on-policy distillation to transfer the advanced capabilities of the large model to the smaller variant, thereby maximizing the performance potential of the compact model. Extensive evaluations across 22 benchmarks, spanning visual perception, spatial reasoning, and embodied understanding, demonstrate the effectiveness of our approach. Our MoT-2B model outperforms similarly sized state-of-the-art models on 16 benchmarks, while the 32B variant achieves performance comparable to frontier models such as Gemini 3.0 Pro. In downstream robot control experiments, we leverage our robust VLM foundation to train an effective Vision-Language-Action (VLA) model, achieving compelling results in real-world physical evaluations. Code and models are open-sourced at https://github.com/Tencent-Hunyuan/HY-Embodied.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter