Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
67
result(s) for
"Zheng, Sipeng"
Sort by:
MR imaging for the quantitative assessment of brain iron in aceruloplasminemia: A postmortem validation study
by
Vroegindeweij, Lena H.P.
,
Bossoni, Lucia
,
Bonnet, Sylvestre
in
Aceruloplasminemia
,
Alzheimer's disease
,
Autopsies
2021
Non-invasive measures of brain iron content would be of great benefit in neurodegeneration with brain iron accumulation (NBIA) to serve as a biomarker for disease progression and evaluation of iron chelation therapy. Although magnetic resonance imaging (MRI) provides several quantitative measures of brain iron content, none of these have been validated for patients with a severely increased cerebral iron burden. We aimed to validate R2* as a quantitative measure of brain iron content in aceruloplasminemia, the most severely iron-loaded NBIA phenotype.
Tissue samples from 50 gray- and white matter regions of a postmortem aceruloplasminemia brain and control subject were scanned at 1.5 T to obtain R2*, and biochemically analyzed with inductively coupled plasma mass spectrometry. For gray matter samples of the aceruloplasminemia brain, sample R2* values were compared with postmortem in situ MRI data that had been obtained from the same subject at 3 T – in situ R2*. Relationships between R2* and tissue iron concentration were determined by linear regression analyses.
Median iron concentrations throughout the whole aceruloplasminemia brain were 10 to 15 times higher than in the control subject, and R2* was linearly associated with iron concentration. For gray matter samples of the aceruloplasminemia subject with an iron concentration up to 1000 mg/kg, 91% of variation in R2* could be explained by iron, and in situ R2* at 3 T and sample R2* at 1.5 T were highly correlated. For white matter regions of the aceruloplasminemia brain, 85% of variation in R2* could be explained by iron.
R2* is highly sensitive to variations in iron concentration in the severely iron-loaded brain, and might be used as a non-invasive measure of brain iron content in aceruloplasminemia and potentially other NBIA disorders.
Journal Article
Spatio-temporal transcriptomic analysis reveals distinct nephrotoxicity, DNA damage, and regeneration response after cisplatin
2025
Nephrotoxicity caused by drug or chemical exposure involves complex mechanisms as well as a temporal integration of injury and repair responses in different nephron segments. Distinct cellular transcriptional programs regulate the time-dependent tissue injury and regeneration responses. Whole kidney transcriptome analysis cannot dissect the complex spatio-temporal injury and regeneration responses in the different nephron segments. Here, we used laser capture microdissection of formalin-fixed paraffin embedded sections followed by whole genome targeted RNA-sequencing-TempO-Seq and co-expression gene-network (module) analysis to determine the spatial–temporal responses in rat kidney glomeruli (GM), cortical proximal tubules (CPT) and outer-medulla proximal tubules (OMPT) comparison with whole kidney, after a single dose of the nephrotoxicant cisplatin. We demonstrate that cisplatin induced early onset of DNA damage in both CPT and OMPT, but not GM. Sustained DNA damage response was strongest in OMPT coinciding with OMPT specific inflammatory signaling, actin cytoskeletal remodeling and increased glycolytic metabolism with suppression of mitochondrial activity. Later responses reflected regeneration-related cell cycle pathway activation and ribosomal biogenesis in the injured OMPT regions. Activation of modules containing kidney injury biomarkers was strongest in OMPT, with OMPT
Clu
expression highly correlating with urinary clusterin biomarker measurements compared the correlation of Kim1. Our findings also showed that whole kidney responses were less sensitive than OMPT. In conclusion, our LCM-TempO-Seq method reveals a detailed spatial mechanistic understanding of renal injury/regeneration after nephrotoxicant exposure and identifies the most representative mechanism-based nephron segment specific renal injury biomarkers.
Graphical Abstract
Highlights
• Different nephron segments exhibit distinct transcriptomic perturbation with different degrees of sensitivity.
• Sustained activation of DNA damage responses upon cisplatin exposure is linked to progressive outcomes of injured nephron regions.
• Mechanistic kidney injury biomarkers such as urinary clusterin outperform conventional biomarkers in reflecting the condition of the damaged nephron segments.
Journal Article
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World
2024
We humans are good at translating third-person observations of hand-object interactions (HOI) into an egocentric view. However, current methods struggle to replicate this ability of view adaptation from third-person to first-person. Although some approaches attempt to learn view-agnostic representation from large-scale video datasets, they ignore the relationships among multiple third-person views. To this end, we propose a Prompt-Oriented View-agnostic learning (POV) framework in this paper, which enables this view adaptation with few egocentric videos. Specifically, We introduce interactive masking prompts at the frame level to capture fine-grained action information, and view-aware prompts at the token level to learn view-agnostic representation. To verify our method, we establish two benchmarks for transferring from multiple third-person views to the egocentric view. Our extensive experiments on these benchmarks demonstrate the efficiency and effectiveness of our POV framework and prompt tuning techniques in terms of view adaptation and view generalization. Our code is available at https://github.com/xuboshen/pov_acmmm2023.
SPAFormer: Sequential 3D Part Assembly with Transformers
2025
We introduce SPAFormer, an innovative model designed to overcome the combinatorial explosion challenge in the 3D Part Assembly (3D-PA) task. This task requires accurate prediction of each part's poses in sequential steps. As the number of parts increases, the possible assembly combinations increase exponentially, leading to a combinatorial explosion that severely hinders the efficacy of 3D-PA. SPAFormer addresses this problem by leveraging weak constraints from assembly sequences, effectively reducing the solution space's complexity. Since the sequence of parts conveys construction rules similar to sentences structured through words, our model explores both parallel and autoregressive generation. We further strengthen SPAFormer through knowledge enhancement strategies that utilize the attributes of parts and their sequence information, enabling it to capture the inherent assembly pattern and relationships among sequentially ordered parts. We also construct a more challenging benchmark named PartNet-Assembly covering 21 varied categories to more comprehensively validate the effectiveness of SPAFormer. Extensive experiments demonstrate the superior generalization capabilities of SPAFormer, particularly with multi-tasking and in scenarios requiring long-horizon assembly. Code is available at https://github.com/xuboshen/SPAFormer.
SPAFormer: Sequential 3D Part Assembly with Transformers
2024
We introduce SPAFormer, an innovative model designed to overcome the combinatorial explosion challenge in the 3D Part Assembly (3D-PA) task. This task requires accurate prediction of each part's pose and shape in sequential steps, and as the number of parts increases, the possible assembly combinations increase exponentially, leading to a combinatorial explosion that severely hinders the efficacy of 3D-PA. SPAFormer addresses this problem by leveraging weak constraints from assembly sequences, effectively reducing the solution space's complexity. Since assembly part sequences convey construction rules similar to sentences being structured through words, our model explores both parallel and autoregressive generation. It further enhances assembly through knowledge enhancement strategies that utilize the attributes of parts and their sequence information, enabling it to capture the inherent assembly pattern and relationships among sequentially ordered parts. We also construct a more challenging benchmark named PartNet-Assembly covering 21 varied categories to more comprehensively validate the effectiveness of SPAFormer. Extensive experiments demonstrate the superior generalization capabilities of SPAFormer, particularly with multi-tasking and in scenarios requiring long-horizon assembly. Codes and model weights will be released at https://github.com/xuboshen/SPAFormer.
Robust Motion Generation using Part-level Reliable Data from Videos
2025
Extracting human motion from large-scale web videos offers a scalable solution to the data scarcity issue in character animation. However, some human parts in many video frames cannot be seen due to off-screen captures or occlusions. It brings a dilemma: discarding the data missing any part limits scale and diversity, while retaining it compromises data quality and model performance. To address this problem, we propose leveraging credible part-level data extracted from videos to enhance motion generation via a robust part-aware masked autoregression model. First, we decompose a human body into five parts and detect the parts clearly seen in a video frame as \"credible\". Second, the credible parts are encoded into latent tokens by our proposed part-aware variational autoencoder. Third, we propose a robust part-level masked generation model to predict masked credible parts, while ignoring those noisy parts. In addition, we contribute K700-M, a challenging new benchmark comprising approximately 200k real-world motion sequences, for evaluation. Experimental results indicate that our method successfully outperforms baselines on both clean and noisy datasets in terms of motion quality, semantic consistency and diversity. Project page: https://boyuaner.github.io/ropar-main/
EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining
2025
Egocentric video-language pretraining has significantly advanced video representation learning. Humans perceive and interact with a fully 3D world, developing spatial awareness that extends beyond text-based understanding. However, most previous works learn from 1D text or 2D visual cues, such as bounding boxes, which inherently lack 3D understanding. To bridge this gap, we introduce EgoDTM, an Egocentric Depth- and Text-aware Model, jointly trained through large-scale 3D-aware video pretraining and video-text contrastive learning. EgoDTM incorporates a lightweight 3D-aware decoder to efficiently learn 3D-awareness from pseudo depth maps generated by depth estimation models. To further facilitate 3D-aware video pretraining, we enrich the original brief captions with hand-object visual cues by organically combining several foundation models. Extensive experiments demonstrate EgoDTM's superior performance across diverse downstream tasks, highlighting its superior 3D-aware visual understanding. Code: https://github.com/xuboshen/EgoDTM.
No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection
2023
Temporal video grounding (TVG) aims to retrieve the time interval of a language query from an untrimmed video. A significant challenge in TVG is the low \"Semantic Noise Ratio (SNR)\", which results in worse performance with lower SNR. Prior works have addressed this challenge using sophisticated techniques. In this paper, we propose a no-frills TVG model that consists of two core modules, namely multi-scale neighboring attention and zoom-in boundary detection. The multi-scale neighboring attention restricts each video token to only aggregate visual contexts from its neighbor, enabling the extraction of the most distinguishing information with multi-scale feature hierarchies from high-ratio noises. The zoom-in boundary detection then focuses on local-wise discrimination of the selected top candidates for fine-grained grounding adjustment. With an end-to-end training strategy, our model achieves competitive performance on different TVG benchmarks, while also having the advantage of faster inference speed and lighter model parameters, thanks to its lightweight architecture.
Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning
by
Liu, Jiazheng
,
Karlsson, Börje F
,
Zheng, Sipeng
in
Annotations
,
Datasets
,
Large language models
2025
Multimodal large language models (MLLMs), built on large-scale pre-trained vision towers and language models, have shown great capabilities in multimodal understanding. However, most existing MLLMs are trained on single-turn vision question-answering tasks, which do not accurately reflect real-world human conversations. In this paper, we introduce MMDiag, a multi-turn multimodal dialogue dataset. This dataset is collaboratively generated through deliberately designed rules and GPT assistance, featuring strong correlations between questions, between questions and images, and among different image regions; thus aligning more closely with real-world scenarios. MMDiag serves as a strong benchmark for multi-turn multimodal dialogue learning and brings more challenges to the grounding and reasoning capabilities of MLLMs. Further, inspired by human vision processing, we present DiagNote, an MLLM equipped with multimodal grounding and reasoning capabilities. DiagNote consists of two modules (Deliberate and Gaze) interacting with each other to perform Chain-of-Thought and annotations respectively, throughout multi-turn dialogues. We empirically demonstrate the advantages of DiagNote in both grounding and jointly processing and reasoning with vision and language information over existing MLLMs.
QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds
by
Wang, Ye
,
Mei, Yuting
,
Qin, Jin
in
Cognition
,
Knowledge bases (artificial intelligence)
,
Locomotion
2024
As robotic agents increasingly assist humans in reality, quadruped robots offer unique opportunities for interaction in complex scenarios due to their agile movement. However, building agents that can autonomously navigate, adapt, and respond to versatile goals remains a significant challenge. In this work, we introduce QuadrupedGPT designed to follow diverse commands with agility comparable to that of a pet. The primary challenges addressed include: i) effectively utilizing multimodal observations for informed decision-making; ii) achieving agile control by integrating locomotion and navigation; iii) developing advanced cognition to execute long-term objectives. Our QuadrupedGPT interprets human commands and environmental contexts using a large multimodal model. Leveraging its extensive knowledge base, the agent autonomously assigns parameters for adaptive locomotion policies and devises safe yet efficient paths toward its goals. Additionally, it employs high-level reasoning to decompose long-term goals into a sequence of executable subgoals. Through comprehensive experiments, our agent shows proficiency in handling diverse tasks and intricate instructions, representing a significant step toward the development of versatile quadruped agents for open-ended environments.