Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
Synth\\(^2\\): Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
by
Mitrovic, Jovana
, Pathak, Shreya
, Kumaran, Dharshan
, Banino, Andrea
, Kaplanis, Christos
, Sharifzadeh, Sahand
, Ilic, Anastasija
, Blundell, Charles
in
Datasets
/ Image enhancement
/ Image processing
/ Image quality
/ Large language models
/ Resource utilization
/ Synthetic data
2024
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Synth\\(^2\\): Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
by
Mitrovic, Jovana
, Pathak, Shreya
, Kumaran, Dharshan
, Banino, Andrea
, Kaplanis, Christos
, Sharifzadeh, Sahand
, Ilic, Anastasija
, Blundell, Charles
in
Datasets
/ Image enhancement
/ Image processing
/ Image quality
/ Large language models
/ Resource utilization
/ Synthetic data
2024
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
Synth\\(^2\\): Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
by
Mitrovic, Jovana
, Pathak, Shreya
, Kumaran, Dharshan
, Banino, Andrea
, Kaplanis, Christos
, Sharifzadeh, Sahand
, Ilic, Anastasija
, Blundell, Charles
in
Datasets
/ Image enhancement
/ Image processing
/ Image quality
/ Large language models
/ Resource utilization
/ Synthetic data
2024
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Synth\\(^2\\): Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
Paper
Synth\\(^2\\): Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
2024
Request Book From Autostore
and Choose the Collection Method
Overview
The creation of high-quality human-labeled image-caption datasets presents a significant bottleneck in the development of Visual-Language Models (VLMs). In this work, we investigate an approach that leverages the strengths of Large Language Models (LLMs) and image generation models to create synthetic image-text pairs for efficient and effective VLM training. Our method employs a pretrained text-to-image model to synthesize image embeddings from captions generated by an LLM. Despite the text-to-image model and VLM initially being trained on the same data, our approach leverages the image generator's ability to create novel compositions, resulting in synthetic image embeddings that expand beyond the limitations of the original dataset. Extensive experiments demonstrate that our VLM, finetuned on synthetic data achieves comparable performance to models trained solely on human-annotated data, while requiring significantly less data. Furthermore, we perform a set of analyses on captions which reveals that semantic diversity and balance are key aspects for better downstream performance. Finally, we show that synthesizing images in the image embedding space is 25\\% faster than in the pixel space. We believe our work not only addresses a significant challenge in VLM training but also opens up promising avenues for the development of self-improving multi-modal models.
Publisher
Cornell University Library, arXiv.org
This website uses cookies to ensure you get the best experience on our website.