Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
28
result(s) for
"Meyers, Luke"
Sort by:
Optimizing image capture for computer vision‐powered taxonomic identification and trait recognition of biodiversity specimens
by
Fox, Nathan
,
Berger‐Wolf, Tanya
,
Betancourt, Isabelle
in
Biodiversity
,
Biological collections
,
Check lists
2025
Biological collections house millions of specimens with digital images increasingly available through open‐access platforms. However, most imaging protocols were developed for human interpretation without considering automated analysis requirements. As computer vision applications revolutionize taxonomic identification and trait extraction, a critical gap exists between current digitization practices and computational analysis needs. This review provides the first comprehensive practical framework for optimizing biological specimen imaging for computer vision applications. Through interdisciplinary collaboration between taxonomists, collection managers, ecologists and computer scientists, we synthesized evidence‐based recommendations addressing fundamental computer vision concepts and practical imaging considerations. We provide immediately actionable implementation guidance while identifying critical areas requiring community standards development. Our framework encompasses 10 interconnected considerations for optimizing image capture for computer vision‐powered taxonomic identification and trait extraction. We translate these into practical implementation checklists, equipment selection guidelines and a roadmap for community standards development, including filename conventions, pixel density requirements and cross‐institutional protocols. By bridging biological and computational disciplines, this approach unlocks automated analysis potential for millions of existing specimens and guides future digitization efforts towards unprecedented analytical capabilities.
Journal Article
Optimizing Image Capture for Computer Vision-Powered Taxonomic Identification and Trait Recognition of Biodiversity Specimens
by
Fox, Nathan
,
Perry, Kayla I
,
Betancourt, Isabelle
in
Computer vision
,
Digital imaging
,
Digitization
2025
1) Biological collections house millions of specimens with digital images increasingly available through open-access platforms. However, most imaging protocols were developed for human interpretation without considering automated analysis requirements. As computer vision applications revolutionize taxonomic identification and trait extraction, a critical gap exists between current digitization practices and computational analysis needs. This review provides the first comprehensive practical framework for optimizing biological specimen imaging for computer vision applications. 2) Through interdisciplinary collaboration between taxonomists, collection managers, ecologists, and computer scientists, we synthesized evidence-based recommendations addressing fundamental computer vision concepts and practical imaging considerations. We provide immediately actionable implementation guidance while identifying critical areas requiring community standards development. 3) Our framework encompasses ten interconnected considerations for optimizing image capture for computer vision-powered taxonomic identification and trait extraction. We translate these into practical implementation checklists, equipment selection guidelines, and a roadmap for community standards development including filename conventions, pixel density requirements, and cross-institutional protocols. 4)By bridging biological and computational disciplines, this approach unlocks automated analysis potential for millions of existing specimens and guides future digitization efforts toward unprecedented analytical capabilities.
Tracking Phenological Status and Ecological Interactions in a Hawaiian Cloud Forest Understory using Low-Cost Camera Traps and Visual Foundation Models
2026
Plant phenology, the study of cyclical events such as leafing out, flowering, or fruiting, has wide ecological impacts but is broadly understudied, especially in the tropics. Image analysis has greatly enhanced remote phenological monitoring, yet capturing phenology at the individual level remains challenging. In this project, we deployed low-cost, animal-triggered camera traps at the Pu'u Maka'ala Natural Area Reserve in Hawaii to simultaneously document shifts in plant phenology and flora-faunal interactions. Using a combination of foundation vision models and traditional computer vision methods, we measure phenological trends from images comparable to on-the-ground observations without relying on supervised learning techniques. These temporally fine-grained phenology measurements from camera-trap images uncover trends that coarser traditional sampling fails to detect. When combined with detailed visitation data detected from images, these trends can begin to elucidate drivers of both plant phenology and animal ecology.
Arcee's MergeKit: A Toolkit for Merging Large Language Models
by
Karpukhin, Vlad
,
Meyers, Luke
,
McQuade, Mark
in
Large language models
,
Machine learning
,
Open source software
2025
The rapid expansion of the open-source language model landscape presents an opportunity to merge the competencies of these model checkpoints by combining their parameters. Advances in transfer learning, the process of fine-tuning pretrained models for specific tasks, has resulted in the development of vast amounts of task-specific models, typically specialized in individual tasks and unable to utilize each other's strengths. Model merging facilitates the creation of multitask models without the need for additional training, offering a promising avenue for enhancing model performance and versatility. By preserving the intrinsic capabilities of the original models, model merging addresses complex challenges in AI - including the difficulties of catastrophic forgetting and multitask learning. To support this expanding area of research, we introduce MergeKit, a comprehensive, open-source library designed to facilitate the application of model merging strategies. MergeKit offers an extensible framework to efficiently merge models on any hardware, providing utility to researchers and practitioners. To date, thousands of models have been merged by the open-source community, leading to the creation of some of the worlds most powerful open-source model checkpoints, as assessed by the Open LLM Leaderboard. The library is accessible at https://github.com/arcee-ai/MergeKit.
Arcee's MergeKit: A Toolkit for Merging Large Language Models
by
Karpukhin, Vlad
,
Meyers, Luke
,
McQuade, Mark
in
Large language models
,
Libraries
,
Open source software
2024
The rapid expansion of the open-source language model landscape presents an opportunity to merge the competencies of these model checkpoints by combining their parameters. Advances in transfer learning, the process of fine-tuning pretrained models for specific tasks, has resulted in the development of vast amounts of task-specific models, typically specialized in individual tasks and unable to utilize each other's strengths. Model merging facilitates the creation of multitask models without the need for additional training, offering a promising avenue for enhancing model performance and versatility. By preserving the intrinsic capabilities of the original models, model merging addresses complex challenges in AI - including the difficulties of catastrophic forgetting and multitask learning. To support this expanding area of research, we introduce MergeKit, a comprehensive, open-source library designed to facilitate the application of model merging strategies. MergeKit offers an extensible framework to efficiently merge models on any hardware, providing utility to researchers and practitioners. To date, thousands of models have been merged by the open-source community, leading to the creation of some of the worlds most powerful open-source model checkpoints, as assessed by the Open LLM Leaderboard. The library is accessible at https://github.com/arcee-ai/MergeKit.
Interfacial properties of the apolipoprotein Cs: Implications for the regulation of lipoprotein catabolism and atherosclerosis
2015
The risk of cardiovascular disease increases with elevated plasma levels of very-low density lipoproteins (VLDL) and chylomicrons. The human apolipoprotein Cs (apo C1, C2, C3) are small secretory proteins that circulate in plasma and play unique roles in the metabolism of VLDL and chylomicrons. ApoC2 is the required cofactor for lipoprotein lipase (LPL) which hydrolyzes plasma triacylglycerol. ApoC3 promotes VLDL synthesis in hepatocytes and both apoC1 and apoC3 inhibit LPL. The molecular details of these processes are largely unknown, but we hypothesized that apoC functions depend on protein structure, protein:lipid interactions, and surface pressure. Each apoC contains amphipathic N- and C-terminal helices that bind to and remodel lipid surfaces. Surface pressure—or the density of amphipathic molecules—increases significantly as LPL hydrolyzes triacylglycerol in VLDL. To probe the effects of protein structure and surface pressure on protein:lipid interactions, we used wild-type and point mutant variants of the apoCs, which differed in helical content and hydrophobicity. We used Oil-Drop Tensiometry to characterize the adsorption, conformational rearrangement, and desorption of each protein at lipid/water interfaces that mimic the core and surface of VLDL. This technique measured the effect of protein adsorption on surface pressure, and the surface area and pressure response of protein/lipid/water interfaces to volume changes that mimic lipogenic and lipolytic processes. We showed that the degree of protein amphipathic α-helical structure correlated with lipid affinity and provide a model for phenotypes in subjects with point mutations in apoC2 and apoC3. Each apoC exhibited multiple, pressure-dependent conformations at lipid surfaces, which indicates that the C-terminus of apoC2 likely desorbs from lipid at higher pressures to interact with LPL. ApoC3 exhibited a marked preference for lipid in the VLDL core, which provides novel insight into its role in VLDL assembly and secretion.
Dissertation
Towards Automatic Honey Bee Flower-Patch Assays with Paint Marking Re-Identification
2023
In this paper, we show that paint markings are a feasible approach to automatize the analysis of behavioral assays involving honey bees in the field where marking has to be as lightweight as possible. We contribute a novel dataset for bees re-identification with paint-markings with 4392 images and 27 identities. Contrastive learning with a ResNet backbone and triplet loss led to identity representation features with almost perfect recognition in closed setting where identities are known in advance. Diverse experiments evaluate the capability to generalize to separate IDs, and show the impact of using different body parts for identification, such as using the unmarked abdomen only. In addition, we show the potential to fully automate the visit detection and provide preliminary results of compute time for future real-time deployment in the field on an edge device.
Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation
2024
We conducted extensive experiments on domain adaptation of the Meta-Llama-3-70B-Instruct model on SEC data, exploring its performance on both general and domain-specific benchmarks. Our focus included continual pre-training (CPT) and model merging, aiming to enhance the model's domain-specific capabilities while mitigating catastrophic forgetting. Through this study, we evaluated the impact of integrating financial regulatory data into a robust language model and examined the effectiveness of our model merging techniques in preserving and improving the model's instructive abilities. The model is accessible at hugging face: https://huggingface.co/arcee-ai/Llama-3-SEC-Base, arcee-ai/Llama-3-SEC-Base. This is an intermediate checkpoint of our final model, which has seen 20B tokens so far. The full model is still in the process of training. This is a preprint technical report with thorough evaluations to understand the entire process.