Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
230
result(s) for
"Yamasaki, Toshihiko"
Sort by:
Intraocular Cytokine Level Prediction from Fundus Images and Optical Coherence Tomography
2025
The relationship between retinal images and intraocular cytokine profiles remains largely unexplored, and no prior work has systematically compared fundus- and OCT-based deep learning models for cytokine prediction. We aimed to predict intraocular cytokine concentrations using color fundus photographs (CFP) and retinal optical coherence tomography (OCT) with deep learning. Our pipeline consisted of image preprocessing, convolutional neural network-based feature extraction, and regression modeling for each cytokine. Deep learning was implemented using AutoGluon, which automatically explored multiple architectures and converged on ResNet18, reflecting the small dataset size. Four approaches were tested: (1) CFP alone, (2) CFP plus demographic/clinical features, (3) OCT alone, and (4) OCT plus these features. Prediction performance was defined as the mean coefficient of determination (R
) across 34 cytokines, and differences were evaluated using paired two-tailed
-tests. We used data from 139 patients (152 eyes) and 176 aqueous humor samples. The cohort consisted of 85 males (61%) with a mean age of 73 (SD 9.8). Diseases included 64 exudative age-related macular degeneration, 29 brolucizumab-associated endophthalmitis, 19 cataract surgeries, 15 retinal vein occlusion, and 8 diabetic macular edema. Prediction performance was generally poor, with mean R
values below zero across all approaches. The CFP-only model (-0.19) outperformed CFP plus demographics (-24.1;
= 0.0373), and the OCT-only model (-0.18) outperformed OCT plus demographics (-14.7;
= 0.0080). No significant difference was observed between CFP and OCT (
= 0.9281). Notably, VEGF showed low predictability (31st with CFP, 12th with OCT).
Journal Article
Transferability prediction among classification and regression tasks using optimal transport
by
Xueting Wang
,
Toshihiko Yamasaki
,
Tomoyuki Hatakeyama
in
Accuracy
,
Classification
,
Computer Communication Networks
2024
Transfer learning is a method for improving generalization performance by training a model for a different task first and then additionally training the pre-learned weights for the target task. However, transferability—the ease with which source task can be effectively transferred to which target task—is often unknown. Existing works proposed methods of measuring the transferability between classification tasks using images and discrete labels, but it cannot be applied to regression tasks. In this work, we investigate transferability among classification and regression tasks, and propose a method for predicting transferability by extending the optimal transport theory. Our transferability prediction model also can be applied to subjective tasks (e.g., aesthetics and memorability), which are usually regression tasks. We show that the appropriate source (pre-training) tasks can be predicted for the chosen target task without conducting actual pre-training and transferring trials. Experimental results demonstrated high prediction accuracy (correlation coefficient of
ρ
=
0.791
) and a speed improvement of approximately 300 times compared with the above-mentioned greedy approach.
Journal Article
A large-scale television advertising dataset for detailed impression analysis
by
Tao, Li
,
Tamura, Gen
,
Yamasaki, Toshihiko
in
Advertisements
,
Audio data
,
Computer Communication Networks
2024
Creating impressive video content such as movies and advertisements is a very important yet challenging task in business that requires both a sense of creativity and a lot of experience. Even professionals cannot necessarily invoke the impressions and emotions that they have aimed at. Many video advertisements are created and then disappear without giving a large impact on viewers. This paper presents a large-scale dataset of television (TV) advertisements that consists of 14,490 videos. The impressions of each video such as the recognition rate and interestingness rate are from the results of questionnaires answered by 620 participants. We also present a baseline for predicting the impression effects of TV advertisements by using visual and audio information, metadata such as broadcasting pattern, business category, the popularity of the casts, and text information including texts appearing on videos and narrations in audios. We predict four impressions of the viewers: 1) how much participants remember the video afterward, 2) how much they feel like buying the product/service, 3) how much they become interested in the product/service, and 4) how much they like the content of the advertisement itself. By combining images, audio, metadata, cast data, and text data, our baseline method is able to predict such impressions with a correlation of 0.69-0.82, much better than using a single-modal feature such as visual data or audio data only. This paper also gives some possible applications such as estimating the importance scores of each key frame, which gives us informative insights about how to make the advertisement content more impressive.
Journal Article
Sketch-based manga retrieval using manga109 dataset
2017
Manga (Japanese comics) are popular worldwide. However, current e-manga archives offer very limited search support, i.e., keyword-based search by title or author. To make the manga search experience more intuitive, efficient, and enjoyable, we propose a manga-specific image retrieval system. The proposed system consists of efficient margin labeling, edge orientation histogram feature description with screen tone removal, and approximate nearest-neighbor search using product quantization. For querying, the system provides a sketch-based interface. Based on the interface, two interactive reranking schemes are presented: relevance feedback and query retouch. For evaluation, we built a novel dataset of manga images, Manga109, which consists of 109 comic books of 21,142 pages drawn by professional manga artists. To the best of our knowledge, Manga109 is currently the biggest dataset of manga images available for research. Experimental results showed that the proposed framework is efficient and scalable (70 ms from 21,142 pages using a single computer with 204 MB RAM).
Journal Article
Attention-Based Multimodal Neural Network for Automatic Evaluation of Press Conferences
2020
In the study, a multimodal neural network is proposed to automatically predict the evaluation of a professional consultant team for press conferences using text and audio data. Seven publicly available press conference videos were collected, and all the Q&A pairs between speakers and journalists were annotated by the consultant team. The proposed multimodal neural network consists of a language model, an audio model, and a feature fusion network. The word representation is made up by a token embedding using ELMo and a type embedding. The language model is an LSTM with an attention layer. The audio model is based on a six-layer CNN to extract segmental feature as well as an attention network to measure the importance of each segment. Two approaches of feature fusion are proposed: a shared attention network and the production of text features and audio features. The former can explain the importance between speech content and speaking style. The latter achieved the best performance with the average accuracy of 60.1% for all evaluation criteria.
Journal Article
Motion Segmentation and Retrieval for 3D Video Based on Modified Shape Distribution
2007
A similar motion search and retrieval system for 3D video are presented based on a modified shape distribution algorithm. 3D video is a sequence of 3D models made for a real-world object. In the present work, three fundamental functions for efficient retrieval have been developed: feature extraction, motion segmentation, and similarity evaluation. Stable-shape feature representation of 3D models has been realized by a modified shape distribution algorithm. Motion segmentation has been conducted by analyzing the degree of motion using the extracted feature vectors. Then, similar motion retrieval has been achieved employing the dynamic programming algorithm in the feature vector space. The experimental results using 3D video sequences of dances have demonstrated very promising results for motion segmentation and retrieval.
Journal Article
Spatially adaptive multi-scale contextual attention for image inpainting
by
Yamasaki, Toshihiko
,
Chen, Yiyan
,
Wang, Xueting
in
Artificial neural networks
,
Computer Communication Networks
,
Computer Science
2022
Image inpainting is the task to fill missing regions of an image. Recently, researchers have achieved a great performance by using convolutional neural networks (CNNs) with the conventional patch-matching method. Existing methods compute the attention scores, which are based on the similarity of patches between the known and missing regions. Considering that patches at different spatial positions can convey different levels of detail, we propose a spatially adaptive multi-scale attention score that uses the patches of different scales to compute scores for each pixel at different positions. Through experiments on the Paris Street View and Places datasets, our proposal shows slight improvement compared with some related methods on the quantitative evaluation metrics commonly used in the existing methods. Moreover, we found that these quantitative metrics are not appropriate enough considering the subjective impressions of the generated images. Therefore, we conducted subjective evaluation through user study for comparison, which shows that our proposal has superiority of performance generating much more detailed and subjectively plausible images.
Journal Article
Image aesthetics prediction using multiple patches preserving the original aspect ratio of contents
2023
The spread of social networking services has created an increasing demand for selecting, editing, and generating impressive images. This trend increases the importance of evaluating image aesthetics as a fundamental function of automatic image processing. However, most existing methods for aesthetics score prediction require image rescaling for input, which can affect the prediction, especially for images with unusual aspect ratios. We propose a multi-patch method, called a multi-patch aggregation network (MPA-Net), to predict image aesthetics scores by maintaining the original aspect ratios of the contents in the images. One of our key contributions is the adoption of an equal-interval multi-patch selection approach for the prediction of the aesthetics score. The effectiveness of our strategy is shown through experiments involving the large-scale AVA dataset. Our MPA-Net outperformed the reported scores of the baseline methods and achieved a better performance in terms of the mean square error (MSE) than the state-of-the-art end-to-end continuous aesthetics score prediction methods. Most notably, MPA-Net yields a significantly lower MSE particularly for images with aspect ratios far from 1.0, indicating that MPA-Net is useful for a wide range of image aspect ratios. Moreover, MPA-Net has several benefits for training and evaluation procedures. MPA-Net meets the conditions of end-to-end learning and mini-batch learning simultaneously, and MPA-Net uses only images that do not require external information during the training or prediction stages. Thus, our easy-to-handle method improves the prediction of image aesthetics scores, outstandingly for images with extraordinary aspect ratios.
Journal Article
Which account will you follow? Recommending influential accounts on social media
2023
In the age of social media, brands spend large part of their budget on social media marketing to promote their products. Finding potential followers for brands has become an immense business opportunity. On the other hand, recommending influential accounts (such as brands and influencers) to ordinary users (customers) can help users find content of their interest. Therefore, matching among influential accounts and ordinary users is a necessary task and could be a powerful marketing tool. In order to effectively calculate compatibility among influential accounts and ordinary users, we consider that hashtags posted by users somehow represent their preferences and could be useful resources. We collected two Instagram datasets: including a brand dataset consisting of 99 brands with 78,996 followers and an influencer dataset consisting of 80 influencers with 43,992 followers. We utilize these users and their posted hashtags to create an account-user-tag graph. We propose a novel framework that incorporates graph embedding and pairwise learning to rank for better recommendation. The random walk based graph embedding method can capture high-order proximity in the interaction data. But it ignores some parts in the graph due to its randomness. The pairwise learning to rank component is designed for the complementary purpose. Experimental results showed that the proposed method is effective at recommending influential accounts when compared with existing methods. In the top-10 recommendation task, our proposed method achieves hit ratio of 0.416 in the brand dataset and 0.524 in the influencer dataset.
Journal Article
Motion Segmentation for Time-Varying Mesh Sequences Based on Spherical Registration
2009
A highly accurate motion segmentation technique for time-varying mesh (TVM) is presented. In conventional approaches, motion of the objects was analyzed using shape feature vectors extracted from TVM frames. This was because it was very difficult to locate and track feature points in the objects in the 3D space due to the fact that the number of vertices and connection varies each frame. In this study, we developed an algorithm to analyze the objects' motion in the 3D space using the spherical registration based on the iterative closest-point algorithm. Rough motion tracking is conducted and the degree of motion is robustly calculated by this method. Although the approach is straightforward, much better motion segmentation results than the conventional approaches are obtained by yielding such high precision and recall rates as 95% and 92% on average.
Journal Article