Catalogue Search | MBRL

Statistical characteristics of comic panel viewing times

by Leslie Wöhler , Kiyoharu Aizawa , Hikaru Ikuta in 631/477/2811 , 639/705/117 , Cartoons

2023

Comics are a bimodal form of art involving a mixture of text and images. Since comics require a combination of various cognitive processes to comprehend their contents, the analysis of human comic reading behavior sheds light on how humans process such bimodal forms of media. In this paper, we particularly focus on the viewing times of each comic panel as a quantitative measure of attention, and analyze the statistical characteristics of the distributions of comic panel viewing times. We create a user interface that presents comics in a panel-wise manner, and measure the viewing times of each panel through a user study experiment. We collected data from 18 participants reading 7 comic book volumes resulting in over 99,000 viewing time data points, which will be released publicly. The results show that the average viewing times are proportional to the text length contained in the panel’s speech bubbles, with a rate of proportion differing for each reader, despite the bimodal setting. Additionally, we find that the viewing time for all users follows a common heavy-tailed distribution.

Journal Article

Share this book

Add to My Shelf

Agreement Between an Artificial Intelligence-Based Meal Image Recognition System and the Weighed Dietary Record for Estimating Energy and Nutrient Intakes

by Iida, Ayaka , Aizawa, Kiyoharu , Sunto, Akiko in Adult , Artificial Intelligence , Brief Report

2026

Objectives: In Japan, smartphone applications are increasingly used for dietary recording in healthcare settings. This study aimed to examine the agreement between energy and nutrient intake estimates obtained using an artificial intelligence (AI)-based dietary recording application and those obtained using the weighed dietary record (WDR). Methods: The AI-based dietary recording method (FoodLog Athl method) was compared with WDR. Thirty-six university students (35 women and 1 man) simultaneously recorded their dietary intake using FoodLog Athl (FLA) and the WDR for 10 consecutive days. Energy and nutrient intakes were estimated using each method, and correlations and agreement between the two methods were evaluated. Results: Significant positive correlations were observed between the two methods for energy and most nutrients, except for iron, vitamin B1, and sodium chloride equivalent (p < 0.01). Compared with the WDR, the FLA method showed systematic overestimation of energy and major macronutrients (protein, fat, and carbohydrate) and underestimation of total dietary fiber. Bland–Altman analysis indicated fixed bias and relatively wide limits of agreement for several nutrients. Conclusions: The FLA method demonstrated moderate agreement with the WDR, with systematic bias observed for selected nutrients. These findings suggest that the application may be useful for monitoring overall dietary trends or relative intake over time, but caution is warranted when precise individual-level nutrient quantification is required. Professional review by registered dietitians may help improve estimation accuracy and reduce bias.

Journal Article

Share this book

Add to My Shelf

Gaze distribution analysis and saliency prediction across age groups

by Rämä, Pia , Krishna, Onkar , Aizawa, Kiyoharu in Adults , Age groups , Attention

2018

Knowledge of the human visual system helps to develop better computational models of visual attention. State-of-the-art models have been developed to mimic the visual attention system of young adults that, however, largely ignore the variations that occur with age. In this paper, we investigated how visual scene processing changes with age and we propose an age-adapted framework that helps to develop a computational model that can predict saliency across different age groups. Our analysis uncovers how the explorativeness of an observer varies with age, how well saliency maps of an age group agree with fixation points of observers from the same or different age groups, and how age influences the center bias tendency. We analyzed the eye movement behavior of 82 observers belonging to four age groups while they explored visual scenes. Explorative- ness was quantified in terms of the entropy of a saliency map, and area under the curve (AUC) metrics was used to quantify the agreement analysis and the center bias tendency. Analysis results were used to develop age adapted saliency models. Our results suggest that the proposed age-adapted saliency model outperforms existing saliency models in predicting the regions of interest across age groups.

Journal Article

Share this book

Add to My Shelf

Computational attention model for children, adults and the elderly

by Onkar, Krishna , Irie Go , Aizawa Kiyoharu in Adults , Computer vision , Eye movements

2021

Computational models of saliency estimation have been studied in a wide range of research fields, including visual perception, image processing, computer vision, multimedia, and their intersections. However, most of them seek to simulate scene viewing by adults only, and the impact of observer’s age has rarely been considered. In this paper, we quantitatively analyze age-related differences in gaze landing positions during scene viewing. From the results, we draw the following three conclusions: child observers focus more on the foreground in a scene, i.e., locations that are near, while elderly observers tend to explore the background, i.e., locations farther in the scene; adult observers are more explorative than child and elder ones; and adult observers have significantly lower center bias compared to child and elderly observers. Based on these observations, we developed a novel computational model for age-dependent saliency estimation. The prediction accuracy suggests that our model better fits collected eye-gaze data of observers belonging to different age groups than several existing models do.

Journal Article

Share this book

Add to My Shelf

Very fast generation of content-preserved photo collage under canvas size constraint

by Kiyoharu Aizawa , Zhipeng Wu in Access to information , Algorithms , Analysis

2016

Photo collage, which constructs a compact and visually appealing representation from a collection of input images, can offer a most convenient and impressive user experience. Most previous approaches to collage construction have utilized saliency detection and visibility optimization. However, such methods are computationally expensive and not feasible for real-time applications such as online image retrieval or interactive photo browsing. Moreover, the effectiveness of automatic saliency detection may be questionable. Even if the main regions of interest are retained accurately, some visually not salient but semantically important items such as logos, captions, and copyright information located at the margins and corners may be missed. In our alternative approach, we address the issue of content-preserved collage, which avoids content-harmful processes such as cropping or changes to aspect ratio and orientation. Based on a full balanced binary layout tree, our algorithm can pack all the input images tightly onto the collage canvas while keeping their visual information unchanged. The proposed algorithm is fast, requiring less than 0.5 ms to generate a 100-image collage. We also present several extensions and applications oriented to a variety of usage contexts and device platforms.

Journal Article

Share this book

Add to My Shelf

Efficiency-enhanced cost-volume filtering featuring coarse-to-fine strategy

by Aizawa, Kiyoharu , Yamaskai, Toshihiko , Ikehata, Satoshi in Filtration , Labeling , Labels

2018

Cost-volume filtering (CVF) is one of the most widely used techniques for solving general multi-labeling problems based on a Markov random field (MRF). However it is inefficient when the label space size (i.e., the number of labels) is large. This paper presents a coarse-to-fine strategy for cost-volume filtering that efficiently and accurately addresses multi-labeling problems with a large label space size. Based on the observation that true labels at the same coordinates in images of different scales are highly correlated, we truncate unimportant labels for cost-volume filtering by leveraging the labeling output of lower scales. Experimental results show that our algorithm achieves much higher efficiency than the original CVF method while maintaining a comparable level of accuracy. Although we performed experiments that deal with only stereo matching and optical flow estimation, the proposed method can be employed in many other applications because of the applicability of CVF to general discrete pixel-labeling problems based on an MRF.

Journal Article

Share this book

Add to My Shelf

Sketch-based manga retrieval using manga109 dataset

by Fujimoto, Azuma , Ito, Kota , Ogawa, Toru in Artists , Comics , Computer Communication Networks

2017

Manga (Japanese comics) are popular worldwide. However, current e-manga archives offer very limited search support, i.e., keyword-based search by title or author. To make the manga search experience more intuitive, efficient, and enjoyable, we propose a manga-specific image retrieval system. The proposed system consists of efficient margin labeling, edge orientation histogram feature description with screen tone removal, and approximate nearest-neighbor search using product quantization. For querying, the system provides a sketch-based interface. Based on the interface, two interactive reranking schemes are presented: relevance feedback and query retouch. For evaluation, we built a novel dataset of manga images, Manga109, which consists of 109 comic books of 21,142 pages drawn by professional manga artists. To the best of our knowledge, Manga109 is currently the biggest dataset of manga images available for research. Experimental results showed that the proposed framework is efficient and scalable (70 ms from 21,142 pages using a single computer with 204 MB RAM).

Journal Article

Share this book

Add to My Shelf

Emotype: Expressing emotions by changing typeface in mobile messenger texting

by Choi, Saemi , Aizawa, Kiyoharu in Emotions , Instant messaging systems , Nonverbal communication

2019

Instant messaging is a popular form of text-based communication. However, text-based messaging lacks the ability to communicate nonverbal information such as that conveyed through facial expressions and voice tones, although a multitude of emotions may underlie the text of a conversation between participants. In this paper, we propose an approach that uses typefaces to communicate emotions. We investigated which typefaces are useful for delivering emotions and introduced these typefaces into a mobile chat app. We conducted a survey to demonstrate how changes in the typeface of a message affected the meaning of the message conveyed. Our user study provides an understanding of the actual user experience with the application. The results show that the use of multiple typefaces in a message can affect and intensify the valence received by users and the use of multiple typefaces elicited an active response and brought about a livelier mood during texting.

Journal Article

Share this book

Add to My Shelf

Self-similarity-based partial near-duplicate video retrieval and alignment

by Wu, Zhipeng , Aizawa, Kiyoharu in Algorithms , Alignment , Clips

2014

There have been recent studies on partial near-duplicate videos, which involve segments of videos that are near duplicates of each other. State-of-the-art searching schemes usually segment the input video into clips and implement clip-level near-duplicate retrieval. However, the segmentation results are always poorly aligned, which lead to a difficult “unbalance” problem. In this paper, we introduce a self-similarity-based feature representation called the Self-Similarity Belt (SSBelt), which derives from the Self-Similarity Matrix (SSM). In addition, a distinctive pattern in SSBelt called the Interest Corner is detected and described by a bag-of-words representation. The visual words are then combined into visual shingles and indexed by an inverted file index for fast retrieval. Another important task is to accurately align the unbalanced clips, for which we propose the Intensity Mark (IMark) and design a coarse-to-fine near-duplicate video localization scheme. Experimental results show the effectiveness of our approach for both web-based near-duplicate video and unbalanced video datasets. The near-duplicate alignment capacity of IMark is also shown to be effective.

Journal Article

Share this book

Add to My Shelf

Motion Segmentation and Retrieval for 3D Video Based on Modified Shape Distribution

by Yamasaki, Toshihiko , Aizawa, Kiyoharu

2007

A similar motion search and retrieval system for 3D video are presented based on a modified shape distribution algorithm. 3D video is a sequence of 3D models made for a real-world object. In the present work, three fundamental functions for efficient retrieval have been developed: feature extraction, motion segmentation, and similarity evaluation. Stable-shape feature representation of 3D models has been realized by a modified shape distribution algorithm. Motion segmentation has been conducted by analyzing the degree of motion using the extracted feature vectors. Then, similar motion retrieval has been achieved employing the dynamic programming algorithm in the feature vector space. The experimental results using 3D video sequences of dances have demonstrated very promising results for motion segmentation and retrieval.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter