Publications

2025

Conference

(2025). No Annotations for Object Detection in Art through Stable Diffusion. Proc. IEEE Winter Conference on Applications of Computer Vision (WACV).
(2025). PALADIN: Understanding Video Intentions in Political Advertisement Videos. Proc. IEEE Winter Conference on Applications of Computer Vision (WACV).

2024

Article

(2024). Cross-modal Guided Visual Representation Learning for Social Image Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence.
(2024). Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks. Journal of Imaging.
(2024). A picture may be worth a hundred words for visual question answering. Electronics.
(2024). Unleashing the Power of Contrastive Learning for Zero-Shot Video Summarization. Journal of Imaging.
(2024). Situating the social issues of image generation models in the model life cycle: a sociotechnical approach. AI and Ethics.
(2024). Exploring Emotional Stimuli Detection in Artworks: A Benchmark Dataset and Baselines Evaluation. Journal of Imaging.
(2024). GOYA: Leveraging Generative Art for Content-Style Disentanglement. Journal of Imaging.

Conference

(2024). DiReCT: Diagnostic Reasoning for Clinical Notes via Large Language Models. Proc. Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS).
(2024). From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment. Proc. 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP).
(2024). Resampled Datasets Are Not Enough: Mitigating Societal Bias Beyond Single Attributes. Proc. 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP).
(2024). MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Subtle Clue Dynamics in Video Dialogues. Proc. 2nd International Workshop on Multimodal and Responsible Affective Computing.
(2024). Stable Diffusion Exposed: Gender Bias from Prompt to Image. Proc. AAAI/ACM Conference on AI, Ethics, and Society.
(2024). Auditing Image-based NSFW Classifiers for Content Filtering. Proc. ACM Conference on Fairness, Accountability, and Transparency (FAccT).
(2024). Would Deep Generative Models Amplify Bias in Future Models?. Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
(2024). Reproducibility Companion Paper: Stable Diffusion for Content-Style Disentanglement in Art Analysis. Proc. 2024 International Conference on Multimedia Retrieval (ICMR).
(2024). Retrieving Emotional Stimuli in Artworks. Proc. 2024 International Conference on Multimedia Retrieval (ICMR).
(2024). Instruct me more! Random prompting for visual in-context learning. Proc. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
(2024). Revisiting pixel-level contrastive pre-training on scene images. Proc. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

2023

Article

(2023). Societal Bias in Vision-and-Language Datasets and Models. Journal of the Imaging Society of Japan.
(2023). Automatic evaluation of atlantoaxial subluxation in rheumatoid arthritis by a deep learning model. Arthritis Research & Therapy.
(2023). ACT2G: Attention-based Contrastive Learning for Text-to-Gesture Generation. Proceedings of the ACM on Computer Graphics and Interactive Techniques.
(2023). Multi-modal humor segment prediction in video. Multimedia Systems.
(2023). Real-time estimation of the remaining surgery duration for cataract surgery using deep convolutional neural networks and long short-term memory. BMC Medical Informatics and Decision Making.
(2023). Improving facade parsing with vision transformers and line integration. Advanced Engineering Informatics.
(2023). Development of a vertex finding algorithm using recurrent neural network. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment.

Conference

(2023). Enhancing Fake News Detection in Social Media via Label Propagation on Cross-Modal Tweet Graph. Proc. ACM International Conference on Multimedia (MM).
(2023). Learning bottleneck concepts in image classification. Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
(2023). Model-agnostic gender debiased image captioning. Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
(2023). Not only generative art: Stable diffusion for content-style disentanglement in art analysis. Proc. 2023 ACM International Conference on Multimedia Retrieval (ICMR).
(2023). Toward verifiable and reproducible human evaluation for text-to-image generation. Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
(2023). Uncurated image-text datasets: Shedding light on demographic bias. Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
(2023). Inference Time Evidences of Adversarial Attacks for Forensic on Transformers. Proc. AAAI-23 Workshop on Artificial Intelligence for Cyber Security (AICS).
(2023). Contrastive Losses Are Natural Criteria for Unsupervised Video Summarization. Proc. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

2022

Article

(2022). Corpus Construction for Historical Newspapers: A Case Study on Public Meeting Corpus Construction Using OCR Error Correction. SN Computer Science.
(2022). Depthwise spatio-temporal STFT convolutional neural networks for human action recognition. IEEE Trans. Pattern Analysis and Machine Intelligence.
(2022). Match them up: Visually explainable few-shot image classification. Applied Intelligence.
(2022). Information Extraction from Public Meeting Articles. SN Computer Science.
(2022). Anonymous identity sampling and reusable synthesis for sensitive face camouflage. Journal of Electronic Imaging.

Conference

(2022). Emotional Intensity Estimation based on Writer’s Personality. Proc. 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (AACL-IJCNJP): Student Research Workshop.
(2022). Deep Gesture Generation for Social Robots Using Type-Specific Libraries. Proc. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
(2022). Multi-label disengagement and behavior prediction in online learning. Proc. International Conference on Artificial Intelligence in Education.
(2022). A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog Domain. Proc. Thirteenth Language Resources and Evaluation Conference (LREC).
(2022). AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval. Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
(2022). Gender and racial bias in visual question answering datasets. Proc. ACM Conference on Fairness, Accountability, and Transparency (FAccT).
(2022). Optimal Correction Cost for Object Detection Evaluation. Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
(2022). Quantifying Societal Bias Amplification in Image Captioning. Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
(2022). Tone Classification for Political Advertising Video using Multimodal Cues. Proc. 3rd ACM Workshop on Intelligent Cross-Data Analysis and Retrieval.
(2022). Integration of gesture generation system using gesture library with DIY robot design kit. Proc. IEEE/SICE International Symposium on System Integration (SII).

2021

Article

(2021). The semantic typology of visually grounded paraphrases. Computer Vision and Image Understanding.
(2021). A comparative study of language Transformers for video question answering. Neurocomputing.
(2021). Noisy-LSTM: Improving temporal awareness for video semantic segmentation. IEEE Access.
(2021). Generation and detection of media clones. IEICE Trans. Information and Systems.
(2021). Preventing fake information generation against media clone attacks. IEICE Trans. Information and Systems.

Conference

(2021). Explain me the painting: Multi-topic knowledgeable art description generation. Proc. IEEE/CVF International Conference on Computer Vision (ICCV).
(2021). GCNBoost: Artwork Classification by Label Propagation Through a Knowledge Graph. Proc. ACM International Conference on Multimedia Retrieval (ICMR).
(2021). Image Retrieval by Hierarchy-aware Deep Hashing Based on Multi-task Learning. Proc. ACM International Conference on Multimedia Retrieval (ICMR).
(2021). SCOUTER: Slot attention-based classifier for explainable image recognition. Proc. IEEE/CVF International Conference on Computer Vision (ICCV).
(2021). Transferring domain-agnostic knowledge in video question answering. Proc. British Machine Vision Conference (BMVC).
(2021). Built year prediction from Buddha face with heterogeneous labels. Proc. Workshop on Structuring and Understanding of Multimedia Heritage Contents (SUMAC).
(2021). Visual question answering with textual representations for images. Proc. IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).
(2021). Learners' efficiency prediction using facial behavior analysis. Proc. International Conference on Image Processing (ICIP).
(2021). Museum Experience into a Souvenir: Generating Memorable Postcards from Guide Device Behavior Log. Proc. ACM/IEEE Joint Conference on Digital Libraries (JCDL).
(2021). PoseRN: A 2D pose refinement network for bias-free multi-view 3D human pose estimation. Proc. International Conference on Image Processing (ICIP).
(2021). Attending self-attention: A case study of visually grounded supervision in vision-and-language transformers. Proc. Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop.
(2021). MTUNet: Few-shot image classification with visual explanations. Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
(2021). WRIME: A new dataset for emotional intensity estimation with subjective and objective annotations. Proc. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT).
(2021). The laughing machine: Predicting humor in video. Proc. IEEE Winter Conference on Applications of Computer Vision (WACV).

2020

Article

(2020). ContextNet: Representation and exploration for painting classification and retrieval in context. International Journal on Multimedia Information Retrieval.
(2020). Cross-lingual visual grounding. IEEE Access.
(2020). Improving topic modeling through homophily for legal documents. Applied Network Science.
(2020). Visually grounded paraphrase identification via gating and phrase localization. Neurocomputing.
(2020). Warmer environments increase implicit mental workload even if learning efficiency is enhanced. Frontiers in Psychology.
(2020). Speech-driven face reenactment for a video sequence. ITE Trans. Media Technology and Applications.

Conference

(2020). IDSOU at WNUT-2020 Task 2: Identification of informative COVID-19 English tweets. Proc. Workshop on Noisy User-Generated Text (W-NUT).
(2020). Uncovering hidden challenges in query-based video moment retrieval. Proc. British Machine Vision Conference (BMVC).
(2020). A dataset and baselines for visual question answering on art. Proc. European Conference on Computer Vision Workshops (VISARTS).
(2020). Demographic Influences on Contemporary Art with Unsupervised Style Embeddings. Proc. European Conference on Computer Vision Workshops (VISARTS).
(2020). Knowledge-based video question answering with unsupervised scene descriptions. Proc. European Conference on Computer Vision (ECCV).
(2020). Privacy sensitive large-margin model for face de-identification. *Proc. International Conference on Neural Computing for Advanced Applications (NCAA) *.
(2020). Joint learning of vessel segmentation and artery/vein classification with post-processing. Proc. Medical Imaging with Deep Learning (MIDL).
(2020). Knowledge-Based Visual Question Answering in Videos. Proc. Workshop on Women in Computer Vision.
(2020). Yoga-82: A new dataset for fine-grained classification of human poses. Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
(2020). Constructing a public meeting corpus. Proc. Conference on Language Resources and Evaluation (LREC).
(2020). BERT representations for video question answering. Proc. IEEE Winter Conference on Applications of Computer Vision (WACV).
(2020). IterNet: Retinal image segmentation utilizing structural redundancy in vessel networks. Proc. IEEE Winter Conference on Applications of Computer Vision (WACV).
(2020). Toward predicting learners' efficiency for adaptive e-learning. Proc. International Learning Analytics and Knowledge Conference (LAK).
(2020). Video analytics in blended learning: Insights from learner-video interaction patterns. Proc. Workshop on Addressing Drop-Out Rates in Higher Education (ADORE).
(2020). KnowIT VQA: Answering knowledge-based questions about videos. Proc. AAAI Conference Artificial Intelligence (AAAI).
(2020). 3D image reconstruction from multi-focus microscopic images. Proc. Pacific-Rim Symposium on Image and Video Technology (PSIVT).

2019

Conference

(2019). Human shape reconstruction with loose clothes from partially observed data by pose specific deformation. Proc. Pacific-Rim Symposium on Image and Video Technology (PSIVT).
(2019). Legal information as a complex network: Improving topic modeling through homophily. Proc. International Conference on Complex Networks and Their Applications.
(2019). Adaptive gating mechanism for identifying visually grounded paraphrases. Proc. Multi-Discipline Approach for Learning Concepts.
(2019). BUDA.ART: A multimodal content-based analysis and retrieval system for Buddha statues. Proc. ACM International Conference on Multimedia (MM).
(2019). Historical and modern features for Buddha statue classification. Proc. Workshop on Structuring and Understanding of Multimedia HeritAge Contents.
(2019). Facial expression recognition with skip-connection to leverage low-level features. Proc. IEEE International Conference Image Processing (ICIP).
(2019). Context-aware embeddings for automatic art analysis. Proc. International Conference on Multimedia Retrieval (ICMR).
(2019). Rethinking the evaluation of video summaries. Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
(2019). Multimodal learning analytics: Society 5.0 project in Japan. Proc. International Conference on Learning Analytics and Knowledge (LAK).

2018

Article

(2018). Finding important people in a video using deep neural networks with conditional random fields. IEICE Trans. Information Systems.
(2018). Iterative applications of image completion with CNN-based failure detection. Journal of Visual Communication and Image Representation.
(2018). Summarization of user-generated sports video by using deep action recognition features. IEEE Trans. Multimedia.

Conference

(2018). iParaphrasing: Extracting visually grounded paraphrases via an image. Proc. International Conference on Computational Linguistics (COLING).
(2018). Representing a partially observed non-rigid 3D human using eigen-texture and eigen-deformation. Proc. International Conference on Pattern Recognition (ICPR).

2017

Article

(2017). Augmented reality marker hiding with texture deformation. IEEE Trans. Visualization and Computer Graphics.
(2017). Video summarization using textual descriptions for authoring video blogs. Multimedia Tools and Applications.
(2017). Increasing pose comprehension through augmented reality reenactment. Multimedia Tools and Applications.

Conference

(2017). Realtime novel view synthesis with eigen-texture regression. Proc. British Machine Vision Conference (BMVC).
(2017). Video question answering to find a desired video segment. Proc. Open Knowledge Base and Question Answering Workshop (OKBQA).
(2017). Novel view synthesis with light-weight view-dependent texture mapping for a stereoscopic HMD. Proc. IEEE International Conference on Multimedia and Expo (ICME).
(2017). ReMagicMirror: Action learning using human reenactment with the mirror metaphor. Proc. International Conference on Multimedia Modeling (MMM).

2016

Article

(2016). Flexible human action recognition in depth video sequences using masked joint trajectories. EURASIP Journal on Image and Video Processing.
(2016). Privacy protection for social video via background estimation and CRF-based videographer's intention modeling. IEICE Trans. Information and Systems.
(2016). Novel View Synthesis Based on View-dependent Texture Mapping with Geometry-aware Color Continuity. Transactions of the Virtual Reality Society of Japan.
(2016). Evaluating protection capability for visual privacy information. IEEE Security & Privacy.

Conference

(2016). Video summarization using deep semantic features. Proc. Asian Conference on Computer Vision (ACCV).
(2016). Learning joint representations of videos and sentences with web image search. Proc. Workshop on Web-scale Vision and Social Media.
(2016). Human action recognition-based video summarization for RGB-D personal sports video. Proc. IEEE International Conference on Multimedia and Expo (ICME).
(2016). 3D shape template generation from RGB-D images capturing a moving and deforming object. Proc. Electronic Imaging.

2015

Article

(2015). AR image generation using view-dependent geometry modification and texture mapping. Virtual Reality.
(2015). Protection and utilization of privacy information via sensing. IEICE Trans. Information and Systems.

Conference

(2015). Facial expression preserving privacy protection using image melding. Proc. IEEE International Conference on Multimedia and Expo (ICME).
(2015). Textual description-based video summarization for video blogs. Proc. IEEE International Conference on Multimedia and Expo (ICME).

2014

Article

(2014). Background estimation for a single omnidirectional image sequence captured with a moving camera. IPSJ Trans. Computer Vision and Applications.

Conference

(2014). Free-viewpoint AR human-motion reenactment based on a single RGB-D video stream. Proc. IEEE International Conference on Multimedia and Expo (ICME).

2013

Conference

(2013). Augmented reality image generation with virtualized real objects using view-dependent texture and geometry. Proc. IEEE International Symposium on Mixed and Augmented Reality (ISMAR).
(2013). Inferring what the videographer wanted to capture. Proc. IEEE International Conference on Image Processing (ICIP).
(2013). Real-time privacy protection system for social videos using intentionally-captured persons detection. Proc. IEEE International Conference on Multimedia and Expo (ICME).

2012

Article

(2012). Intended human object detection for automatically protecting privacy in mobile video surveillance. Multimedia Systems.

Conference

(2012). Markov random field-based real-time detection of intentionally-captured persons. Proc. IEEE International Conference on Image Processing (ICIP).

2011

Article

(2011). Indoor positioning system using digital audio watermarking. IEICE Trans. Information and Systems.

Conference

(2011). Extracting intentionally captured regions using point trajectories. Proc. ACM International Conference on Multimedia (MM).
(2011). Automatic generation of privacy-protected videos using background estimation. Proc. IEEE International Conference on Multimedia and Expo (ICME).

2010

Conference

(2010). Automatically protecting privacy in consumer generated videos using intended human object detector. Proc. ACM International Conference on Multimedia (MM).
(2010). Discriminating intended human objects in consumer videos. Proc. International Conference on Pattern Recognition (ICPR).
(2010). Real-time user position estimation in indoor environments using digital watermarking for audio signals. Proc. International Conference on Pattern Recognition (ICPR).
(2010). Detecting intended human objects in human-captured videos. Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
(2010). Digital diorama: Sensing-based real-world visualization. Proc. International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems.

2009

Article

(2009). Watermarked movie soundtrack finds the position of the camcorder in a theater. IEEE Trans. Multimedia.

2007

Conference

(2007). Maximum-likelihood estimation of recording position based on audio watermarking. Proc. International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIHMSP).
(2007). Determining Recording Location Based on Synchronization Positions of Audio watermarking. Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

2006

Conference

(2006). Estimation of recording location using audio watermarking. Proc. Workshop on Multimedia and Security (MM&Sec).