Supported by

Ninth International Workshop on Symbolic-Neural Learning (SNL2025)

October 29-30, 2025
Nakanoshima Center 10F, The University of Osaka (Osaka, Japan)

Keynote Talks

Yun-Nung (Vivian) Chen (National Taiwan University)

"Strategizing Conversations: Reasoning for Personalized AI Agents"

Abstract:
This talk explores how to build more intelligent and personalized AI agents by strategizing conversations. We'll show how to use synthetic dialogue simulation to derive effective conversational strategies. By having LLM role-play as simulated customers and sales agents, we can generate and analyze rich behavioral data. We'll introduce a plug-and-play mechanism that allows us to inject these derived strategies directly into an agent's reasoning process. This approach not only improves performance but also enhances an agent's controllability and explainability, making it a practical tool for real-world marketing. Our method offers a scalable way to understand what influences customer engagement and conversion, ultimately enabling us to strategize more effective and successful interactions.

Bio:
Yun-Nung (Vivian) Chen is currently a professor in the Department of Computer Science & Information Engineering at National Taiwan University. She earned her Ph.D. degree from Carnegie Mellon University, where her research interests focus on spoken dialogue systems and natural language processing. She was recognized as the World's Top 2% Scientists in her 2023 impact, the Taiwan Outstanding Young Women in Science and received Google Faculty Research Awards, Amazon AWS Machine Learning Research Awards, MOST Young Scholar Fellowship, and FAOS Young Scholar Innovation Award. Her team was selected to participate in the first Alexa Prize TaskBot Challenge in 2021. Prior to joining National Taiwan University, she worked in the Deep Learning Technology Center at Microsoft Research Redmond. (http://vivianchen.idv.tw)

David Chiang (University of Notre Dame)

"What Transformers Can and Can't Do: A Logical Approach"

Abstract:
Neural networks are advancing the state of the art in many areas of artificial intelligence, but in many respects remain poorly understood. At a time when new abilities as well as new limitations of neural networks are continually coming to light, a clear understanding of what they can and cannot do is more needed than ever. The theoretical study of transformers, the dominant neural network for sequences, is just beginning, and we have helped to make this into a fruitful and fast-growing area of research. Our particular approach is to explore these questions by relating neural networks to formal logic. We have successfully proven that one variant of transformers, unique-hard attention transformers, are exactly equivalent to the first-order logic of strings with ordering and to linear temporal logic (LTL), which allows numerous expressivity results from logic to be carried over to unique-hard attention transformers. We have also proven that softmax attention transformers, under suitable assumptions about numeric precision, are exactly equivalent to an extension of LTL with counting. Among other things, this predicts that deeper transformers recognize more languages than shallower transformers, which we have confirmed experimentally.

Bio:
David Chiang (PhD, University of Pennsylvania, 2004) is an associate professor in the Department of Computer Science and Engineering at the University of Notre Dame. His research is on computational models for learning human languages, particularly on connections between formal language theory and natural language, and on speech and language processing for low-resource, endangered, and historical languages. He is the recipient of best paper awards at ACL 2005 and NAACL HLT 2009, and a social impact award and outstanding paper award at ACL 2024. He has received research grants from DARPA, NSF, Google, and Amazon, has served on the executive board of NAACL and the editorial board of Computational Linguistics and JAIR, and is currently on the editorial board of Transactions of the ACL.

Katsushi Ikeuchi (Irobomation LLC)

"Learning-from-Observation2.0"

Abstract:
We are developing a Learning-from-Observation (LfO) system that acquires robotic behaviors through the observation of human demonstrations. Unlike the bottom-up approach known as "Learning-from-Demonstration" or "Imitation Learning," which replicates human movements as they are, we are employing a top-down approach (top-down learning-from-observation). This method entails observing only the critical components of human actions through a task model representation (akin to Minsky's frame), generating an abstract representation based on these observations, which is subsequently mapped onto the robot's behavior. The advantages of this top-down approach include the ability to generalize and correct observational errors by utilizing an intermediate task model representation, thereby enhancing the affinity with large language models. Furthermore, by tailoring the mapping to each individual robot, the system can be applied to different robotic platforms without necessitating significant modifications to the recognition system. The initial step of the system involves the utilization of a large language model (LLM) to comprehend the "what-to-do" from human demonstrations and subsequently retrieve the corresponding task model. This task model directs the CNN-based observation module to focus on specific aspects of human behavior and fills in the requisite parameters for "how-to-do," thereby completing the intermediate representation. Based on this finalized task model, the system activates the appropriate agents from a pre-trained group of agents—trained through reinforcement learning on the "how-to-do" aspect—to execute the robot's actions. This presentation will provide a comprehensive overview of the system architecture, the design methodologies for the pre-trained skill sets, and other pertinent details. Furthermore, it will discuss a comparison between this hybrid approach, which integrates traditional robotic techniques with LLMs, and end-to-end (E2E) methodologies, including foundation models.

Bio:
Dr. Ikeuchi founded Irobomation in 2025. Prior to that, he held distinguished positions at MIT's Artificial Intelligence Laboratory, Japan's National Institute of Advanced Industrial Science and Technology (AIST), Carnegie Mellon University's Robotics Institute (CMU-RI), the University of Tokyo and Microsoft Research. His research interests span computer vision, robotics, and Intelligent Transportation Systems (ITS). He has served as the Editor-in-Chief of the International Journal of Computer Vision (IJCV) and the International Journal of Intelligent Transportation Systems (IJITS), as well as the Encyclopedia of Computer Vision. Dr. Ikeuchi has also chaired numerous international conferences, including IROS95, CVPR96, ICCV03, ITSW07, ICRA09, ICPR12, and ICCV17. He has been the recipient of several prestigious awards, such as the IEEE PAMI Distinguished Researcher Award, the Okawa Award, the Funai Award, the IEICE Outstanding Achievements and Contributions Award, as well as the Medal of Honor with Purple Ribbon from the Emperor of Japan. Dr. Ikeuchi is a Fellow of IEEE, IAPR, IEICE, IPSJ, and RSJ. He earned his Ph.D. in Information Engineering from the University of Tokyo and his Bachelor's degree in Mechanical Engineering from Kyoto University.

Kyle Richardson (AI2)

"Understanding the Logic of Generative AI through Logic and Programming"

Abstract:
Symbolic logic has long served as the de facto language for expressing complex knowledge throughout computer science, owing to its clean semantics. Symbolic approaches to reasoning that are driven by declarative knowledge, in sharp contrast to purely machine learning-based approaches, have the advantage of allowing us to reason transparently about the behavior and correctness of the resulting systems. In this talk, we focus on the broad question: Can the declarative and symbolic approaches be leveraged to better understand and formally specify algorithms for large language models (LLMs)? In the first part of the talk, we will focus on formalizing recent direct preference alignment (DPA) loss functions, such as DPO, that are commonly used for LLM alignment. Specifically, we ask: Given an existing DPA loss, can we systematically decompile it into a high-level symbolic program that characterizes its semantics? We outline a novel formalism we developed for this purpose based on probabilistic logic. We discuss how this formal view of preference learning sheds new light on both the size and the structure of the DPA loss landscape and makes it possible to derive new algorithms from first principles. In the second part, we extend this analysis to distilling test-time inference algorithms (e.g., chain-of-thought prompting) into other forms of symbolic programs, ones that rely on a shared set of algorithmic and semantic tools drawn from probabilistic programming and neuro-symbolic modeling. Our general framework and approach aim not only to provide guidance for the AI alignment community, but also to open the door to the development of new high-level programming languages and tooling that make LLM development easier and more transparent.

Bio:
Kyle Richardson is a senior research scientist at the Allen Institute for AI (AI2) in Seattle. He works at the intersection of NLP and Machine Learning on the Aristo team, with a particular focus on generative AI and language models. Recently, he has been interested in using formal methods to better understand and specify algorithms for large language models. Prior to AI2 he was at the IMS and the University of Stuttgart, where he obtained his PhD in 2018. website: https://www.krichardson.me/

Matt Walter (Toyota Technological Institute at Chicago)

"From Representations to Policies: Neural-Symbolic Robot Learning from Demonstrations"

Abstract:
Reinforcement learning (RL) has shown promising performance; however its high sample complexity limits its broader application across a variety of domains---particularly in real-world, sparse reward settings. Recent advances in robot learning increasingly draw on demonstrations to address these challenges, using examples of desirable behavior to guide exploration, provide structural priors, and bootstrap learning when explicit rewards are scarce. In this talk, I will begin with work that employs tokenization methods popularized by modern language models to learn temporal abstractions of the action space from demonstrations. By representing sequences of low-level actions as behavioral symbols, the framework enables policies to reason over extended time horizons and achieve more efficient exploration in sparse-reward environments. I will then present a class of imitation learning algorithms capable of directly learning from demonstrations, even when they are suboptimal. Key to these algorithms is their ability to adaptively determine when and how to rely on different demonstrators and to transition from learning from imitation-based to reinforcement-based learning as experience accumulates. Finally, I will conclude with recent work that learns task-agnostic reward functions from human videos, enabling goal-conditioned policy learning through offline RL without manual reward engineering or labeled supervision. Together, these directions illustrate how learned abstractions, imitation, and reinforcement can be combined to advance learning in complex, real-world domains.

Bio:
Matthew R. Walter is an associate professor at the Toyota Technological Institute at Chicago (TTIC). His interests revolve around the realization of intelligent, perceptually aware robots that are able to act robustly and effectively in unstructured environments, particularly with and alongside people. His research focuses on machine learning-based solutions that allow robots to learn to understand and interact with the people, places, and objects in their surroundings. Matthew has investigated these areas in the context of various robotic platforms, including autonomous underwater vehicles, self-driving cars, voice-commandable wheelchairs, mobile manipulators, and autonomous cars for (rubber) ducks. Matthew obtained his Ph.D. from the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution, where his thesis focused on improving the efficiency of inference for simultaneous localization and mapping.

Invited Talks

Mayumi Bono (National Institute of Informatics)

Improvisational Signing: How Deaf People Orient to and Engage with Symbolic Resources

Abstract:
Sign languages differ fundamentally from spoken languages. One of the most significant differences lies in how deaf people orient to and engage with symbolic resources. Like spoken languages, sign languages rely heavily on a lexicon—that is, a repertoire of conventionalized symbols. However, in Japanese Sign Language (JSL) interaction, when referring to entities for which a fixed lexical item has not yet been established, or when a concrete depiction is required to convey an idea, signers often employ alternative strategies. These may include temporarily substituting related lexical items, or incorporating other symbolic resources available to them—for example, mouthing elements from spoken Japanese—into their improvised signing. I refer to this phenomenon as “Improvisational Signing” and have been investigating it within the framework of conversation analysis and multimodal interaction research (Bono, 2017). In sign language AI research, benchmark datasets for continuous sign language recognition (CSLR) are typically based on news broadcasts with gloss annotations or scripted content prepared in advance in written language. Crucially, these datasets consist of monologic data—one signer addressing the camera—rather than dialogic interaction. While progress has been made in sign language recognition and translation research, the next major challenge will clearly be the development of technologies for dialogic sign language recognition (DSLR) in social interaction. In this talk, I will introduce phenomena such as Improvisational Signing and discuss whether, and to what extent, current approaches in sign language AI research are able to account for them.

Bio:
Mayumi Bono is an Associate Professor at the National Institute of Informatics (NII) in Tokyo, Japan. She received her Ph.D. in Applied Linguistics from Kobe University in 2005. In her Ph.D. project, she demonstrated how to build a machine-readable model of the “Participation Framework,” originally proposed by Canadian-American sociologist Erving Goffman. After receiving her Ph.D., she worked in informatics at ATR Media Information Science Laboratories, Kyoto University, and the National Institute of Informatics. She is currently conducting several research projects aimed at building sharable spoken and sign language multimodal corpora within an open science framework, making them accessible to academic researchers interested in human communication, and applying AI techniques such as machine learning.

Gou Koutaki (Kumamoto University)

"Ensemble System Using Semi-Automatic Instrument-Playing Robots"

Abstract:
The presenter has been developing devices that assist with musical instrument playing motions. While fully automatic playing robots have been extensively researched, they leave no room for human interaction in the performance. In contrast, this research has developed a semi-automatic instrument device where only part of the playing motions are automated by a robot, while the remaining motions are performed physically by the human player. This allows users to experience physical sound while easily playing their favorite songs. Furthermore, using multiple instrument-playing robots enables ensemble performance. This expands the scope to multi-person interaction. The presenter will introduce the guitar robot and saxophone robot developed. An example of a robot saxophone quartet system featuring soprano, alto, tenor, and baritone saxophones will be presented.

Bio:
Gou Koutaki received a Doctor of Engineering from Kumamoto University, Japan, in 2007. He joined the Production Engineering Research Laboratory, Hitachi, Ltd., in 2007 and is currently a professor at Kumamoto University. His research interests include image processing and musical-instrument support systems. He was originally a researcher in computer vision and has presented papers at CVPR, ICCV, SIGGRAPH technical paper, IJCV, etc., but is currently designing and manufacturing robotic instruments.

Shuhei Kurita (National Institute of Informatics)

"Real-world foundation models: from Text toward Egocentric-vision, 3D and Robotics"

Abstract:
With the rapid progress of large language models (LLMs) and multimodal language models (MLLMs), new attempts have emerged to process and reason about physical real-world information in textual form. Text remains one of the most intuitive and universal symbolic mediums we use and its abundance on the Internet provides unparalleled accessibility. Yet, as a representation of the physical real world, text alone captures only a limited portion of the underlying information. In this talk, I will discuss the evolving roles and potential of textual information among the broader landscape of LLM and MLLM research. By examining cross-disciplinary applications such as Egocentric-vision applications, 3D understanding, and robotic foundation models, I will explore how text serves as both a bridge and a bottleneck in grounding language models to the real world.

Bio:
Shuhei Kurita is an Assistant Professor at National Institute of Informatics and a Specially Appointed Associate Professor at Institute of Science Tokyo. He obtained his PhD of Informatics from Kyoto University in 2019. His research interests include from language modelings to visual foundation modelings, including vision, language and action modelings. He has a keen interest in developing language models with real world understandings and application to computer vision and robotics.

Takashi Matsubara (Hokkaido University)

"How First-order Logic Helps Diffusion-based Image Generation"

Abstract:
Despite the remarkable progress of diffusion models in text-to-image generation, they often struggle to faithfully capture the intended meaning of text prompts. A specified object may not appear, an adjective may incorrectly modify unintended objects, or an agent may fail to possess a specified object. In this talk, I introduce Predicated Diffusion, a unified framework designed to more effectively convey user intentions. It represents the intended meaning as propositions in first-order logic and interprets pixels in attention maps as fuzzy predicates. This formulation guides the generation process so that the resulting images more faithfully satisfy the specified propositions.

Bio:
Takashi Matsubara is a Professor at the Graduate School of Information Science and Technology, Hokkaido University. He received his Ph.D. in Engineering from Osaka University in 2015. He then served as an Assistant Professor at the Graduate School of System Informatics, Kobe University, and later as an Associate Professor at the Graduate School of Engineering Science, Osaka University, before assuming his current position in April 2024. Since May 2025, he has also been a Research Scientist at CyberAgent AI Lab. In 2021, he received the Research and Development Encouragement Award under the Strategic Information and Communications R&D Promotion Programme (SCOPE) from the Ministry of Internal Affairs and Communications. His primary research interests lie in scientific machine learning (SciML) and its applications to computer vision.

Yusuke Matsui (The University of Tokyo)

"Where Learned Data Structures Meet Computer Vision"

Abstract:
Learned data structures are a new type of data structure that enhances the performance of classical data structures, such as B-trees, by leveraging the power of machine learning. Learned data structures have been actively studied in the database field and hold the potential to accelerate many procedures across various domains. However, their capabilities are not yet widely recognized. In this talk, I will explore whether learned data structures can be applied to tasks in computer vision, and also discuss how applications in computer vision may influence learned data structures. In this discussion, I explore the potential of learned data structures, next-generation data structures incorporating machine learning.

Bio:
Yusuke Matsui is a senior assistant professor at the Department of Information and Communication Engineering, Graduate School of Information Science and Technology, The University of Tokyo. He received his Ph.D. in information science and technology from the University of Tokyo in 2016. His research focuses on computer vision, data structures, and machine learning. He is particularly interested in developing foundational technologies for large-scale and high-performance AI systems, including vector databases, retrieval-augmented generation (RAG), and learned data structures.

Sho Sonoda (RIKEN, CyberAgent)

"Conjecturing-Proving Loop: Discovering New Theorems via LLMs with In-Context Proof Learning in Lean"

Abstract:
Large Language Models have demonstrated significant promise in formal theorem proving. Previous studies mainly focus on solving existing problems. In this study, we focus on the ability of LLMs to find novel theorems. We propose the Conjecturing-Proving Loop (CPL) pipeline for automatically generating mathematical conjectures and proving them in Lean 4 format. A feature of our approach is that we generate and prove further conjectures with context including previously generated theorems and their proofs, which enables the generation of more difficult proofs by in-context learning of proof strategies without changing parameters of LLMs. We demonstrated that our framework rediscovered theorems with verification, which were published in past mathematical papers and have not yet formalized. Moreover, at least one of these theorems could not be proved by the LLM without in-context learning, even in natural anguage, which means that in-context learning was effective for neural theorem proving.

Bio:
Sho Sonoda is a Permanent Senior Research Scientist at Deep Learning Theory Team (PI: Prof. Taiji Suzuki), RIKEN AIP and Research Scientist at AI Lab, CyberAgent, Inc. He received the degree of Doctor of Engineering from Waseda University in 2017 under the supervision of Prof. Noboru Murata. He joined RIKEN in 2018, and he was tenured in 2021. His expertise is in theory and application of machine learning, and especially in harmonic analysis for neural networks. Since early 2023, he has also been working on AI4Math.

Shinnosuke Takamichi (Keio University)

"How Do Audio Foundation Models Understand Sound?"

Abstract:
The audio foundation model is a general-purpose model designed to handle a wide range of sounds, speech, acoustic, and music. How do such model understand sound? Unraveling the foundation model's understanding of sound contributes not only to explainable machine learning but also to elucidating human auditory. This talk introduces research from the perspectives of statistics and structure.

Bio:
Shinnosuke Takamichi received the Ph.D. degree from Nara Institute of Science and Technology, Japan, in 2016. He is currently Associate Professor of Keio University, Japan. He has received more than 20 paper/achievement awards including the IEEE Signal Processing Society Young Author Best Paper Award and the MEXT Young Scientists' Prize.

Tatsuya Yokota (Nagoya Institute of Technology)

"Tensor Network Decompositions and Their Applications in Machine Learning"

Abstract:
Matrix and tensor decompositions are classical mathematical models that have been widely used for feature extraction and reconstruction of multidimensional array data. In recent years, tensor network decompositions have emerged as more advanced models, and their applications in machine learning are actively being explored. In this talk, I will introduce the fundamentals of tensor network decompositions, highlight their unique and intriguing characteristics, and present examples of their applications.

Bio:
Tatsuya Yokota received his Ph.D. in Engineering from the Tokyo Institute of Technology, Tokyo, Japan, in 2014. He is currently an Associate Professor in the Department of Computer Science at the Nagoya Institute of Technology, Japan, and a Visiting Research Scientist with the Tensor Learning Team at RIKEN AIP. His research interests include matrix and tensor factorizations, signal and image processing, and machine learning. He serves as an Associate Editor for the IEEE Transactions on Signal Processing.

Naoto Yokoya (The University of Tokyo, RIKEN)

"Open and Equitable AI for Earth Observation"

Abstract:
Machine learning has accelerated the automation and advancement of Earth-observation (EO) analysis. For submeter-resolution imagery, where spatial pattern recognition is central, deep learning is essential, yet openness in data and tools has lagged under restrictive policies. We begin with globally applicable dense-prediction benchmarks: land-cover segmentation built from real, diverse imagery, and height estimation learned with high-fidelity synthetic data and domain adaptation. We then introduce single-image 3D plant reconstruction that combines modeling and machine learning to recover fine structure from one photograph, pointing to extensions from ground to aerial EO. We close with vision–language models for EO and their reliability, with benchmarks for disaster scene understanding and long-term temporal understanding.

Bio:
Naoto Yokoya is a Professor at the University of Tokyo (Graduate School of Frontier Sciences) and leads the Geoinformatics Team at RIKEN AIP. He received his Ph.D. in aerospace engineering from the University of Tokyo in 2013. His research lies at the intersection of remote sensing and computer vision, with applications to disaster management and environmental assessment. He previously held an Alexander von Humboldt Fellowship at DLR/TUM and currently serves as Associate Editor for IEEE TPAMI, IEEE TGRS, and ISPRS JPRS; he is a Clarivate Highly Cited Researcher (2022–).