Kyoto Information and Society Seminar (KISS) | Machine Learning and Data Mining Research Laboratory

This seminar series invites speakers from a variety of research fields, including from informatics to mathematical sciences, and humanities, as well as from the fields of social application and implementation of information technology. The seminar will be held in English and will be scheduled to be held once or twice a month. If you are interested in attending or giving a presentation, please contact takeuchi@i.kyoto-u.ac.jp. [Japanese page]

Organizer
- Akira Matsushima (KU, UTMD)
- Kyohei Atarashi (KU)
- Han Bao (The Institute of Statistical Mathematics)
- Koh Takeuchi (KU, RIKEN AIP)

Current schedule:

[KISS-007]
- Title: Confident Region (CoRe) Approaches for Variational Quantum Eigensolvers
- Presenter: Shinichi Nakajima (Technical University of Berlin / BIFOLD / RIKEN AIP) [web]
- Date：04/16 13:00-14:30
- Location：104, Research Bldg. No. 7
- Abstract:
  Variational quantum eigensolvers (VQEs) are hybrid quantum-classical algorithms for estimating the ground-state energy of given quantum Hamiltonians. The classical part of VQEs corresponds to solving a noisy black-box optimization problem, for which we recently developed three approaches based on Gaussian processes (GPs) and the notion of confident region (CoRe)—the region in the search space where the predictive uncertainty is bounded by a required accuracy. This talk will begin with an introduction to VQEs and the concept of CoRe, followed by an explanation of how physical prior knowledge about VQEs is incorporated into our GP-based methods for three applications: Bayesian optimization, quantum computation cost minimization, and stochastic gradient descent.
- Related papers:
  K. Nicoli et al., “Physics-Informed Bayesian Optimization of Variational Quantum Circuits,” NeurIPS 2023. https://arxiv.org/abs/2406.06150
  C. Anders et al., “Adaptive Observation Cost Control for Variational Quantum Eigensolvers,” ICML 2024. https://arxiv.org/abs/2502.01704
  S. Pedrielli et al., “Bayesian Parameter Shift Rule in Variational Quantum Eigensolvers,” arXiv:2502.02625 https://arxiv.org/abs/2502.02625

Past schedule:

[KISS-006]
- Title: Online Inverse Linear Optimization
- Presenter: Shinsaku Sakaue [web]
- Date：03/07 13:30-15:00
- Location：107, Research Bldg. No. 7
- Abstract:
  We study an online learning problem in which, over $T$ rounds, a learner observes time-varying feasible sets and an agent’s optimal actions obtained via linear optimization on $\R^n$. The learner sequentially predicts the agent’s linear objective function, and the performance is measured by the regret—the cumulative gap between the optimal objective values and those achieved by following predictions. The best known regret bound is $O(n^4 \log T)$, which is achieved by a somewhat inefficient ellipsoid-based method (Besbes et al. 2021, 2023). We present an efficient method based on the online Newton step and establish a regret bound of $O(n \log T)$, improving the previous bound by a factor of $n^3$. We also discuss how to handle suboptimal actions by using universal online-learning methods and possible directions toward tight regret analysis.
[KISS-005]
- Title: A Pre-Trained Graph-Based Model for Adaptive Sequencing of Educational Documents
- Presenter: Jill-Jênn Vie (Inria) [web]
- Date：1/29 15:00-16:30
- Location：B04, Research Bldg. No. 7 (B1 floor)
- Abstract:
  Massive Open Online Courses (MOOCs) have greatly contributed to making education more accessible. However, many MOOCs maintain a rigid, one-size-fits-all structure that fails to address the diverse needs and backgrounds of individual learners. Learning path personalization aims to address this limitation, by tailoring sequences of educational content to optimize individual student learning outcomes. Existing approaches, however, often require either massive student interaction data or extensive expert annotation, limiting their broad application. In this talk, we will first present a dynamic version of cognitive diagnosis, then introduce a novel data-efficient framework for learning path personalization that operates without expert annotation. Our method employs a flexible recommender system pre-trained with reinforcement learning on a dataset of raw course materials. Through experiments on semi-synthetic data, we show that this pre-training stage substantially improves data-efficiency in a range of adaptive learning scenarios featuring new educational materials. This opens up new perspectives for the design of foundation models for adaptive learning.
[KISS-004]
- Title: A tutorial on metaheuristics for combinatorial optimization problems
- Presenter: Shunji Umetani (Senior Researcher Advanced Technology Lab., Recruit Co., Ltd.) [web]
- Date：12/17 13:30-15:00
- Location：Lecture room 3, Research Bldg. No. 7 (104, 1st floor)
- Abstract:
  We often encounter computationally hard (a.k.a. NP-hard) combinatorial optimization problems in a wide range of industrial applications. A standard approach is formulating the real-world problem as a mixed integer programming (MIP) problem and then solve it by one of the state-of-the-art MIP solvers. Continuous development of the MIP technology has much improved the performance of MIP solvers and this has been accompanied by advances in computing machinery. However, many real-world problems still remains unsolved due to a large gap between the lower and upper bounds of the optimal values. We often consider an alternative approach based on heuristic algorithms that give us (not optimal) but practically good feasible solutions for such hard problems.
  Metaheuristics can be considered as the collection of ideas on designing heuristic algorithm for combinatorial optimization problems. The ideas of metaheuristics give us a systematic view by incorporating them into the basic strategies such as greedy and local search algorithms. In this tutorial, we first introduce how to design efficient local search algorithms along with their ingredients. We then introduce their expansions called “metaheuristics” along with types of strategies such as iterated local search (ILS), simulated annealing (SA), genetic algorithm (GA), guided local search (GLS) and so on.
[KISS-003]
- Title: How Transformers Learn Causal Structure with Gradient Descent
- Presenter: Jason D. Lee (Associate Professor, Princeton University) [web]
- Date：12/5 13:30-15:00
- Location：Lecture room 2, Research Bldg. No. 7 (101, 1st floor)
- Abstract:
  The incredible success of transformers on sequence modeling tasks can be largely attributed to the self-attention mechanism, which allows information to be transferred between different parts of a sequence. Self-attention allows transformers to encode causal structure which makes them particularly suitable for sequence modeling. However, the process by which transformers learn such causal structure via gradient-based training algorithms remains poorly understood. To better understand this process, we introduce an in-context learning task that requires learning latent causal structure. We prove that gradient descent on a simplified two-layer transformer learns to solve this task by encoding the latent causal graph in the first attention layer. The key insight of our proof is that the gradient of the attention matrix encodes the mutual information between tokens. As a consequence of the data processing inequality, the largest entries of this gradient correspond to edges in the latent causal graph. As a special case, when the sequences are generated from in-context Markov chains, we prove that transformers learn an induction head (Olsson et al., 2022). We confirm our theoretical findings by showing that transformers trained on our in-context learning task are able to recover a wide variety of causal structures
[KISS-002]
- Title: Best Arm Identification: Fixed Confidence and Fixed Budget Settings
- Presenter: Jumpei Komiyama (Assistant Professor, NYU) [web]
- Date：10/29 13:30-15:00
- Location：Lecture room 3, Research Bldg. No. 7 (104, 1st floor)
- Abstract:
  We consider the best arm identification problem, where the goal is to find the arm with the largest mean. In this problem, there are two popular settings: the fixed-confidence setting, where the desired confidence level is given, and the fixed-budget setting, where the sample size is predetermined. We introduce the basic ideas of this problem and discuss how differences in the problem setting affect algorithmic design. If time permits, I will also introduce other recent works by the speaker.
[KISS-001]
- Title: Outlier-Robust Neural Network Training: Efficient Optimization of Transformed Trimmed Loss with Variation Regularization
- Presenter: Akihumi Okuno (Assistant Professor, ISM) [web]
- Date：10/8 13:30-15:00
- Location：Lecture room 3, Research Bldg. No. 7 (104, 1st floor)
- Abstract:
  In this study, we consider outlier-robust predictive modeling using highly-expressive neural networks. To this end, we employ (1) a transformed trimmed loss (TTL), which is a computationally feasible variant of the classical trimmed loss, and (2) a higher-order variation regularization (HOVR) of the prediction model. Note that using only TTL to train the neural network may possess outlier vulnerability, as its high expressive power causes it to overfit even the outliers perfectly. However, simultaneously introducing HOVR constrains the effective degrees of freedom, thereby avoiding fitting outliers. We newly provide an efficient stochastic gradient supergradient descent (SGSD) algorithm for optimization and its theoretical convergence guarantee. (This work is a joint work with Shotaro Yagishita (ISM))