The very heart of sequentiel decision making

Facing the traveler tree, you wonder: which path shall I pick this time? Choosing the right alternative in an uncertain world is not easy. Advancing the multi-armed bandit theory will help you.

Mathematical Statistics for Sequential Learning

The more applied you go, the stronger theory you need. This is an equilibrium, between questions and answers, between dreams and practice. Mathematics is the door and the key for optimisation, learning guarantees and making your dreams come true.

Provably adaptive decisions in the wild

Providing algorithms with truely adaptive capabilities when facing an unknown dynamics and environment. Reinforcement Learning is the basic formalism, optimism in face of uncertainty a good tool, but robustness, and adaptivity to the unknown structure are the real challenges.

Sequential Learning for Sustainable Systems

Understanding the dynamics of complex systems, how to optimally act in them can have a huge positive impact on all aspects of human societies that require a careful management of natural, energetic, human and computational resources. It is our duty to optimally answer it.

The wind of change - An avenue of novel applications.

Choosing which future we want to shape is equally important as picturing the world we dream of beyond the existing applications of current research. From E-learning to Permaculture or Circular economy, embrace the potential of sequential learning for our societies.

All you need is a deep passion for mathematics, computer science and changing the world.

On these pages, you will find information regarding my research activities in the wide fields of Mathematics>Statistical Theory and Computer Science>Machine Learning. You may want to read and comment on my publications, or follow much more interesting links. For open positions (we do have several open positions in 2021), please go read this page as well as this one, and do not hesitate to get in touch by email as I have a bunch of exciting research projects to work on these days.

...
In case you
  • believe that understanding the dynamics of complex systems, as well as how to optimally act in them can have a huge positive impact on all aspects of human societies that require a careful management of natural, energetic, human and computational resources, and that it is thus our duty to optimally answer it,
  • consider that for that purpose, due to the limitations of human capabilities to process large amounts of data, we should pursue the long-term development of an optimal and automatic method that can, from mere observations and interactions with a complex system, understand its dynamics and how to optimally act in it,
  • want to attack this problem by using any combination of the following four pillar domains: Machine Learning, Mathematical Statistics, Dynamical Systems and Optimization,
  • then do not hesitate to contact me, I'll be very happy to help you achieve this goal.
    Research Domains

    Budgeted RL with Continuous States

    Budgeted RL with Continuous States
    A Budgeted Markov Decision Process is considered with continuous spaces environments and unknown dynamics. This requires a few modifications of the standard MDP theory.. ___________________________________________

    Reinforcement Learning State Representations

    Reinforcement Learning State Representations
    We consider online reinforcement learning when several state representations are available, and revisit a few results. ___________________________________________

    Adaptive Allocation for Learning Markov Chains

    Adaptive Allocation for Learning Markov Chains
    We extend the active allocation strategies for multi-armed bandits to the setup of multiple Markov chains. ___________________________________________

    Practical Open-Loop Optimistic Planning

    Practical Open-Loop Optimistic Planning
    We revisit the OLOP strategy, to make it more efficient in practice. ___________________________________________

    Exploiting State-Action Equivalence in RL

    Exploiting State-Action Equivalence in RL
    Dynamical systems often exhibit equivalence of dynamics when playing different actions from different states. We exploit this. ___________________________________________

    Sequential change-point detection

    Sequential change-point detection
    When you want to provably detect a change as fast as possible in the context of streaming data. ___________________________________________

    Robust Control of Uncertain Non-linear Dynamics

    Robust Control of Uncertain Non-linear Dynamics
    Desining safe control policies for non-linear systems with unknown dynamics. ___________________________________________

    Fully adaptive streaming kernel regression

    Fully adaptive streaming kernel regression
    Confidence bounds for streaming kernel regression while adapting regularization to an unknown variance. Application to bandits. ___________________________________________

    Variance of value function in RL regret bounds

    Variance of value function in RL regret bounds
    Finally ! Regret bounds involving local variance of the value function in undiscounted RL. ___________________________________________

    Aggregating a growing number of experts

    Aggregating a growing number of experts
    How do you aggregate decision of learners when at each time step more learners may arrive? ___________________________________________

    Boundary Crossing Probabilities

    Boundary Crossing Probabilities
    When you want to control, in finite time, the probability to cross a threshold in exponential families of arbitrary finite dimension K ___________________________________________

    One trajectory Spectral Learning

    One trajectory Spectral Learning
    How to apply the spectral method of moments when only observing a single trajectory of a dynamical system? ___________________________________________

    Non-stationary Stochastic Bandits I

    Non-stationary Stochastic Bandits I
    A first study to understand how to identify a best option and minimize regret when distributions are changing. ___________________________________________

    Random Shuffling for non-stationary bandits

    Random Shuffling for non-stationary bandits
    A simple idea to improve the handling of non-stationary bandits. ___________________________________________

    Low-rank Latent bandits

    Low-rank Latent bandits
    Combining the RTP methods with linear bandits enables to handle low-rank structure in bandits, yet at some high sampling cost. ___________________________________________

    Streaming confident regression

    Streaming confident regression
    In a streaming regression setting with dependent data, we build history-based confidence distribution on the next point. ___________________________________________

    Pliable Rejection Sampling

    Pliable Rejection Sampling
    Using kernel estimates to leverage application of rejection sampling at provably low cost. ___________________________________________

    Random Projections MCMC is hard

    Random Projections MCMC is hard
    Random Projections may replace sub-sampling techniques for MCMC with large data. Whether it actually works is a tricky issue. ___________________________________________

    How hard is my MDP?

    How hard is my MDP?
    How many samples do you need for tight enough confidence tp solve an MDP? The Bersntein norm of the value function helps! ___________________________________________

    Selecting State Representations

    Selecting State Representations
    When you have many possible notions of states perhaps all wrong, you don't know which is the best, but still want optimal regret guarantee. ___________________________________________

    Sub-sampling Bandits

    Sub-sampling Bandits
    A surprisingly simple bandit strategy that achieves the state-of-the-art in vast range of settings, without knowing the reward model. ___________________________________________

    Sampling without replacement

    Sampling without replacement
    What concentration inequalities can you show when sampling without replacement? ___________________________________________

    Latent Bandits

    Latent Bandits
    In recommender systems, not all features may be known about the users. Not considering the latent features may lead to dramatic results. ___________________________________________

    Robust risk-averse Bandits

    Robust risk-averse Bandits
    Choosing the right action when minimizing the risk of each trial, instead of simply the mean, and how to get near-optimal guarantees. ___________________________________________

    Handling infinitely many state models

    Handling infinitely many state models
    Solving an RL problem in a single stream of interactions when you don't know the state model, but have infinitely many candidates. ___________________________________________

    Better selecting the state representation

    Better selecting the state representation
    Solving an RL problem in a single stream of interactions in an optimal way when you have not one but many plausible state models. ___________________________________________

    Optimal bandit allocation strategy

    Optimal bandit allocation strategy
    An old KL-based class of bandit algorithm is shown to be not only optimal in the limit, but also analyzed for a finite number of pulls. ___________________________________________

    Random Projections Linear Regression

    Random Projections Linear Regression
    For a high-dimension function space, how to reduce the dimension to a manageable size while preserving risk-minimization guarantee? ___________________________________________

    Active curiosity-based sampling

    Active curiosity-based sampling
    Curiosity-driven learning naturally tradeoffs choosing between too complex or too easy tasks. This is here applied to active sampling. ___________________________________________

    Active sampling and partitioning

    Active sampling and partitioning
    Building a piecewise-constant approximation of a function, by actively sampling and refining a partition of the space in a near-optimal way. ___________________________________________

    Finite-time optimal bandit strategy

    Finite-time optimal bandit strategy
    Proving that an old bandit strategy based on KL divergence is optimal for discrete distributions. ___________________________________________

    Selecting the state-representation in RL

    Selecting the state-representation in RL
    Solving an RL problem in a single stream of interactions when you have many plausible state models and don't know which is right. ___________________________________________

    Sparse recovery with Brownian sensing

    Sparse recovery with Brownian sensing
    When compressed sensing fails because your sampling matrix has no good properties, apply Brownian sensing and you'll be fine. ___________________________________________

    Online learning with smooth opponents.

    Online learning with smooth opponents.
    Given a continuum of actions, an opponent chooses your feedback, only assumed to be smooth. How to get efficient, optimal actions? ___________________________________________

    Bound for Bellman residual minimization

    Bound for Bellman residual minimization
    In the setting of discounted MDPs, we show a generalization bound for the Bellman residual in linear approximation spaces. ___________________________________________

    History-dependent Adaptive Bandits

    History-dependent Adaptive Bandits
    Say you face an opponent in a bandit game. Knowing her limitations, you can design an optimal strategy. What if you don't know it? ___________________________________________

    Scrambled function spaces for regression

    Scrambled function spaces for regression
    Building from a large function space a subspace of manageable dimension by scrambling your basis functions. ___________________________________________

    Random Projections for MDPs

    Random Projections for MDPs
    Applying random projection regression to approximate the value function that is only available via fixed point formulation. ___________________________________________

    Compressed least-squares regression

    Compressed least-squares regression
    You basically build a random matrix to solve a regresion problem in a smaller space, and handling the approximation error overhead. ___________________________________________

    Many views agreement regularization

    Many views agreement regularization
    You observe the same data from different representations, each lerner providing different answers, and you want them to agree. ___________________________________________

    Have a good day :)

    If you are interested in actively saving academic research in France, you may ask your university to open a “Travail de Communication de la Recherche (T.C.R)“, this is a Teaching Unit (Unité d’Enseignement) for students to practice communicating research activities.
    Fièrement propulsé par Tempera & WordPress.