Sprekeler, H and Hennequin, G and Gerstner, W (2009) Code-specific policy gradient rules for spiking neurons. Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference. pp. 1741-1749.Full text not available from this repository.
Although it is widely believed that reinforcement learning is a suitable tool for describing behavioral learning, the mechanisms by which it can be implemented in networks of spiking neurons are not fully understood. Here, we show that different learning rules emerge from a policy gradient approach depending on which features of the spike trains are assumed to influence the reward signals, i.e., depending on which neural code is in effect. We use the framework of Williams (1992) to derive learning rules for arbitrary neural codes. For illustration, we present policy-gradient rules for three different example codes - a spike count code, a spike timing code and the most general "full spike train" code - and test them on simple model problems. In addition to classical synaptic learning, we derive learning rules for intrinsic parameters that control the excitability of the neuron. The spike count learning rule has structural similarities with established Bienenstock-Cooper-Munro rules. If the distribution of the relevant spike train features belongs to the natural exponential family, the learning rules have a characteristic shape that raises interesting prediction problems.
|Divisions:||Div F > Computational and Biological Learning|
|Depositing User:||Cron Job|
|Date Deposited:||07 Mar 2014 12:00|
|Last Modified:||08 Dec 2014 02:19|