Spike-Timing-Dependent Plasticity and its Reward
Spike-Timing-Dependent Plasticity
and its Reward-Modulated Variant
Dejan Pecevski
Institute for Theoretical Computer Science
Graz University of Technology
Neural Networks B, April, 2009
• Spike-Timing-Dependent Plasticity
ƒ Experimental evidence
ƒ Model of STDP
ƒ Analysis of STDP: learning equation
ƒ Extensions of the model to fit additional experimental data
ƒ Functional role of STDP (hypotheses)
• Reward-modulated STDP
ƒ model of RM-STDP and the learning equation
ƒ Modeling a biofeedback experiment
ƒ Discrimination of temporal spike patterns with RM-STDP
Hebbian learning
synaptic changes are thought to be the
neurochemical basis for learning and memory
Hebb’s postulate
“When an axon of cell A repeatedly or
persistently takes part in firing cell B, then A’s
efficiency as one of the cells firing B is
Hebb, 1949
local rule
driven by the correlations of the firing between
cells that are correlated form cell assemblies
Spike-timing-dependent plasticity
Two connected neurons are stimulated to fire at specific times tpre (presynaptic nrn.)
and tpost (postsynaptic nrn.).
Such paired stimulations at tpre and tpost are performed repeatedly, always with same
interval between the corresponding spike pairs Δt= tpost-tpre..
After the repeated stimulation the change of the EPSP amplitude is observed.
The same procedure is repeated for different values of Δt, to examine how the
change of EPSP depends on Δt, and the sign of Δt.
Spike-timing-dependent plasticity
If the pre-synaptic neuron fires in a time window on the order of 10s of ms before the
post-synaptic neuron, then the EPSP increases => the synapse is strengthened
If the pre-synaptic neuron fires in a time window on the order of 10s of ms
after the post-synaptic neuron, then the EPSP decreases => the synapse is
weakened (LTD)
Phenomenological model of STDP
We denote as Si(t) the spike train of neuron i:
The STDP model is defined through a learning window function W( Δt):
Each presynaptic and postsynaptic spike pair
instantaneous weight change
that happens at the time of the later spike
Model of STDP
The weight change can be expressed as
Substituting for
we can calculate the total weight change in a time interval [0,T]
where we sum over all spikes of the presynaptic and postsynaptic neuron that are
within the interval [0,T] i.e.
STDP model by local variables
x jpre
ti( g )
update with every presynaptic spike
update with every postsynaptic spike
At a postsynaptic spike the weight change is proportional to the
and at presynaptic spike proportional to the
A biophysical model of STDP
Arrival of an action potential at the presynaptic terminal induces release of
the neurotransmitter glutamate into the synaptic cleft.
Glutamate binds with AMPA and NMDA receptors.
AMPA receptors open but NMDA are blocked by Mg.
A biophysical model of STDP
Depolarization of the postsynaptic membrane unblocks Mg allowing influx of
Depolarization is caused by back-propagating action potential that travels
up the dendrites.
Change of Ca2+ concentration in the postsynaptic cell triggers processes
that change the synaptic efficacy.
Different STDP windows for different types of cells
Analysis of STDP
We assume that the presynaptic and postsynaptic spike trains are stochastic processes
drawn from a stochastic ensemble E
We compute the expected weight change over some time interval [0,T]
is the temporal average.
Instantaneous firing rate of neuron i:
correlations of firing of neuron i and j:
Analysis of STDP
The weights changes depend on the correlations between the inputs and the output of
a neuron
The output of the neuron depends on the input.
For simple neuron models (e.g. linear Poisson neuron model), it is possible to derive
analytically the weight changes given the statistics of the inputs.
If the input is weakly correlated with the output, the weight decreases.
Groups of inputs that are strongly correlated (and drive the neuron) are strengthened.
STDP selects inputs which are correlated on the timescale of the learning
window and the postsynaptic potential.
Weight dependence of STDP
From (Bi and Poo,1998)
The amount of change of the EPSC
amplitude depends on the initial EPSC
The dependency is different for positive
and negative spike pairing
Model from Gütig et al. 2001 - smooth transition between an additive and
weight dependent STDP rule
Dependence on pairing frequency
15 bursts of 5 spikes at different
frequencies are induced in the neurons
Δt = +10ms
pre-before-post spike pairing at
post-before-pre pairing at Δt=-10ms
Sjöstrom et al. 2001
Δt = -10ms
Triplet rule (Pfister
y post
y post 2
second posts. trace
Triplet rule (Pfister et al. 2006)
reproduces experimental data for
STDP dependence on pairing
frequency from (Sjöstrom et al.
reproduces other experimental
data involving protocols with
triplets of spikes (Wang et al.
Functional role of STDP (hypotheses)
Formation of memories
Developmental learning – receptive field development
Stabilization of network activity – prevent blowing up the synaptic
weights (Abbott et al. 2000)
Reward-based learning – neuromodulation with a reward signal
Formation of memories
synaptic changes are sensitive to
input-output correlations (Hebb, 1949)
STDP explains induction of LTP and
LTD but not its maintenance
Stability-plasticity dilemma
formation of new memories should be
possible, but
memories need to be retained and
Long-term synaptic changes happen in
two phases:
induction (tagging), on the order of
seconds, e.g. by pairing protocols
consolidation, takes more than 60 min
Receptive field development
ν in
ν out
Gaussian profile of the
rates at the 100 inputs
the center of the Gaussian
is shifting randomly every
200 sec.
Pfister et al. 2006
The triplet STDP rule is used in the
The neuron becomes selective to one
position of the Gaussian profile.
Reward-Modulated STDP
Reward-modulated STDP
Synaptic changes are dependent on a reward signal
Based on the experimentally found influence of neuromodulators
like dopamine on LTD and LTP (Izhikevich, 2007)
Dopamine enables or enhances synaptic plasticity
Link between
Local synaptic changes on microscopic level (STDP)
ƒ Behaviorally relevant adaptive changes on macroscopic level that
increase the reward signal.
Influence of Dopamine on Plasticity
Activity of dopaminergic neurons
code the reward-prediction error
(Schulz et al. 2002)
The DA signal is thought to carry
the reward in reward-modulated
Model of Reward-modulated STDP
The model is from (Izhikevich, 2007)
Weight changes by STDP are
collected in an eligibility trace
The actual weight changes are
triggered by a reward signal d(t):
Theoretical Analysis of Weight Changes
From (Legenstein,Pecevski,Maass,2008)
We treat presynaptic and postsynaptic spike trains and the reward signal as
stochastic processes.
We derive expected weight changes over some time interval T, taken over
stochastic realizations of the presynaptic and postsynaptic spike trains
stochastic realizations of the reward signal
denoted by the ensemble average 〈.〉E
is the temporal average of a signal f(t)
Theoretical Analysis of Weight Changes
Learning equation for reward-modulated STDP
is the average of the reward after a pre-postsynaptic spike pair, and
describes the correlations of the spike timings between neurons j and i.
Weight changes are driven by co-occurrences between rewards and spike
pairings within the time scale of the eligibility kernel function.
Biofeedback Experiment
Biofeedback Experiment by Fetz and Baker
[Fetz and Baker, 1973]
The spiking activity of a single neuron in monkey motor cortex was
The current firing rate was made visible to the monkey in form of an
illuminated meter.
The monkey received liquid rewards for high firing rates.
The monkey learnt (within tens of minutes) to change the firing rate
Model of the experiment
We consider as model a recurrent neural circuit and we reward the spiking
activity of one neuron k.
A reward pulse of shape
is delivered to all synapses with a delay dr
every time the reinforced neuron produces an action potential:
Theoretical predictions
Linear Poisson neuron model is used in analysis.
Equation for the expected weight change for the reinforced neuron
Weights change according to STDP with a constant learning rate.
Equation for the expected weight change for the other neurons
Weights change according to STDP with a learning rate proportional
to the correlation with the reinforced neuron.
Simulation of the Biofeedback Experiment
We simulated recurrent circuit with 4000 LIF neurons.
Induced spontaneous activity of 4.6 Hz by injection of Ornstein-Uhlenbeck
The circuit had 228954 conductance-based synapses with short-term
Reward-modulated STDP was applied to all 142813 excitatory-to-excitatory
The 80 synapses of the reinforced neuron self-organize to increase the
firing rate of the neuron.
Simulation Results
The firing rate of the reinforced neuron increases from 4 to 11 Hz.
The average firing rate of 20 other neurons remains unchanged.
reinforced neuron
other neurons
Pattern Discrimination with
Reward-Modulated STDP
Model for Theoretical Analysis
Two patterns with one spike per input channel are presented to a neuron
The reward signal is
for pattern P and
Theoretical Predictions
A linear Poisson neuron model is used in the analysis.
We estimate the expected weight change of synapse i for the presentation
of pattern P followed after time T’ by a presentation of pattern N
Result of the analysis
The variance of the membrane potential of the neuron is increased
for the positive pattern, and decreased for the negative pattern.
Simulations of the Model
Experimental setup:
LIF neuron with 100 afferents
Patterns of 500 ms duration
Randomly drawn spike times
for the patterns
without threshold
Vm(t) before learning
Vm(t) after learning
Var[Vm] and num. of spikes
for P increases.
Var[Vm] and num. of spikes
for N decreases.
with threshold
Training a Readout Neuron to Recognize
Isolated Spoken Digits
20 different utterances of digit “one” and “two”
Raw wave forms were transformed by a model of the cochlear hair cells
[Verstreaten et al, 2005].
The output analog signals were encoded as spikes with the BSA algorithm
[Schrauwen and Van Campenhout, 2003].
Cortical microcircuit model of 560 LIF neurons with noise.
Spiking readout (LIF) neuron connected to all exc. neurons in the circuit.
Strong decrease of the number of
spikes for digit “one”
Slight increase of the number of spikes
for digit “two”
Increase of variance of Vm(t) for digit
“two” utterances
Decrease of variance of Vm(t) for digit
“one” utterances
before learning
after learning
