Spike-Timing-Dependent Plasticity and its Reward
Transcription
Spike-Timing-Dependent Plasticity and its Reward
Spike-Timing-Dependent Plasticity and its Reward-Modulated Variant Dejan Pecevski, dejan@igi.tugraz.at Institute for Theoretical Computer Science Graz University of Technology Neural Networks B, April, 2009 Agenda • Spike-Timing-Dependent Plasticity Experimental evidence Model of STDP Analysis of STDP: learning equation Extensions of the model to fit additional experimental data Functional role of STDP (hypotheses) • Reward-modulated STDP model of RM-STDP and the learning equation Modeling a biofeedback experiment Discrimination of temporal spike patterns with RM-STDP Dejan Pecevski, Neural Networks B, April, 2009 2 of 38 2 Hebbian learning • synaptic changes are thought to be the neurochemical basis for learning and memory Hebb’s postulate “When an axon of cell A repeatedly or persistently takes part in firing cell B, then A’s efficiency as one of the cells firing B is increased” Hebb, 1949 • • • local rule driven by the correlations of the firing between cells cells that are correlated form cell assemblies Dejan Pecevski, Neural Networks B, April, 2009 3 of 38 3 Spike-timing-dependent plasticity Experiment • • • • Two connected neurons are stimulated to fire at specific times tpre (presynaptic nrn.) and tpost (postsynaptic nrn.). Such paired stimulations at tpre and tpost are performed repeatedly, always with same interval between the corresponding spike pairs Δt= tpost-tpre.. After the repeated stimulation the change of the EPSP amplitude is observed. The same procedure is repeated for different values of Δt, to examine how the change of EPSP depends on Δt, and the sign of Δt. Dejan Pecevski, Neural Networks B, April, 2009 4 of 38 4 Spike-timing-dependent plasticity Results • If the pre-synaptic neuron fires in a time window on the order of 10s of ms before the post-synaptic neuron, then the EPSP increases => the synapse is strengthened (LTP) • If the pre-synaptic neuron fires in a time window on the order of 10s of ms after the post-synaptic neuron, then the EPSP decreases => the synapse is weakened (LTD) Dejan Pecevski, Neural Networks B, April, 2009 5 of 38 5 Phenomenological model of STDP • We denote as Si(t) the spike train of neuron i: • The STDP model is defined through a learning window function W( Δt): • Each presynaptic and postsynaptic spike pair instantaneous weight change that happens at the time of the later spike Dejan Pecevski, Neural Networks B, April, 2009 contributes an . 6 of 38 6 Model of STDP • The weight change can be expressed as • Substituting for and we can calculate the total weight change in a time interval [0,T] where we sum over all spikes of the presynaptic and postsynaptic neuron that are within the interval [0,T] i.e. Dejan Pecevski, Neural Networks B, April, 2009 7 of 38 7 STDP model by local variables pre j t x jpre (f) j wij yipost i post ti( g ) update with every presynaptic spike update with every postsynaptic spike At a postsynaptic spike the weight change is proportional to the and at presynaptic spike proportional to the trace. Dejan Pecevski, Neural Networks B, April, 2009 trace, 8 of 38 8 A biophysical model of STDP • • • Arrival of an action potential at the presynaptic terminal induces release of the neurotransmitter glutamate into the synaptic cleft. Glutamate binds with AMPA and NMDA receptors. AMPA receptors open but NMDA are blocked by Mg. Dejan Pecevski, Neural Networks B, April, 2009 9 of 38 9 A biophysical model of STDP • • • Depolarization of the postsynaptic membrane unblocks Mg allowing influx of Ca2+. Depolarization is caused by back-propagating action potential that travels up the dendrites. Change of Ca2+ concentration in the postsynaptic cell triggers processes that change the synaptic efficacy. Dejan Pecevski, Neural Networks B, April, 2009 10 of 38 10 Different STDP windows for different types of cells Dejan Pecevski, Neural Networks B, April, 2009 11 of 38 11 Analysis of STDP We assume that the presynaptic and postsynaptic spike trains are stochastic processes drawn from a stochastic ensemble E We compute the expected weight change over some time interval [0,T] where is the temporal average. Instantaneous firing rate of neuron i: correlations of firing of neuron i and j: Dejan Pecevski, Neural Networks B, April, 2009 12 of 38 12 Analysis of STDP • The weights changes depend on the correlations between the inputs and the output of a neuron • The output of the neuron depends on the input. • For simple neuron models (e.g. linear Poisson neuron model), it is possible to derive analytically the weight changes given the statistics of the inputs. • If the input is weakly correlated with the output, the weight decreases. • Groups of inputs that are strongly correlated (and drive the neuron) are strengthened. • STDP selects inputs which are correlated on the timescale of the learning window and the postsynaptic potential. Dejan Pecevski, Neural Networks B, April, 2009 13 of 38 13 Weight dependence of STDP • • From (Bi and Poo,1998) • The amount of change of the EPSC amplitude depends on the initial EPSC amplitude • The dependency is different for positive and negative spike pairing Model from Gütig et al. 2001 - smooth transition between an additive and weight dependent STDP rule Dejan Pecevski, Neural Networks B, April, 2009 14 of 38 14 Dependence on pairing frequency • 15 bursts of 5 spikes at different frequencies are induced in the neurons with ±10ms Δt = +10ms Dejan Pecevski, Neural Networks B, April, 2009 pre-before-post spike pairing at Δt=10ms post-before-pre pairing at Δt=-10ms Sjöstrom et al. 2001 Δt = -10ms 15 of 38 15 Triplet rule (Pfister et al. 2006) pre pre j tj wij i post y post y post 2 tipost second posts. trace Dejan Pecevski, Neural Networks B, April, 2009 16 of 38 16 Triplet rule (Pfister et al. 2006) Dejan Pecevski, Neural Networks B, April, 2009 • reproduces experimental data for STDP dependence on pairing frequency from (Sjöstrom et al. 2001) • reproduces other experimental data involving protocols with triplets of spikes (Wang et al. 2005) 17 of 38 17 Functional role of STDP (hypotheses) • Formation of memories • Developmental learning – receptive field development • Stabilization of network activity – prevent blowing up the synaptic weights (Abbott et al. 2000) • Reward-based learning – neuromodulation with a reward signal (dopamine) Dejan Pecevski, Neural Networks B, April, 2009 18 of 38 18 Formation of memories • synaptic changes are sensitive to input-output correlations (Hebb, 1949) • STDP explains induction of LTP and LTD but not its maintenance • Stability-plasticity dilemma • formation of new memories should be possible, but memories need to be retained and stable Long-term synaptic changes happen in two phases: induction (tagging), on the order of seconds, e.g. by pairing protocols consolidation, takes more than 60 min Dejan Pecevski, Neural Networks B, April, 2009 19 of 38 19 Receptive field development ν in ν out • Gaussian profile of the rates at the 100 inputs • the center of the Gaussian is shifting randomly every 200 sec. Pfister et al. 2006 • The triplet STDP rule is used in the synapses. • The neuron becomes selective to one position of the Gaussian profile. Dejan Pecevski, Neural Networks B, April, 2009 20 of 38 20 Reward-Modulated STDP Dejan Pecevski, Neural Networks B, April, 2009 21 of 38 21 Reward-modulated STDP • Synaptic changes are dependent on a reward signal • Based on the experimentally found influence of neuromodulators like dopamine on LTD and LTP (Izhikevich, 2007) • Dopamine enables or enhances synaptic plasticity Link between Local synaptic changes on microscopic level (STDP) Behaviorally relevant adaptive changes on macroscopic level that increase the reward signal. Dejan Pecevski, Neural Networks B, April, 2009 22 of 38 22 Influence of Dopamine on Plasticity • Activity of dopaminergic neurons code the reward-prediction error (Schulz et al. 2002) • The DA signal is thought to carry the reward in reward-modulated STDP Dejan Pecevski, Neural Networks B, April, 2009 23 of 38 23 Model of Reward-modulated STDP • The model is from (Izhikevich, 2007) • Weight changes by STDP are collected in an eligibility trace • The actual weight changes are triggered by a reward signal d(t): Dejan Pecevski, Neural Networks B, April, 2009 24 of 38 24 Theoretical Analysis of Weight Changes • From (Legenstein,Pecevski,Maass,2008) • We treat presynaptic and postsynaptic spike trains and the reward signal as stochastic processes. • We derive expected weight changes over some time interval T, taken over stochastic realizations of the presynaptic and postsynaptic spike trains stochastic realizations of the reward signal denoted by the ensemble average 〈.〉E is the temporal average of a signal f(t) Dejan Pecevski, Neural Networks B, April, 2009 25 of 38 25 Theoretical Analysis of Weight Changes • Learning equation for reward-modulated STDP where is the average of the reward after a pre-postsynaptic spike pair, and describes the correlations of the spike timings between neurons j and i. • Weight changes are driven by co-occurrences between rewards and spike pairings within the time scale of the eligibility kernel function. Dejan Pecevski, Neural Networks B, April, 2009 26 of 38 26 Biofeedback Experiment Dejan Pecevski, Neural Networks B, April, 2009 27 of 38 27 Biofeedback Experiment by Fetz and Baker [Fetz and Baker, 1973] • • • • The spiking activity of a single neuron in monkey motor cortex was recorded. The current firing rate was made visible to the monkey in form of an illuminated meter. The monkey received liquid rewards for high firing rates. The monkey learnt (within tens of minutes) to change the firing rate accordingly. Dejan Pecevski, Neural Networks B, April, 2009 28 of 38 28 Model of the experiment • We consider as model a recurrent neural circuit and we reward the spiking activity of one neuron k. • A reward pulse of shape is delivered to all synapses with a delay dr every time the reinforced neuron produces an action potential: Dejan Pecevski, Neural Networks B, April, 2009 29 of 38 29 Theoretical predictions • Linear Poisson neuron model is used in analysis. • Equation for the expected weight change for the reinforced neuron Weights change according to STDP with a constant learning rate. • Equation for the expected weight change for the other neurons Weights change according to STDP with a learning rate proportional to the correlation with the reinforced neuron. Dejan Pecevski, Neural Networks B, April, 2009 30 of 38 30 Simulation of the Biofeedback Experiment • We simulated recurrent circuit with 4000 LIF neurons. • Induced spontaneous activity of 4.6 Hz by injection of Ornstein-Uhlenbeck Noise. • The circuit had 228954 conductance-based synapses with short-term dynamics. • Reward-modulated STDP was applied to all 142813 excitatory-to-excitatory synapses. • The 80 synapses of the reinforced neuron self-organize to increase the firing rate of the neuron. Dejan Pecevski, Neural Networks B, April, 2009 31 of 38 31 Simulation Results The firing rate of the reinforced neuron increases from 4 to 11 Hz. The average firing rate of 20 other neurons remains unchanged. reinforced neuron other neurons Dejan Pecevski, Neural Networks B, April, 2009 32 of 38 32 Pattern Discrimination with Reward-Modulated STDP Dejan Pecevski, Neural Networks B, April, 2009 33 of 38 33 Model for Theoretical Analysis • Two patterns with one spike per input channel are presented to a neuron • The reward signal is where for pattern P and Dejan Pecevski, Neural Networks B, April, 2009 for pattern N, 34 of 38 34 Theoretical Predictions • A linear Poisson neuron model is used in the analysis. • We estimate the expected weight change of synapse i for the presentation of pattern P followed after time T’ by a presentation of pattern N Result of the analysis The variance of the membrane potential of the neuron is increased for the positive pattern, and decreased for the negative pattern. Dejan Pecevski, Neural Networks B, April, 2009 35 of 38 35 Simulations of the Model Experimental setup: • • • LIF neuron with 100 afferents Patterns of 500 ms duration Randomly drawn spike times for the patterns without threshold Vm(t) before learning Vm(t) after learning Results: • • Var[Vm] and num. of spikes for P increases. Var[Vm] and num. of spikes for N decreases. with threshold Dejan Pecevski, Neural Networks B, April, 2009 36 of 38 36 Training a Readout Neuron to Recognize Isolated Spoken Digits “one” BSA “two” • • • • • 20 different utterances of digit “one” and “two” Raw wave forms were transformed by a model of the cochlear hair cells [Verstreaten et al, 2005]. The output analog signals were encoded as spikes with the BSA algorithm [Schrauwen and Van Campenhout, 2003]. Cortical microcircuit model of 560 LIF neurons with noise. Spiking readout (LIF) neuron connected to all exc. neurons in the circuit. Dejan Pecevski, Neural Networks B, April, 2009 37 of 38 37 Results • • Strong decrease of the number of spikes for digit “one” Slight increase of the number of spikes for digit “two” • Increase of variance of Vm(t) for digit “two” utterances • Decrease of variance of Vm(t) for digit “one” utterances before learning after learning Dejan Pecevski, Neural Networks B, April, 2009 38 of 38 38
Similar documents
PDF file - Izhikevich
As compared to real cortices, the model is obviously greatly reduced in the number of its neurons and synapses as well as in its anatomical complexity (see Fig. 1). Nevertheless, we made efforts to...
More information