slides

Transcription

slides
Hierarchical models of the
visual cortex
Thomas Serre
Brown University
Department of Cognitive & Linguistic & Psychological Sciences
Brown Institute for Brain Sciences
Center for Vision Research
patient #004
Arslan Singer Madsen Kreiman & Serre (unpublished)
Classifier-based
importance maps
patient #004
ERP signals
Arslan Singer Madsen Kreiman & Serre (unpublished)
Classifier-based
importance maps
patient #004
ERP signals
Arslan Singer Madsen Kreiman & Serre (unpublished)
Rapid presentation
paradigms
• Ss get the gist of a scene from
ultra-rapid image presentations
- No time for eye movements
- No top-down / expectations
• Coarse initial base representation
- Enables rapid object
categorization
- Does not require attention
- Sensitive to background clutter
- Insufficient for object localization
Potter 1971; Biederman 1972; Thorpe et al 1996; Li et al 2002; Evans & Treisman 2005; Serre et al 2007;
see Fabre-Thorpe 2011 for review
Rapid presentation
paradigms
• Ss get the gist of a scene from
ultra-rapid image presentations
- No time for eye movements
- No top-down / expectations
• Coarse initial base representation
- Enables rapid object
categorization
- Does not require attention
- Sensitive to background clutter
- Insufficient for object localization
Potter 1971; Biederman 1972; Thorpe et al 1996; Li et al 2002; Evans & Treisman 2005; Serre et al 2007;
see Fabre-Thorpe 2011 for review
Rapid categorization:
Behavior
Dy
Ry
Image
Interval
Image-Mask
100
Mask
1/f noise
~50 ms SOA
Accuracy (%)
90
80
70
60
50
Animal present
or not ?
Cauchoix Crouzet Fize & Serre (unpublished)
human Ss
Familiar
Novel
Mon: 0.49 Hum: 0.47
Cauchoix Crouzet Fize & Serre (unpublished)
A
C
100
90
1
80
70
60
50
Fam
New
B
100
80
70
60
M1
50
M2
Fam
New
Monkeys animalness
90
0.5
0
Cauchoix Crouzet Fize & Serre (unpublished)
0
0.5
Humans animalness
1
0.25
Corrected Correlation
A
C
100
0.2
90
1
80
0.15
70
60
0.1
50
0.05
B
0
100
Fam
New
Hum/Hum Hum/Dy Hum/Ry
Dy/Ry
80
70
60
M1
50
M2
Fam
New
Monkeys animalness
90
0.5
0
Cauchoix Crouzet Fize & Serre (unpublished)
0
0.5
Humans animalness
1
Setup
Ventral visual stream
Button release
and touch screen
on targets
Cauchoix Crouzet Fize & Serre (unpublished)
Image source: DiCarlo
Setup
Ventral visual stream
Button release
and touch screen
on targets
Cauchoix Crouzet Fize & Serre (unpublished)
Image source: DiCarlo
Decoding
• Robust single-trial
decoding of category
information from fast
ventral stream neural
activity ~ 70–80 ms on
fastest trials
• Neural activity linked to
behavioral responses
(both accuracy and
reaction times)
Cauchoix Crouzet Fize & Serre (unpublished)
Ventral visual stream
Image source: DiCarlo
Riesenhuber & Poggio 1999
Serre Kouh Cadieu Knoblich Kreiman Poggio ’05 ’07
Serre Oliva & Poggio ’07
• System-level feedforward
computational model, large-scale
(100M units), spans several areas of
the visual cortex
• Some similarities with state-of-theart computer vision systems (e.g.,
convolutional and deep belief nets;
see also Fukushima’s neocognitron)
• But constrained by anatomy and
physiology and shown to be
consistent with experimental data
across areas of visual cortex
Feedforward hierarchical
model of object recognition
za
st
01
out
et
ed
rietch
ed
mof
m
ats,
m
epe
er
ny
nd
pby
an
m.
don
es
rch
ng
nd
as
es
ar
al
rt
ed
psychophysics on human subjects.
Area
Type of data
Ref. biol. data Ref. model data
Psych. Rapid animal categorization
Face inversion effect
(1)
(1)
(2)
(2)
(11)
(5)
(12)
(8,9)
(17–19)
(8)
Face processing
(fMRI)
(3)
(3)
• An
initial
attempt to reversePFC
Differential role of IT and PFC in categorization
(4)
(5)
engineer
ventral stream
IT
Tuning and invariance the
properties
(6) of the (5)
Read out for object category
(7)
(8,9)
visual
cortex
Average effect in IT
(10)
(10)
LOC
V4
MAX operation
Tuning for two-bar stimuli
(108
• Large-scale
units), spans
Two-spot interaction
(13)
(8)
Tuning for boundary conformation
(14)
(8,15)
several
areas
of
the
visual
cortex
Tuning for Cartesian and non-Cartesian gratings
(16)
(8)
V1
Simple and complex cells tuning properties
• Some similarities with state-of-theart computer vision systems based
on hierarchies of reusable parts
(Geman, Bienstock, Yuille, Zhu, etc)
as well as convolutional and deep
belief networks (LeCun, Hinton,
Bengio, Ng, etc)
MAX operation in subset of complex cells
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
(20)
(5)
Serre, T., Oliva, A., and Poggio, T. Proc. Natl. Acad. Sci.104, 6424 (Apr. 2007).
Riesenhuber, M. et al. Proc. Biol. Sci. 271, S448 (2004).
Jiang, X. et al. Neuron 50, 159 (2006).
Freedman, D.J., Riesenhuber, M., Poggio, T., and Miller, E.K. Journ. Neurosci. 23, 5235 (2003).
Riesenhuber, M. and Poggio, T. Nature Neuroscience 2, 1019 (1999).
Logothetis, N.K., Pauls, J., and Poggio, T. Curr. Biol. 5, 552 (May 1995).
Hung, C.P., Kreiman, G., Poggio, T., and DiCarlo, J.J. Science 310, 863 (Nov. 2005).
Serre, T. et al. MIT AI Memo 2005-036 / CBCL Memo 259 (2005).
Serre, T. et al. Prog. Brain Res. 165, 33 (2007).
Zoccolan, D., Kouh, M., Poggio, T., and DiCarlo, J.J. Journ. Neurosci. 27, 12292 (2007).
Gawne, T.J. and Martin, J.M. Journ. Neurophysiol. 88, 1128 (2002).
Reynolds, J.H., Chelazzi, L., and Desimone, R. Journ. Neurosci.19, 1736 (Mar. 1999).
Taylor, K., Mandon, S., Freiwald, W.A., and Kreiter, A.K. Cereb. Cortex 15, 1424 (2005).
Pasupathy, A. and Connor, C. Journ. Neurophysiol. 82, 2490 (1999).
Cadieu, C. et al. Journ. Neurophysiol. 98, 1733 (2007).
Gallant, J.L. et al. Journ. Neurophysiol. 76, 2718 (1996).
Schiller, P.H., Finlay, B.L., and Volman, S.F. Journ. Neurophysiol. 39, 1288 (1976).
Hubel, D.H. and Wiesel, T.N. Journ. Physiol. 160, 106 (1962).
De Valois, R.L., Albrecht, D.G., and Thorell, L.G. Vision Res. 22, 545 (1982).
Lampl, I., Ferster, D., Poggio, T., and Riesenhuber, M. Journ. Neurophysiol. 92, 2704 (2004).
Feedforward hierarchical
model of object recognition
es,13 finding that the model of the dorsal stream competed with a state-ofthe-art action-recognition system (that
outperformed many other systems) on
all three data sets.13 A direct extension
of this approach led to a computer sys-
this model produced a large dictionary
of optic-flow patterns that seems consistent with the response properties of
cells in the medial temporal (MT) area
in response to both isolated gratings
and plaids, or two gratings superim-
za
st
01
out
et
ed
rietch
ed
mof
m
ats,
m
epe
er
ny
nd
pby
an
m.
don
es
rch
ng
nd
as
es
ar
al
rt
ed
psychophysics on human subjects.
Area
Type of data
Ref. biol. data Ref. model data
Psych. Rapid animal categorization
Face inversion effect
(1)
(1)
(2)
(2)
(11)
(5)
(12)
(8,9)
(17–19)
(8)
Face processing
(fMRI)
(3)
(3)
• An
initial
attempt to reversePFC
Differential role of IT and PFC in categorization
(4)
(5)
engineer
ventral stream
IT
Tuning and invariance the
properties
(6) of the (5)
Read out for object category
(7)
(8,9)
visual
cortex
Average effect in IT
(10)
(10)
LOC
V4
MAX operation
Tuning for two-bar stimuli
(108
• Large-scale
units), spans
Two-spot interaction
(13)
(8)
Tuning for boundary conformation
(14)
(8,15)
several
areas
of
the
visual
cortex
Tuning for Cartesian and non-Cartesian gratings
(16)
(8)
V1
Simple and complex cells tuning properties
• Some similarities with state-of-theart computer vision systems based
on hierarchies of reusable parts
(Geman, Bienstock, Yuille, Zhu, etc)
as well as convolutional and deep
belief networks (LeCun, Hinton,
Bengio, Ng, etc)
MAX operation in subset of complex cells
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
(20)
(5)
Serre, T., Oliva, A., and Poggio, T. Proc. Natl. Acad. Sci.104, 6424 (Apr. 2007).
Riesenhuber, M. et al. Proc. Biol. Sci. 271, S448 (2004).
Jiang, X. et al. Neuron 50, 159 (2006).
Freedman, D.J., Riesenhuber, M., Poggio, T., and Miller, E.K. Journ. Neurosci. 23, 5235 (2003).
Riesenhuber, M. and Poggio, T. Nature Neuroscience 2, 1019 (1999).
Logothetis, N.K., Pauls, J., and Poggio, T. Curr. Biol. 5, 552 (May 1995).
Hung, C.P., Kreiman, G., Poggio, T., and DiCarlo, J.J. Science 310, 863 (Nov. 2005).
Serre, T. et al. MIT AI Memo 2005-036 / CBCL Memo 259 (2005).
Serre, T. et al. Prog. Brain Res. 165, 33 (2007).
Zoccolan, D., Kouh, M., Poggio, T., and DiCarlo, J.J. Journ. Neurosci. 27, 12292 (2007).
Gawne, T.J. and Martin, J.M. Journ. Neurophysiol. 88, 1128 (2002).
Reynolds, J.H., Chelazzi, L., and Desimone, R. Journ. Neurosci.19, 1736 (Mar. 1999).
Taylor, K., Mandon, S., Freiwald, W.A., and Kreiter, A.K. Cereb. Cortex 15, 1424 (2005).
Pasupathy, A. and Connor, C. Journ. Neurophysiol. 82, 2490 (1999).
Cadieu, C. et al. Journ. Neurophysiol. 98, 1733 (2007).
Gallant, J.L. et al. Journ. Neurophysiol. 76, 2718 (1996).
Schiller, P.H., Finlay, B.L., and Volman, S.F. Journ. Neurophysiol. 39, 1288 (1976).
Hubel, D.H. and Wiesel, T.N. Journ. Physiol. 160, 106 (1962).
De Valois, R.L., Albrecht, D.G., and Thorell, L.G. Vision Res. 22, 545 (1982).
Lampl, I., Ferster, D., Poggio, T., and Riesenhuber, M. Journ. Neurophysiol. 92, 2704 (2004).
Feedforward hierarchical
model of object recognition
es,13 finding that the model of the dorsal stream competed with a state-ofthe-art action-recognition system (that
outperformed many other systems) on
all three data sets.13 A direct extension
of this approach led to a computer sys-
this model produced a large dictionary
of optic-flow patterns that seems consistent with the response properties of
cells in the medial temporal (MT) area
in response to both isolated gratings
and plaids, or two gratings superim-
Beyond spatial orientation
and spatial frequency
1346
D.Y. Ts’o et al. / Vision Research 41 (2001) 1333–1349
functional terms, an additional level within the V2
organizational hierarchy, and at a finer grain than the
view of V2 as a collection of CO stripes.
The subcompartments for color and luminance seen
in color stripes seen in optical imaging undoubtedly
intermesh with the also observed representation of
Issa et al 2000
T’so et al 2001
Fig. 10. Subcompartments for color, orientation and disparity within stripes of V2. The three optical images were obtained from different animals.
(A) Color-preferring and luminance-preferring subcompartments within a single thin stripe. (B) Pseudo-color coded image of orientation selectivity
in V2, showing domains of orientation (blue arrows indicate zones containing pale and thick stripes, large patches of saturated colors), separated
by regions of little apparent organization for orientation (thin stripes, lacking patches of saturated color). Color code: blue =horizontal, red =45°,
yellow=vertical, green =135°. (C) Patches of tuned excitatory disparity cells (white patches, left blue arrow) within thick stripes. Also patches
of color cells (the dark patches, right blue arrow) can be seen within thin stripes of V2. Note the similarity of the geometry of the
subcompartments, 0.7 –1.5 mm in size, regardless of functional type, whereas subcompartments for color (blobs) or (iso)orientation in V1 are
smaller than those in V2, at !0.2 mm in size.
Shmuel & Grinvald 1996
Color processing
Conway ’01
Color processing
Conway ’01
Color processing
ECCV-12 submission ID 1052
Conway ’01
Spatio-chromatic opponent operator
( , , s)
DO
Color channels
SO
Half-squaring
Half-wave
R
/
G
B
R
Divisive
normalization
G
R G
c2 ,
Single vs. double-opponent
R
G
G
R
R
C
C
R
Y
ECCV-12 submission ID 1052
B
B
Y
Wh
Bl
1
SO
0
DO
R G
G R
Y
B
Wh Bl
Fig. 3. Schematic description of spatio-chromatic opponent representation. The orang
A
90
Comparison with glob cells in
V4/PIT
180
90
90
yellow
90
90
90
Glob cells (Conway &
Tsao ’09)
Model
0
180
0
0
180
270
270
270
90
90
90
green
red
180
red
0 180
0
180
180
0
0
180
0
180
0
cyan
blue
270
270
270
90
90
90
270
270
270
B
90
90
90
0.2
180
0
180
270
0
180
270
180
0
270
0
180
180
0
270
270
270
90
90
90
A
90
180
90
90
0
180
0
0
180
180
270
270
0
90
0
270
270
90
0 180
180
90
270
270
C
Zhang & Serre in prep
Color processing
Munsell data
Model
SO: R2=0.9952
Munsell
2
CIELAB: R =
1
1
1
0.5
0.5
0.5
0
0
0
0
0.5
1
0
0.5
1
0.5
Zhang & Serre in prep
Color processing
ECCV
• SO/DO approach improves on
all recognition and
segmentation datasets tested
ECCV
10
ECCV-12 submission ID 1052
as compared to existing#1052
color
Table representations
2. Recognition performance on soccer team
#1052
360
and 17-category flower dataset.
361
The data in each feature type are percentage of classification accuracy (Data inside
the parentheses are the initial performance reported by [10, 31] using the same features
362
in a bag-of-words scheme.)
363
• ECCV
Color datasets
ECCV
#1052
405
406
ECCV-12
407
A. Gradient used in SIFT
408
ECCV
409
Soccer team360
Flower
364
#1052 and their compo
Fig.
4.
Filters
410
Color
Shape Both
Color
Shape Both
361
Hue/sift
69 (67) 43 (43) 73 (73) 58 (40) 65 (65) 77 (79) 365
411
(A) Gradient in the y direction
submission
1052 412 11
Opp/sift
69 (65) 43 (43) 362
74 (72) 57ECCV-12
(39) 65 (65)
74 (79) ID
366
ECCV-12
submission
ID Gaussian
1052
in Hmax
[15].
(C)
deC9
A.
Gradient
used
in
SIFT
B.
Gabor
filters
used
in
HMAX
SOsift/DOsift
82
66
83
68
69
79
413
363
367
are:
original filter and the
Table
3.
Recognition
performance
on
Pascal
voc
2007
dataset.
Performance
corre-the 450
SOHmax/DOHmax 87
76
89
77
73
83
414
sponds360
to the mean average precision (AP)
paren- the input color channel
364 over all 20 classes. Performance
368 (in process
#1052
Method
415
451used in the spatio-ch
Fig.
4.
Filters
and
their
components
thesis) corresponds to the best performance
reported
in
[37,
6]
orientations,
scales and phases
365
416
452in sift computation
361
(A) Gradient in the369
y directions
used
• Pascal challenge
417
366
370It
453
362
approaches
that do not rely on any prior knowledge
object
categories.
inabout
Hmax
[15].
(C) Gaussian
derivatives
used in segmenta
418
A. Gradient used in SIFT
B. Gabor filters used in HMAX
C. Gaussian
454derivatives used in segmentation
Method
sift
Huesift
Opponentsift
Csift
SODOsift
SODOHmax
was shown,
however,
that
the
performance
of
various
color
descriptors
could
be
367
371
are:
the
original
filter
and
the
individual
center and sur
363
419
On these
two datasets,
455
further
improved on this dataset43(up
to 96% performance)
when used in 46.8
con-(30.1/36.4)
AP 364 40 (38.4) 41
(42.5) 368 43 (44.0)
46.5 (33.3/39.8)
372 channels.
process
the input color
Note
that additional filt
420
junction with semantic
featuresand
(i.e.,their
Colorcomponents
Names) andused
bottom-up
456 opponent
highly
diagnostic
ofHmax
object
ca
Fig.color
4. Filters
in the
spatio-chromatic
operator
421
orientations,
scales
and
phases
are
also
used
in
and
369
373
and top-down
attentional mechanisms [32].
such an approach would
365
457
(A) Gradient in the y Whether
directions
used in sift computation
[14].
(B)
Gabor filters
used
422
than
their
grayscale
counterp
similarly boost
performance
of the SO
and
be further
370 DO descriptors
374
Table
4. Recognition
performance
on
scene should
categorization
366 the
458
423
in
Hmax
[15].
(C)
Gaussian
derivatives
used
in
segmentation
[19].
From
to’12
righ
Zhang
Barhomi
&left
Serre
and Hmax)
descriptors
s
studied.
Disparity processing
Extends the energy model of stereo disparity (Ohzawa et al ’90, Qian ’94, Fleet et al ’96)
Disparity processing
Riesen & Serre (unpublished data)
See Sasaki et al ‘10 for qualitatively similar results
Disparity processing
Riesen & Serre (unpublished data)
See Sasaki et al ‘10 for qualitatively similar results
...
Motion processing
...
G. DeAngelis, I. Ohzawa and R. Freeman --- Receptive-field dynamics
Fig. 3. Spatiotemporal RF profiles (X-T
LGN
plots) for neurons recorded from the
Nonlagged
Lagged
A
B
V4/ITof the cat. In
LGN and striate cortex
200
250
MT/MST
each panel, the horizontal axis represents
space (X) and the vertical axis represents
time (T). For panels A-F, solid contours de100
125
limit bright-excitatory regions, whereas
dashed contours indicate dark-excitatory
regions. To construct these X-T plots, 1-D
RF profiles (see Fig. 2) are obtained, at
0
0
0
3
0
3
finely spaced time intervals
(5-10ms),
over
G. DeAngelis, I. Ohzawa and R.V1/V2
Freeman --- Receptive-field dynamics
V1/MT
a range of values of T. These 1-D profiles
are then "stacked up" to form a surface,
SIMPLE, Separable
Fig. 3. Spatiotemporal
RF profiles
(X-Tas a contour
LGN
which is smoothed
and plotted
C
D
plots) for neurons recorded from8,34
the
Nonlagged
Lagged
A
B
map
(for
details,
see
Refs.
).
(A)
An
X250
400
LGN and striate cortex of the cat. In
250
T profile
is shown
for a typical 200
ONeach panel,
the horizontal
axishere
represents
from the LGN.
space (X)center,
and thenon-lagged
vertical axis X-cell
represents
Forpanels
T<50 ms,
RFcontours
has a bright-excitatory
time (T). For
A-F,the
solid
de125
200
100
125
center
and
a
dark-excitatory
limit bright-excitatory regions, V1
whereas surround.
V1
However,indicate
for T>50
ms, the RF center bedashed contours
dark-excitatory
comes
dark-excitatory
and1-D
the surround
regions. To
construct
these X-T plots,
0
0
RF profiles
(see
Fig.
2)
are
obtained,
becomes bright-excitatory. atSimilar spa0
0
6
0 3
0
6
0
0
3
finely spaced
time intervals
(5-10ms),
over
tiotemporal
profiles
are presented
elsea range of values
of T. These 1-D profiles
where9,36
. (B) An X-T plot is shown for an
are then "stacked up" to form a surface,
SIMPLE,
Separable
SIMPLE, Inseparable
ON-center, lagged X-cell. Note that
the
which is smoothed and plotted as a contour
D
second temporal
phase of the profile isC
E
F
8,34). (A) An Xmap (for details,
see
Refs.
250
400
strongest. (C) An X-T
profile
for a simple
200
300
Figure
13.1
T profile is shown here for a typical ONcell with a space-time
separable
For processing of dynamic face stimuli. Form and motion features are extracted in two
Neural
modelRF.
for the
center, non-lagged X-cell from
the LGN.
T<100 ms, the RF separate
has a dark-excitatory
pathways. The addition of asymmetric recurrent connections at the top levels makes the units se
For T<50 ms, the RF has a bright-excitatory
125order. The highest level200
subregion
to
the
left
of
a
bright-excitatory
for temporal
consists of neurons that fuse form and motion information.
center and a dark-excitatorylective
surround.
100
150
subregion.
For
T>100
ms,
each
subregion
However, for T>50 ms, the RF center bereverses polarity,
the bright-excitacomes dark-excitatory
and so
thethat
surround
region is now on
the left.
0
0
becomes tory
bright-excitatory.
Similar
spa- Similar X-T
6
0
0
6
0
0
tiotemporal
profiles
are presented
else-8,30,34. (D)
data
are presented
elsewhere
0
6
0
4
cell
where9,36Data
. (B)for
Ananother
X-T plot simple
is shown
forwith
an an approxSIMPLE,
Inseparable
imately
X-T profile.
are
ON-center,
laggedseparable
X-cell. Note
that the(E) Data
COMPLEX
second temporal
phase
of the
is
shown for
a simple
cellprofile
with a clearly
insep-E
F
strongest.arable
(C) An
X-T
profile Note
for a how
simple
200
300
X-T
profile.
the spatial
arG
cell with arangement
space-time of
separable
For
200
bright- RF.
and dark-excitatory
Dark
Bright
Dorsal “motion” pathway
Time, t (ms)
STS
Time, t (ms)
Ventral “shape” pathway
t
t
x
Separable
space-time RFs
x
Non-separable
space-time RFs
the
rent
sists
/C2
/C1
dicative
ainugh
tion
were
first
the
tion
the
cogand
isoclass
Automated rodent behavioral
analysis
NATURE COMMUNICATIONS | DOI: 10.1038/ncomms1064
Table 1 | Accuracy of the system.
‘Set B’ (1.6 h
of video)
‘Full database’
(over 10 h of video)
Our system
CleverSys
commercial
system
Human
(‘Annotator
group 2’)
77.3%/76.4%
60.9%/64.0%
71.6%/75.7%
78.3%/77.1%
61.0%/65.8%
Image source: Shmuel & Grinvald ‘96
Accuracies are reported as averaged across frames/across behaviours (underlined numbers,
computed as the average of the diagonal entities in Figure 3 confusion matrix; chance level is
12.5% for an eight-class classification problem).
Assessing the accuracy of the system is a critical task. Therefore, we made two comparisons: (I) between the resulting system
and commercial software (HomeCageScan 2.0, CleverSys Inc.) for
mouse home-cage behaviour classification and (II) between the system and human annotators. The level of agreement between human
annotators sets a benchmark for the system performance, as the
system relies entirely on human annotations to learn to recognize
behaviours. To evaluate the agreement between two sets of labellers,
Jhuang Serre et al ‘07 ’10; Kuehne Jhuang et al ‘11
the
rent
sists
/C2
/C1
dicative
ainugh
tion
were
first
the
tion
the
cogand
isoclass
Automated rodent behavioral
analysis
NATURE COMMUNICATIONS | DOI: 10.1038/ncomms1064
Table 1 | Accuracy of the system.
‘Set B’ (1.6 h
of video)
‘Full database’
(over 10 h of video)
Our system
CleverSys
commercial
system
Human
(‘Annotator
group 2’)
77.3%/76.4%
60.9%/64.0%
71.6%/75.7%
78.3%/77.1%
61.0%/65.8%
Image source: Shmuel & Grinvald ‘96
Accuracies are reported as averaged across frames/across behaviours (underlined numbers,
computed as the average of the diagonal entities in Figure 3 confusion matrix; chance level is
12.5% for an eight-class classification problem).
Assessing the accuracy of the system is a critical task. Therefore, we made two comparisons: (I) between the resulting system
and commercial software (HomeCageScan 2.0, CleverSys Inc.) for
mouse home-cage behaviour classification and (II) between the system and human annotators. The level of agreement between human
annotators sets a benchmark for the system performance, as the
system relies entirely on human annotations to learn to recognize
behaviours. To evaluate the agreement between two sets of labellers,
Jhuang Serre et al ‘07 ’10; Kuehne Jhuang et al ‘11
Automated rodent behavioral
analysis
Automated rodent behavioral
analysis
Automated rodent behavioral
analysis
Automated rodent behavioral
analysis
Visual control of
navigation
Visual control of
navigation
Visual control of
navigation
Visual control of
navigation
Humans
Model
What matters:
• Multi-stage / pooling mechanisms
• Normalization circuits:
- Tuning for 2D shape
- Color similarity ratings
- Tuning for relative disparity
- Perceived motion tuning in MT
- Classification accuracy (see also Jarrett et al
2009; Pinto et al 2009)
What does not matter:
• Separate classes of simple and
complex cells (Pinto et al 2009; O’Reilly
et al 2013)
• Max in HMAX
• Learning mechanisms?
What have we learned
about visual processing?
Acknowledgments
• Past work at CBCL: C. Cadieu, H. Jhuang, M. Kouh, U.
Knoblich, G. Kreiman, E. Meyers, A. Oliva, T. Poggio, M.
Riesenhuber
• Lab members / Brown collaborators: A. Arslan, Y.
Barhomi, K. Bath, S. Crouzet (now Charité – Universitäts
medizin Berlin) J. Kim, X. Li, M. McGill (now CalTech), D.
Mely, S. Parker, D. Reichert, I. Sofer, W. Warren, J.
Zhang (Hefei University of Technology), S. Zhang
• CNRS (France): E.J. Barbeau, G. Barragan-Jason, M.
Cauchoix, D. Fize