DEMO SESSION - CNNA 2012

Transcription

DEMO SESSION - CNNA 2012
2
WELCOME MESSAGE
On behalf of the 2012 CNNA Organising Committee, it is our great pleasure to welcome
you in Torino, to the International Workshop on Cellular Nanoscale Networks and their
Applications (August 29th-31st, 2012). CNNA 2012 is the 13th event in the series of IEEE
CNNA biannual international workshops started in Budapest in 1990.
In addition, the 3rd Memristor and Memristive Symposium, will be held on August 28th-29th,
before the beginning of CNNA 2012. This year, we are delighted to host these two
conferences for the first time in the Politecnico di Torino.
The world of computational devices and architectures has witnessed dramatic changes in
the last few years for the emergence of many-core processors and memristive systems.
Their long-term significance lies in their enabling potentials for designing nano CNNs, and
intelligent machines, with learning and adaptive capabilities. Even more fundamental is
their nonlinear dynamics that underpins the biological basis of life itself.
CNNA 2012 covers a wide range of topics and technical challenges, in view of the growing
interest in mega-processor nanoscale computing. The 3rd Memristor and Memristive
Symposium will be a multidisciplinary forum for researches to grasp the latest advances in
the field of memristor and memristive circuits and their latest breakthrough applications.
The total number of submissions was 83. The technical programme includes a rich
presentation of the latest technology breakthrough in CNNs and Memristors and is
organised into 4 regular, 10 special, and 1 demo sessions. Four special sessions are
devoted to memristor theory, devices and architectures. The programme comprises
fourteen plenary lectures, given by distinguished invited speakers, with a strong industry
involvement – including IBM, Intel, FIAT, STMicroelectronics – and startup companies,
mainly focused in technologies beyond CMOS. In particular the 3rd Memristor and
Memristive Symposium and the CNNA 2012 Workshop will be opened by keynote lectures
of Leon O. Chua, Tamas Roska and Daniel Hammerstrom.
We would like to thank all members of the organising and scientific committee for their
constant support and valuable work and all institutions and companies that sponsor both
conferences for their generous support, in particular the “Cassa di Risparmio di
Torino” (CRT) Foundation, the Chamber of Commerce of Torino and the “Compagnia di
San Paolo” Foundation.
We also hope that, in addition to appreciate the technical programme, you will enjoy your
stay in Torino and find the time to visit the campus of the Politecnico di Torino and our
beautiful city. Torino was the first Capital of Italy and in occasion of the 150th anniversary of
the unification of Italy, that we celebrated last year, many historical buildings were
completely restored. Among them, the Valentino Castle, an historical residence of the
Royal family, donated to the Politecnico di Torino, where we will have the welcome
reception.
We hope your stay here will be both rewarding and memorable.
Marco Gilli
General Chair
Fernando Corinto
Technical Program Chair
4
CONFERENCE VENUE
POLITECNICO DI TORINO
CORSO DUCA DEGLI ABRUZZI, 24
Room A is Aula Magna
Room B is Sala Consiglio di Facoltà
ORGANIZING COMMITTEE
HONORARY CHAIRS
!
PIER PAOLO CIVALLERI, POLITECNICO DI TORINO, ITALY
!
LEON O. CHUA, U. OF CALIF., BERKELEY, U.S.A.
GENERAL CHAIRS
!
MARCO GILLI, POLITECNICO DI TORINO, ITALY
!
TAMÁS ROSKA, MTA-SZTAKI / PPCU, BUDAPEST, HUNGARY
!
CHAI WAH WU, IBM T. J. WATSON R. C., NY, U. S. A.
PROGRAM CHAIR
!
FERNANDO CORINTO, POLITECNICO DI TORINO, ITALY
PROGRAM CO-CHAIRS
!
GIOVANNI E. PAZIENZA, U. OF MEMPHIS / PPCU, BUDAPEST, HUNGARY
!
ANGELA SLAVOVA, BULG. A. SCIENCES, SOFIA, BULGARIA
!
ÁKOS ZARÁNDY, MTA-SZTAKI, BUDAPEST, HUNGARY
SPECIAL SESSION CHAIRS
!
MARIO BIEY, POLITECNICO DI TORINO, ITALY
!
VALERI MLADENOV, T. U. OF SOFIA, SOFIA, BULGARIA
!
PÉTER SZOLGAY, MTA-SZTAKI / PPCU, BUDAPEST, HUNGARY
!
RONALD TETZLAFF, TUD, DRESDEN, GERMANY
FINANCIAL CHAIR
!
PAOLA MIRAGLIO, POLITECNICO DI TORINO, ITALY
PUBLICATION CHAIR
!
GIOVANNI E. PAZIENZA, U. OF MEMPHIS / PPCU, BUDAPEST, HUNGARY
PUBLICITY CHAIR
!
BERTRAM SHI, HKUST, KOWLOON, HONG KONG
6
EXHIBIT AND DEMO SESSION CHAIRS
!
GYÖRGY CSEREY, PPCU, BUDAPEST, HUNGARY
!
!
PIOTR DUDEK, U. OF MANCHESTER, U.K.
RICARDO CARMONA GALÁN, CNM-CSIC, SEVILLA, SPAIN
INDUSTRY LIASON CHAIR
!
CSABA REKECZKY, EUTECUS INC., BERKELEY, U.S.A.
ASIA-PACIFIC LIASON CHAIR
!
CHIN-TENG LIN, N. CHIAO TUNG U., HSINCHU, TAIWAN
!
HYONGSUK KIM, CHONBUK NATIONAL U., KOREA
SECRETARY, LOGISTICS, AND WEB
!
MICHELE BONNIN, POLITECNICO DI TORINO, ITALY
!
ANDRAS HORVATH, PPCU, BUDAPEST, HUNGARY
!
MARCO BERTINO, POLITECNICO DI TORINO, ITALY
SCIENTIFIC COMMITTEE
PAOLO ARENA (UNIVERSITY OF CATANIA, ITALY)
GUANRONG CHEN (CITY UNIVERSITY OF HONG KONG, HK)
LEON O. CHUA (UC BERKELEY, USA)
FERNANDO CORINTO (POLITECNICO OF TURIN, ITALY)
PIOTR DUDEK (UNIVERSITY OF MANCHESTER, UK)
WAI-CHI FANG (NATIONAL CHIAO TUNG UNIVERSITY, TAIWAN)
LUIGI FORTUNA (UNIVERSITY OF CATANIA, ITALY)
MARCO GILLI (POLITECNICO OF TURIN, ITALY)
EDUARDO GOMEZ-RAMIREZ (UNIVERSIDAD LA SALLE, MEXICO OF, MEXICO)
STEVE KANG (UC SANTA CRUZ, USA)
PEDRO JULIAN (UNIVERSIDAD NACIONAL DEL SUR, BAHIA BLANCA, ARGENTINA)
CHIN-TENG LIN (NAT. CHIAO TUNG UNIVERSITY, HSINCHU, TAIWAN)
JOSEF A. NOSSEK (TECHNICAL UNIVERSITY OF MUNICH)
MACIEJ OGORZALEK (AGH UNIV. OF SCIENCE AND TECH. OF KRAKOW, POLAND)
ARI PAASIO (UNIVERSITY OF TURKU, FINLAND)
GIOVANNI E. PAZIENZA (MTA-SZTAKI, BUDAPEST, HUNGARY)
WOLFGANG POROD (UNIVERSITY OF NOTRE DAME, USA)
CSABA REKECZKY (EUTECUS INC., BERKELEY, USA)
ÁNGEL RODRÍGUEZ-VÁZQUEZ (UNIVERSITY OF SEVILLE, SPAIN)
TAMÁS ROSKA (MTA-SZTAKI & PAZMANY UNIVERSITY, BUDAPEST, HUNGARY)
BING J. SHEU (UNIVERSITY OF SOUTHERN CALIFORNIA, LOS ANGELES, USA)
BERTRAM SHI (HONG KONG UNIVERSITY SCI. & TECH., HK)
PÉTER SZOLGAY (MTA-SZTAKI, BUDAPEST, HUNGARY)
MAMORU TANAKA (SOPHIA UNIVERSITY, TOKYO, JAPAN)
VEDAT TAVSANOGLU (YILDIZ TECHNICAL UNIVERSITY, ISTANBUL, TURKEY)
RONALD TETZLAFF (TECHNICAL UNIVERSITY DRESDEN, GERMANY)
JOOS VANDEWALLE (CATHOLIC UNIVERSITY OF LEUVEN, BELGIUM)
XAVIER VILASÍS-CARDONA (UNIVERSITAT RAMON LLULL, BARCELONA, SPAIN)
CHAI WAH WU (IBM, USA)
ÁKOS ZARÁNDY (MTA-SZTAKI, BUDAPEST, HUNGARY)
8
ag
e
Th
is
p
is
in
te
nt
io
na
lly
le
ft
.
bl
an
k
Program at a Glance
CNNA 2012 – 13th International Workshop on Cellular Nanoscale Networks and their Applications
Tuesday, Aug. 28
Wednesday, Aug. 29
Thursday, Aug. 30
Friday, Aug. 31
Registration
Registration
Registration
Opening and
Plenary Sessions
Chair: M. Gilli
Plenary Sessions
Chair: W. Porod
Plenary Sessions
Chair: T. Roska
9.00-9:50
Prof. Tamas Roska
Dr. George I. Bourianoff
(INTEL)
Prof. Angel Rodriguez Vázquez
9:50-10:40
Dr. Daniel Hammerstrom
(DARPA)
Dr. Chagaan Baatar
(ONR)
Dr. Atul Yoshi
10:40-11:00
Coffee break
Coffee break
Coffee break
Parallel Sessions
Plenary Sessions
Chair: P. Szolgay
Plenary Sessions
Chair: R. Carmona Galán
Dr. Ruud A. Haring
(IBM)
Dr. Csaba Rekeczky
FIAT/SELEX/STM
Dr. Maria Ercsey-Ravasz
FIAT/SELEX/STM
FIAT/SELEX/STM
Lunch
Lunch
Lunch
Parallel Sessions
Parallel Sessions (14:00-16.00)
Parallel Sessions
8:30-9:00
Room A
11.00-11:40
11:40-12:10
12:10-12:40
Room B
3rd
Memristor
and Memristive
Systems
Symposium
SSM1
12:40-14:00
Room A
14:00-15:40
SSM2
RS1
Room B Room C
RS2
DS
Room A
Room B
Room A
Room B
SS1
SS2
RS3
SS5
Coffee break
Coffee break (16.00-16.20)
Parallel Sessions
Parallel Sessions (16.20-17.20)
15:40-16:00
Room A
Room B Room C
Parallel Sessions (15.40-17.20)
Room A
Room B
Room A
Room B
SS3
SS4
RS4
SS6
16:00-17:40
SSM3
Evening
SS7
DS
Welcome cocktail
(Start at 7:00 pm)
Banquet
(Start at 7:00 pm)
10
Closing Ceremony
(Start at 6:00 pm)
Keynote speakers
Dr. George I. Bourianoff, “Towards a Bayesian processor implemented with
oscillatory nanoelectronic arrays”
Components Research Intel Corporation
Dr. Ruud A. Haring, “The Design of the BlueGene/Q Compute Chips”
IBM T. J. Watson Research Center
Prof. Angel Rodriguez Vázquez, “Progress on CMOS Smart Imagers and Vision
Systems”
University of Seville, Spain
Dr. Atul Yoshi, “Advances in Electro-Optical and Infrared Imaging Sensors”
Teledyne Imaging Group
Dr. Csaba Rekeczky, “Sparse Space-time Computing for Embedded Video Analytics
Systems”
CTO and President, Eutecus, Inc.
The strong industry involvement is also highlighted by plenary speakers from
FIAT
SELEX-GALILEO
STMicroelectronics
Position Papers
Prof. Tamas Roska, “Physical and Virtual Cellular Machines for Nanoscale Chips
and Systems”
Pázmány Peter Catholic University, Budapest
Dr. Daniel Hammerstrom, “Unconventional Computing”
DARPA Program Manager
Dr. Chagaan Baatar, “An Overview of ONR Nanoelectronics Program”
ONR Program Officer
Dr. Maria Ercsey-Ravasz, “Solving constraint satisfaction problems via transiently
chaotic analog systems and CNN dynamics”
Physics Department of the Babes-Bolyai University, Romania
Memristor Theory [SSM1]
(in conjunction with the 3rd Memristor and Memristive Systems Symposium)
Chair: Weiran Cai
Time: Wednesday 29, August - 11:00-12:40
Room A
________________________________________________________________________
11:00-11:20
Advanced Memristive Model of Synapses with Adaptive Thresholds
Weiran Cai, Ronald Tetzlaff
Abstract—In this paper, we propose a memristive STDP model realizing the principle of
suppression of Froemke and Dan for triplet spikes. The proposed model claims compatibility
with both the pair and triplet STDP rules, going beyond the limit of the basic memristive
STDP model. The compatibility is realized by assuming a mechanism of variable thresholds
adapting to synaptic potentiation (LTP) and depression (LTD): the preceding LTP has a
negative influence on the following LTD. The corresponding dynamical process is governed
by a set of ordinary differential equations. It is an equivalent model of the original
suppression STDP model. A relation of the adaptive thresholds to short-term plasticity is
addressed.
11:20-11:40
Mathematical models and circuit implementations of memristive systems
Fernando Corinto, Alon Ascoli, Marco Gilli
Abstract—In this paper we first present a novel, simple and general boundary conditionbased model for nano-scale switching resistances with memory. The boundary conditions
are embedded into a switching function modulating the rate of ionic transport, and, on the
basis of the memristor under modeling, may be suitably chosen through an optimization
procedure minimizing some reference parameter such as the mean squared error between
observed and modeled data. The versatile nature of the switching function enables the
model to detect complex dynamics from a number of memristive nano-structures, including
the Hewlett-Packard memristor. In the second part of the manuscript, we explain how to use
the switching dynamics of appropriate nonlinear two-ports to synthesize simple memristive
electronic circuits employing purely-passive already-existing components.
11:40-12:00
Neuronal Spike Event Generation by Memristors
Sangho Shin, Davide Sacchetto, Yusuf Leblebici, Sung-Mo Kang
Abstract—New memristors-based neuronal spike event generator is introduced. By using
the dynamic properties of conditional resistance switching of a practical bistable memristive
device, the neuronal action potential is generated describing both the integrate-and-fire
spiking events and the long enough refractory period of nerve membrane cells. The
memristor offers the dual time-constants which model the unbalanced charging and
discharging periods of the spike signals. With a Pt/TiO2/Pt memristive device having the
ROFF/RON resistance ratio of 3000, the memristor-based spike generator offers spike trains
with about 0.03% duty.
12
12:00-12:20 Fast Computation with Memory Circuit Elements
Massimiliano Di Ventra, Yuriy Pershin
Abstract—Memory circuit elements – resistors, capacitors and inductors with memory – are
electronic components with great potential in a wide range of applications. In particular, they
are ideally suited to enhance all three major computing paradigms: binary, analog and
quantum. Here, we consider how to achieve a faster computation with these elements.
Specifically, we will show that a binary logic architecture combining memristive and
memcapacitive elements requires considerably less steps to process information compared
to architectures employing only memristive elements. In addition, we demonstrate that a
network of memristive - as well as memcapacitive or meminductive – systems can solve a
complex optimization problem – the maze problem – with unprecedented speed due to the
analog parallelism afforded by these elements.
12:20-12:40 FPGA–Based Generation of Autowaves in Memristive Cellular Neural
Networks
Viet Thanh Pham, Arturo Buscarino, Mattia Frasca, Luigi Fortuna, Thang Manh
Hoang
Abstract—Cellular Neural/Nonlinear Networks (CNNs) constitute an effective approach for
studying complex phenomena like autowaves, spiral waves or pattern formation either by
providing a computationally efficient environment for numerical simulations or by allowing
the possibility of hardware emulators of the system under study. In this work, we focus on a
CNN made of memristor–based cells, namely a Memristive Cellular Neural/Nonlinear
Network (MCNN). This has been recently shown to be capable of generating complex
phenomena such as autowave propagation. In this work, we implement such a MCNN by
using Field Programmable Gate Array (FPGA). Our system consisting of a FPGA
development board connected to a monitor allows us to emulate autowave propagation in an
efficient way. Experimental results show the feasibility of FPGA–based approach to
implement MCNN.
New Spatial-temporal Algorithms [RS 1]
Chair: Vedat Tavsanoglu
Time: Wednesday 29, August - 11:00-12:40
Room B
________________________________________________________________________
11:00-11:20
CNN Based Dark Signal Non-Uniformity Estimation
Marc Geese, Paul Ruhnau, Bernd Jähne
Abstract—Image sensors come with a spatial inhomogeneity, known as Fixed Pattern Noise,
that degrades the image quality. Especially the dark signal non uniformity (DSNU)
component of the FPN drifts with time and depends highly on temperature and exposure
time. In this paper we introduce a cellular neural network (CNN) to estimate the DSNU from
a given set of recorded images. Therefore the foundations of a previously presented
maximum likelihood estimation method are used. A rigorous mathematical derivation exploits
the available sensor statistics and uses only well motivated statistical models to calculate
the CNNʼs synaptic weights. The advantages of the resulting CNN-method are continuous
DSNU updates and a reduction of the computational complexity. Furthermore, a comparison
based on ground truth correction patterns shows a significant performance increase to
related methods.
11:20-11:40
Continuous-Time Neural Networks Without Local Traps for Solving Boolean
Satisfiability
Botond Molnár, Zoltán Toroczkai, Mária Ercsey-Ravasz
Abstract—We present a deterministic continuous-time recurrent neural network similar to
CNN models, which can solve Boolean satisfiability (k-SAT) problems without getting
trapped in non-solution fixed points. The model can be implemented by analog circuits, in
which case the algorithm would take a single operation: the template (connection weights) is
set by the k- SAT instance and starting from any initial condition the system converges to a
solution. We prove that there is a one-to-one correspondence between the stable fixed
points of the model and the k-SAT solutions and present numerical evidence that limit cycles
may also be avoided by appropriately choosing the parameters of the model. As this study
opens potentially novel technical avenues to tackle hard optimization problems, we also
discuss some of the arising questions that need to be investigated in future studies.
11:40-12:00
Coarse Grain Mapping Method for Image Processing on Fine Grain Cellular
Processor Arrays
Bin Wang, Piotr Dudek
Abstract—This paper introduces a mapping method for adding a coarse grain (multiple
pixels per processor) processing mode to massively parallel cellular processor arrays. The
main motivation is to provide the fine grain pixel-parallel processor array with the ability of
processing images with higher resolution than the array itself, in a way that is transparent to
the programmer. The proposed method accomplishes the mapping work entirely during the
code compilation process, which has four main advantages. Firstly, there is no extra
overhead during processing. Secondly, the source code for fine grain mode can be used in
coarse grain mode without modification. Thirdly, the proposed method does not introduce
any restrictions of the number of pixels stored in a processing element. Finally, the proposed
method is easy to implement, as it does not require any modifications to the hardware
design of the pixel-parallel processor array or its controller, but only to the software compiler.
The mapping method and its software implementation are presented in this paper.
14
12:00-12:20 2nd Order 2-D Spatial Filters and Cellular Neural Network Implementations
Vedat Tavsanoglu, Nergis Tural Polat
Abstract— In this paper 2-D discrete-space filters are generated from their analog
counterparts and implemented by Cellular Neural Networks (CNN). To this end, first 2-D
analog transfer functions are obtained from their 1-D counterparts. Then, the corresponding
difference equations are obtained by discretization of 2-D analog filter differential equations,
which are then implemented by CNN. Simulation results are presented.
12:20-12:40 CNN Modeling of Tsunami Waves
Angela Slavova, Pietro Zecca
Abstract— In this paper CNN modeling of tsunami waves is presented. Two models are
studied: two-component Camassa- Holm type equation is studied and generalized KdV
equation. For these cases CNN models are constructed and traveling wave solutions are
obtained theoretically and via simulations. New type of traveling wave solutions are
introduced – peak type, called peakon. Discussion and example of tsunami waves are\
provided at the end of the paper.
Memristor Devices [SSM2]
(in conjunction with the 3rd Memristor and Memristive Systems Symposium)
Chair: Qiangfei Xia
Time: Wednesday 29, August - 14:00-15:40
Room A
________________________________________________________________________
14:00-14:20 Memristor Crossbar Arrays with Junction Areas towards
sub-10x10 nm^2
Shuang Pi, Peng Lin, Qiangfei Xia
Abstract—We used diluted hydrofluoric acid to shrink the feature size of a silicon dioxide
nanoimprint mold to sub-10 nm regime. Using this mold, we have fabricated memristor
crossbar arrays using nanoimprint lithography. We demonstrated that memristor devices
with small junction areas exhibited bipolar non-volatile switching behavior with high ON/OFF
ratio and low operational current.
14:20-14:40 Modeling and Implementation of Oxide Memristors for Neuromorphic
Applications
Ting Chang, Patrick Sheridan, Wei Lu
Abstract—We report the fabrication, modeling and implementation of nanoscale tungstenoxide (WOx) memristive (memristor) devices for neuromorphic applications. The device
behaviors can be predicted accurately by considering both ion drift and diffusion. Short-term
memory and memory enhancement phenomena, and the effects of spike rate, timing and
associativity have been demonstrated. SPICE modeling has been achieved that allows
circuit-level implementations.
14:40-15:00 Cost-effective Printed Memristor Fabrication and Analysis
Kyung Hyun Choi, Muhammad Naeem Awais, Hyung Chan Kim, Yang Hui Doh
Abstract—Fabrication of the printed memristors and their memristive behavior have been
presented for different metalinsulator- metal (MIM) structures. The printing techniques
studied for the current work includes e ectrohydrodynamic printing (EHDP) and roll-to-plate.
The materials used for the electrode deposition are silver (Ag) and indium titanium oxide
(ITO) while zirconium oxide (ZrO2) and graphene oxide (GO) have been used for the
sandwich layer between two electrodes on a polyimide (PI) substrate. Electrically stable
bipolar resistive switching behavior of all the MIM structures with significant Off/On ratio has
been observed. The analysis regarding device dimensions and its current voltage (IV)
behavior with respect to the employed printed electronic techniques confirms their feasibility
for the cost-effective memristive device fabrication.
15:00-15:20 Selector Devices for Cross-point ReRAM
Hyunsang Hwang
Abstract— Both varistor-type bidirectional selector (VBS) and ultrathin NbO2 device with
threshold switching (TS) characteristics were investigated. A highly non-linear VBS showed
superior performances including high current density (>3x107A/cm2) and high selectivity
(~104). Ultrathin NbO2 exhibits excellent TS characteristics such as high temperature
stability (~160C), good switching uniformity, and extreme scalability.
16
15:20-15:40 Applications and Limitations of Memristive Implication Logic
Eero Lehtonen, Jussi Poikonen, Mika Laiho
Abstract—In its elementary form, memristive implication logic suffers from multiple
disadvantages such as the lengths of the computational sequences required to synthesize a
Boolean function, the lack of fan-out, and the requirement of complex control signals. In this
paper we present a new stateful logic operation available for rectifying memristors which
corresponds to the logical operation known as the converse nonimplication, and show that it
solves the fan-out problem. Moreover, we show how parallel stateful logic can be performed
within a CMOL memory architecture, and how it can be used to shorten the computational
sequences. We also discuss applications where stateful logic could be advantageous when
compared to more conventional solutions.
Cellular Architectures & Algorithms [RS 2]
Chair: Tadashi Shibata
Time: Wednesday 29, August - 14:00-15:40
Room B
________________________________________________________________________
14:00-14:20 Multi-Feature Detection for Quality Assessment in Laser Beam Welding:
Experimental Results
Leonardo Nicolosi, Ronald Tetzlaff, Felix Abt, Andreas Blug, Heinrich Höfler
Abstract—Laser beam welding (LBW) has been largely used in manufacturing processes
ranging from automobile production to precision mechanics. The complexity of LBW requires
the development of strategies for the real-time control of the process. Most of the available
feedback systems lack of temporal and/or spatial
resolution and, therefore, they hardly allow observing more than one characteristic of the
process. In the last years, we proposed some high-speed visual algorithms for
image feature extraction from process images. The detection of the full penetration hole
(FPH) allowed controlling the laser power at rates of up to 14 kHz. Another strategy enables
observing the occurrence of spatters at monitoring rates of 15 kHz. The achievement of
these results was made possible by the adoption of a visual system including a focal plane
processor programmable by typical Cellular Neural Network (CNN) operations. This paper is
focused on a new visual algorithm for the simultaneous detection of FPH and spatters, which
led to real-time control rates of about 8 kHz. Besides the algorithm description, some
interesting experimental results will be presented.
14:20-14:40 On the Phase Space Decomposition for Weakly Connected Oscillatory
Networks with 2nd Order Cells
Michele Bonnin, Fernando Corinto, Marco Gilli
Abstract—Oscillatory nonlinear networks represent a circuit architecture for image and
information processing. It has been shown that they can be exploited to implement
associative and dynamic memories. It has also been shown that phase noise play an
important role as a limiting key factor for the performances of oscillatory cells. A tool of
paramount importance for the design of oscillatory networks and the analysis of phase noise
are phase models. These models require to treat the noise and the couplings among the
cells as perturbations, and to identify the proper directions along which project the
perturbations. In this paper we discuss the proper decomposition of the phase space for
second order cells of oscillatory nonlinear networks, and we derive analytical formulas for the
vectors spanning the directions for the proper phase space decomposition. We also discuss
the implications of this decomposition in control theory and to what extent a simple
orthogonal projection is correct.
14:40-15:00 Cellular Neural Networks with Dynamic Cell Activity Control for Hausdorff
Distance Estimation
Maria Janczyk, Krzysztof Slot
Abstract—A concept of Cellular Neural Networks with dynamic cell activity control is
proposed in the paper. The concept is an extension to the Fixed State Map mechanism and it
assumes that cells can be disabled or enabled for processing based on assessment of
current distributions of their neighboring signals. A particular case, where this assessment is
made by thresholding a result of cross-correlation between feedback template and
neighborhood outputs is shown to provide a simple means for efficient min/max problem
handling. This idea requires introducing only minor modifications to a cell structure. As an
example, application of the proposed network for fast estimation of Hausdorff distance
between two sets has been considered.
18
15:00-15:20 A VLSI Hardware Implementation Study of SVDD Algorithm Using Analog
Gaussian-Cell Array for on-Chip Learning
Renyuan Zhang, Tadashi Shibata
Abstract—A feasibility study of VLSI hardware implementation of support vector domain
description (SVDD) has been done in this work. The on-chip learning operation of SVDD
algorithm was implemented by an analog Gaussian-cell array. By using a compact analog
Gaussian-generation circuit, the center, height and width of the generated Gaussian kernel
function feature can be programmed. Based on this Gaussian-generation circuit, a fully
parallel architecture is developed to implement the on chip learning operation, which is
carried out by the proposed method. In this manner, the learning operation autonomously
proceeds without any clock-based iteration, and self-converges with a high speed. A proof-ofconcept processor is designed for sixteen learning sample vectors. From the circuit
simulation results, the entire learning operation is accomplished within 0.6 μs, and the
domain of sample space is described by a reduced number of sample vectors. In addition,
the various forms of domain description can be realized by tuning the kernel function feature
dynamically.
15:20-15:40 Analysis of Sperm Motility with CNN Architecture
Levent Savkay, Mustak E. Yalcin
Abstract—In this paper, we propose a CNN model based spermatozoa motility analysis,
which is an important part of complete semen analysis. Sperm motility analysis is a good
example of a multiple object tracking and video surveillance problem when viewed from
engineering viewpoint. Our proposed system takes the video and images from a CCD
camera, applies the front edge preprocessing tasks that uses uses CNN algorithms for
spatial enhancement and preparation of image frames, combined with an appropriately
designed cost function and a greedy assignment algorithm, that determines the objectsspermatozoa, traces their trajectories and classifies the obtained information for the use of
biologists. The system composed of a digital CCD camera connected to the evaluation
system. Here we showed the results by a simulation software running under a PC system.
For the determination of sperm cells and and tracking the trajectories, we utilized the
heuristic rules deduced from the dynamics of spermatozoa and investigation of the video
obtained from real samples.
Memristor Systems [SSM3]
(in conjunction with the 3rd Memristor and Memristive Systems Symposium)
Chair: Yusuf Leblebici
Time: Wednesday 29, August - 16:00-17:40
Room A
________________________________________________________________________
16:00-16:20 MRL - Memristor Ratioed Logic
Shahar Kvatinsky, Nimrod Wald, Guy Satat, Eby Friedman, Avinoam Kolodny, Uri C.
Weiser
Abstract— Memristive devices are novel structures, developed primarily as memory. Another
interesting application for memristive devices is logic circuits. In this paper, MRL (Memristor
Ratioed Logic) - a hybrid CMOS-memristive logic family - is described. In this logic family,
OR and AND logic gates are based on memristive devices, and CMOS inverters are added
to provide a complete logic structure and signal restoration. Unlike previously published
memristive-based logic families, the MRL family is compatible with standard CMOS logic. A
case study of an eight-bit full adder is presented and related design considerations are
discussed.
16:20-16:40 Pattern Matching and Classification based on Complementary Resistive
Switch (CRS) Architecture
Kyoungrok Cho, Sang-Jin Lee, Kwang-Seok Oh, Omid Kavehei, Kamran
Eshraghian
Abstract—Emergence of new materials and in particular the recent progress in Memristor
and related memory technologies encouraged the research community for a renewed
approach towards formulation of architectures such as those that depend upon associate
memory constructs to take the advantages being offered within this new design domain. In
this paper we address a key issue in pattern matching and classification process and hence
suggest an alternative approach for image vector matching combining Complementary
Resistive Switch (CRS) array and bump circuits. We emulated an experimental pattern
matching with two approaches which are based on Hamming distance and threshold level of
the image: the former finds an exact image with a bump circuit and the later finds similar
patterns from the stored images combining comparators. The proposed hardware oriented
architecture is high speed and smaller size that is easier to implement on conventional
CMOS technology.
16:40-17:00 Reaction-Diffusion Media with Excitable Oregonators coupled by Memristors
Xiyuan Gong, Tetsuya Asai, Masato Motomura
Abstract—We numerically investigated the dynamics of a new reaction-diffusion-type
excitable medium where the diffusion coefficient is represented by memristive dynamics.
This type of a medium consists of an array of excitable Oregonators, and each Oregonator
is locally coupled with other Oregonators via memristors, which were claimed to be the
fourth circuit element exhibiting a relationship between flux φ and charge q. Through
extensive numerical simulations, we found that the memristor conductances were modulated
by the excitable waves and controlled the velocity of the waves, depending on the
memristorʼs polarity. Further, different nonuniform spatial patterns were generated
depending on the initial condition of Oregonatorʼs state, memristor polarity and stimulation.
20
17:00-17:20 SPICE Simulator for Hybrid CMOS Memristor Circuit and System
Yuhao Wang, Wei Fei, Hao Yu
Abstract—Memristor is a two-terminal non-linear passive electrical device. After its recently
successful fabrication, a variety of applications based on memristor have been explored,
such as non-volatile memory, reconfigurable computing and neural network. However, one
major challenge when designing hybrid CMOS memristor integrated circuit is the lack of
SPICE-like simulator for design validation. Current approach is to describe memristor device
with equivalent circuit, which is however extremely time-consuming for large scale design
simulation due to additional modeling components. In this paper, a memristor SPICE
simulator is introduced based on the recent new modified nodal analysis (MNA) framework,
which can effectively support the non-conventional state variable such as doping ratio of
memristor. As such, the memristor device can be stamped into state matrix similarly as one
BSIM MOSFET. Compared with equivalent circuit simulation approach, our new MNA based
approach exhibits 40x less simulation time for a 32X32 memristor crossbar circuit. A hybrid
CMOS memristor circuit for classic conditioning training has also been studied by the
developed SPICE simulator.
17:20-17:40 CNN Cell with Memcapacitive Synapses and Threshold Control Circuit
Jacek Flak
Abstract—This paper presents a concept of a solid-state memcapacitor based on a
combination of memristor and capacitor, as well as its applications to cellular nanoscale
networks. In addition to ultra-dense memories, memcapacitors can also be used for synaptic
connections and threshold control in arrays with capacitively coupled processing units. In
principle, the proposed CNN cell structure implements the basic McCulloch- Pitts neuron
model. Although the cell relies on the binary programmability scheme with single-bit
template coefficients, the proposed memcapacitive synapses allow for asynchronous
processing of tasks, for which the traditional cloning templates contain both positive and
negative values.
Memristor-based Cellular and Neural
Synaptic Circuits [SS 7]
Chair: Hyongsuk Kim
Time: Wednesday 29, August - 16:00-17:40
Room B
________________________________________________________________________
16:00-16:20 Memristor Bridge Circuit for Neural Synaptic Weighting
Maheshwar Pd. Sah, Changju Yang, Hyongsuk Kim, Tamás Roska, Leon O. Chua
Abstract—A simple and compact memristor-based bridge circuit which is able to perform
signed synaptic weighting in neuron cells is proposed. The proposed memristor-based
synapse is composed of four memristors which makes a bridge type configuration. By
programming different values on each memristor of the memristor bridge circuit, weighting
values can be set on the memristor bridge synapses. Various simulation results are
included.
16:20-16:40 Synaptic Weighting Circuits for Cellular Neural Networks
Young-Su Kim, Kyeong-Sik Min
Abstract— Cellular Neural Network (CNN) that can provide parallel processing in massive
scale is known suitable to neuromorphic applications such as vision systems. In this paper,
we propose a new synaptic weighting circuit that can perform analog multiplication for CNN
applications. The common-mode feedback is used in the new weighting circuit to minimize
the output offset. The multiplication accuracy can be degraded by finite High Resistance
State (HRS) and non-zero Low Resistance State (LRS) of real memristors. To improve the
multiplication accuracy, we added two MOSFET switches to the memristor weighting circuit
and decided the weighting memristance very carefully considering the leakage current.
Variations in memristance are analyzed to estimate how much they can affect the accuracy
of analog multiplication. Finally, the Average and Laplacian template were tested and verified
by the circuit simulation using the proposed weighting circuit.
16:40-17:00 Memristance and Memcapacitance Modeling of Thin Film Devices Showing
Memristive Behavior
Mohamed G. Ahmed Mohamed, Kyoungrok Cho, Tae-Won Cho
Abstract— In 2008, the fourth passive element “Memristor” was implemented as a device
having both passivity and nonvolatile properties opening the way into new possibilities in the
design and fabrication of innovative memory, arithmetic and logic architectures. Nanofeatures and ionic transport mechanism inherent in memristor device introduce new
challenges into modeling, characterization and, in particular, in the related circuit simulation
needs with system constructs. Therefore, in this paper, we analyze memristor device
fundamentally to characterize the memristance paying particular attention to the hidden
memcapacitance effect. Our proposed macro-model modifies takes into account some of the
non ideal effects like tunneling current and the hidden memcapacitor constructed across non
conducting materials. The model provides the insight for building a device as either
memristive or memcapacitive system. The simulation results have been compared with HP
published data which show good agreement.
22
17:00-17:20 Memristor Emulator Design with Off-the-shelf Solid State Components for
Memristor Circuit Applications
Changju Yang, Maheshwar Pd. Sah, Jae-Bung Kim, Seongik Cho, Hyongsuk Kim
Abstract— A memristor emulator circuit which is designed with off-the-shelf solid state
components is presented. As the memristors are not commercially available so far, some
circuit replacements which behave like memristors are needed to develop application
circuits. In this paper, the variable resistance of a memristor is built utilizing the input
resistance of the closed loop circuit of an OP amp. The memristor emulator circuit has been
implemented on breadboard with off-the-shelf solid state components. The experimental
results of the proposed memristor emulator circuit show a memristor behavior that can be
utilized as an alternative of hp TiO2 memristor model.
17:20-17:40 Analysis of a Serial Circuit with Two memristors and Voltage Source at Sine
and Impulse Regime
Valeri Mladenov, Stoyan Kirilov
Abstract — In the present paper the structure and principle of action of Williamsʼs memristor
are described. There are presented its basic parameters and the basic physical
dependencies are confirmed. The analysis described here considers linear drift model of
Williamsʼs memristor. A SIMULINK model of circuit with two memristors is build with obtained
formulae and Kirchhoffʼs voltage law. The basic results by the simulations organized in
MATLAB and SIMULINK environment are given in graphical form. These results are
associated with distortions of plateaus of impulses at different ratios between resistances of
“opened” and “closed” states of Williamsʼs memristor - ROFF and RON. There are given
also interpreting of results, which confirms that a memristor with high ratio r is better than a
memristor with small value of r. In conclusion there are given basic deductions and
perspectives for future applications of memristor circuits.
DEMO SESSION –
Applications of CNN Technology [DS]
Chairs: György Cserey, Piotr Dudek, Ricardo Carmona Galán
Time: Wednesday 29, August - 14:00-17:40
Room C
________________________________________________________________________
Stand 1
Low Power Multiple Object Tracking and Counting Using a Scamp Cellular
Processor Array
David Barr, Stephen Carey, Piotr Dudek
Abstract - A low-power demonstration system using a SCAMP-3 vision chip to track and
count multiple objects with unpredictable trajectories is presented. The system can track as
many discrete objects that can fit into its visual field. The compact, self contained hardware
consists of a battery, an ARM Cortex-M3 coprocessor, and the sensor/processor array
device. The tracking algorithm is performed entirely by the processor array and the complete
system draws 7.3mA during operation.
Stand 2
Locating High Speed Multiple Objects Using a Scamp-5 Vision-Chip
Stephen Carey, David Barr, Bin Wang, Alexey Lopich, Piotr Dudek
Abstract - Presented in this paper is a demonstration system that uses a low-power
SCAMP-5 256x256 vision-chip to locate and count multiple objects moving at high speed
along arbitrary trajectories. The hardware consists of a SCAMP-5 IC, its power supply
system and a Xilinx Spartan3 controller. At 100,000fps, the SCAMP-5 chip can locate and
readout the coordinates of a single closed-shaped object amongst clutter. At 25,000fps, the
IC can readout the coordinates of 5 objects.
Stand 3
Realization of a Fully Configurable Complex Network of Non Linear Chuaʼs
Oscillators
Marco Colandra, Massimiliano de Magistris, Carlo Petrarca, Mario di Bernardo,
Sabato Manfredi
Abstract— We describe the realization of a new experimental setup for the analysis and
characterization of complex networks of Chuaʼs circuits. It is characterized by full
configurability of the nodeʼs parameters and the network structure (topology and link
impedances), and designed for easy scalability to high number of nodes. The set-up is
automated in terms of control of the network and data acquisition by means of USB
interfaced boards. A portable version of the set-up with 8 nodes is realized for demonstration
purposes.
Stand 4
Real-Time Remote Reporting of Motion Analysis with Wi-Flip
Jorge Fernández-Berni, Ricardo Carmona-Galán, Ángel Rodríguez-Vázquez
Abstract—This paper describes a real-time application programmed into Wi-FLIP, a wireless
smart camera resulting from the integration of FLIP-Q, a prototype mixed-signal focal-plane
array processor, and Imote2, a commercial WSN platform. The application consists in
scanning the whole scene by sequentially analyzing small regions. Within each region,
motion is detected by background subtraction. Subsequently, information related to that
motion — intensity and location — is radio-propagated in order to remotely account for it. By
aggregating this information along time, a motion map of the scene is built. This map permits
to visualize the different activity patterns taking place. It also provides an elaborated
representation of the scene for further remote analysis, preventing raw images from being
transmitted. In particular, the scene inspected in this demo corresponds to vehicular traffic in
a motorway. The remote representation progressively built enables the assessment of the
traffic density.
24
Stand 5
Demonstration of the Second Generation Real-Time Cellular Neural Network
Processor: RTCNNP-V2
Nerhun Yildiz, Evren Cesur, Vedat Tavsanoglu
Abstract—This proceeding is compiled from our previous works, where architecture of the
Second–Generation Real–Time Cellular Neural Network (CNN) Processor (RTCNNP-v2)
was proposed. The system is designed for applications where high resolution and highspeed is desired. The structure is fully– pipelined and the processing is real–time. Proposed
structure is coded in VHDL and realized on two FPGA devices: one high–end and one low–
budget. The system is the only reported CNN implementation supporting real–time Full–HD
video image processing, to date.
Stand 6
An Improved FPGA Implementation of CNN Gabor--Type Filters
Evren Cesur, Nerhun Yildiz, Vedat Tavsanoglu
Abstract— In this paper, a new Cellular Neural Network (CNN) structure for
implementing two dimensional Gabor–type filters is proposed over our previous
design. The structure is coded in VHDL and realized on a state of the art Altera
Stratix IV 230 FPGA. The prototype supports Full–HD 1080p resolution and 60 Hz
frame rate. One dedicated processor is used for each Euler iteration, where time
step is taken as the same as optimum step size, and 50 iterations are implemented.
The input/output, control, RAM and communication blocks of the
realization are taken from our second generation real time CNN emulator (RTCNNPv2).
Stand 7
Cellular Processor Array Based UAV Safety System
Ákos Zarándy, Tamás Zsedrovits, András Kiss, Péter Szolgay, Tamás Roska
Abstract—Embedded sensor-processor system is being developed for on-board UAV
(Unmanned Aerial Vehicle) safety applications. The role of the device is to detect intruder
airplanes which are on or close to collision course. Due to weight, size, and cost
requirements, the visual approach leads to feasible solution only. In our design , 5 cameras
are applied to collect visual data from a large field of view. The image flows are processed
by 3 different virtual cellular processor arrays, which are implemented in FPGA.
Non-Boolean Architectures –
Computing by Physics via Device Arrays
[SS 1]
Chair: Wolfgang Porod and Tamas Roska
Time: Thursday 30, August - 14:00-16:00
Room A
________________________________________________________________________
14:00-14:20 Spin Torque Oscillator Models for Applications in Associative Memories
Gyorgy Csaba, Matt Pufall, Dmitri Nikonov, George Bourianoff, Andras Horvath,
Tamas Roska, Wolfgang Porod
Abstract— We present physics-based models for both individual and coupled spin torque
nano oscillators (STNOs). Such STNOs may become as building blocks for CNN like
dynamic computing architectures. We discuss a hierarchy of models, extending from
micromagnetic models which includes the detailed geometry and physics, to compact
models which are based on parameters extracted from the underlying physical description.
These simulations also include coupling between individual STNOs, both via spin waves and
via electrical interconnects. Using this modeling approach we demonstrate frequency
entrainment and phase synchronization between STOs in the array, which enable computing
functions.
14:20-14:40 Synchronization in Cellular Spin Torque Oscillator Arrays
Andras Horvath, Fernando Corinto, Gyorgy Csaba, Wolfgang Porod, Tamas Roska
Abstract—Spin torque nanodevices could provide a platform for computation beyond
Mooreʼs law. The network of spin oscillators can have only local, cellular interconnections
because of the underlying physics: the interaction between the oscillators happens through
the magnetic field. In this paper we describe the dynamics of weakly coupled spin-torque
oscillator networks and how the dynamics of these cellular arrays can be used for problem
solving. We will describe how the phase shift in a synchronized array can be calculated
between the elements and we will also show a simple example how the dynamics of a
cellular array can be used to solve simple tasks.
14:40-15:00 An Associative Memory with Oscillatory CNN Arrays Using Spin Torque
Oscillator Cells and Spin-Wave Interactions Architecture and End-to-End
Simulator
Tamas Roska, Andras Horvath, Attila Stubendek, Fernando Corinto, Gyorgy Csaba,
Wolfgang Porod, Tadashi Shibata, George Bourianoff
Abstract—An Associative Memory is built by three consecutive components: (1) a CMOS
preprocessing unit generating input feature vectors from picture inputs, (2) an AM cluster
generating signature outputs composed of spintronic oscillator (STO) cells and local spinwave interactions, as an oscillatory CNN (OCNN) array unit, applied several times arranged
in space, and (3) a classification unit (CMOS). The end to end design of the preprocessing
unit, the interacting O-CNN arrays, and the classification unit is embedded in a learning and
optimization procedure where the geometric distances between the STOs in the O-CNN
arrays play a crucial role. The O-CNN array has an input vector as a 1D array of oscillator
frequencies, and the synchronized O-CNN array codes the output as the phases of the
output 1D array. The typical O-CNN array has 1-3 rows of STOs. Simplified STO and
interaction macro models are used. A typical example is shown using an End-to-end
Simulator.
26
15:00-15:20 CMOS Supporting Circuitries for Nano-Oscillator-Based Associative
Memories
Tadashi Shibata, Renyuan Zhang, Steven Levitan, Dmitri Nikonov, George
Bourianoff
Abstract—“Let physics do computing” is a promising approach to new-paradigm computing
in the beyond CMOS era. Building associative memories based on the physics of nano
oscillators, in particular, presents a lot of potential for intelligent information processing. In
this paper, we discuss how CMOS supporting circuitries can interface the fabric of nano
oscillators with digital computing world. Using CMOS ring oscillators to emulate the nano
oscillator behavior, how to produce the associative memory function and to use it for image
recognition is demonstrated by HSPICE simulation.
15:20-15:40 Non-Boolean Associative Architectures Based on Nano-Oscillators
Steven Levitan, Yan Fang, Denver Dash, Tadashi Shibata, Dmitri Nikonov, George
Bourianoff
Abstract— Many of the proposed and emerging nano-scale technologies simply cannot
compete with CMOS in terms of energy efficiency for performing Boolean operations.
However, the potential for these technologies to perform useful non- Boolean computations
remains an opportunity to be explored. In this talk we examine the use of the resonance of
coupled nanoscale oscillators as a primitive computational operator for associative
processing and develop the architectural structures that could enable such devices to be
integrated into mainstream applications.
15:40-16:00 Boolean and Non-Boolean Nearest Neighbor Architectures for Out-of-Plane
Nanomagnet Logic
Xueming Ju, Michael Niemier, György Csaba, Aaron Dingler, Xiaobo Sharon Hu,
Wolfgang Porod, Xueming Ju, Markus Becherer, Doris Schmitt-Landsiedel, Paolo
Lugli
Abstract—We present the design and simulation of information processing hardware that is
comprised of single domain, Co/Pt magnets (i.e., out-of-plane nanomagnet logic – or oNML).
We first describe the design and evaluation of oNML hardware that can identify instances of
a preprogrammed bit sequence in streaming data. Systolic arrays (that process information
using Boolean logic gates) are employed as a system-level architecture which can (i)
mitigate less desirable features of the oNML device architecture (nearest neighbor dataflow
and longer device switching times when compared to a CMOS transistor), and (ii) exploit
unique features of the device architecture (non-volatility and inherently pipelined logic with
no overhead). We conclude the paper with a discussion as to how oNML might be employed
for non-Boolean information processing. A simple image processing function is used as an
initial case study.
Problems and Solutions on Hybrid Kilo/
Mega Core Architectures [SS 2]
Chair: Péter Szolgay
Time: Thursday 30, August - 14:00-16:00
Room B
________________________________________________________________________
14:00-14:20 Memory Access Optimization for Computations on Unstructured Meshes
Antal Hiba, Zoltan Nagy, Miklos Ruszinko
Abstract—Many real-life applications of processor-arrays suffer from memory bandwidth
limitations. In many cases an unstructured mesh is given (computation on sensor data,
simulations of physical systems - PDEs), where the vertices represent computations with
dependencies represented by the edges. Utilization of processing elements (PEs) during
these computations is mainly depends on the node indexing of the mesh. If the adjacent
nodes are stored close to each other in main memory, the reloading of node data can be
significantly decreased. In case of FPGA the memory accesses can be fully determined by
the designer. The mesh and an ordering of its nodes, define the graph bandwidth, which
determines the minimum size of on-chip memory to avoid reloading of the nodes from the
off-chip memory. If the required on-chip memory size is higher than the available resources,
the mesh must be divided into parts. In this paper a novel geometry based method is
presented, which constructs reordered parts from a given unstructured mesh, where each
part meets some predefined constraints on graph bandwidth.
14:20-14:40 Examining the Accuracy and the Precision of PDEs for FPGA Computations
András Kiss, Zoltán Nagy, Árpád Csík, Péter Szolgay
Abstract—There are a large number of problems which can be accelerated by using
architectures on Field Programmable Gate Arrays (FPGA). However sometimes the
complexity of a problem does not allow to map it onto a specific FPGA. In that case analysis
of precision of the arithmetic unit which may solve the computational problem can be a good
attempt to fit the architecture and to accelerate its computation. Numerical algorithm can be
implemented using fixed-point or floating point arithmetic (or mixed (both)) with different
precision. The aim of the article is not to optimize the numerical algorithm but to find a
smaller arithmetic unit precision, which results enough accuracy and fits to smaller FPGA-s.
In the paper, one particular problem type is investigated, namely the accuracy of the solution
of a simple Partial Differential Equation (PDE). The accuracy measurement is done on an
FPGA with different bit width. The solution of the advection equation is analyzed
using first and second order discretization methods. As a result we managed to find an
optimal bit width for the solution on a specific FPGA.
28
14:40-15:00 Automatic Generation of Locally Controlled Arithmetic Unit via Floorplan
Based Partitioning
Csaba Nemes, Zoltán Nagy, Péter Szolgay
Abstract—In the paper a framework for generating a locally controlled arithmetic unit is
presented including graph generation from a mathematical expression, graph partitioning to
determine locally controlled parts of the design and VHDL generation. The output of the
framework is a pipelined architecture containing locally controlled groups of floating point
units. It is demonstrated that both partitioning and placement aspects of the design have to
be considered to obtain a highspeed circuit. In a well-placeable design locally controlled
groups can be mapped to FPGA in such a way that only neighboring groups communicate
with each other. In the presented algorithm an initial floorplan of the floating point units is
produced and a novel graph partitioning representation is used for partitioning the floating
point units to obtain a well-placeable design. The framework is demonstrated during the
automatic circuit generation of a complex mathematical expression related to Computation
Fluid Dynamics (CFD). The framework produces 15-27% faster design than the
unpartitioned, globally controlled one in the price of a modest area increase. The framework
automatically produces well-placeable deadlock-free partitions for complex expressions as
well, while in case of traditional partitioners these objectives cannot be targeted.
15:00-15:20 Analysis of a GPU Based CNN Implementation
Endre László, Péter Szolgay, Zoltán Nagy
Abstract—The CNN (Cellular Neural Network) is a powerful image processing architecture
whose hardware implementation is extremely fast. The lack of such hardware device in a
development process can be substituted by using an efficient simulator implementation.
Commercially available graphics cards with high computing capabilities make this simulator
feasible. The aim of this work is to present a GPU based implementation of a CNN simulator
using nVidiaʼs Fermi architecture. Different implementation approaches are considered and
compared to a multi-core, multi-threaded CPU and some earlier GPU implementations. A
detailed analysis of the introduced GPU implementation is presented.
15:20-15:40 Investigation of Area and Speed Trade-Offs in FPGA Implementation of an
Image Correlation Algorithm
Zoltán Kincses, Zsolt Vörösházi, Zoltán Nagy, Péter Szolgay, Tepelea Laviniu,
Alexandru Gacsádi
Abstract—In this paper an image correlation algorithm is implemented on FPGA architecture
for assisted movements of visually impaired persons or automotive driving systems. Taking
into account the limitations of FPGA devices and the special requirements of the correlation
based image matching algorithm a semi-parallel approach is proposed. This provides an
optimal tradeoff between area and speed of the implemented algorithm. Several key issues
are investigated and discussed related to the speed and area.
15:40-16:00 Sound Propagation Cellular Processors Architectures, Comparisons and
Performances
Radu Dogaru, Ioana Dogaru, Narcis Zamfir, Dorel Aiordachioaie
Abstract—The aim of this paper is to discuss and compare several architectural possibilities
for implementing a simulator for (ultra) sound propagation in a controlled environment (e.g.
using specified obstacles and signal sources). Although initially such sound propagation
simulators were designed to assist the design of robotic "ears" of autonomous agents trying
to reconstruct an image of the environment, its use expands beyond its initial goals. We are
particularly interested here to define the limits and the constraints for kilo-processor
architectures capable to implement such systems at reasonable costs. Our results for
various implementations (software, FPGA, GPU/with CUDA) are considered with some
proposals for suitable kiloprocessor architectures.
GPUs and Multicore Systems in High
Energy Physics [SS 3]
Chair: Niko Neufeld and Xavier Vilasis Cardona
Time: Thursday 30, August - 16:20-17:20
Room A
________________________________________________________________________
16:20-16:40 Many-Core Processors and GPU Opportunities in Particle Detectors
Niko Neufeld, Xavier Vilasis-Cardona
Abstract—High energy physics particle detectors are large and complex devices with very
demanding requirements at the level of signal to noise ratios, processing times and data
throughput. The first stages of the data acquisition are hardware based while the last ones
depend rather on software. Among the solutions to the problems posed by the requirements
we may find the use of multi-core processors or maybe GPUʼs. We shall review what are the
points in which these techniques could be of use and the actual proposals.
16:40-17:00 Real-Time Use of GPUs in NA62 Experiment
Gianmaria Collazuol, Vincenzo Innocente, Gianluca Lamanna, Felice Pantaleo,
Marco Sozzi
Abstract—We describe a pilot project for the use of GPUs in a real-time triggering
application in the early trigger stages at the CERN NA62 experiment, and the results of the
first field tests together with a prototype data acquisition (DAQ) system. This pilot project
within NA62 aims at integrating GPUs into the central L0 trigger processor, and also to use
them as fast online processors for computing trigger primitives. Several TDC equipped subdetectors with sub-nanosecond time resolution will participate in the first-level NA62 trigger
(L0), fully integrated with the data-acquisition system, to reduce the readout rate of all subdetectors to 1 MHz, using multiplicity information asynchronously computed over time
frames of a few ns, both for positive sub-detectors and for vetos. The online use of GPUs
would allow the computation of more complex trigger primitives already at this first trigger
level. We describe the architectures of the proposed systems, focusing on measuring the
performance (both throughput and latency) of various approaches meant to solve these high
energy physics problems. The challenges and the prospects of this promising idea are
discussed.
30
17:00-17:20 ALICE TPC Online Tracker on GPU for Heavy-Ion Events
David Rohr
Abstract—The online event reconstruction for the ALICE experiment at CERN requires
processing capabilities to process central Pb-Pb collisions at a rate of more than 200 Hz,
corresponding to an input data rate of about 25 GB/s. The reconstruction of particle
trajectories in the Time Projection Chamber (TPC) is the most compute intensive step. The
TPC online tracker implementation combines the principle of the cellular automaton and the
Kalman filter. It has been accelerated by the usage of graphics cards (GPUs). A pipelined
processing allows to perform the tracking on the GPU, the data transfer, and the
preprocessing on the CPU in parallel. In order to use data locality, the tracking is split in
multiple phases. At first, track segments are searched in local sectors of the detector,
independently and in parallel. These segments are then merged at a global level. A
shortcoming of this approach is that if a track contains only a very short segment in one
particular sector, the local search possibly does not find this short part. The fast GPU
processing allowed to add an additional step: all found tracks are extrapolated to
neighboring sectors and the unassigned clusters which constitute the missing track segment
are collected. For running QA, it is important that the output of the CPU and the GPU tracker
is as consistent as possible. One major challenge was to implement the tracker such that the
output is not affected by concurrency, while maintaining peak performance and efficiency.
For instance, a naive implementation depended on the order of the tracks which is
nondeterministic when they are created in parallel. Still, due to non-associative floating point
arithmetic a direct binary comparison of the CPU and the GPU tracker output is impossible.
Thus, the approach chosen for evaluating the GPU tracker efficiency is to compare the
cluster to track assignment of the CPU and the GPU tracker cluster by cluster. With the
above comparison scheme, the output of the CPU and the GPU tracker differ by
0.00024Compared to the offline tracker, the HLT tracker is orders of magnitudes faster while
delivering good results. The GPU version outperforms its CPU analog by another factor of
three. Recently, the ALICE HLT cluster was upgraded with new GPUs and is able to process
central heavy ion events at a rate of approximately 200 Hz.
Silicon Implementation [SS 4]
Chair: Peter Foldesy
Time: Thursday 30, August - 16:20-17:20
Room B
________________________________________________________________________
16:20-16:40 On Challenges for Implementing Pixelwise DA Converter in 3D
Ari Paasio, Henri Ansio
Abstract—Vision chips are natural candidates for being among the first areas that are able to
utilize the emerging 3D integration possibilities. In some 2D vision chip architectures there
are pixel level AD and/or DA converters that are used for various purposes. This article
covers the challenges and needs when targeting a megapixel architecture within a 1cm2
chip area. The Through-Silicon-Vias (TSVs) on one hand allow the 3D integration, but on the
other hand pose strict challenges for the design. The TSVs occupy certain area and in an
area restricted design, the number of TSVs should be minimized. Also the associated KeepOut-Zone (KOZ) for each TSV should be taken into account.
16:40-17:00 A Compact FPGA Implementation of a Bit-Serial SIMD Cellular Processor
Array
Declan Walsh, Piotr Dudek
Abstract— An FPGA implementation of a fine grain general purpose SIMD processor array
is presented. The processor architecture has a compact processing element which is
encapsulated into two configurable logic blocks (CLBs) and is then replicated to form an
array. A 32 × 32 processing element array is implemented on a low-cost Xilinx XC5VLX50
FPGA using four-neighbour connectivity with the possibility to scale up using a larger FPGA.
The processor array operates at a frequency of 150 MHz and executes a peak of 153.6
GOPS (bitserial operations). Binary and 8-bit greyscale image processing is performed and
demonstrated.
17:00-17:20 Integrated CMOS Sub-THz Imager Array
Péter Földesy, Ákos Zarándy
Abstract— This paper describes the of a 90 nm CMOS sub-THz detector array ASIC. The
sub-THz detector array is an integrated system composed of silicon field effect plasma wave
sensors, various integrated antennas, pre-amplifiers, ADCs, and digital domain lock-in
amplifier detector. The peak responsivity is found 185 kV/W@365 GHz and 52 kV/W@470
GHz and at the detectivity maximum NEP ~ 20 pW/Hz^-1.
32
ag
e
Th
is
p
is
in
te
nt
io
na
lly
le
ft
.
bl
an
k
Applications on FPGAs & GPUs [RS 3]
Chair: Mustak Yalcin
Time: Friday 31, August - 14:00-15:40
Room A
________________________________________________________________________
14:00-14:20 Implementing Dynamic Reconfigurable CNN-Based Full-Adder
Yanyi Liu, Wenbo Liu, Xiaozheng Yuan, Guanrong Chen
Abstract-This paper presents a new approach to implement the dynamic reconfigurable
logical systems based on Cellular Neural Networks (CNN), comparing with utilizing the
chaos computing system, which is easier to implement in engineering applications and more
stable. We provided and experimentally demonstrated the basic principle for obtaining a fulladder by using uncoupled CNN cells. The actual circuit to implementing the full-adder and
transforming from adder to subtractor also has been presented.
14:20-14:40 Cesar: Emulating Cellular Networks on FPGA
Jens Müller, Ralf Becker, Jan Müller, Ronald Tetzlaff
Abstract—Complex dynamical systems establish offer entirely new possibilities to the
development of groundbreaking data processing methods. In the domains of image and
video processing, locally coupled cellular array computers, based on Cellular Nonlinear
Networks (CNN), accelerate the computation of large amounts of data in real-time, due to
their inherent concept of massive parallelism. Current VLSI implementations however, are
accompanied by several distinct drawbacks. The computational accuracy of most currently
available systems is limited to 8 bit, and the volatilely capacitively stored state values of
analogue realisations often lead to errors when multiple tasks are processed sequentially.
Moreover, the systems hardly allow to run a CNN program code to provide the full
functionality of a CNN-UM. In this contribution, the novel CESAR architecture is proposed
for the digital emulation of a time-discrete CNN-UM. The programmable array computer
facilitates the powerful computation of consecutive CNN operations and the cost-efficient
implementation of several application-specific configurations with variable network size and
data representation. The presented architecture retains the inherent parallel paradigm of
CNN, and assigns one processing element to each cell of the network. The cell outputs are
coupled and stored locally, thus minimising data exchange with external structures and
maximising the computation speed. The internal fixed-point multiplications are accelerated
by using on-chip DSP resources provided by current FPGAs. By this means, a CNN-based
embedded system with 128 cells, a 3 × 3 neighbourhood and 18 bit data representation was
implemented on a Xilinx Virtex-5 FPGA.
14:40-15:00 Implementing Time-Derivative CNNs on a Xilinx Spartan FPGA
Jordi Albo-Canals, Giovanni Pazienza
Abstract—Time-Derivative CNNs (TDCNNs) have been recently proposed as a novel
paradigm realizing spatiotemporal transfer functions for linear filtering. Their dynamics is
usually simulated with SIMULINK because VLSI chips are still in the preliminary phase. In
order to make TDCNNs available to a larger audience, we present here their implementation
on a Xilinx Spartan-6 FPGA. The results concerning an 8X8 network are promising and
consistent with the SW simulations.
34
15:00-15:20 Nonlinear Spatio-Temporal Wave Computing for Real-Time Applications on
GPU
Mehmet Tükel, Ramazan Yeniçeri, Mustak Yalcin
Abstract—In this work, active wave simulation on Cellular Nonlinear Network was computed
for path planning on the GPU of a NVIDIA GTX275 video card. In software part, QtOpenCL,
which is a wrapper library of OpenCL, was used to make code portable for systems with
different GPUs. We achieved promising results comparing to results achieved by both CPU
and FPGA. We have implemented different hardware and software solutions to path
planning problem for 2-D media in real-time. They were almost at limit of real-time
requirements because of some bottlenecks such as low communication bandwidth and low
resolution of network. In this work, by utilizing GPUs, we performed 60000 iterations per
second for simulation of 128X128 node network while we achieved at most 35 iterations per
second with software on an Intel Core 2 Duo P8700 processor. We also achieved 36
iterations per second for 3-D active wave simulation of a 256X 256X256 network on GPU.
15:20-15:40 Visual Learning with Cellular Neural Networks
Alexey Badalov, Xavier Vilasís-Cardona, Jordi Albo-Canals
Abstract—Reinforcement learning is a powerful tool for teaching robotic agents to perform
tasks in real environments. Visual information provided by a camera could be a cheap and
rich source of information about an agentʼs surroundings, if this information were
represented in a compact and generalizable form. We turn to cellular neural networks as the
means of transforming visual input to a representation suitable for reinforcement learning.
We investigate a CNN-based image processing algorithm and describe a method for
efficiently computing CNNs using the DirectX 10 API.
Visual Navigation and Collision
Avoidance [SS 5]
Chair: Ákos Zarándy
Time: Friday 31, August - 14:00-15:40
Room B
________________________________________________________________________
14:00-14:20 A New CNN Based Path Planning Algorithm Improved by the Doppler Effect
Ramazan Yeniceri, Mustak Erhan Yalcin
Abstract—Many path planning and navigation papers using Cellular Neural/Nonlinear
Networks (CNN) are found in literature. High proportion of these works originated by wave
processing feature of CNN. This paper proposes a special condition of a known Cellular
Nonlinear Network model which makes the network very proper to obtain nested and
repetitive travelling waves. The Doppler effect appears as a corollary using this special
condition. The main contribution of the Doppler effect to the path planning applications that
uses CNNs is giving an opportunity to adjust the trackerʼs speed or change the route
completely, dependent to the targetʼs motion. By this way, this paper gains a new
qualification to the CNN-based wave computing techniques putting the wave sourceʼs
motion into use.
14:20-14:40 Azimuth Estimation of Distant, Approaching Airplane in See-and-Avoid
Systems
Tamas Zsedrovits, Akos Zarandy, Balint Vanek, Tamas Peni, Jozsef Bokor, Tamas
Roska
Abstract— Visual detection based sense and avoid problem is more and more important
nowadays as UAVs are getting closer to entering remotely piloted or autonomously into the
airspace. It is critical to gain as much information as possible from the silhouettes of the
distant aircrafts. In our paper, we investigate the reachable accuracy of the orientation
information of remote planes under different geometrical condition, by identifying their wing
lines from their detected wingtips. Under the assumption that the remote airplane is on a
straight course, the error of the spatial discretization (pixelization), and the automatic
detection error is calculated.
14:40-15:00 Visual Sense-and-Avoid System for UAVs
Akos Zarandy, Tamas Zsedrovits, Zoltan Nagy, Andras Kiss, Tamas Roska
Abstract—A visual sense-and-avoid system is introduced in this paper. The system is
designed to operate on small and medium sized UAVs, and to be able to detect and avoid
small manned and unmanned aircrafts. The intruder detection is done on a 4650×1280 sized
video flow which is processed by a many-core cellular processor array real-time.
15:00-15:20 Bio-Inspired Looming Direction Detection Method
Tamás Fülöp, Ákos Zarándy
Abstract— The retina inspired approaching object detection algorithm – based on the
recently identified Pvlab-5 ganglion cell – is a computationally easy segmentation free
method. The original method can detect only the dark looming objects against bright
background. This paper shows a modified algorithm, which can detect any looming and
recessing objects against dark or bright background. Moreover, we show a post processing
evaluation method, which can measure the lateral motion direction using the spatialtemporal activities of the ganglion cells without introducing any hard calculation.
36
15:20-15:40 On the Potential of Current CNN Cameras for Industrial Surface Inspection
Andreas Blug, Peter Strohm, Daniel Carl, Heinrich Höfler, Bernhard Blug, Andreas
Kailer
Abstract— An important issue in industrial quality control is the inspection of rapidly moving
surfaces for small defects such as scratches, dents, grooves, or chatter marks. This paper
investigates the potential of the EyeRIS 1.3 camera as a state-of the- art camera based on
“cellular neural networks” (CNN) for this application in comparison to conventional image
processing systems. Based on experimental data from an aluminum wire drawing process
where defects with a lateral size of 100 μm have to be detected at feeding rates of 10 m/s,
the potential specifications for other surface inspection applications are estimated. Using the
relation between the lateral defect size and the feeding rate as a figure of merit, the CNN
based system outperforms conventional image processing systems by an order or
magnitude in this particular application. In general, the lighting system limits the
performance at lower defect sizes and the computational power at larger defect sizes and
fields of view.
Theoretical Advances of CNNs [RS 4]
Chair: Mauro Forti
Time: Friday 31, August - 15:40-17:20
Room A
________________________________________________________________________
15:40-16:00 Monotonicity of semiflows Generated by Cooperative Delayed Full-Range
CNNs
Mauro Di Marco, Mauro Forti, Massimo Grazzini, Luca Pancioni
Abstract—The paper considers the full-range (FR) model of cellular neural networks (CNNs)
with ideal hard-limiter nonlinearities that limit the allowable range of the neuron state
variables. It is also supposed that there is a concentrated delay (D) in the neuron
interconnections. Due to the presence of multivalued nonlinearities the D-FRCNN model is
mathematically described by a retarded differential inclusion. The main result is a rigorous
proof that, in the case of nonsymmetric cooperative (nonnegative) interconnections, and
delayed interconnections, the semiflow generated by D-FRCNNs is monotone, and that
monotonicity implies some basic restrictions on the long-term behavior of the solutions. The
result is compared with recent results in the literature on semiflows generated by
cooperative standard CNNs, with and without delays.
16:00-16:20 An Experimental Study on Long Transient Oscillations in Cooperative CNN
Rings
Mauro Forti, Barnabas Garay, Miklos Koller, Luca Pancioni
Abstract—The paper considers a class of one-dimensional circular standard cellular neural
network (CNN) arrays with a typical three-segment piecewise linear activation and two sided
cooperative (positive) interactions (a cooperative CNN ring). Numerical simulations show
that in a wide range of interconnection parameters, and for a wide set of initial conditions,
the solutions of a cooperative CNN ring display unexpectedly long oscillations, lasting even
hundreds of cycles, before they eventually converge toward an equilibrium point. The goal of
this paper is to confirm the presence of such long-transient oscillations through laboratory
experiments on a simple discrete-component prototype of a cooperative CNN ring with 16
cells and to analyze some of their salient features. Analytical results are also provided
to support the numerical and experimental findings.
16:20-16:40 Image Representation by Means of CNN Dynamics
Tang Tang, Ronald Tetzlaff
Abstract—By taking advantage of their nonlinear dynamics Cellular Nonlinear Networks
(CNN) are considered to be powerful tools for many image processing applications. In this
paper we will try to investigate the feasibility of image representation by using CNN
dynamics.
16:40-17:00 Phase Model Reduction for Oscillatory Networks Subject to Stochastic Inputs
Michele Bonnin, Fernando Corinto, Valentina Lanza
Abstract—Oscillatory networks represent a circuit architecture for image and information
processing, that can be used to realize associative and dynamic memories. Phase noise is
often a limiting key factors for the performances of oscillatory networks. The ideal framework
to investigate phase noise effect in nonlinear oscillators are phase models. Classical phase
models lead to the conclusion that, in presence of random disturbances such as white noise,
the phase noise problem is simply a diffusion process. In this paper we develop a reduced
order model for phase noise analysis in nonlinear oscillators. We derive a reduced Fokker–
Planck equation for the phase variable and the corresponding reduced phase equations. We
show that the phase noise problem is a convection–diffusion process, proving that white
noise produces both phase diffusion and frequency shift.
38
17:00-17:20 Two Neuron CNN for Hypothesis Testing
Mireia Vinyoles-Serra, Xavier Vilasis-Cardona
Abstract—The two neuron continous time cellular neural network is used to define a statistic
in the classical hypothesis testing problem. The proposal is based on a generalisation of the
linear Fisher discriminant. The procedure to set the cellular neural network parameters is
described and the performance shown on two examples with gaussianly distributed
hypothesis. This technique might also be applied to probabilistic classification problems or
pattern recognition.
Volumetric Imaging Using Numerical
Optical Sensing and Imaging Techniques
[SS 6]
Chair: Szabolcs Tokes
Time: Friday 31, August - 15:40-17:00
Room B
________________________________________________________________________
15:40-16:00 Advanced Background Elimination in Digital Holographic Microscopy
Laszlo Orzo, Andras Feher, Szabolcs Tokes
Abstract—Background estimation and elimination is an indispensable step of hologram
processing. Its application ensures that the fix pattern noise caused by the deposits, dirt and
other impurities of the measuring chamber and the optical system do not contaminate the
reconstructed holograms and improves the efficiency of the object segmentation. It is
conventionally solved by averaging large number of holograms with altering objects within
the flow-through cell. Due to the possible illumination changes the background should be
updated incessantly during the hologram measuring process. Here we introduce an
improved background estimation method where the holographic contributions of the
segmented and reconstructed objects are excluded from the running average. The applied
segmentation is based on the 3D positions of the objects within the flow-through measuring
chamber. Therefore the objects can be distinguished from the impurities and deposits, which
customary located at the walls of the measuring chamber. This way, an elevated speed,
more adaptive background estimation becomes achievable with reduced noise. The applied
object segmentation and hologram subtraction methods are presented also. To accelerate
the processing of the measured holograms the application of some parallel computing
implementation seems essential. Using stream processors (GPU) we were able to increase
the algorithm speed considerably, without perceptible reconstruction accuracy loss.
16:00-16:20 Afocal Digital Holographic Microscopy and its Advantages
Szabolcs Tokes, Laszlo Orzo
Abstract—Applying afocal optical systems in microscopy, especially in digital holographic
microscopy (DHM) have several advantages. We have investigated some possible
implementations theoretically and experimentally as well. Space bandwidth product of an
afocal system can exceed that of the conventional ones. Afocal systems provide higher
resolution and much less distortions. Furthermore, the computational cost of the numerical
reconstruction and correction phase is also lower in the case of an afocal optical setup, as it
ensures constant lateral magnification within the whole measured volume. We show that the
advantage of low distortion is especially enhanced in the case of color DHM. GPU
implementation of reconstruction software is demonstrated.
16:20-16:40 Study on Application of Reference Conjugated Hologram for Aberration
Correction of Multiple Object Planes
Benedek Nagy, Szabolcs Tokes
Abstract—Aberration correction using Reference Conjugated Hologram (RCH) method is
investigated. However we use it not for a single but for a number of reconstructed object
planes in Digital Holographic Microscopy (DHM). We build an off-axis DHM for testing the
performance of the method. The limits of this method have been studied. We compare inline with aberration compensated off-axis DHM. The in-line DHM compensates quite the
same aberrations physically as the RCH method numerically.
40
16:40-17:00 Self-Referenced Digital Holographic Microscopy
Márton Kiss, Zoltán Göröcs, Szabolcs Tõkés
Abstract—By developing a self referenced digital holographic microscope it becomes
possible to record holograms and numerically reconstruct volumetric images of low
coherence fluorescent objects such as (auto)fluorescent biological samples (e.g. algae). Our
goal was to develop and construct a simple, compact portable device. In contrast to the
common holographic approaches where there is a conventional reference beam, a
reference beam should be produced together with the object beam from the same
fluorescent source via imaging it by two separate optical paths (with near zero path length
differences) to get interferences fringes. These interference forms separate holograms of all
the point sources. The waves coming from the separate sources are mutually incoherent but
have an inherent short coherence length. Initially we have tested the self referenced digital
holographic microscope setup with test objects illuminated by LED light source that has
similar spectral bandwidth as the fluorescence sources like chlorophyll. Digital
reconstructions of the measured holograms need considerable processing. To accelerate
the hologram processing a parallel implementation of processing seems essential. Using
GPU-s we were able to enhance the algorithmʼs speed considerably, without the loss of the
reconstruction accuracy.