DEMO SESSION - CNNA 2012
Transcription
DEMO SESSION - CNNA 2012
2 WELCOME MESSAGE On behalf of the 2012 CNNA Organising Committee, it is our great pleasure to welcome you in Torino, to the International Workshop on Cellular Nanoscale Networks and their Applications (August 29th-31st, 2012). CNNA 2012 is the 13th event in the series of IEEE CNNA biannual international workshops started in Budapest in 1990. In addition, the 3rd Memristor and Memristive Symposium, will be held on August 28th-29th, before the beginning of CNNA 2012. This year, we are delighted to host these two conferences for the first time in the Politecnico di Torino. The world of computational devices and architectures has witnessed dramatic changes in the last few years for the emergence of many-core processors and memristive systems. Their long-term significance lies in their enabling potentials for designing nano CNNs, and intelligent machines, with learning and adaptive capabilities. Even more fundamental is their nonlinear dynamics that underpins the biological basis of life itself. CNNA 2012 covers a wide range of topics and technical challenges, in view of the growing interest in mega-processor nanoscale computing. The 3rd Memristor and Memristive Symposium will be a multidisciplinary forum for researches to grasp the latest advances in the field of memristor and memristive circuits and their latest breakthrough applications. The total number of submissions was 83. The technical programme includes a rich presentation of the latest technology breakthrough in CNNs and Memristors and is organised into 4 regular, 10 special, and 1 demo sessions. Four special sessions are devoted to memristor theory, devices and architectures. The programme comprises fourteen plenary lectures, given by distinguished invited speakers, with a strong industry involvement – including IBM, Intel, FIAT, STMicroelectronics – and startup companies, mainly focused in technologies beyond CMOS. In particular the 3rd Memristor and Memristive Symposium and the CNNA 2012 Workshop will be opened by keynote lectures of Leon O. Chua, Tamas Roska and Daniel Hammerstrom. We would like to thank all members of the organising and scientific committee for their constant support and valuable work and all institutions and companies that sponsor both conferences for their generous support, in particular the “Cassa di Risparmio di Torino” (CRT) Foundation, the Chamber of Commerce of Torino and the “Compagnia di San Paolo” Foundation. We also hope that, in addition to appreciate the technical programme, you will enjoy your stay in Torino and find the time to visit the campus of the Politecnico di Torino and our beautiful city. Torino was the first Capital of Italy and in occasion of the 150th anniversary of the unification of Italy, that we celebrated last year, many historical buildings were completely restored. Among them, the Valentino Castle, an historical residence of the Royal family, donated to the Politecnico di Torino, where we will have the welcome reception. We hope your stay here will be both rewarding and memorable. Marco Gilli General Chair Fernando Corinto Technical Program Chair 4 CONFERENCE VENUE POLITECNICO DI TORINO CORSO DUCA DEGLI ABRUZZI, 24 Room A is Aula Magna Room B is Sala Consiglio di Facoltà ORGANIZING COMMITTEE HONORARY CHAIRS ! PIER PAOLO CIVALLERI, POLITECNICO DI TORINO, ITALY ! LEON O. CHUA, U. OF CALIF., BERKELEY, U.S.A. GENERAL CHAIRS ! MARCO GILLI, POLITECNICO DI TORINO, ITALY ! TAMÁS ROSKA, MTA-SZTAKI / PPCU, BUDAPEST, HUNGARY ! CHAI WAH WU, IBM T. J. WATSON R. C., NY, U. S. A. PROGRAM CHAIR ! FERNANDO CORINTO, POLITECNICO DI TORINO, ITALY PROGRAM CO-CHAIRS ! GIOVANNI E. PAZIENZA, U. OF MEMPHIS / PPCU, BUDAPEST, HUNGARY ! ANGELA SLAVOVA, BULG. A. SCIENCES, SOFIA, BULGARIA ! ÁKOS ZARÁNDY, MTA-SZTAKI, BUDAPEST, HUNGARY SPECIAL SESSION CHAIRS ! MARIO BIEY, POLITECNICO DI TORINO, ITALY ! VALERI MLADENOV, T. U. OF SOFIA, SOFIA, BULGARIA ! PÉTER SZOLGAY, MTA-SZTAKI / PPCU, BUDAPEST, HUNGARY ! RONALD TETZLAFF, TUD, DRESDEN, GERMANY FINANCIAL CHAIR ! PAOLA MIRAGLIO, POLITECNICO DI TORINO, ITALY PUBLICATION CHAIR ! GIOVANNI E. PAZIENZA, U. OF MEMPHIS / PPCU, BUDAPEST, HUNGARY PUBLICITY CHAIR ! BERTRAM SHI, HKUST, KOWLOON, HONG KONG 6 EXHIBIT AND DEMO SESSION CHAIRS ! GYÖRGY CSEREY, PPCU, BUDAPEST, HUNGARY ! ! PIOTR DUDEK, U. OF MANCHESTER, U.K. RICARDO CARMONA GALÁN, CNM-CSIC, SEVILLA, SPAIN INDUSTRY LIASON CHAIR ! CSABA REKECZKY, EUTECUS INC., BERKELEY, U.S.A. ASIA-PACIFIC LIASON CHAIR ! CHIN-TENG LIN, N. CHIAO TUNG U., HSINCHU, TAIWAN ! HYONGSUK KIM, CHONBUK NATIONAL U., KOREA SECRETARY, LOGISTICS, AND WEB ! MICHELE BONNIN, POLITECNICO DI TORINO, ITALY ! ANDRAS HORVATH, PPCU, BUDAPEST, HUNGARY ! MARCO BERTINO, POLITECNICO DI TORINO, ITALY SCIENTIFIC COMMITTEE PAOLO ARENA (UNIVERSITY OF CATANIA, ITALY) GUANRONG CHEN (CITY UNIVERSITY OF HONG KONG, HK) LEON O. CHUA (UC BERKELEY, USA) FERNANDO CORINTO (POLITECNICO OF TURIN, ITALY) PIOTR DUDEK (UNIVERSITY OF MANCHESTER, UK) WAI-CHI FANG (NATIONAL CHIAO TUNG UNIVERSITY, TAIWAN) LUIGI FORTUNA (UNIVERSITY OF CATANIA, ITALY) MARCO GILLI (POLITECNICO OF TURIN, ITALY) EDUARDO GOMEZ-RAMIREZ (UNIVERSIDAD LA SALLE, MEXICO OF, MEXICO) STEVE KANG (UC SANTA CRUZ, USA) PEDRO JULIAN (UNIVERSIDAD NACIONAL DEL SUR, BAHIA BLANCA, ARGENTINA) CHIN-TENG LIN (NAT. CHIAO TUNG UNIVERSITY, HSINCHU, TAIWAN) JOSEF A. NOSSEK (TECHNICAL UNIVERSITY OF MUNICH) MACIEJ OGORZALEK (AGH UNIV. OF SCIENCE AND TECH. OF KRAKOW, POLAND) ARI PAASIO (UNIVERSITY OF TURKU, FINLAND) GIOVANNI E. PAZIENZA (MTA-SZTAKI, BUDAPEST, HUNGARY) WOLFGANG POROD (UNIVERSITY OF NOTRE DAME, USA) CSABA REKECZKY (EUTECUS INC., BERKELEY, USA) ÁNGEL RODRÍGUEZ-VÁZQUEZ (UNIVERSITY OF SEVILLE, SPAIN) TAMÁS ROSKA (MTA-SZTAKI & PAZMANY UNIVERSITY, BUDAPEST, HUNGARY) BING J. SHEU (UNIVERSITY OF SOUTHERN CALIFORNIA, LOS ANGELES, USA) BERTRAM SHI (HONG KONG UNIVERSITY SCI. & TECH., HK) PÉTER SZOLGAY (MTA-SZTAKI, BUDAPEST, HUNGARY) MAMORU TANAKA (SOPHIA UNIVERSITY, TOKYO, JAPAN) VEDAT TAVSANOGLU (YILDIZ TECHNICAL UNIVERSITY, ISTANBUL, TURKEY) RONALD TETZLAFF (TECHNICAL UNIVERSITY DRESDEN, GERMANY) JOOS VANDEWALLE (CATHOLIC UNIVERSITY OF LEUVEN, BELGIUM) XAVIER VILASÍS-CARDONA (UNIVERSITAT RAMON LLULL, BARCELONA, SPAIN) CHAI WAH WU (IBM, USA) ÁKOS ZARÁNDY (MTA-SZTAKI, BUDAPEST, HUNGARY) 8 ag e Th is p is in te nt io na lly le ft . bl an k Program at a Glance CNNA 2012 – 13th International Workshop on Cellular Nanoscale Networks and their Applications Tuesday, Aug. 28 Wednesday, Aug. 29 Thursday, Aug. 30 Friday, Aug. 31 Registration Registration Registration Opening and Plenary Sessions Chair: M. Gilli Plenary Sessions Chair: W. Porod Plenary Sessions Chair: T. Roska 9.00-9:50 Prof. Tamas Roska Dr. George I. Bourianoff (INTEL) Prof. Angel Rodriguez Vázquez 9:50-10:40 Dr. Daniel Hammerstrom (DARPA) Dr. Chagaan Baatar (ONR) Dr. Atul Yoshi 10:40-11:00 Coffee break Coffee break Coffee break Parallel Sessions Plenary Sessions Chair: P. Szolgay Plenary Sessions Chair: R. Carmona Galán Dr. Ruud A. Haring (IBM) Dr. Csaba Rekeczky FIAT/SELEX/STM Dr. Maria Ercsey-Ravasz FIAT/SELEX/STM FIAT/SELEX/STM Lunch Lunch Lunch Parallel Sessions Parallel Sessions (14:00-16.00) Parallel Sessions 8:30-9:00 Room A 11.00-11:40 11:40-12:10 12:10-12:40 Room B 3rd Memristor and Memristive Systems Symposium SSM1 12:40-14:00 Room A 14:00-15:40 SSM2 RS1 Room B Room C RS2 DS Room A Room B Room A Room B SS1 SS2 RS3 SS5 Coffee break Coffee break (16.00-16.20) Parallel Sessions Parallel Sessions (16.20-17.20) 15:40-16:00 Room A Room B Room C Parallel Sessions (15.40-17.20) Room A Room B Room A Room B SS3 SS4 RS4 SS6 16:00-17:40 SSM3 Evening SS7 DS Welcome cocktail (Start at 7:00 pm) Banquet (Start at 7:00 pm) 10 Closing Ceremony (Start at 6:00 pm) Keynote speakers Dr. George I. Bourianoff, “Towards a Bayesian processor implemented with oscillatory nanoelectronic arrays” Components Research Intel Corporation Dr. Ruud A. Haring, “The Design of the BlueGene/Q Compute Chips” IBM T. J. Watson Research Center Prof. Angel Rodriguez Vázquez, “Progress on CMOS Smart Imagers and Vision Systems” University of Seville, Spain Dr. Atul Yoshi, “Advances in Electro-Optical and Infrared Imaging Sensors” Teledyne Imaging Group Dr. Csaba Rekeczky, “Sparse Space-time Computing for Embedded Video Analytics Systems” CTO and President, Eutecus, Inc. The strong industry involvement is also highlighted by plenary speakers from FIAT SELEX-GALILEO STMicroelectronics Position Papers Prof. Tamas Roska, “Physical and Virtual Cellular Machines for Nanoscale Chips and Systems” Pázmány Peter Catholic University, Budapest Dr. Daniel Hammerstrom, “Unconventional Computing” DARPA Program Manager Dr. Chagaan Baatar, “An Overview of ONR Nanoelectronics Program” ONR Program Officer Dr. Maria Ercsey-Ravasz, “Solving constraint satisfaction problems via transiently chaotic analog systems and CNN dynamics” Physics Department of the Babes-Bolyai University, Romania Memristor Theory [SSM1] (in conjunction with the 3rd Memristor and Memristive Systems Symposium) Chair: Weiran Cai Time: Wednesday 29, August - 11:00-12:40 Room A ________________________________________________________________________ 11:00-11:20 Advanced Memristive Model of Synapses with Adaptive Thresholds Weiran Cai, Ronald Tetzlaff Abstract—In this paper, we propose a memristive STDP model realizing the principle of suppression of Froemke and Dan for triplet spikes. The proposed model claims compatibility with both the pair and triplet STDP rules, going beyond the limit of the basic memristive STDP model. The compatibility is realized by assuming a mechanism of variable thresholds adapting to synaptic potentiation (LTP) and depression (LTD): the preceding LTP has a negative influence on the following LTD. The corresponding dynamical process is governed by a set of ordinary differential equations. It is an equivalent model of the original suppression STDP model. A relation of the adaptive thresholds to short-term plasticity is addressed. 11:20-11:40 Mathematical models and circuit implementations of memristive systems Fernando Corinto, Alon Ascoli, Marco Gilli Abstract—In this paper we first present a novel, simple and general boundary conditionbased model for nano-scale switching resistances with memory. The boundary conditions are embedded into a switching function modulating the rate of ionic transport, and, on the basis of the memristor under modeling, may be suitably chosen through an optimization procedure minimizing some reference parameter such as the mean squared error between observed and modeled data. The versatile nature of the switching function enables the model to detect complex dynamics from a number of memristive nano-structures, including the Hewlett-Packard memristor. In the second part of the manuscript, we explain how to use the switching dynamics of appropriate nonlinear two-ports to synthesize simple memristive electronic circuits employing purely-passive already-existing components. 11:40-12:00 Neuronal Spike Event Generation by Memristors Sangho Shin, Davide Sacchetto, Yusuf Leblebici, Sung-Mo Kang Abstract—New memristors-based neuronal spike event generator is introduced. By using the dynamic properties of conditional resistance switching of a practical bistable memristive device, the neuronal action potential is generated describing both the integrate-and-fire spiking events and the long enough refractory period of nerve membrane cells. The memristor offers the dual time-constants which model the unbalanced charging and discharging periods of the spike signals. With a Pt/TiO2/Pt memristive device having the ROFF/RON resistance ratio of 3000, the memristor-based spike generator offers spike trains with about 0.03% duty. 12 12:00-12:20 Fast Computation with Memory Circuit Elements Massimiliano Di Ventra, Yuriy Pershin Abstract—Memory circuit elements – resistors, capacitors and inductors with memory – are electronic components with great potential in a wide range of applications. In particular, they are ideally suited to enhance all three major computing paradigms: binary, analog and quantum. Here, we consider how to achieve a faster computation with these elements. Specifically, we will show that a binary logic architecture combining memristive and memcapacitive elements requires considerably less steps to process information compared to architectures employing only memristive elements. In addition, we demonstrate that a network of memristive - as well as memcapacitive or meminductive – systems can solve a complex optimization problem – the maze problem – with unprecedented speed due to the analog parallelism afforded by these elements. 12:20-12:40 FPGA–Based Generation of Autowaves in Memristive Cellular Neural Networks Viet Thanh Pham, Arturo Buscarino, Mattia Frasca, Luigi Fortuna, Thang Manh Hoang Abstract—Cellular Neural/Nonlinear Networks (CNNs) constitute an effective approach for studying complex phenomena like autowaves, spiral waves or pattern formation either by providing a computationally efficient environment for numerical simulations or by allowing the possibility of hardware emulators of the system under study. In this work, we focus on a CNN made of memristor–based cells, namely a Memristive Cellular Neural/Nonlinear Network (MCNN). This has been recently shown to be capable of generating complex phenomena such as autowave propagation. In this work, we implement such a MCNN by using Field Programmable Gate Array (FPGA). Our system consisting of a FPGA development board connected to a monitor allows us to emulate autowave propagation in an efficient way. Experimental results show the feasibility of FPGA–based approach to implement MCNN. New Spatial-temporal Algorithms [RS 1] Chair: Vedat Tavsanoglu Time: Wednesday 29, August - 11:00-12:40 Room B ________________________________________________________________________ 11:00-11:20 CNN Based Dark Signal Non-Uniformity Estimation Marc Geese, Paul Ruhnau, Bernd Jähne Abstract—Image sensors come with a spatial inhomogeneity, known as Fixed Pattern Noise, that degrades the image quality. Especially the dark signal non uniformity (DSNU) component of the FPN drifts with time and depends highly on temperature and exposure time. In this paper we introduce a cellular neural network (CNN) to estimate the DSNU from a given set of recorded images. Therefore the foundations of a previously presented maximum likelihood estimation method are used. A rigorous mathematical derivation exploits the available sensor statistics and uses only well motivated statistical models to calculate the CNNʼs synaptic weights. The advantages of the resulting CNN-method are continuous DSNU updates and a reduction of the computational complexity. Furthermore, a comparison based on ground truth correction patterns shows a significant performance increase to related methods. 11:20-11:40 Continuous-Time Neural Networks Without Local Traps for Solving Boolean Satisfiability Botond Molnár, Zoltán Toroczkai, Mária Ercsey-Ravasz Abstract—We present a deterministic continuous-time recurrent neural network similar to CNN models, which can solve Boolean satisfiability (k-SAT) problems without getting trapped in non-solution fixed points. The model can be implemented by analog circuits, in which case the algorithm would take a single operation: the template (connection weights) is set by the k- SAT instance and starting from any initial condition the system converges to a solution. We prove that there is a one-to-one correspondence between the stable fixed points of the model and the k-SAT solutions and present numerical evidence that limit cycles may also be avoided by appropriately choosing the parameters of the model. As this study opens potentially novel technical avenues to tackle hard optimization problems, we also discuss some of the arising questions that need to be investigated in future studies. 11:40-12:00 Coarse Grain Mapping Method for Image Processing on Fine Grain Cellular Processor Arrays Bin Wang, Piotr Dudek Abstract—This paper introduces a mapping method for adding a coarse grain (multiple pixels per processor) processing mode to massively parallel cellular processor arrays. The main motivation is to provide the fine grain pixel-parallel processor array with the ability of processing images with higher resolution than the array itself, in a way that is transparent to the programmer. The proposed method accomplishes the mapping work entirely during the code compilation process, which has four main advantages. Firstly, there is no extra overhead during processing. Secondly, the source code for fine grain mode can be used in coarse grain mode without modification. Thirdly, the proposed method does not introduce any restrictions of the number of pixels stored in a processing element. Finally, the proposed method is easy to implement, as it does not require any modifications to the hardware design of the pixel-parallel processor array or its controller, but only to the software compiler. The mapping method and its software implementation are presented in this paper. 14 12:00-12:20 2nd Order 2-D Spatial Filters and Cellular Neural Network Implementations Vedat Tavsanoglu, Nergis Tural Polat Abstract— In this paper 2-D discrete-space filters are generated from their analog counterparts and implemented by Cellular Neural Networks (CNN). To this end, first 2-D analog transfer functions are obtained from their 1-D counterparts. Then, the corresponding difference equations are obtained by discretization of 2-D analog filter differential equations, which are then implemented by CNN. Simulation results are presented. 12:20-12:40 CNN Modeling of Tsunami Waves Angela Slavova, Pietro Zecca Abstract— In this paper CNN modeling of tsunami waves is presented. Two models are studied: two-component Camassa- Holm type equation is studied and generalized KdV equation. For these cases CNN models are constructed and traveling wave solutions are obtained theoretically and via simulations. New type of traveling wave solutions are introduced – peak type, called peakon. Discussion and example of tsunami waves are\ provided at the end of the paper. Memristor Devices [SSM2] (in conjunction with the 3rd Memristor and Memristive Systems Symposium) Chair: Qiangfei Xia Time: Wednesday 29, August - 14:00-15:40 Room A ________________________________________________________________________ 14:00-14:20 Memristor Crossbar Arrays with Junction Areas towards sub-10x10 nm^2 Shuang Pi, Peng Lin, Qiangfei Xia Abstract—We used diluted hydrofluoric acid to shrink the feature size of a silicon dioxide nanoimprint mold to sub-10 nm regime. Using this mold, we have fabricated memristor crossbar arrays using nanoimprint lithography. We demonstrated that memristor devices with small junction areas exhibited bipolar non-volatile switching behavior with high ON/OFF ratio and low operational current. 14:20-14:40 Modeling and Implementation of Oxide Memristors for Neuromorphic Applications Ting Chang, Patrick Sheridan, Wei Lu Abstract—We report the fabrication, modeling and implementation of nanoscale tungstenoxide (WOx) memristive (memristor) devices for neuromorphic applications. The device behaviors can be predicted accurately by considering both ion drift and diffusion. Short-term memory and memory enhancement phenomena, and the effects of spike rate, timing and associativity have been demonstrated. SPICE modeling has been achieved that allows circuit-level implementations. 14:40-15:00 Cost-effective Printed Memristor Fabrication and Analysis Kyung Hyun Choi, Muhammad Naeem Awais, Hyung Chan Kim, Yang Hui Doh Abstract—Fabrication of the printed memristors and their memristive behavior have been presented for different metalinsulator- metal (MIM) structures. The printing techniques studied for the current work includes e ectrohydrodynamic printing (EHDP) and roll-to-plate. The materials used for the electrode deposition are silver (Ag) and indium titanium oxide (ITO) while zirconium oxide (ZrO2) and graphene oxide (GO) have been used for the sandwich layer between two electrodes on a polyimide (PI) substrate. Electrically stable bipolar resistive switching behavior of all the MIM structures with significant Off/On ratio has been observed. The analysis regarding device dimensions and its current voltage (IV) behavior with respect to the employed printed electronic techniques confirms their feasibility for the cost-effective memristive device fabrication. 15:00-15:20 Selector Devices for Cross-point ReRAM Hyunsang Hwang Abstract— Both varistor-type bidirectional selector (VBS) and ultrathin NbO2 device with threshold switching (TS) characteristics were investigated. A highly non-linear VBS showed superior performances including high current density (>3x107A/cm2) and high selectivity (~104). Ultrathin NbO2 exhibits excellent TS characteristics such as high temperature stability (~160C), good switching uniformity, and extreme scalability. 16 15:20-15:40 Applications and Limitations of Memristive Implication Logic Eero Lehtonen, Jussi Poikonen, Mika Laiho Abstract—In its elementary form, memristive implication logic suffers from multiple disadvantages such as the lengths of the computational sequences required to synthesize a Boolean function, the lack of fan-out, and the requirement of complex control signals. In this paper we present a new stateful logic operation available for rectifying memristors which corresponds to the logical operation known as the converse nonimplication, and show that it solves the fan-out problem. Moreover, we show how parallel stateful logic can be performed within a CMOL memory architecture, and how it can be used to shorten the computational sequences. We also discuss applications where stateful logic could be advantageous when compared to more conventional solutions. Cellular Architectures & Algorithms [RS 2] Chair: Tadashi Shibata Time: Wednesday 29, August - 14:00-15:40 Room B ________________________________________________________________________ 14:00-14:20 Multi-Feature Detection for Quality Assessment in Laser Beam Welding: Experimental Results Leonardo Nicolosi, Ronald Tetzlaff, Felix Abt, Andreas Blug, Heinrich Höfler Abstract—Laser beam welding (LBW) has been largely used in manufacturing processes ranging from automobile production to precision mechanics. The complexity of LBW requires the development of strategies for the real-time control of the process. Most of the available feedback systems lack of temporal and/or spatial resolution and, therefore, they hardly allow observing more than one characteristic of the process. In the last years, we proposed some high-speed visual algorithms for image feature extraction from process images. The detection of the full penetration hole (FPH) allowed controlling the laser power at rates of up to 14 kHz. Another strategy enables observing the occurrence of spatters at monitoring rates of 15 kHz. The achievement of these results was made possible by the adoption of a visual system including a focal plane processor programmable by typical Cellular Neural Network (CNN) operations. This paper is focused on a new visual algorithm for the simultaneous detection of FPH and spatters, which led to real-time control rates of about 8 kHz. Besides the algorithm description, some interesting experimental results will be presented. 14:20-14:40 On the Phase Space Decomposition for Weakly Connected Oscillatory Networks with 2nd Order Cells Michele Bonnin, Fernando Corinto, Marco Gilli Abstract—Oscillatory nonlinear networks represent a circuit architecture for image and information processing. It has been shown that they can be exploited to implement associative and dynamic memories. It has also been shown that phase noise play an important role as a limiting key factor for the performances of oscillatory cells. A tool of paramount importance for the design of oscillatory networks and the analysis of phase noise are phase models. These models require to treat the noise and the couplings among the cells as perturbations, and to identify the proper directions along which project the perturbations. In this paper we discuss the proper decomposition of the phase space for second order cells of oscillatory nonlinear networks, and we derive analytical formulas for the vectors spanning the directions for the proper phase space decomposition. We also discuss the implications of this decomposition in control theory and to what extent a simple orthogonal projection is correct. 14:40-15:00 Cellular Neural Networks with Dynamic Cell Activity Control for Hausdorff Distance Estimation Maria Janczyk, Krzysztof Slot Abstract—A concept of Cellular Neural Networks with dynamic cell activity control is proposed in the paper. The concept is an extension to the Fixed State Map mechanism and it assumes that cells can be disabled or enabled for processing based on assessment of current distributions of their neighboring signals. A particular case, where this assessment is made by thresholding a result of cross-correlation between feedback template and neighborhood outputs is shown to provide a simple means for efficient min/max problem handling. This idea requires introducing only minor modifications to a cell structure. As an example, application of the proposed network for fast estimation of Hausdorff distance between two sets has been considered. 18 15:00-15:20 A VLSI Hardware Implementation Study of SVDD Algorithm Using Analog Gaussian-Cell Array for on-Chip Learning Renyuan Zhang, Tadashi Shibata Abstract—A feasibility study of VLSI hardware implementation of support vector domain description (SVDD) has been done in this work. The on-chip learning operation of SVDD algorithm was implemented by an analog Gaussian-cell array. By using a compact analog Gaussian-generation circuit, the center, height and width of the generated Gaussian kernel function feature can be programmed. Based on this Gaussian-generation circuit, a fully parallel architecture is developed to implement the on chip learning operation, which is carried out by the proposed method. In this manner, the learning operation autonomously proceeds without any clock-based iteration, and self-converges with a high speed. A proof-ofconcept processor is designed for sixteen learning sample vectors. From the circuit simulation results, the entire learning operation is accomplished within 0.6 μs, and the domain of sample space is described by a reduced number of sample vectors. In addition, the various forms of domain description can be realized by tuning the kernel function feature dynamically. 15:20-15:40 Analysis of Sperm Motility with CNN Architecture Levent Savkay, Mustak E. Yalcin Abstract—In this paper, we propose a CNN model based spermatozoa motility analysis, which is an important part of complete semen analysis. Sperm motility analysis is a good example of a multiple object tracking and video surveillance problem when viewed from engineering viewpoint. Our proposed system takes the video and images from a CCD camera, applies the front edge preprocessing tasks that uses uses CNN algorithms for spatial enhancement and preparation of image frames, combined with an appropriately designed cost function and a greedy assignment algorithm, that determines the objectsspermatozoa, traces their trajectories and classifies the obtained information for the use of biologists. The system composed of a digital CCD camera connected to the evaluation system. Here we showed the results by a simulation software running under a PC system. For the determination of sperm cells and and tracking the trajectories, we utilized the heuristic rules deduced from the dynamics of spermatozoa and investigation of the video obtained from real samples. Memristor Systems [SSM3] (in conjunction with the 3rd Memristor and Memristive Systems Symposium) Chair: Yusuf Leblebici Time: Wednesday 29, August - 16:00-17:40 Room A ________________________________________________________________________ 16:00-16:20 MRL - Memristor Ratioed Logic Shahar Kvatinsky, Nimrod Wald, Guy Satat, Eby Friedman, Avinoam Kolodny, Uri C. Weiser Abstract— Memristive devices are novel structures, developed primarily as memory. Another interesting application for memristive devices is logic circuits. In this paper, MRL (Memristor Ratioed Logic) - a hybrid CMOS-memristive logic family - is described. In this logic family, OR and AND logic gates are based on memristive devices, and CMOS inverters are added to provide a complete logic structure and signal restoration. Unlike previously published memristive-based logic families, the MRL family is compatible with standard CMOS logic. A case study of an eight-bit full adder is presented and related design considerations are discussed. 16:20-16:40 Pattern Matching and Classification based on Complementary Resistive Switch (CRS) Architecture Kyoungrok Cho, Sang-Jin Lee, Kwang-Seok Oh, Omid Kavehei, Kamran Eshraghian Abstract—Emergence of new materials and in particular the recent progress in Memristor and related memory technologies encouraged the research community for a renewed approach towards formulation of architectures such as those that depend upon associate memory constructs to take the advantages being offered within this new design domain. In this paper we address a key issue in pattern matching and classification process and hence suggest an alternative approach for image vector matching combining Complementary Resistive Switch (CRS) array and bump circuits. We emulated an experimental pattern matching with two approaches which are based on Hamming distance and threshold level of the image: the former finds an exact image with a bump circuit and the later finds similar patterns from the stored images combining comparators. The proposed hardware oriented architecture is high speed and smaller size that is easier to implement on conventional CMOS technology. 16:40-17:00 Reaction-Diffusion Media with Excitable Oregonators coupled by Memristors Xiyuan Gong, Tetsuya Asai, Masato Motomura Abstract—We numerically investigated the dynamics of a new reaction-diffusion-type excitable medium where the diffusion coefficient is represented by memristive dynamics. This type of a medium consists of an array of excitable Oregonators, and each Oregonator is locally coupled with other Oregonators via memristors, which were claimed to be the fourth circuit element exhibiting a relationship between flux φ and charge q. Through extensive numerical simulations, we found that the memristor conductances were modulated by the excitable waves and controlled the velocity of the waves, depending on the memristorʼs polarity. Further, different nonuniform spatial patterns were generated depending on the initial condition of Oregonatorʼs state, memristor polarity and stimulation. 20 17:00-17:20 SPICE Simulator for Hybrid CMOS Memristor Circuit and System Yuhao Wang, Wei Fei, Hao Yu Abstract—Memristor is a two-terminal non-linear passive electrical device. After its recently successful fabrication, a variety of applications based on memristor have been explored, such as non-volatile memory, reconfigurable computing and neural network. However, one major challenge when designing hybrid CMOS memristor integrated circuit is the lack of SPICE-like simulator for design validation. Current approach is to describe memristor device with equivalent circuit, which is however extremely time-consuming for large scale design simulation due to additional modeling components. In this paper, a memristor SPICE simulator is introduced based on the recent new modified nodal analysis (MNA) framework, which can effectively support the non-conventional state variable such as doping ratio of memristor. As such, the memristor device can be stamped into state matrix similarly as one BSIM MOSFET. Compared with equivalent circuit simulation approach, our new MNA based approach exhibits 40x less simulation time for a 32X32 memristor crossbar circuit. A hybrid CMOS memristor circuit for classic conditioning training has also been studied by the developed SPICE simulator. 17:20-17:40 CNN Cell with Memcapacitive Synapses and Threshold Control Circuit Jacek Flak Abstract—This paper presents a concept of a solid-state memcapacitor based on a combination of memristor and capacitor, as well as its applications to cellular nanoscale networks. In addition to ultra-dense memories, memcapacitors can also be used for synaptic connections and threshold control in arrays with capacitively coupled processing units. In principle, the proposed CNN cell structure implements the basic McCulloch- Pitts neuron model. Although the cell relies on the binary programmability scheme with single-bit template coefficients, the proposed memcapacitive synapses allow for asynchronous processing of tasks, for which the traditional cloning templates contain both positive and negative values. Memristor-based Cellular and Neural Synaptic Circuits [SS 7] Chair: Hyongsuk Kim Time: Wednesday 29, August - 16:00-17:40 Room B ________________________________________________________________________ 16:00-16:20 Memristor Bridge Circuit for Neural Synaptic Weighting Maheshwar Pd. Sah, Changju Yang, Hyongsuk Kim, Tamás Roska, Leon O. Chua Abstract—A simple and compact memristor-based bridge circuit which is able to perform signed synaptic weighting in neuron cells is proposed. The proposed memristor-based synapse is composed of four memristors which makes a bridge type configuration. By programming different values on each memristor of the memristor bridge circuit, weighting values can be set on the memristor bridge synapses. Various simulation results are included. 16:20-16:40 Synaptic Weighting Circuits for Cellular Neural Networks Young-Su Kim, Kyeong-Sik Min Abstract— Cellular Neural Network (CNN) that can provide parallel processing in massive scale is known suitable to neuromorphic applications such as vision systems. In this paper, we propose a new synaptic weighting circuit that can perform analog multiplication for CNN applications. The common-mode feedback is used in the new weighting circuit to minimize the output offset. The multiplication accuracy can be degraded by finite High Resistance State (HRS) and non-zero Low Resistance State (LRS) of real memristors. To improve the multiplication accuracy, we added two MOSFET switches to the memristor weighting circuit and decided the weighting memristance very carefully considering the leakage current. Variations in memristance are analyzed to estimate how much they can affect the accuracy of analog multiplication. Finally, the Average and Laplacian template were tested and verified by the circuit simulation using the proposed weighting circuit. 16:40-17:00 Memristance and Memcapacitance Modeling of Thin Film Devices Showing Memristive Behavior Mohamed G. Ahmed Mohamed, Kyoungrok Cho, Tae-Won Cho Abstract— In 2008, the fourth passive element “Memristor” was implemented as a device having both passivity and nonvolatile properties opening the way into new possibilities in the design and fabrication of innovative memory, arithmetic and logic architectures. Nanofeatures and ionic transport mechanism inherent in memristor device introduce new challenges into modeling, characterization and, in particular, in the related circuit simulation needs with system constructs. Therefore, in this paper, we analyze memristor device fundamentally to characterize the memristance paying particular attention to the hidden memcapacitance effect. Our proposed macro-model modifies takes into account some of the non ideal effects like tunneling current and the hidden memcapacitor constructed across non conducting materials. The model provides the insight for building a device as either memristive or memcapacitive system. The simulation results have been compared with HP published data which show good agreement. 22 17:00-17:20 Memristor Emulator Design with Off-the-shelf Solid State Components for Memristor Circuit Applications Changju Yang, Maheshwar Pd. Sah, Jae-Bung Kim, Seongik Cho, Hyongsuk Kim Abstract— A memristor emulator circuit which is designed with off-the-shelf solid state components is presented. As the memristors are not commercially available so far, some circuit replacements which behave like memristors are needed to develop application circuits. In this paper, the variable resistance of a memristor is built utilizing the input resistance of the closed loop circuit of an OP amp. The memristor emulator circuit has been implemented on breadboard with off-the-shelf solid state components. The experimental results of the proposed memristor emulator circuit show a memristor behavior that can be utilized as an alternative of hp TiO2 memristor model. 17:20-17:40 Analysis of a Serial Circuit with Two memristors and Voltage Source at Sine and Impulse Regime Valeri Mladenov, Stoyan Kirilov Abstract — In the present paper the structure and principle of action of Williamsʼs memristor are described. There are presented its basic parameters and the basic physical dependencies are confirmed. The analysis described here considers linear drift model of Williamsʼs memristor. A SIMULINK model of circuit with two memristors is build with obtained formulae and Kirchhoffʼs voltage law. The basic results by the simulations organized in MATLAB and SIMULINK environment are given in graphical form. These results are associated with distortions of plateaus of impulses at different ratios between resistances of “opened” and “closed” states of Williamsʼs memristor - ROFF and RON. There are given also interpreting of results, which confirms that a memristor with high ratio r is better than a memristor with small value of r. In conclusion there are given basic deductions and perspectives for future applications of memristor circuits. DEMO SESSION – Applications of CNN Technology [DS] Chairs: György Cserey, Piotr Dudek, Ricardo Carmona Galán Time: Wednesday 29, August - 14:00-17:40 Room C ________________________________________________________________________ Stand 1 Low Power Multiple Object Tracking and Counting Using a Scamp Cellular Processor Array David Barr, Stephen Carey, Piotr Dudek Abstract - A low-power demonstration system using a SCAMP-3 vision chip to track and count multiple objects with unpredictable trajectories is presented. The system can track as many discrete objects that can fit into its visual field. The compact, self contained hardware consists of a battery, an ARM Cortex-M3 coprocessor, and the sensor/processor array device. The tracking algorithm is performed entirely by the processor array and the complete system draws 7.3mA during operation. Stand 2 Locating High Speed Multiple Objects Using a Scamp-5 Vision-Chip Stephen Carey, David Barr, Bin Wang, Alexey Lopich, Piotr Dudek Abstract - Presented in this paper is a demonstration system that uses a low-power SCAMP-5 256x256 vision-chip to locate and count multiple objects moving at high speed along arbitrary trajectories. The hardware consists of a SCAMP-5 IC, its power supply system and a Xilinx Spartan3 controller. At 100,000fps, the SCAMP-5 chip can locate and readout the coordinates of a single closed-shaped object amongst clutter. At 25,000fps, the IC can readout the coordinates of 5 objects. Stand 3 Realization of a Fully Configurable Complex Network of Non Linear Chuaʼs Oscillators Marco Colandra, Massimiliano de Magistris, Carlo Petrarca, Mario di Bernardo, Sabato Manfredi Abstract— We describe the realization of a new experimental setup for the analysis and characterization of complex networks of Chuaʼs circuits. It is characterized by full configurability of the nodeʼs parameters and the network structure (topology and link impedances), and designed for easy scalability to high number of nodes. The set-up is automated in terms of control of the network and data acquisition by means of USB interfaced boards. A portable version of the set-up with 8 nodes is realized for demonstration purposes. Stand 4 Real-Time Remote Reporting of Motion Analysis with Wi-Flip Jorge Fernández-Berni, Ricardo Carmona-Galán, Ángel Rodríguez-Vázquez Abstract—This paper describes a real-time application programmed into Wi-FLIP, a wireless smart camera resulting from the integration of FLIP-Q, a prototype mixed-signal focal-plane array processor, and Imote2, a commercial WSN platform. The application consists in scanning the whole scene by sequentially analyzing small regions. Within each region, motion is detected by background subtraction. Subsequently, information related to that motion — intensity and location — is radio-propagated in order to remotely account for it. By aggregating this information along time, a motion map of the scene is built. This map permits to visualize the different activity patterns taking place. It also provides an elaborated representation of the scene for further remote analysis, preventing raw images from being transmitted. In particular, the scene inspected in this demo corresponds to vehicular traffic in a motorway. The remote representation progressively built enables the assessment of the traffic density. 24 Stand 5 Demonstration of the Second Generation Real-Time Cellular Neural Network Processor: RTCNNP-V2 Nerhun Yildiz, Evren Cesur, Vedat Tavsanoglu Abstract—This proceeding is compiled from our previous works, where architecture of the Second–Generation Real–Time Cellular Neural Network (CNN) Processor (RTCNNP-v2) was proposed. The system is designed for applications where high resolution and highspeed is desired. The structure is fully– pipelined and the processing is real–time. Proposed structure is coded in VHDL and realized on two FPGA devices: one high–end and one low– budget. The system is the only reported CNN implementation supporting real–time Full–HD video image processing, to date. Stand 6 An Improved FPGA Implementation of CNN Gabor--Type Filters Evren Cesur, Nerhun Yildiz, Vedat Tavsanoglu Abstract— In this paper, a new Cellular Neural Network (CNN) structure for implementing two dimensional Gabor–type filters is proposed over our previous design. The structure is coded in VHDL and realized on a state of the art Altera Stratix IV 230 FPGA. The prototype supports Full–HD 1080p resolution and 60 Hz frame rate. One dedicated processor is used for each Euler iteration, where time step is taken as the same as optimum step size, and 50 iterations are implemented. The input/output, control, RAM and communication blocks of the realization are taken from our second generation real time CNN emulator (RTCNNPv2). Stand 7 Cellular Processor Array Based UAV Safety System Ákos Zarándy, Tamás Zsedrovits, András Kiss, Péter Szolgay, Tamás Roska Abstract—Embedded sensor-processor system is being developed for on-board UAV (Unmanned Aerial Vehicle) safety applications. The role of the device is to detect intruder airplanes which are on or close to collision course. Due to weight, size, and cost requirements, the visual approach leads to feasible solution only. In our design , 5 cameras are applied to collect visual data from a large field of view. The image flows are processed by 3 different virtual cellular processor arrays, which are implemented in FPGA. Non-Boolean Architectures – Computing by Physics via Device Arrays [SS 1] Chair: Wolfgang Porod and Tamas Roska Time: Thursday 30, August - 14:00-16:00 Room A ________________________________________________________________________ 14:00-14:20 Spin Torque Oscillator Models for Applications in Associative Memories Gyorgy Csaba, Matt Pufall, Dmitri Nikonov, George Bourianoff, Andras Horvath, Tamas Roska, Wolfgang Porod Abstract— We present physics-based models for both individual and coupled spin torque nano oscillators (STNOs). Such STNOs may become as building blocks for CNN like dynamic computing architectures. We discuss a hierarchy of models, extending from micromagnetic models which includes the detailed geometry and physics, to compact models which are based on parameters extracted from the underlying physical description. These simulations also include coupling between individual STNOs, both via spin waves and via electrical interconnects. Using this modeling approach we demonstrate frequency entrainment and phase synchronization between STOs in the array, which enable computing functions. 14:20-14:40 Synchronization in Cellular Spin Torque Oscillator Arrays Andras Horvath, Fernando Corinto, Gyorgy Csaba, Wolfgang Porod, Tamas Roska Abstract—Spin torque nanodevices could provide a platform for computation beyond Mooreʼs law. The network of spin oscillators can have only local, cellular interconnections because of the underlying physics: the interaction between the oscillators happens through the magnetic field. In this paper we describe the dynamics of weakly coupled spin-torque oscillator networks and how the dynamics of these cellular arrays can be used for problem solving. We will describe how the phase shift in a synchronized array can be calculated between the elements and we will also show a simple example how the dynamics of a cellular array can be used to solve simple tasks. 14:40-15:00 An Associative Memory with Oscillatory CNN Arrays Using Spin Torque Oscillator Cells and Spin-Wave Interactions Architecture and End-to-End Simulator Tamas Roska, Andras Horvath, Attila Stubendek, Fernando Corinto, Gyorgy Csaba, Wolfgang Porod, Tadashi Shibata, George Bourianoff Abstract—An Associative Memory is built by three consecutive components: (1) a CMOS preprocessing unit generating input feature vectors from picture inputs, (2) an AM cluster generating signature outputs composed of spintronic oscillator (STO) cells and local spinwave interactions, as an oscillatory CNN (OCNN) array unit, applied several times arranged in space, and (3) a classification unit (CMOS). The end to end design of the preprocessing unit, the interacting O-CNN arrays, and the classification unit is embedded in a learning and optimization procedure where the geometric distances between the STOs in the O-CNN arrays play a crucial role. The O-CNN array has an input vector as a 1D array of oscillator frequencies, and the synchronized O-CNN array codes the output as the phases of the output 1D array. The typical O-CNN array has 1-3 rows of STOs. Simplified STO and interaction macro models are used. A typical example is shown using an End-to-end Simulator. 26 15:00-15:20 CMOS Supporting Circuitries for Nano-Oscillator-Based Associative Memories Tadashi Shibata, Renyuan Zhang, Steven Levitan, Dmitri Nikonov, George Bourianoff Abstract—“Let physics do computing” is a promising approach to new-paradigm computing in the beyond CMOS era. Building associative memories based on the physics of nano oscillators, in particular, presents a lot of potential for intelligent information processing. In this paper, we discuss how CMOS supporting circuitries can interface the fabric of nano oscillators with digital computing world. Using CMOS ring oscillators to emulate the nano oscillator behavior, how to produce the associative memory function and to use it for image recognition is demonstrated by HSPICE simulation. 15:20-15:40 Non-Boolean Associative Architectures Based on Nano-Oscillators Steven Levitan, Yan Fang, Denver Dash, Tadashi Shibata, Dmitri Nikonov, George Bourianoff Abstract— Many of the proposed and emerging nano-scale technologies simply cannot compete with CMOS in terms of energy efficiency for performing Boolean operations. However, the potential for these technologies to perform useful non- Boolean computations remains an opportunity to be explored. In this talk we examine the use of the resonance of coupled nanoscale oscillators as a primitive computational operator for associative processing and develop the architectural structures that could enable such devices to be integrated into mainstream applications. 15:40-16:00 Boolean and Non-Boolean Nearest Neighbor Architectures for Out-of-Plane Nanomagnet Logic Xueming Ju, Michael Niemier, György Csaba, Aaron Dingler, Xiaobo Sharon Hu, Wolfgang Porod, Xueming Ju, Markus Becherer, Doris Schmitt-Landsiedel, Paolo Lugli Abstract—We present the design and simulation of information processing hardware that is comprised of single domain, Co/Pt magnets (i.e., out-of-plane nanomagnet logic – or oNML). We first describe the design and evaluation of oNML hardware that can identify instances of a preprogrammed bit sequence in streaming data. Systolic arrays (that process information using Boolean logic gates) are employed as a system-level architecture which can (i) mitigate less desirable features of the oNML device architecture (nearest neighbor dataflow and longer device switching times when compared to a CMOS transistor), and (ii) exploit unique features of the device architecture (non-volatility and inherently pipelined logic with no overhead). We conclude the paper with a discussion as to how oNML might be employed for non-Boolean information processing. A simple image processing function is used as an initial case study. Problems and Solutions on Hybrid Kilo/ Mega Core Architectures [SS 2] Chair: Péter Szolgay Time: Thursday 30, August - 14:00-16:00 Room B ________________________________________________________________________ 14:00-14:20 Memory Access Optimization for Computations on Unstructured Meshes Antal Hiba, Zoltan Nagy, Miklos Ruszinko Abstract—Many real-life applications of processor-arrays suffer from memory bandwidth limitations. In many cases an unstructured mesh is given (computation on sensor data, simulations of physical systems - PDEs), where the vertices represent computations with dependencies represented by the edges. Utilization of processing elements (PEs) during these computations is mainly depends on the node indexing of the mesh. If the adjacent nodes are stored close to each other in main memory, the reloading of node data can be significantly decreased. In case of FPGA the memory accesses can be fully determined by the designer. The mesh and an ordering of its nodes, define the graph bandwidth, which determines the minimum size of on-chip memory to avoid reloading of the nodes from the off-chip memory. If the required on-chip memory size is higher than the available resources, the mesh must be divided into parts. In this paper a novel geometry based method is presented, which constructs reordered parts from a given unstructured mesh, where each part meets some predefined constraints on graph bandwidth. 14:20-14:40 Examining the Accuracy and the Precision of PDEs for FPGA Computations András Kiss, Zoltán Nagy, Árpád Csík, Péter Szolgay Abstract—There are a large number of problems which can be accelerated by using architectures on Field Programmable Gate Arrays (FPGA). However sometimes the complexity of a problem does not allow to map it onto a specific FPGA. In that case analysis of precision of the arithmetic unit which may solve the computational problem can be a good attempt to fit the architecture and to accelerate its computation. Numerical algorithm can be implemented using fixed-point or floating point arithmetic (or mixed (both)) with different precision. The aim of the article is not to optimize the numerical algorithm but to find a smaller arithmetic unit precision, which results enough accuracy and fits to smaller FPGA-s. In the paper, one particular problem type is investigated, namely the accuracy of the solution of a simple Partial Differential Equation (PDE). The accuracy measurement is done on an FPGA with different bit width. The solution of the advection equation is analyzed using first and second order discretization methods. As a result we managed to find an optimal bit width for the solution on a specific FPGA. 28 14:40-15:00 Automatic Generation of Locally Controlled Arithmetic Unit via Floorplan Based Partitioning Csaba Nemes, Zoltán Nagy, Péter Szolgay Abstract—In the paper a framework for generating a locally controlled arithmetic unit is presented including graph generation from a mathematical expression, graph partitioning to determine locally controlled parts of the design and VHDL generation. The output of the framework is a pipelined architecture containing locally controlled groups of floating point units. It is demonstrated that both partitioning and placement aspects of the design have to be considered to obtain a highspeed circuit. In a well-placeable design locally controlled groups can be mapped to FPGA in such a way that only neighboring groups communicate with each other. In the presented algorithm an initial floorplan of the floating point units is produced and a novel graph partitioning representation is used for partitioning the floating point units to obtain a well-placeable design. The framework is demonstrated during the automatic circuit generation of a complex mathematical expression related to Computation Fluid Dynamics (CFD). The framework produces 15-27% faster design than the unpartitioned, globally controlled one in the price of a modest area increase. The framework automatically produces well-placeable deadlock-free partitions for complex expressions as well, while in case of traditional partitioners these objectives cannot be targeted. 15:00-15:20 Analysis of a GPU Based CNN Implementation Endre László, Péter Szolgay, Zoltán Nagy Abstract—The CNN (Cellular Neural Network) is a powerful image processing architecture whose hardware implementation is extremely fast. The lack of such hardware device in a development process can be substituted by using an efficient simulator implementation. Commercially available graphics cards with high computing capabilities make this simulator feasible. The aim of this work is to present a GPU based implementation of a CNN simulator using nVidiaʼs Fermi architecture. Different implementation approaches are considered and compared to a multi-core, multi-threaded CPU and some earlier GPU implementations. A detailed analysis of the introduced GPU implementation is presented. 15:20-15:40 Investigation of Area and Speed Trade-Offs in FPGA Implementation of an Image Correlation Algorithm Zoltán Kincses, Zsolt Vörösházi, Zoltán Nagy, Péter Szolgay, Tepelea Laviniu, Alexandru Gacsádi Abstract—In this paper an image correlation algorithm is implemented on FPGA architecture for assisted movements of visually impaired persons or automotive driving systems. Taking into account the limitations of FPGA devices and the special requirements of the correlation based image matching algorithm a semi-parallel approach is proposed. This provides an optimal tradeoff between area and speed of the implemented algorithm. Several key issues are investigated and discussed related to the speed and area. 15:40-16:00 Sound Propagation Cellular Processors Architectures, Comparisons and Performances Radu Dogaru, Ioana Dogaru, Narcis Zamfir, Dorel Aiordachioaie Abstract—The aim of this paper is to discuss and compare several architectural possibilities for implementing a simulator for (ultra) sound propagation in a controlled environment (e.g. using specified obstacles and signal sources). Although initially such sound propagation simulators were designed to assist the design of robotic "ears" of autonomous agents trying to reconstruct an image of the environment, its use expands beyond its initial goals. We are particularly interested here to define the limits and the constraints for kilo-processor architectures capable to implement such systems at reasonable costs. Our results for various implementations (software, FPGA, GPU/with CUDA) are considered with some proposals for suitable kiloprocessor architectures. GPUs and Multicore Systems in High Energy Physics [SS 3] Chair: Niko Neufeld and Xavier Vilasis Cardona Time: Thursday 30, August - 16:20-17:20 Room A ________________________________________________________________________ 16:20-16:40 Many-Core Processors and GPU Opportunities in Particle Detectors Niko Neufeld, Xavier Vilasis-Cardona Abstract—High energy physics particle detectors are large and complex devices with very demanding requirements at the level of signal to noise ratios, processing times and data throughput. The first stages of the data acquisition are hardware based while the last ones depend rather on software. Among the solutions to the problems posed by the requirements we may find the use of multi-core processors or maybe GPUʼs. We shall review what are the points in which these techniques could be of use and the actual proposals. 16:40-17:00 Real-Time Use of GPUs in NA62 Experiment Gianmaria Collazuol, Vincenzo Innocente, Gianluca Lamanna, Felice Pantaleo, Marco Sozzi Abstract—We describe a pilot project for the use of GPUs in a real-time triggering application in the early trigger stages at the CERN NA62 experiment, and the results of the first field tests together with a prototype data acquisition (DAQ) system. This pilot project within NA62 aims at integrating GPUs into the central L0 trigger processor, and also to use them as fast online processors for computing trigger primitives. Several TDC equipped subdetectors with sub-nanosecond time resolution will participate in the first-level NA62 trigger (L0), fully integrated with the data-acquisition system, to reduce the readout rate of all subdetectors to 1 MHz, using multiplicity information asynchronously computed over time frames of a few ns, both for positive sub-detectors and for vetos. The online use of GPUs would allow the computation of more complex trigger primitives already at this first trigger level. We describe the architectures of the proposed systems, focusing on measuring the performance (both throughput and latency) of various approaches meant to solve these high energy physics problems. The challenges and the prospects of this promising idea are discussed. 30 17:00-17:20 ALICE TPC Online Tracker on GPU for Heavy-Ion Events David Rohr Abstract—The online event reconstruction for the ALICE experiment at CERN requires processing capabilities to process central Pb-Pb collisions at a rate of more than 200 Hz, corresponding to an input data rate of about 25 GB/s. The reconstruction of particle trajectories in the Time Projection Chamber (TPC) is the most compute intensive step. The TPC online tracker implementation combines the principle of the cellular automaton and the Kalman filter. It has been accelerated by the usage of graphics cards (GPUs). A pipelined processing allows to perform the tracking on the GPU, the data transfer, and the preprocessing on the CPU in parallel. In order to use data locality, the tracking is split in multiple phases. At first, track segments are searched in local sectors of the detector, independently and in parallel. These segments are then merged at a global level. A shortcoming of this approach is that if a track contains only a very short segment in one particular sector, the local search possibly does not find this short part. The fast GPU processing allowed to add an additional step: all found tracks are extrapolated to neighboring sectors and the unassigned clusters which constitute the missing track segment are collected. For running QA, it is important that the output of the CPU and the GPU tracker is as consistent as possible. One major challenge was to implement the tracker such that the output is not affected by concurrency, while maintaining peak performance and efficiency. For instance, a naive implementation depended on the order of the tracks which is nondeterministic when they are created in parallel. Still, due to non-associative floating point arithmetic a direct binary comparison of the CPU and the GPU tracker output is impossible. Thus, the approach chosen for evaluating the GPU tracker efficiency is to compare the cluster to track assignment of the CPU and the GPU tracker cluster by cluster. With the above comparison scheme, the output of the CPU and the GPU tracker differ by 0.00024Compared to the offline tracker, the HLT tracker is orders of magnitudes faster while delivering good results. The GPU version outperforms its CPU analog by another factor of three. Recently, the ALICE HLT cluster was upgraded with new GPUs and is able to process central heavy ion events at a rate of approximately 200 Hz. Silicon Implementation [SS 4] Chair: Peter Foldesy Time: Thursday 30, August - 16:20-17:20 Room B ________________________________________________________________________ 16:20-16:40 On Challenges for Implementing Pixelwise DA Converter in 3D Ari Paasio, Henri Ansio Abstract—Vision chips are natural candidates for being among the first areas that are able to utilize the emerging 3D integration possibilities. In some 2D vision chip architectures there are pixel level AD and/or DA converters that are used for various purposes. This article covers the challenges and needs when targeting a megapixel architecture within a 1cm2 chip area. The Through-Silicon-Vias (TSVs) on one hand allow the 3D integration, but on the other hand pose strict challenges for the design. The TSVs occupy certain area and in an area restricted design, the number of TSVs should be minimized. Also the associated KeepOut-Zone (KOZ) for each TSV should be taken into account. 16:40-17:00 A Compact FPGA Implementation of a Bit-Serial SIMD Cellular Processor Array Declan Walsh, Piotr Dudek Abstract— An FPGA implementation of a fine grain general purpose SIMD processor array is presented. The processor architecture has a compact processing element which is encapsulated into two configurable logic blocks (CLBs) and is then replicated to form an array. A 32 × 32 processing element array is implemented on a low-cost Xilinx XC5VLX50 FPGA using four-neighbour connectivity with the possibility to scale up using a larger FPGA. The processor array operates at a frequency of 150 MHz and executes a peak of 153.6 GOPS (bitserial operations). Binary and 8-bit greyscale image processing is performed and demonstrated. 17:00-17:20 Integrated CMOS Sub-THz Imager Array Péter Földesy, Ákos Zarándy Abstract— This paper describes the of a 90 nm CMOS sub-THz detector array ASIC. The sub-THz detector array is an integrated system composed of silicon field effect plasma wave sensors, various integrated antennas, pre-amplifiers, ADCs, and digital domain lock-in amplifier detector. The peak responsivity is found 185 kV/W@365 GHz and 52 kV/W@470 GHz and at the detectivity maximum NEP ~ 20 pW/Hz^-1. 32 ag e Th is p is in te nt io na lly le ft . bl an k Applications on FPGAs & GPUs [RS 3] Chair: Mustak Yalcin Time: Friday 31, August - 14:00-15:40 Room A ________________________________________________________________________ 14:00-14:20 Implementing Dynamic Reconfigurable CNN-Based Full-Adder Yanyi Liu, Wenbo Liu, Xiaozheng Yuan, Guanrong Chen Abstract-This paper presents a new approach to implement the dynamic reconfigurable logical systems based on Cellular Neural Networks (CNN), comparing with utilizing the chaos computing system, which is easier to implement in engineering applications and more stable. We provided and experimentally demonstrated the basic principle for obtaining a fulladder by using uncoupled CNN cells. The actual circuit to implementing the full-adder and transforming from adder to subtractor also has been presented. 14:20-14:40 Cesar: Emulating Cellular Networks on FPGA Jens Müller, Ralf Becker, Jan Müller, Ronald Tetzlaff Abstract—Complex dynamical systems establish offer entirely new possibilities to the development of groundbreaking data processing methods. In the domains of image and video processing, locally coupled cellular array computers, based on Cellular Nonlinear Networks (CNN), accelerate the computation of large amounts of data in real-time, due to their inherent concept of massive parallelism. Current VLSI implementations however, are accompanied by several distinct drawbacks. The computational accuracy of most currently available systems is limited to 8 bit, and the volatilely capacitively stored state values of analogue realisations often lead to errors when multiple tasks are processed sequentially. Moreover, the systems hardly allow to run a CNN program code to provide the full functionality of a CNN-UM. In this contribution, the novel CESAR architecture is proposed for the digital emulation of a time-discrete CNN-UM. The programmable array computer facilitates the powerful computation of consecutive CNN operations and the cost-efficient implementation of several application-specific configurations with variable network size and data representation. The presented architecture retains the inherent parallel paradigm of CNN, and assigns one processing element to each cell of the network. The cell outputs are coupled and stored locally, thus minimising data exchange with external structures and maximising the computation speed. The internal fixed-point multiplications are accelerated by using on-chip DSP resources provided by current FPGAs. By this means, a CNN-based embedded system with 128 cells, a 3 × 3 neighbourhood and 18 bit data representation was implemented on a Xilinx Virtex-5 FPGA. 14:40-15:00 Implementing Time-Derivative CNNs on a Xilinx Spartan FPGA Jordi Albo-Canals, Giovanni Pazienza Abstract—Time-Derivative CNNs (TDCNNs) have been recently proposed as a novel paradigm realizing spatiotemporal transfer functions for linear filtering. Their dynamics is usually simulated with SIMULINK because VLSI chips are still in the preliminary phase. In order to make TDCNNs available to a larger audience, we present here their implementation on a Xilinx Spartan-6 FPGA. The results concerning an 8X8 network are promising and consistent with the SW simulations. 34 15:00-15:20 Nonlinear Spatio-Temporal Wave Computing for Real-Time Applications on GPU Mehmet Tükel, Ramazan Yeniçeri, Mustak Yalcin Abstract—In this work, active wave simulation on Cellular Nonlinear Network was computed for path planning on the GPU of a NVIDIA GTX275 video card. In software part, QtOpenCL, which is a wrapper library of OpenCL, was used to make code portable for systems with different GPUs. We achieved promising results comparing to results achieved by both CPU and FPGA. We have implemented different hardware and software solutions to path planning problem for 2-D media in real-time. They were almost at limit of real-time requirements because of some bottlenecks such as low communication bandwidth and low resolution of network. In this work, by utilizing GPUs, we performed 60000 iterations per second for simulation of 128X128 node network while we achieved at most 35 iterations per second with software on an Intel Core 2 Duo P8700 processor. We also achieved 36 iterations per second for 3-D active wave simulation of a 256X 256X256 network on GPU. 15:20-15:40 Visual Learning with Cellular Neural Networks Alexey Badalov, Xavier Vilasís-Cardona, Jordi Albo-Canals Abstract—Reinforcement learning is a powerful tool for teaching robotic agents to perform tasks in real environments. Visual information provided by a camera could be a cheap and rich source of information about an agentʼs surroundings, if this information were represented in a compact and generalizable form. We turn to cellular neural networks as the means of transforming visual input to a representation suitable for reinforcement learning. We investigate a CNN-based image processing algorithm and describe a method for efficiently computing CNNs using the DirectX 10 API. Visual Navigation and Collision Avoidance [SS 5] Chair: Ákos Zarándy Time: Friday 31, August - 14:00-15:40 Room B ________________________________________________________________________ 14:00-14:20 A New CNN Based Path Planning Algorithm Improved by the Doppler Effect Ramazan Yeniceri, Mustak Erhan Yalcin Abstract—Many path planning and navigation papers using Cellular Neural/Nonlinear Networks (CNN) are found in literature. High proportion of these works originated by wave processing feature of CNN. This paper proposes a special condition of a known Cellular Nonlinear Network model which makes the network very proper to obtain nested and repetitive travelling waves. The Doppler effect appears as a corollary using this special condition. The main contribution of the Doppler effect to the path planning applications that uses CNNs is giving an opportunity to adjust the trackerʼs speed or change the route completely, dependent to the targetʼs motion. By this way, this paper gains a new qualification to the CNN-based wave computing techniques putting the wave sourceʼs motion into use. 14:20-14:40 Azimuth Estimation of Distant, Approaching Airplane in See-and-Avoid Systems Tamas Zsedrovits, Akos Zarandy, Balint Vanek, Tamas Peni, Jozsef Bokor, Tamas Roska Abstract— Visual detection based sense and avoid problem is more and more important nowadays as UAVs are getting closer to entering remotely piloted or autonomously into the airspace. It is critical to gain as much information as possible from the silhouettes of the distant aircrafts. In our paper, we investigate the reachable accuracy of the orientation information of remote planes under different geometrical condition, by identifying their wing lines from their detected wingtips. Under the assumption that the remote airplane is on a straight course, the error of the spatial discretization (pixelization), and the automatic detection error is calculated. 14:40-15:00 Visual Sense-and-Avoid System for UAVs Akos Zarandy, Tamas Zsedrovits, Zoltan Nagy, Andras Kiss, Tamas Roska Abstract—A visual sense-and-avoid system is introduced in this paper. The system is designed to operate on small and medium sized UAVs, and to be able to detect and avoid small manned and unmanned aircrafts. The intruder detection is done on a 4650×1280 sized video flow which is processed by a many-core cellular processor array real-time. 15:00-15:20 Bio-Inspired Looming Direction Detection Method Tamás Fülöp, Ákos Zarándy Abstract— The retina inspired approaching object detection algorithm – based on the recently identified Pvlab-5 ganglion cell – is a computationally easy segmentation free method. The original method can detect only the dark looming objects against bright background. This paper shows a modified algorithm, which can detect any looming and recessing objects against dark or bright background. Moreover, we show a post processing evaluation method, which can measure the lateral motion direction using the spatialtemporal activities of the ganglion cells without introducing any hard calculation. 36 15:20-15:40 On the Potential of Current CNN Cameras for Industrial Surface Inspection Andreas Blug, Peter Strohm, Daniel Carl, Heinrich Höfler, Bernhard Blug, Andreas Kailer Abstract— An important issue in industrial quality control is the inspection of rapidly moving surfaces for small defects such as scratches, dents, grooves, or chatter marks. This paper investigates the potential of the EyeRIS 1.3 camera as a state-of the- art camera based on “cellular neural networks” (CNN) for this application in comparison to conventional image processing systems. Based on experimental data from an aluminum wire drawing process where defects with a lateral size of 100 μm have to be detected at feeding rates of 10 m/s, the potential specifications for other surface inspection applications are estimated. Using the relation between the lateral defect size and the feeding rate as a figure of merit, the CNN based system outperforms conventional image processing systems by an order or magnitude in this particular application. In general, the lighting system limits the performance at lower defect sizes and the computational power at larger defect sizes and fields of view. Theoretical Advances of CNNs [RS 4] Chair: Mauro Forti Time: Friday 31, August - 15:40-17:20 Room A ________________________________________________________________________ 15:40-16:00 Monotonicity of semiflows Generated by Cooperative Delayed Full-Range CNNs Mauro Di Marco, Mauro Forti, Massimo Grazzini, Luca Pancioni Abstract—The paper considers the full-range (FR) model of cellular neural networks (CNNs) with ideal hard-limiter nonlinearities that limit the allowable range of the neuron state variables. It is also supposed that there is a concentrated delay (D) in the neuron interconnections. Due to the presence of multivalued nonlinearities the D-FRCNN model is mathematically described by a retarded differential inclusion. The main result is a rigorous proof that, in the case of nonsymmetric cooperative (nonnegative) interconnections, and delayed interconnections, the semiflow generated by D-FRCNNs is monotone, and that monotonicity implies some basic restrictions on the long-term behavior of the solutions. The result is compared with recent results in the literature on semiflows generated by cooperative standard CNNs, with and without delays. 16:00-16:20 An Experimental Study on Long Transient Oscillations in Cooperative CNN Rings Mauro Forti, Barnabas Garay, Miklos Koller, Luca Pancioni Abstract—The paper considers a class of one-dimensional circular standard cellular neural network (CNN) arrays with a typical three-segment piecewise linear activation and two sided cooperative (positive) interactions (a cooperative CNN ring). Numerical simulations show that in a wide range of interconnection parameters, and for a wide set of initial conditions, the solutions of a cooperative CNN ring display unexpectedly long oscillations, lasting even hundreds of cycles, before they eventually converge toward an equilibrium point. The goal of this paper is to confirm the presence of such long-transient oscillations through laboratory experiments on a simple discrete-component prototype of a cooperative CNN ring with 16 cells and to analyze some of their salient features. Analytical results are also provided to support the numerical and experimental findings. 16:20-16:40 Image Representation by Means of CNN Dynamics Tang Tang, Ronald Tetzlaff Abstract—By taking advantage of their nonlinear dynamics Cellular Nonlinear Networks (CNN) are considered to be powerful tools for many image processing applications. In this paper we will try to investigate the feasibility of image representation by using CNN dynamics. 16:40-17:00 Phase Model Reduction for Oscillatory Networks Subject to Stochastic Inputs Michele Bonnin, Fernando Corinto, Valentina Lanza Abstract—Oscillatory networks represent a circuit architecture for image and information processing, that can be used to realize associative and dynamic memories. Phase noise is often a limiting key factors for the performances of oscillatory networks. The ideal framework to investigate phase noise effect in nonlinear oscillators are phase models. Classical phase models lead to the conclusion that, in presence of random disturbances such as white noise, the phase noise problem is simply a diffusion process. In this paper we develop a reduced order model for phase noise analysis in nonlinear oscillators. We derive a reduced Fokker– Planck equation for the phase variable and the corresponding reduced phase equations. We show that the phase noise problem is a convection–diffusion process, proving that white noise produces both phase diffusion and frequency shift. 38 17:00-17:20 Two Neuron CNN for Hypothesis Testing Mireia Vinyoles-Serra, Xavier Vilasis-Cardona Abstract—The two neuron continous time cellular neural network is used to define a statistic in the classical hypothesis testing problem. The proposal is based on a generalisation of the linear Fisher discriminant. The procedure to set the cellular neural network parameters is described and the performance shown on two examples with gaussianly distributed hypothesis. This technique might also be applied to probabilistic classification problems or pattern recognition. Volumetric Imaging Using Numerical Optical Sensing and Imaging Techniques [SS 6] Chair: Szabolcs Tokes Time: Friday 31, August - 15:40-17:00 Room B ________________________________________________________________________ 15:40-16:00 Advanced Background Elimination in Digital Holographic Microscopy Laszlo Orzo, Andras Feher, Szabolcs Tokes Abstract—Background estimation and elimination is an indispensable step of hologram processing. Its application ensures that the fix pattern noise caused by the deposits, dirt and other impurities of the measuring chamber and the optical system do not contaminate the reconstructed holograms and improves the efficiency of the object segmentation. It is conventionally solved by averaging large number of holograms with altering objects within the flow-through cell. Due to the possible illumination changes the background should be updated incessantly during the hologram measuring process. Here we introduce an improved background estimation method where the holographic contributions of the segmented and reconstructed objects are excluded from the running average. The applied segmentation is based on the 3D positions of the objects within the flow-through measuring chamber. Therefore the objects can be distinguished from the impurities and deposits, which customary located at the walls of the measuring chamber. This way, an elevated speed, more adaptive background estimation becomes achievable with reduced noise. The applied object segmentation and hologram subtraction methods are presented also. To accelerate the processing of the measured holograms the application of some parallel computing implementation seems essential. Using stream processors (GPU) we were able to increase the algorithm speed considerably, without perceptible reconstruction accuracy loss. 16:00-16:20 Afocal Digital Holographic Microscopy and its Advantages Szabolcs Tokes, Laszlo Orzo Abstract—Applying afocal optical systems in microscopy, especially in digital holographic microscopy (DHM) have several advantages. We have investigated some possible implementations theoretically and experimentally as well. Space bandwidth product of an afocal system can exceed that of the conventional ones. Afocal systems provide higher resolution and much less distortions. Furthermore, the computational cost of the numerical reconstruction and correction phase is also lower in the case of an afocal optical setup, as it ensures constant lateral magnification within the whole measured volume. We show that the advantage of low distortion is especially enhanced in the case of color DHM. GPU implementation of reconstruction software is demonstrated. 16:20-16:40 Study on Application of Reference Conjugated Hologram for Aberration Correction of Multiple Object Planes Benedek Nagy, Szabolcs Tokes Abstract—Aberration correction using Reference Conjugated Hologram (RCH) method is investigated. However we use it not for a single but for a number of reconstructed object planes in Digital Holographic Microscopy (DHM). We build an off-axis DHM for testing the performance of the method. The limits of this method have been studied. We compare inline with aberration compensated off-axis DHM. The in-line DHM compensates quite the same aberrations physically as the RCH method numerically. 40 16:40-17:00 Self-Referenced Digital Holographic Microscopy Márton Kiss, Zoltán Göröcs, Szabolcs Tõkés Abstract—By developing a self referenced digital holographic microscope it becomes possible to record holograms and numerically reconstruct volumetric images of low coherence fluorescent objects such as (auto)fluorescent biological samples (e.g. algae). Our goal was to develop and construct a simple, compact portable device. In contrast to the common holographic approaches where there is a conventional reference beam, a reference beam should be produced together with the object beam from the same fluorescent source via imaging it by two separate optical paths (with near zero path length differences) to get interferences fringes. These interference forms separate holograms of all the point sources. The waves coming from the separate sources are mutually incoherent but have an inherent short coherence length. Initially we have tested the self referenced digital holographic microscope setup with test objects illuminated by LED light source that has similar spectral bandwidth as the fluorescence sources like chlorophyll. Digital reconstructions of the measured holograms need considerable processing. To accelerate the hologram processing a parallel implementation of processing seems essential. Using GPU-s we were able to enhance the algorithmʼs speed considerably, without the loss of the reconstruction accuracy.