演習3 (PRACTICE FOR 3 GRADES)
Transcription
演習3 (PRACTICE FOR 3 GRADES)
演習3 (PRACTICE FOR 3RD GRADES) HOW TO USE XEON PHI ON KNF3-5 Background: Architecture 2 ¨ Current Configuration Multi-core host with Many-core Co-processor ¤ Connected through PCI Express ¤ Co-Processor (e.g., Xeon Phi): ¤ Large number of power efficient, n but lower performance CPU cores n Limited on-board memory and smaller caches n ¨ Future Architecture Direction ¤ Standalone Many-core Unit possibly with Heterogeneous CPU Cores Interface for Heterogeneous Kernels (IHK) 3 • A low-level software layer for rapid OS prototyping • Main components: • • • • IHK-Host IHK-Many-core IHK-IKC (Inter-Kernel Communication) mcKernel (Lightweight Kernel for Many-Cores) IHK-Host 4 ¨ Implemented as a Linux kernel module ¤ Through ¨ ioctl() from userspace Provides interface to ¤ Initialize/manage Co-Processor(s) ¤ Create/Destroy OS instances ¤ Bind resources to OS instances n CPU cores n Physical memory ¤ Bootstrap/shutdown OS instances ¤ Map device memory to host / Access device memory ¤ Interrupt device CPUs ¤ Drive DMA engines IHK-Manycore 5 Sort of a hardware abstraction layer ¨ Provides a standard interface to ¨ ¤ Map host memory to device / Access host memory ¤ Interrupt host CPUs ¤ Drive DMA engines ¨ Implemented as a library that is linked to the manycore kernel Inter-Kernel Communication (IKC) 6 ¨ Provides asynchronous messaging facility for kernels running over IHK ¤ Listen/accept/connect semantics ¤ Callback notification or poll based message reception Implemented as a pair of send/receive queues ¨ IHK provides a master IKC channel that is used to ¨ ¤ Establish ¨ and manage other IKC channels IKC is used for syscall offloading in mcKernel mcKernel: Lightweight Kernel for Many Cores 7 ¨ ¨ A lightweight kernel developed from scratch over IHK Goals: ¤ ¤ ¨ ¨ Small memory and cache footprint Scalable kernel data structures (e.g., partially separated page tables) Maintains Linux ABI Only necessary services: ¤ ¤ ¤ Memory management Processes / Threading Simple system calls executed on the Co-processor n ¤ Complex system calls are offloaded to the host kernel n ¨ ¨ ¨ clone(), mmap(), etc.. Such as file I/O Support for futex/pthreads/OpenMP Hierarchical memory management Infiniband MPI integration in progress Set up ssh_config for gower.il.is.s.u-toyo.ac.jp ¨ Add this to your /etc/ssh_config Host gower HostName %h.il.is.s.u-tokyo.ac.jp Port 10298 Host kncclogin ProxyCommand ssh 133.11.233.10 -p 10298 -W 133.11.249.234:%p Host knf3 knf4 knf5 # add any host you'd like to use # Use gower as a proxy. for OpenSSH 5.4p and later ProxyCommand ssh 133.11.233.10 -p 10298 -W %h:%p # Use below for OpenSSH clients older than 5.4p # ('ssh -V' to check) # ProxyCommand ssh gower nc %h %p ¨ (http://www.il.is.s.u-tokyo.ac.jp/~bgerofi/ssh_config) We have 3 Xeon Phi machines for Enshu3: ¤ knf3, ¨ knf4, knf5 Now you can log in just by typing: ¤ ssh knf3 (or knf4, knf5) McKernel Git Setup Create a directory for your sources ¨ Clone the repositories from my (bgerofi) directory: ¨ ¤ git clone /home/bgerofi/Code/enshu3/ihk/ ¤ git clone /home/bgerofi/Code/enshu3/mckernel/ ¤ git clone /home/bgerofi/Code/enshu3/glibc/ ¨ Copy the rebuild and reload scripts: ¤ cp /home/bgerofi/Code/enshu3/rebuild.sh ./ ¤ cp /home/bgerofi/Code/enshu3/reload_ihk_modules.sh ./ Compile IHK and McKernel ¨ Set up Intel ICC compiler environment: ¤ . /opt/intel/bin/compilervars.sh intel64 ¤ There is a space after the dot!! ¨ Compile kernel sources: ¤ ./rebuild.sh ¨ Compile userland programs: ¤ icc –Wall –L/home/bgerofi/Code/libs-stat/ -mmic -static pthread source.c –o objname Reload and boot kernel ¨ Reload the IHK and boot the kernel ¤ ¨ Display kernel log on KNC (McKernel’s log) ¤ ¤ ¨ (This is currently the best way to debug) ./ihk/linux/user/ihkostest 0 kmsg Execute user app: ¤ ¨ ./reload_ihk_modules.sh –w (press ENTER at the end) ./mckernel/executer/user/mcexec /path-to-app/app Hello world example: ¤ ./mckernel/executer/user/mcexec /home/bgerofi/Code/hello/hellointel Source tree (shown in editor)