MeerKAT Control and Monitoring (CAM)

Transcription

MeerKAT Control and Monitoring (CAM)
MeerKAT
Control and Monitoring (CAM)
Lize van den Heever
Paul Swart
CAM Subsystem Manager
lize@ska.ac.za
Senior Software Engineer
paul@ska.ac.za
MeerKAT Phases/Specifications
v  64 x 13.5m offset Gregorian dishes
o  1mm rms surface
o  15 arcsec pointing accuracy (with approx 5 arcsec tracking consistency)
v  Frequency range 0.59 – 14.5 GHz
v  65k freq channels (spread over 4 sub-bands)
v  L-band sensitivity: Ae/Tsys = 220 m^2/K
Phase 1
Phase 2
Est. completion
2016
2018
Frequency bands (GHz)
1.0 - 1.7
0.59 – 1.1
8 -14.5
RF bandwidth (MHz)
850
6500
Sampling frequency (Gsps)
5
30
Processed bandwidth (MHz)
850
6500
Max baseline (km)
8
8
2
Offset Gregorian Dish – prelim design
3
MeerKAT Project Status
u  3 centres:
v  JHB (some “business” functions, infrastructure and site bid)
v  Cape Town (operations control centre, engineering and science)
v  Karoo (telescope site)
u  About 100 people employed directly on the project currently (growing)
u  MeerKAT (and SKA SA) site operational after major infrastructure development
u  Site on grid power (with diesel backup) with 10Gb fibre connection to Cape Town
u  KAT-7 engineering and science test-bed fully deployed on site
(7 prime focus composite dishes)
u  KAT-7 turn-o program of commissioning operations
u  Continued strong political support
u  Good momentum!
4
KAT-7
5
KAT-7 – HartRAO Baseline
KAT-7
7
KAT-7 our playground
MeerKAT Political Support
Weekly flight to Karoo site
Cape Town office
Karoo infrastructure
KAT-7 Composite Dishes
Feeds, Receivers & Electronics
Cold Feed Installation
Digital Signal Processing
KAT-7 Correlator (16-element)
KAT-7
19
KAT-7 Results
20
KAT-7 Early Fringes (2009)
KAT-7 Cen A (2010)
Moon
size
4-dishes, warm feeds
22
KAT-7 Cen A (2011)
7 cold23feeds
KAT-7 PKS1610-60.5 (2011)
24
MeerKAT
25
MeerKAT Schedule
u  MeerKAT System PDR
v  Very successful PDR completed in July 2011
v  Strong international panel approval for MeerKAT system design
u  MeerKAT CAM Requirements memo and interface guideline
v  First draft available
- was supporting documentation for MeerKAT System PDR
v  Guidelines for communication with Ethernet Devices (katcp protocol)
- MeerKAT version will be ready by Dec 2011
u  MeerKAT CAM Architecture description
v  To be ready for external review towards 2012
v  Including updated requirements and signed off interfaces
u  MeerKAT Critical and Major milestones
v  MeerKAT Science RFP Selection
v  MeerKAT System Concept Design Review (CoDR)
v  MeerKAT System PDR
v  MeerKAT Receptor 1 Qualification complete
v  Array release 1: Antenna 2 - 5 Array Commissioning
v  Array release 2: Antenna 6- 32 Start of early science
v  Array release 3: Antenna 33-64 Full MeerKAT Array
v  MeerKAT (Phase 1) Handover to Science Operations
Mar 2010 *
Jul 2010 *
July 2011 *
Jun 2014
Dec 2014
Dec 2015
Dec 2016
Jul 2017
26
MeerKAT CAM Scope
MeerKAT CAM Overview
28
MeerKAT CAM components
u  Configuration and management components
v  kat conf, kat controller, kat nodemanager
u  Communications framework - Device proxies & device controllers
v  Core CAM access layer for all hardware and devices on standard protocol (katcp)
v  Protect hardware from direct access and expose device monitoring points, commands and
logs
u  Monitoring components
v  kat store, kat monitors, kat aware, kat logger
u  Control components
v  kat scheduler, kat executor, kat subarray manager, kat controller
u  User Interface components (on site and in Cape Town)
v  kat core & ui libraries, kat portals, user interfaces, archive access tools
u  Planning Tools
v  Proposal Management Tool and Observation Planning Tool
29
MeerKAT CAM Numbers
u  KAT-7 monitoring (sensors only, excluding logs and alarms)
v  2 CAM servers with 10-20 processes each
v  ± 260 sensors per antenna (x 7 antennas)
v  ± 4500 total sensors
v  ± 300 sensors sampled at order ms rate
v  ± 100 sensors sampled at order second rate
v  Rest sampled with default rate of 10s or event (depending on sensor type)
but can be configured from ms up to minutes
v  ± 650 samples per second
v  ± 16GB per hour (compressed) initial storage, decimation over time being implemented
u  MeerKAT monitoring (estimates)
v  6 CAM servers with 10-30 processes each
v  ± 500 per antenna (x 64 antennas)
v  ± 100 000 total sensors
v  ± 2500 sensors sampled at order ms rate
v  ± 100 sensors sampled at order second rate
v  Rest sampled with default rate of 10s and event
but can be configured from ms up to minutes
v  ± 12000 samples per second
v  ± ¾ - 1 TB per hour (compressed) initial storage, decimated over time
30
MeerKAT CAM design principles
u  Some of the core design principles and implementation decisions:
v  Use of TCP/IP over Ethernet as a field-bus as far as possible
v  Standardized communications everywhere!
- standardizing device/component interfacing over a well-defined protocol
- standardizing device/component behaviour
- heterogeneous and diversity – handle this as low as possible
v  Soft real-time as far as possible
- no time critical control loops in CAM software
- real-time control decentralized down to devices
v  Incremental development and continuous deployment
v  Verify technology decisions through pilot projects and prototyping
31
MeerKAT CAM concepts/lessons
u  Standardized communications protocol (katcp) is core!!!
- katcp-python publicly released on pypi & used by others like CASPER collaboration
- katcp controllers delivered by subcontractors or developed in-house
u  Standardizes
- startup behaviour and handshaking
- version, build state and serial nr reporting
- fault codes and failure reporting
- types, status’ and reporting behaviour of monitoring points
- types of commands and exception behaviour of commands
- standardize logging platform and behaviour
u  Supports
- all levels support multiple connections and flexible sampling/update rates (strategies)
- low-level direct control and monitoring on all levels (including devices) over TCP/IP
- introspection for connected devices, their monitoring points and control commands
- direct telnet connection on any level, down to hardware devices for trouble shooting
- even alarms, aggregate sensors, failure codes/messages and some parts of configuration are
exposed on katcp as sensor
u  Dynamic discovery / introspection
- fluid in-time detection of system through introspection of monitoring points and commands,
down to device level
- monitoring points includes detail like unit of measure, absolute ranges, min/ max values
- introspection of command includes help and examples
32
MeerKAT CAM concepts/lessons
u  Device control through Proxy layer
v  protect access to devices
v  consistent M&C layer across all hardware devices and software components
v  katcp as close to the hardware as possible in all cases (wrap modbus, OPC, ganglia, etc
in katcp controllers
u  Low-level / command line control through libraries & scripting interface
v  Powerful core and client libraries in Python serve both interactive users and other system
processes with various access levels
v  Built-in support for exposing monitor points and commands found through introspection
and auto discovery
v  A powerful python package, interactive user shell through iPython, as well as interface to
other components in the system
v  Adapting to system configuration - connect to what is defined in the configuration
v  iPython scripting: flexible, powerful, scalable, expandable – interactive user shell and also
level and component interface
u  Remote operations
v  Designed in from the start with control room in Cape Town
v  Most GUIs web based, portal in Karoo and in Cape Town
33
MeerKAT CAM concepts/lessons
u  Fully simulated system
v  Fully simulated system up to hardware devices and device controllers
v  Concurrently running a mix of simulated and real hardware devices
v  Allow full software development, unit and integration testing without dependency on
availability of hardware
v  Regression testing and continuous build server use simulated system continuously –
forcing 100% alignment with real world at all times
u  Development process – incremental deployment, maturing over cycles
v  Agile development and continuous incremental deployment of functionality
v  Early initial simple implementations maturing into full fledged functionality
34
MeerKAT CAM concepts/lessons
u  Homogeneous node/server management
v  Automated deployment to update software / patch fixes and updates on all servers
v  Same suite of software deployed on each node (server) with one startup service, the
nodemanager
v  Single headnode identified as configuration server and controller
v  Headnode coordinates servers and controls all nodes by pushing subset of configured
system to each nodemanager for launching
v  Each nodemanager does consistent reporting on running processes
v  All node control (start/restart/stop/halt/powerdown) through nodemanagers
v  Looking at doing some of this through VMs in future.
u  Adaptive system configuration
v  Adaptive systems and flexible configuration is required to support integration and
incremental rollout.
v  Any combination of real and simulated devices supported in any configuration
v  Multiple configurations available for karoo, atp, lab, development, simulated systems, …
v  Powerful and flexible system configuration in human readable text files to support
integration and incremental rollout
v  System automatically adapts to current configuration (which antennas are available) adapt
connections, health displays, etc
v  Templated for multiples of antennas
35
MeerKAT CAM concepts/lessons
u  Scalability
v  Hierarchical and distributed monitoring for scalability
- Prevent bottlenecks through design
- Consistent monitoring and rolled up reporting on all levels (i.e. comms.ok, sensors.ok,
unit,ok, all.ok) providing a single point to check and drill down on error
- Consistent failure codes and failure reporting on all levels (i.e. failure codes and msgs)
- Consistent logging on all levels
- Support for multiple clients with different sampling rates (sensor strategies)
- Aggregate sensors to collate information across multiple sensors into a single monitoring
point
- Support for multiple clients with different sampling rates
v  Distributed monitoring
- Monitoring per node, multiple monitor components collecting distributed information
- Gathers monitoring points locally and store centrally
- KAT-7 and MeerKAT writes to central monitor store over network mount
v  Avoid network traffic bottlenecks
- Antenna clusters
- Distributed components
- Hierarchical reporting rolled up from the bottom layers
v  Design for archive retrieval
- Adapt design to support optimized retrieval performance
36
MeerKAT CAM development approach
u  KAT-7 CAM
v  Support for hardware integration, commissioning and low-level direct control over all
components
v  “Productionized” the M&C architecture – reworking, rewriting, expanding, enhancing;
incorporated learning from XDM, PED & Fringe Finder
v  Built a solid robust framework for MeerKAT CAM and prove underlying design concepts of
the CAM architecture
v  Verified architectural components in terms of scalability, robustness, efficiency,
appropriateness for MeerKAT on KAT-7 (and simulated KAT-64)
v  Performed a KAT-64 simulation of current monitoring architecture
u  Towards MeerKAT CAM
v  Continue with agile approach and incremental deployment
v  No re-write or start of MeerKAT CAM
v  Expand KAT-7 CAM subsystem into MeerKAT CAM
37
MeerKAT SW Development Process
u  System Engineering
v  Not too diligent with SE process on KAT-7, struggling now because of it
v  MeerKAT following a much better (but still streamlined) SE process – already paying off
v  SE investment pays off later
v  Drive early ICDs and consistent requirements across subsystems / components
u  Light-weight iterative process
v  Iterative approach and incremental deployment
v  Initial early implementations with continuous incremental deployment to mature
functionality
v  On-line documentation – part of code base in subversion
v  Specification record (per component, functional area) – analyse and gather requirements,
describe understanding in text format, review in the team with SE, commissioners and
CAM and SP team - part of on-line documentation
v  Design record (per component) – a guide to the code, describe architecture, audience:
new team members, engineers, commissioners and operators wanting to know a bit more
– part of on-line documentation
v  Test driven development, continuous build server and integrated testing
v  Develop against the full simulated system (even before the hardware / component is
ready)
38
Towards the SKA
u  Summary – considerations for SKA:
v  Standardized device/component protocol – absolute must
v  Heterogeneous devices/components and diversity – handle this as low as possible
v  Fully simulated system – absolute must
v  Client library & scripting interface (with discovery and introspection) for full low-level
control to support early engineering
v  Development Process – initial early implementations with continuous incremental
deployment to mature functionality
v  Homogeneous node management and deployment
v  Adaptive system configuration – design for it
v  Scalability – hierarchical status reporting & distributed monitoring, synchronised
distributed control, archive retrieval
v  Get involved with ICALEPCS conference (the International Conference on Accelerator and
Large Experimental Physics Control Systems)
and tap into an impressive body of knowledge with a Monitoring and Control focus
www.icalepcs.org http://icalepcs2011.esrf.eu/
39
What we have KAT-7 CAM
u  Covers core requirements of MeerKAT control and monitoring:
v  full low-level control and monitoring functionality with engineering interfaces
v  some operational control and monitoring functionality and interfaces
v  reasonable support for remote operations (including alarms and SMS notifications)
v  archive access and browsers
v  various engineering and commissioning displays, not yet developed GUIs for operators
v  full manual control & scripting, no scheduling or subarraying yet
u  Succeeded in establishing:
v  the benefits of a standardized communications protocol (katcp) on all levels
v  a flexible and adaptive system configuration through introspection to support engineering,
commissioning and incremental roll-out
v  a fully simulated system up to hardware devices and device controllers, running
concurrently with real hardware devices
v  an interactive user shell using iPython through a powerful command line user library
v  a solid CAM framework that is robust and tested; and ready for expansion
v  agile process with continuous incremental deployment
40
To do for MeerKAT CAM
u  Operational control of array
v  Features required by an operational instrument and better support for remote operations
v  Specification and implementation of the kat subarray manager
u  Specification and implementation of the observation framework
v  including a simple scheduler, task executor, authorization & authentication, noting data
products, operator logs and observation reports, etc
u  Proposal Management Tool and Observation Planning Tool
v  Hope to adopt/adapt existing tools used by other telescopes for these / or parts of these
u  Scalable User Interfaces
v  GUIs for operators and scientists
v  Enhanced client library, access levels and user interfaces
v  Working with HCI experts from universities for design inputs
41
Some M&C challenges for SKA
u  Standardized protocol
§  Specify it really early and get it right !!!
§  For hardware, devices, software components
§  Standardize interfacing AND behaviour
u  Scalability
§  Obviously in processes, architectural components, networking
§  But also things like user interfaces, archive retrieval
§  Managing heterogeneous devices/components and diversity - as low as possible
§  Hierarchical components - Design layers of aggregation
§  Each level do local monitoring and rolled up status reported up the hierarchy
u  Identifying Scope
§  Specifying the scope of M&C carefully
§  Interfacing to other S&C components & feedback loops required from signal path
§  Can limit the numbers of M&C by specifying down to sub-components – e.g. allowed nr of
sensors, specify logging and exception behaviour, specify interfaces and common
behaviour
u  Distributed monitoring
§  Distributed monitor store
§  Central API for accessing historical data
§  Monitoring points – how many, how often, how to split storage, careful design of access to
distributed storage
u  Distributed control
§  Design for synchronized distributed execution
§  Common API for control and state feedback
42
Some M&C challenges for SKA
u  Access to archived monitoring data
§  For fault finding, debugging, post mortem and trend analysis
§  Needs careful design
u  No-one has built a telescope the scale of the SKA before
§  Don’t expect to get the requirements right the first time
§  Allow and plan for it
u  Multiple views and M&C flexibility
§  Design generic mechanisms so pockets of M&C can group and roll-up into different views
e.g. health of a subarray, health of a region, health of a specific station
§  Design generic mechanisms eg. for rolled-up reporting, drill-down & interrogation,
§  Each M&C component to be flexible in slotting into various views/roles, knowing how to
report for different parents, even knowing how to present itself, etc
u  Incremental implementation and roll-out
§  No-one has built a telescope the scale of the SKA before – don’t expect to get the
requirements right the first time
§  Coordinating chunks of functionality between teams
§  Identifying scope and boundaries to fit
§  Timing of delivery between subsystems
§  Feedback loops for user/commissioner inputs to mature initial implementations to prevent
continuous scope creep and uncontrolled refactoring
u  Continuously changing system configuration
§  New receptors rolled out continuously
§  Adapt layers of aggregation
43
Thank you!
Questions?
lize@ska.ac.za
44
http://www.ska.ac.za44