ExoGENI Federated Private NIaaS Infrastructure
Transcription
ExoGENI Federated Private NIaaS Infrastructure
ExoGENI Federated Private NIaaS Infrastructure! Chris Heermann! ckh@renci.org! Overview! • ExoGENI architecture and implementation" • ExoGENI Science use-cases" – Urgent Computing: Storm Surge Predictions on GENI – ScienceDMZ as a Service: Creating Science Super-Facilities with GENI • Support for SDN in ExoGENI" 2! IaaS: clouds and network virtualization Virtual Compute and Storage Infrastructure Cloud APIs (Amazon EC2 ..) Virtual Network Infrastructure Dynamic circuit APIs (NLR Sherpa, DOE OSCARS, I2 ION, OGF NSI …) Breakable Experimental Network Cloud Providers Transport Network Providers Open Resource Control Architecture • ORCA is a “wrapper” for off-the-shelf cloud and circuit nets etc., enabling federated orchestration: + Resource brokering + VM image distribution + Topology embedding + Stitching + Federated Authorization • GENI, DOE, NSF SDCI+TC • http://geni-orca.renci.org • http://networkedclouds.org coordinator B SM controller AM aggregate The APIs! • Simple API, complex description language" – createSlice(sliceName, Term, SliceTopology, Credentials)" • Topology management" – deleteSlice(sliceName)" – sliceStatus(sliceName)" • Debugging" – modifySlice(sliceName, TopologyUpdate)" • Elasticity" – extendSlice(sliceName, NewTerm)" • Agility" • Description language:" – NDL-OWL – OWL-based ontology that describes" • Participating in US-EU effort to standardize the IaaS ontology" – User: Resource requests" – Provider: Resource description, public resource advertisement, manifest" 5! GENI Federation! • Federated identity" – InCommon " – X.509 identity certificates" • Common APIs" – Aggregate Manager" • ExoGENI has a compatibility API layer supporting AM API v2 " – Clearinghouse" • Federated access policies" – ABAC" • Agreed upon resource description language" – RSpec" • ExoGENI translates relevant portions from NDL-OWL to RSpec and back as needed" • Several major portions" – ExoGENI, InstaGENI, WiMax, Internet2 AL2S" • Federation with EU" – Amsterdam XO rack part of SDX demo at GEC21 with iMinds" 6! Building network topologies Slice owner may deploy an IP network into a slice (OSPF). slice OpenFlow-‐enabled L2 topology Cloud hosts with network control mputed embedding Virtual network exchange Virtual colo campus net to circuit fabric ExoGENI • Every Infrastructure as a Service, All Connected. – Substrate may be volunteered or rented. – E.g., public or private clouds, HPC, instruments and transport providers – Contribution size is dynamically adjustable • ExoGENI Principles: – Open substrate – Off-the-shelf back-ends • OSCARS, NSI, EC2 etc. – – – – Provider autonomy Federated coordination Dynamic contracts Resource visibility Breakable Experimental Network Current topology! 9! An ExoGENI cloud “rack site” Management switch option 1: tunnels 4x1Gbps management and iSCSI storage links (bonded) Direct L2 Peering w/ the backbone node node node node node node node node node node 2x10Gbps dataplane links Static VLAN tunnels provisioned to the backbone Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker To campus Layer 3 network Management node Sliverable Storage OpenFlow-enabled L2 switch option 2: fiber uplink Dataplane to dynamic circuit backbone (10/40/100Gbps) (optional) Dataplane to campus network for stitchable VLANs ExoGENI software structure Current deployments! • xCAT" – Operator node provisioning" – User-initiated bare-metal provisioning" • OpenStack Essex++ (RedHat/CentOS version)" – Custom Quantum plugin to support multiple dataplanes" – Working on Juno port" • iSCSI user slivering" – IBM DS3512 appliance" • NetApp iSCSI support in the works" – Linux iSCSI stack" • Backend support for LVM, Gluster, ZFS" 12! Tools! • ORCA Native tools (native APIs, resource descriptions)" – Flukes" – More flexibility" • Federation tools (federation APIs, resource descriptions)" – Jacks, omni, jFed" – Compatibility" 13! Tools (continued)! Presentation title goes here" 14! ExoGENI – a federation of private clouds! • Each site is a micro-cloud" – Adding support for HPC batch schedulers" • Owners decide what portion of resources to contribute" • Free to continue using native IaaS interfaces" • Have the opportunity to take advantage of federated identity and inter-provide orchestration mechanisms" • What is it good for?" – Foundation for future science institutional collaborative CI" 15! !ExoGENI Science Use-cases! Presentation title goes here" 16! Computing Storm Surge! • ADCIRC Storm Surge Model" – FEMA-approved for Coastal Flood Insurance Studies " – Very high spatial resolution (millions of triangles)" – Typically use 256-1024 cores for real-time (one simulation!) " ADCIRC grid for coastal North Carolina Tackling Uncertainty! One simulaJon is NOT enough! ProbabilisJc Assessment of Hurricanes Research Ensemble NSF Hazards SEES project 22 members, H. Floyd (1999) A “few” likely hurricanes Fully dynamic atmosphere (WRF) Why GENI?! • Current limitations: Real-time demands for compute resource" – Large demands for real-time compute resources during storms" – Not enough demand to dedicate a cluster year-round" • GENI enables" – Federation of resources" – Cloud bursting, urgent, on-demand" – High-speed data transfers to/from/between remote resources" – Replicate data/compute across geographic areas" • Resiliency, performance" Storm Surge Workflow! • Whole workflow is 22 ensemble members • Pegasus workflow management system Ensemble Scheduler … Collector Slice Topology! • 11 GENI sites (1 ensemble manager, 10 compute sites) • Topology: 92 VMs (368 cores), 10 inter-‐domain VLANs, 1 TB iSCSI storage • HPC compute nodes: 80 compute nodes (320 cores) from 10 sites Representative Science DMZ! Dedicated vs. Virtual resources! • GENI provides a distributed software-defined infrastructure (SDI)" – Compute + Storage + Network" Emerging Trend: Super Facilities, Coupled by Networks! Experimental faciliJes are being transformed by new detectors, advanced mathemaJcs, roboJcs, automaJon, advanced networks. Today’s Demonstration: Real-time data processing and vis. workflow!AL2S, ESnet ESnet SPADE instance @ Server at Argonne ExoGENI SPADE VM @ Starlight, Chicago • Data from ALS Experiment • • ExoGENI SPADE VM @ Oakland, California WAN-‐opJmized data transfer nodes and a network slice created programmaJcally (Science DMZ as a service) ApplicaJon workflow instanJated to stage data at the GENI rack on Science DMZ slice Data is moved opJmally across the WAN1 Compute Cluster NERSC, LBL 1 Earlier work, like Phoebus, have instandated the value of this approach h_p://portal.nersc.gov/project/als/sc14/ Dedicated vs. Virtual resources! • GENI provides a distributed software-defined infrastructure (SDI)" – Compute + Storage + Network" • GENI racks may be deployed on-campus or in provider networks close to the campus" • ‘Science DMZ as a service’ " – Applications can provision a virtual ‘Science DMZ’ as and when needed" Programmable infrastructure to enable end-‐users to create dynamic ‘fricJon-‐free’ infrastructures without advanced knowledge/training Microtomography of High Temperature Materials under stress! Set collected by materials sciendst Rob Ritchie, LBNL/UCB What constitutes programmable network behavior? (i.e. what is SDN?)! • Control over virtual topology" – Link in one layer is represented by a path in another" • Control over packet forwarding" – Making decisions about which interface a packet/ frame should be placed" • Queue management and arbitration" – Defining packet queues and associated service and scheduling policies " Layer 1/2/3 VPNs via explicit signaling (MPLS, GMPLS)" Bandwidth-on-demand services (OSCARS, NSI)" FlowVisor" OpenFlow 1.0, Nicira OpenVSwitch, Cisco ONE, OpenDaylight, Juniper Contrail" Numerous vendor-proprietary APIs," OpenFlow 1.3" 28! ExoGENI and OpenFlow (now)! • OpenFlow experiments using embedded topologies with OVS spanning one or more sites" – e.g. HotSDN ‘14 “A Resource Delegation Framework for Software Defined Networks” Baldin, Huang, Gopidi" • Experiments with OF 1.0 in rack switches" – Described in ExoBlog (www.exogeni.net)" 29! ExoGENI and OpenFlow (near future)! • OpenFlow service on BEN (ben.renci.org)" – 40G wave using Juniper EX switches" – FSFW, OF 1.0, multiple controllers" – Topology embedding/VNE for ExoGENI, path service for other projects." • Slice on AL2S with own controller" – Topology embedding for ExoGENI, value-add experimenter services with ExoGENI resources" • Application-specific topology embedding" 30! Where are we going?! • More sites" – Georgia Tech [Atlanta, GA], PUCP [Lima, Peru], Ciena [Hanover, MD]" • Updated OpenStack" • Better compute isolation" – Take NUMA into account for placement decisions" • Better storage isolation" – Provision storage VLANs/channels with QoS properties to provide predictable performance" • Better network isolation and performance" – Enable SR-IOV " • More complex topology management/embedding" – Fully dynamic slices" • More diverse substrates" – Integration with batch schedulers (SLURM)" – VMWare, other cloud stacks" – Public clouds" 31! Thank you!! • http://www.exogeni.net" " 32!