VNT: Virtual Networks over TRILL

Transcription

VNT: Virtual Networks over TRILL
Ahmed Amamou, ahmed@gandi.net
Benoît Ganne, bganne@kalray.eu
Accelerate networking innovation
through programmable data plane
Removing switches from datacenters with
TRILL/VNT and smartNIC
Who is Gandi?
• Gandi is a domain name registrars since 1999 and a cloud provider
since 2008
• We provide both
– IaaS: Infrastructure As A Service
– PaaS: Platform As A Service
• We support open source community:
– Provide open source code : https://github.com/Gandi
– Support open source project: VLC, Debian, … *
* Check http://www.gandi.net/supports/ for exhaustive list
2
IaaS new network’s challenges
• Cisco Forecast report*:
– Cloud traffic was about 3.3 zetta (1021) Bytes in 2013
– Cloud traffic will reach 6.6 zetta Bytes in 2016
– 76% of cloud traffic are East-West (within the same datacenter)
 A high density of links within a datacenter is needed
• Customer need a full network access
– Should be isolated
– VM network configuration should not be restrictive
Overlaying tenant traffic should be considered
* Cisco Global Cloud Index Forecast and Methodology, 2011-2016.
3
Why OpenCompute?
• New protocols are proposed to solve these
problems (TRILL , VXLAN, 802.1 ad STT …) but:
– Hardware integration is slow
– Protocol extensions are hard to integrate
• We believe the OpenCompute community can
help us
– To define an open, vendor-neutral API for
programmable data plane
– Bring open hardware fulfilling those needs
4
New datacenter architecture
• Switch from classic datacenter architecture to a full-mesh one
• Upgrade hardware to improve performances
5
TRILL @Gandi
• Gandi uses commodity hardware as TRILL Rbridges since 2013
• We did not yet found hardware that suits our needs.
6
TRILL: TRansparent Interconnection of Lot of Links
•
•
•
•
•
Layer 2 Routing Protocol
Uses a control and a data plane
Control plane : based on IS-IS that computes all Routing information
Data plane : forward packets using provided information from control plane
Uses Mac-in-Mac encapsulation
TRILL Header
Original payload
7
TRILL benefits
Commutation(L2)
Routing (L3)
TRILL
Configuration
Minimal
Intense
Minimal
Plug & play
Yes
No
Yes
Discovery
Automatic
Configured
Automatic
Learning
Automatic
Configured
Automatic
Multi path
No
Yes
Yes
Convergence
Slow
Fast
Fast
Connectivity
Inflexible
Flexible
Flexible
Scale
Limited
Important
Important
8
Control Plane: Forwarding database
9
Multitenancy: Virtual Network over TRILL (VNT)
 New cloud architecture have to take into consideration Multitenancy
 Trill does not provide Multitenancy handling mechanisms
→ We need to extend it
10
VNT vs TRILL
• Update Both control and data planes
– Control plane : Prune multicast tree to limit multicast traffic
– Data plane : Forwarding is conditioned by VNI support
L2 Routing information
Outer Destination
Mac Address
Outer Source
Mac Address
Egress Rbridge
Nickname
Ingress Rbridge
Nickname
Optional
Outer IEEE 802.1Q
TRILL Header
Options description
TLV
VNI Tag (24 bits)
VNT Header
Extensions
Tenant
identification
Original
Packet Payload
VNT Encapsulation
Original Ethernet
Frame
Publication:
Amamou, A., Haddadou, K., & Pujolle, G. (2014).
A TRILL-based multi-tenant data center network. Computer Networks.
11
VNT: Multicast tree pruning
Topology
i1
n8
i2
i1
Multicast tree
n1 i2
i3
i2
n2
n7 i3
A –Vni1
i2
i1
i2
i1
i3
n6
i1
i1
n5
n8
i1
i2
i2
i1
n4
n1
n1 i2
i1
n2
i2
i1
i2
i1
i3
n6
n5
i1
A –Vni1
n5
n2
n3
i2
i3
i2
n2
n7 n6 n4 n3
n7 i3
i1
n5
i1
i3
i2
n8
n3
i2
i3
i2
B –Vni1
n1
i1
i2
n4
i1
n6
12
Current VNT implementation on Linux
Control plane :
Quagga daemon
Data plane:
Linux Bridge Module
13
Current VNT implementation on Linux
Control plane :
Quagga daemon
https://github.com/Gandi/
Data plane:
Linux Bridge Module
14
Data plane: performance
Throughput
Delay
• Throughput is affected by the addition
processing operation
• Processing for a single packet is not affected
15
Improving performance
• Shift data plane from host to smartNIC
– Increase performance
– Offload x86 for other usages
• eg. Customers workload
Host
Host
Control plane
Control plane
Data plane
Data plane
NIC
smartNIC
16
KALRAY
deterministic supercomputing on a chip
•
Founded in 2008, fabless semiconductor
company
•
Kalray has developed the disruptive MPPA®
(Multi-Purpose Processing Array)
programmable architecture
–
–
–
–
First MPPA®-256 Chips with CMOS 28nm TSMC
Leading Performance / Energy Ratio Worldwide
Leading Performance / Energy Ratio Worldwide
Time predictability and low latency
Heterogeneous applications on the same chip
High programmability
•
Working with industry-leading partners and
customers
•
55 employees
•
Offices in France and US
17
MPPA®-256 Bostan
Networking Strengths
High throughput / Line rate
Software Defined NIC




Smart packet classification/dispatching
256 cores for packets processing
Standard C/C++ with GCC-4.9
Advanced debugging and profiling
Low latency
 Zero-copy Ethernet  PCIe
 < 1µs port-to-port transparent mode
 < 1µs port to system memory




80 Gbps full-duplex line-rate (2x120MPPS)
3400 instructions per packet @64B
AES, SHA-1, SHA-2,CRC accelerators
2 x PCIe Gen3 8-lanes
System integration
 Linux support
 Virtualization support
 Low power
18
MPPA®-256 Bostan
•
•
•
64-bit processor
Up to 800MHz
High Performance
– 845 GFLOPS SP / 422 GFLOPS DP
– 1 TOPS
•
High Bandwidth Network On a Chip
– 2 x 12.8 GB/s
•
High Speed Ethernet
– Up to 2x40 Gbps / 2x120 MPPS @ 64B
•
DDR3 Memory interfaces
– 2 x 64-bit + ECC @2133MT/s / 2 x
17GB/s
•
PCIe Gen3 interface
– 2 x 8-lanes / 2 x 8 GB/s full duplex
– End Point / Root Complex
•
NoCX extension
– 2 x 40 Gbps + 2 x 80 Gbps ILK
•
Flash controller, GPIOs…
19
MPPA®-256 Processor Hierarchical Architecture
256 Processing Engine cores + 32 Resource Management cores
VLIW Core
Compute Cluster
Instruction Level
Parallelism
Thread Level
Parallelism
Manycore Processor
Process Level
Parallelism
20
High Speed Ethernet Packet processing
• Ethernet Rx dispatcher
– 8 classification tables
• Classify
• Extract fields
• Smart Dispatch
– Round Robin way
– Flexible cores allocation
• Round Robin vs. classification
• Per 10G Ports
•
Ethernet Tx
– 64 Tx FIFOs
– QoS between the FIFOs
– Flow Control between
clusters and Tx FIFOs
Patent pending
21
VNT on a programmable data plane
Multicast forwarding example
•
Kalray Bostan smartNIC
– Explore programmable data
plane opportunities
– Study a VNT smartNIC
feasibility and architecture
8x10GbE
•
IO ethernet driver
On-going work between
Gandi and Kalray
Multicast forwarding put a
high load on each node
x86
Hypervisor
MPPA Linux ethernet driver
MPPA Linux ethernet driver
Linux networking stack
Linux networking stack
TRILL controller
Userspace application
22
VNT on a programmable data plane
Multicast forwarding example
•
Kalray Egress=DTROOT,
Bostan smartNIC VNI=VNI-1>
<Ethertype=TRILL,
8x10GbE
if (Packet[Ethertype]IO
==ethernet
TRILL) {driver
send to cluster #HASH(Egress RBridge)
}
Dispatch the packet based
on Egress Rbridge
– In case of multicast, Egress
RBridge is set to the tree root
– Each cluster “owns” a subset
of the possible Egress
RBridge (ie. a FIB subset)
x86
Hypervisor
MPPA Linux ethernet driver
MPPA Linux ethernet driver
Linux networking stack
Linux networking stack
TRILL controller
Userspace application
23
VNT on a programmable data plane
Multicast forwarding example
•
Kalray Bostan smartNIC
8x10GbE
Dispatch the packet based
on Egress Rbridge
– In case of multicast, Egress
RBridge is set to the tree root
– Each cluster “owns” a subset
of the possible Egress
RBridge (ie. a FIB subset)
IO ethernet driver
x86
Hypervisor
MPPA Linux ethernet driver
MPPA Linux ethernet driver
Linux networking stack
Linux networking stack
TRILL controller
Userspace application
24
VNT on a programmable data plane
Multicast forwarding example
•
Kalray Bostan smartNIC
Lookup the list of next-hop
RBridges for this multicast
tree
– RBridge owner clusters can
be local or remote
8x10GbE
•
Lookup the LIB for local
ports if any
FIB[Egress RBridge] = {
IO ethernet
Egress RBridge
MAC; driver
Egress RBridge Interface;
MCTree = [ RBx, RBy, … ];
x86
VNI = [ VNI-1, VNI-2, … ];
} Hypervisor
LIB = {
(Local MACx, Local Portx, VNI-1);
MPPA Linux ethernet driver
MPPA Linux ethernet driver
…
}
Linux networking stack
Linux networking stack
TRILL controller
Userspace application
25
VNT on a programmable data plane
Multicast forwarding example
•
Kalray Bostan smartNIC
Forward the frame
– Remote
•
Forward to clusters owning the
next-hop RBridge
– Local
8x10GbE
•
•
Decapsulte inner frame
Forward it the local VM
IO ethernet driver
x86
Hypervisor
MPPA Linux ethernet driver
MPPA Linux ethernet driver
Linux networking stack
Linux networking stack
TRILL controller
Userspace application
26
VNT on a programmable data plane
Multicast forwarding example
•
Kalray Bostan smartNIC
Check if the RBridge
support the appropriate
VNI
– If yes forward to Rbridge
– If not, stop here
8x10GbE
IO ethernet
FIB[Egress RBridge]
= { driver
Egress RBridge MAC;
Egress RBridge Interface;
x86MCTree = [ RBx, RBy, … ];
VNI = [ VNI-1, VNI-2, … ];
Hypervisor
}
MPPA Linux ethernet driver
MPPA Linux ethernet driver
Linux networking stack
Linux networking stack
TRILL controller
Userspace application
27
VNT on a programmable data plane
Multicast forwarding example
•
Kalray Bostan smartNIC
Check if the RBridge
support the appropriate
VNI
– If yes forward to Rbridge
– If not, stop here
8x10GbE
IO ethernet driver
x86
Hypervisor
MPPA Linux ethernet driver
MPPA Linux ethernet driver
Linux networking stack
Linux networking stack
TRILL controller
Userspace application
28
Innovation and efficiency
• Solving SDN and network virtualization challenges
requires new protocols
– eg. VXLAN, NVGRE, TRILL/VNT…
• Efficiency generally means hardware support
…But hardware development cannot keep up with
software and slow down innovation
• Gandi and Kalray think a programmable data
plane can reconcile efficiency and innovation
…But we need open ecosystems, standards and API
29
Ahmed Amamou, ahmed@gandi.net
Benoît Ganne, bganne@kalray.eu
Thank you for your attention!
Questions?