SGI 10-Gigabit Ethernet Network Adapter Uživatelský manuál

Procházejte online nebo si stáhněte Uživatelský manuál pro Sítě SGI 10-Gigabit Ethernet Network Adapter. InfiniBand and 10-Gigabit Ethernet for Dummies Uživatelská příručka

  • Stažení
  • Přidat do mých příruček
  • Tisk
  • Strana
    / 150
  • Tabulka s obsahem
  • KNIHY
  • Hodnocené. / 5. Na základě hodnocení zákazníků
Zobrazit stránku 0
Designing Cloud and Grid Computing Systems
with InfiniBand and High-Speed Ethernet
Dhabaleswar K. (DK) Panda
The Ohio State University
http://www.cse.ohio-state.edu/~panda
A Tutorial at CCGrid ’11
by
Sayantan Sur
The Ohio State University
E-mail: sur[email protected]-state.edu
http://www.cse.ohio-state.edu/~surs
Zobrazit stránku 0
1 2 3 4 5 6 ... 149 150

Shrnutí obsahu

Strany 1 - A Tutorial at CCGrid ’11

Designing Cloud and Grid Computing Systems with InfiniBand and High-Speed EthernetDhabaleswar K. (DK) PandaThe Ohio State UniversityE-mail: panda@cse.

Strany 2

Hadoop Architecture• Underlying Hadoop Distributed File System (HDFS)• Fault-tolerance by replicating data blocks• NameNode: stores information on dat

Strany 3 - Computing Systems

CCGrid '11OpenFabrics Stack with Unified Verbs InterfaceVerbs Interface(libibverbs)Mellanox(libmthca)QLogic(libipathverbs)IBM (libehca)Chelsio(li

Strany 4 - Cluster Computing Environment

• For IBoE and RoCE, the upper-level stacks remain completely unchanged• Within the hardware:– Transport and network layers remain completely unchange

Strany 5 - (http://www.top500.org)

CCGrid '11OpenFabrics Software StackSA Subnet AdministratorMAD Management DatagramSMA Subnet Manager AgentPMA Performance Manager AgentIPoIB IP o

Strany 6 - Grid Computing Environment

CCGrid '11103InfiniBand in the Top500Percentage share of InfiniBand is steadily increasing

Strany 7

45%43%6%1%0%0%1%0%0%0%4%Number of SystemsGigabit Ethernet InfiniBandProprietary MyrinetQuadrics Mixed NUMAlink SP Switch Cray Interconnect Fat Tree Cu

Strany 8 - Compute cluster

105InfiniBand System Efficiency in the Top500 ListCCGrid '1101020304050607080901000 50 100 150 200 250 300 350 400 450 500Efficiency (%)Top 500 S

Strany 9 - Cloud Computing Environments

• 214 IB Clusters (42.8%) in the Nov ‘10 Top500 list (http://www.top500.org)• Installations in the Top 30 (13 systems):CCGrid '11Large-scale Infi

Strany 10 - Hadoop Architecture

• HSE compute systems with ranking in the Nov 2010 Top500 list– 8,856-core installation in Purdue with ConnectX-EN 10GigE (#126)– 7,944-core installat

Strany 11 - Memcached Architecture

• HSE has most of its popularity in enterprise computing and other non-scientific markets including Wide-area networking• Example Enterprise Computing

Strany 12

• Introduction• Why InfiniBand and High-speed Ethernet?• Overview of IB, HSE, their Convergence and Features• IB and HSE HW/SW Products and Installati

Strany 13 - • Software components

Memcached Architecture• Distributed Caching Layer– Allows to aggregate spare memory from multiple nodes– General purpose• Typically used to cache data

Strany 14 - • Ex: TCP/IP, UDP/IP

Modern Interconnects and Protocols110ApplicationVerbsSocketsApplicationInterfaceTCP/IPHardwareOffloadTCP/IPEthernetDriverKernelSpaceProtocolImplementa

Strany 15 - – Not scalable:

• Low-level Network Performance• Clusters with Message Passing Interface (MPI)• Datacenters with Sockets Direct Protocol (SDP) and TCP/IP (IPoIB)• Inf

Strany 16 - Myrinet (1993 -) 1 Gbit/sec

CCGrid '11112Low-level Latency Measurements051015202530VPI-IBNative IBVPI-EthRoCESmall MessagesLatency (us)Message Size (bytes)010002000300040005

Strany 17

CCGrid '11113Low-level Uni-directional Bandwidth Measurements02004006008001000120014001600VPI-IBNative IBVPI-EthRoCEUni-directional BandwidthBand

Strany 18

• Low-level Network Performance• Clusters with Message Passing Interface (MPI)• Datacenters with Sockets Direct Protocol (SDP) and TCP/IP (IPoIB)• Inf

Strany 19 - IB Trade Association

• High Performance MPI Library for IB and HSE– MVAPICH (MPI-1) and MVAPICH2 (MPI-2.2)– Used by more than 1,550 organizations in 60 countries– More tha

Strany 20

CCGrid '11116One-way Latency: MPI over IB0123456Small Message LatencyMessage Size (bytes)Latency (us)1.961.541.602.17050100150200250300350400MVAP

Strany 21 - • I/O interface bottlenecks

CCGrid '11117Bandwidth: MPI over IB0500100015002000250030003500Unidirectional BandwidthMillionBytes/secMessage Size (bytes)2665.63023.71901.11553

Strany 22

CCGrid '11118One-way Latency: MPI over iWARP0102030405060708090Chelsio (TCP/IP)Chelsio (iWARP)Intel-NetEffect (TCP/IP)Intel-NetEffect (iWARP)Mess

Strany 23

CCGrid '11119Bandwidth: MPI over iWARP0200400600800100012001400Message Size (bytes)Unidirectional BandwidthMillionBytes/sec839.81169.7373.31245.0

Strany 24 - (not shown)

• Good System Area Networks with excellent performance (low latency, high bandwidth and low CPU utilization) for inter-processor communication (IPC) a

Strany 25

CCGrid '11120Convergent Technologies: MPI Latency0102030405060Small MessagesLatency (us)Message Size (bytes)0200040006000800010000120001400016000

Strany 26

CCGrid '11121Convergent Technologies:MPI Uni- and Bi-directional Bandwidth02004006008001000120014001600Native IBVPI-IBVPI-EthRoCEUni-directional

Strany 27 - • Myricom GM

• Low-level Network Performance• Clusters with Message Passing Interface (MPI)• Datacenters with Sockets Direct Protocol (SDP) and TCP/IP (IPoIB)• Inf

Strany 28 - IB Hardware Acceleration

CCGrid '11123IPoIB vs. SDP Architectural ModelsTraditional ModelPossible SDP ModelSockets AppSockets APISockets ApplicationSockets APIKernelTCP/I

Strany 29 - • Hardware Checksum Engines

CCGrid '11124SDP vs. IPoIB (IB QDR)050010001500200028321285122K8K32KBandwidth (MBps)IPoIB-RCIPoIB-UDSDP0510152025302481632641282565121K2KLatency

Strany 30 - TOE and iWARP Accelerators

• Low-level Network Performance• Clusters with Message Passing Interface (MPI)• Datacenters with Sockets Direct Protocol (SDP) and TCP/IP (IPoIB)• Inf

Strany 31

• Option 1: Layer-1 Optical networks– IB standard specifies link, network and transport layers– Can use any layer-1 (though the standard says copper a

Strany 32

Features• End-to-end guaranteed bandwidth channels• Dynamic, in-advance, reservation and provisioning of fractional/full lambdas• Secure control-plane

Strany 33

• Supports SONET OC-192 or 10GE LAN-PHY/WAN-PHY• Idea is to make remote storage “appear” local• IB-WAN switch does frame conversion– IB standard allow

Strany 34 - 2003 (Gen1), 2007 (Gen2)

CCGrid '11129InfiniBand Over SONET: Obsidian Longbows RDMAthroughput measurements over USNLinuxhostORNL700 milesLinuxhostChicagoCDCISeattleCDCISu

Strany 35

• Hardware components– Processing cores and memory subsystem– I/O bus or links– Network adapters/switches• Software components– Communication stack• B

Strany 36 - IB, HSE and their Convergence

CCGrid '11130IB over 10GE LAN-PHY and WAN-PHYLinuxhostORNL700 milesLinuxhostSeattleCDCIORNLCDCIlongbowIB/SlongbowIB/S3300 miles 4300 milesORNL lo

Strany 37 - Traditional Ethernet

MPI over IB-WAN: Obsidian RoutersDelay (us) Distance (km)10 2100 201000 20010000 2000Cluster ACluster BWAN LinkObsidian WAN Router Obsidian WAN Router

Strany 38 - IB Overview

Communication Options in Grid• Multiple options exist to perform data transfer on Grid• Globus-XIO framework currently does not support IB natively• W

Strany 39 - Components: Channel Adapters

Globus-XIO Framework with ADTS DriverGlobus XIO Driver #nDataConnectionManagementPersistentSessionManagementBuffer &FileManagementData Transport I

Strany 40 - • Switches: intra-subnet

134Performance of Memory BasedData Transfer• Performance numbers obtained while transferring 128 GB of aggregate data in chunks of 256 MB files• ADTS

Strany 41 - – Not directly addressable

135Performance of Disk Based Data Transfer• Performance numbers obtained while transferring 128 GB of aggregate data in chunks of 256 MB files• Predic

Strany 42

136Application Level Performance050100150200250300CCSMUltra-VizBandwidth (MBps)Target ApplicationsADTSIPoIB• Application performance for FTP getopera

Strany 43 - IB Communication Model

• Low-level Network Performance• Clusters with Message Passing Interface (MPI)• Datacenters with Sockets Direct Protocol (SDP) and TCP/IP (IPoIB)• Inf

Strany 44 - Queue Pair Model

A New Approach towards OFA in CloudCurrent ApproachTowards OFA in CloudApplicationAccelerated Sockets10 GigE or InfiniBandVerbs / Hardware OffloadCurr

Strany 45 - Memory Registration

Memcached Design Using Verbs• Server and client perform a negotiation protocol– Master thread assigns clients to appropriate worker thread• Once a cli

Strany 46 - Memory Protection

• Ex: TCP/IP, UDP/IP• Generic architecture for all networks• Host processor handles almost all aspects of communication– Data buffering (copies on sen

Strany 47 - (Send/Receive Model)

Memcached Get Latency• Memcached Get latency– 4 bytes – DDR: 6 us; QDR: 5 us– 4K bytes -- DDR: 20 us; QDR:12 us• Almost factor of four improvement ove

Strany 48 - Hardware ACK

Memcached Get TPS• Memcached Get transactions per second for 4 bytes– On IB DDR about 600K/s for 16 clients – On IB QDR 1.9M/s for 16 clients• Almost

Strany 49

Hadoop: Java Communication Benchmark• Sockets level ping-pong bandwidth test• Java performance depends on usage of NIO (allocateDirect)• C and Java ve

Strany 50

Hadoop: DFS IO Write Performance• DFS IO included in Hadoop, measures sequential access throughput• We have two map tasks each writing to a file of in

Strany 51 - Hardware Protocol Offload

Hadoop: RandomWriter Performance• Each map generates 1GB of random binary data and writes to HDFS• SSD improves execution time by 50% with 1GigE for t

Strany 52 - • Switching and Multicast

Hadoop Sort Benchmark• Sort: baseline benchmark for Hadoop• Sort phase: I/O bound; Reduce phase: communication bound• SSD improves performance by 28%

Strany 53 - Buffering and Flow Control

• Introduction• Why InfiniBand and High-speed Ethernet?• Overview of IB, HSE, their Convergence and Features• IB and HSE HW/SW Products and Installati

Strany 54 - Virtual Lanes

• Presented network architectures & trends for Clusters, Grid, Multi-tier Datacenters and Cloud Computing Systems• Presented background and detail

Strany 55 - Service Levels and QoS

CCGrid '11Funding AcknowledgmentsFunding Support byEquipment Support by148

Strany 56 - Traffic Segregation Benefits

CCGrid '11Personnel AcknowledgmentsCurrent Students – N. Dandapanthula (M.S.)– R. Darbha (M.S.)– V. Dhanraj (M.S.)– J. Huang (Ph.D.)– J. Jose (P

Strany 57 - Identifiers)

• Traditionally relied on bus-basedtechnologies (last mile bottleneck)– E.g., PCI, PCI-X– One bit per wire– Performance increase through:• Increasing

Strany 58 - Switch Complex

CCGrid '11Web Pointershttp://www.cse.ohio-state.edu/~pandahttp://www.cse.ohio-state.edu/~surshttp://nowlab.cse.ohio-state.eduMVAPICH Web Pagehttp

Strany 59 - – 3D Torus (Sandia Red Sky)

• Network speeds saturated at around 1Gbps– Features provided were limited– Commodity networks were not considered scalable enough for very large-scal

Strany 60 - More on Multipathing

• Industry Networking Standards• InfiniBand and High-speed Ethernet were introduced into the market to address these bottlenecks• InfiniBand aimed at

Strany 61 - IB Multicast Example

• Introduction• Why InfiniBand and High-speed Ethernet?• Overview of IB, HSE, their Convergence and Features• IB and HSE HW/SW Products and Installati

Strany 62

• IB Trade Association was formed with seven industry leaders (Compaq, Dell, HP, IBM, Intel, Microsoft, and Sun)• Goal: To design a scalable and high

Strany 63 - IB Transport Services

• Introduction• Why InfiniBand and High-speed Ethernet?• Overview of IB, HSE, their Convergence and Features• IB and HSE HW/SW Products and Installati

Strany 64 - Reliability

• 10GE Alliance formed by several industry leaders to take the Ethernet family to the next speed step• Goal: To achieve a scalable and high performanc

Strany 65 - Transport Layer Capabilities

• Network speed bottlenecks• Protocol processing bottlenecks• I/O interface bottlenecksCCGrid '1121Tackling Communication Bottlenecks with IB and

Strany 66 - Data Segmentation

• Bit serial differential signaling– Independent pairs of wires to transmit independent data (called a lane)– Scalable to any number of lanes– Easy to

Strany 67 - Transaction Ordering

CCGrid '11Network Speed Acceleration with IB and HSEEthernet (1979 - ) 10 Mbit/secFast Ethernet (1993 -) 100 Mbit/secGigabit Ethernet (1995 -) 10

Strany 68 - Message-level Flow-Control

2005 - 2006 - 2007 - 2008 - 2009 - 2010 - 2011Bandwidth per direction (Gbps)32G-IB-DDR48G-IB-DDR96G-IB-QDR48G-IB-QDR200G-IB-EDR112G-IB-FDR300G-IB-EDR1

Strany 69

• Network speed bottlenecks• Protocol processing bottlenecks• I/O interface bottlenecksCCGrid '1125Tackling Communication Bottlenecks with IB and

Strany 70

• Intelligent Network Interface Cards• Support entire protocol processing completely in hardware (hardware protocol offload engines)• Provide a rich c

Strany 71 - Concepts in IB Management

• Fast Messages (FM)– Developed by UIUC• Myricom GM– Proprietary protocol stack from Myricom• These network stacks set the trend for high-performance

Strany 72 - Subnet Manager

• Some IB models have multiple hardware accelerators– E.g., Mellanox IB adapters• Protocol Offload Engines– Completely implement ISO/OSI layers 2-4 (l

Strany 73

• Interrupt Coalescing– Improves throughput, but degrades latency• Jumbo Frames– No latency impact; Incompatible with existing switches• Hardware Chec

Strany 74 - HSE Overview

CCGrid '11Current and Next Generation Applications and Computing Systems3• Diverse Range of Applications– Processing and dataset characteristics

Strany 75 - Differences

• TCP Offload Engines (TOE)– Hardware Acceleration for the entire TCP/IP stack– Initially patented by Tehuti Networks– Actually refers to the IC on th

Strany 76 - – Multi Stream Semantics

• Also known as “Datacenter Ethernet” or “Lossless Ethernet”– Combines a number of optional Ethernet standards into one umbrella as mandatory requirem

Strany 77

• Network speed bottlenecks• Protocol processing bottlenecks• I/O interface bottlenecksCCGrid '1132Tackling Communication Bottlenecks with IB and

Strany 78

• InfiniBand initially intended to replace I/O bus technologies with networking-like technology– That is, bit serial differential signaling– With enha

Strany 79

• Recent trends in I/O interfaces show that they are nearly matching head-to-head with network speeds (though they still lag a little bit)CCGrid &apos

Strany 80

• Introduction• Why InfiniBand and High-speed Ethernet?• Overview of IB, HSE, their Convergence and Features• IB and HSE HW/SW Products and Installati

Strany 81

• InfiniBand– Architecture and Basic Hardware Components– Communication Model and Semantics– Novel Features– Subnet Management and Services• High-spee

Strany 82

CCGrid '1137Comparing InfiniBand with Traditional Networking StackApplication LayerMPI, PGAS, File SystemsTransport LayerOpenFabrics VerbsRC (rel

Strany 83 - Offloaded TCP

• InfiniBand– Architecture and Basic Hardware Components– Communication Model and Semantics• Communication Model• Memory registration and protection•

Strany 84

• Used by processing and I/O units to connect to fabric• Consume & generate IB packets• Programmable DMA engines with protection features• May hav

Strany 85 - Myrinet Express (MX)

CCGrid '11Cluster Computing EnvironmentCompute clusterLANFrontendMeta-DataManagerI/O ServerNodeMetaDataDataComputeNodeComputeNodeI/O ServerNodeDa

Strany 86 - Datagram Bypass Layer (DBL)

• Relay packets from a link to another• Switches: intra-subnet• Routers: inter-subnet• May support multicastCCGrid '11Components: Switches and Ro

Strany 87 - • Solarflare approach:

• Network Links– Copper, Optical, Printed Circuit wiring on Back Plane– Not directly addressable• Traditional adapters built for copper cabling– Restr

Strany 88

• InfiniBand– Architecture and Basic Hardware Components– Communication Model and Semantics• Communication Model• Memory registration and protection•

Strany 89

CCGrid '11IB Communication ModelBasic InfiniBand Communication Semantics43

Strany 90 - Hardware

• Each QP has two queues– Send Queue (SQ)– Receive Queue (RQ)– Work requests are queued to the QP (WQEs: “Wookies”)• QP to be linked to a Complete Que

Strany 91 - IB Transport

1. Registration Request • Send virtual address and length2. Kernel handles virtual->physical mapping and pins region into physical memory• Process

Strany 92 - IB iWARP/HSE RoE RoCE

• To send or receive data the l_keymust be provided to the HCA• HCA verifies access to local memory• For RDMA, initiator must have the r_key for the r

Strany 93

CCGrid '11Communication in the Channel Semantics(Send/Receive Model)InfiniBand DeviceMemoryMemoryInfiniBand DeviceCQQPSend RecvMemorySegmentSend

Strany 94 - IB Hardware Products

CCGrid '11Communication in the Memory Semantics (RDMA Model)InfiniBand DeviceMemoryMemoryInfiniBand DeviceCQQPSend RecvMemorySegmentSend WQE cont

Strany 95 - Tyan Thunder S2935 Board

InfiniBand DeviceCCGrid '11Communication in the Memory Semantics (Atomics)MemoryMemoryInfiniBand DeviceCQQPSend RecvMemorySegmentSend WQE contain

Strany 96 - IB Hardware Products (contd.)

CCGrid '11Trends for Computing Clusters in the Top 500 List (http://www.top500.org)Nov. 1996: 0/500 (0%)Nov. 2001: 43/500 (8.6%)Nov. 2006: 361

Strany 97 - – Nortel Networks

• InfiniBand– Architecture and Basic Hardware Components– Communication Model and Semantics• Communication Model• Memory registration and protection•

Strany 98 - • Support for VPI and RoCE

CCGrid '11Hardware Protocol OffloadComplete HardwareImplementationsExist51

Strany 99 - – OFED 1.6 is underway

• Buffering and Flow Control• Virtual Lanes, Service Levels and QoS• Switching and MulticastCCGrid '11Link/Network Layer Capabilities52

Strany 100 - (libibverbs)

• IB provides three-levels of communication throttling/control mechanisms– Link-level flow control (link layer feature)– Message-level flow control (t

Strany 101 - • Within the hardware:

• Multiple virtual links within same physical link– Between 2 and 16• Separate buffers and flow control– Avoids Head-of-Line Blocking• VL15: reserved

Strany 102 - OpenFabrics Software Stack

• Service Level (SL):– Packets may operate at one of 16 different SLs– Meaning not defined by IB• SL to VL mapping:– SL determines which VL on the nex

Strany 103 - InfiniBand in the Top500

• InfiniBand Virtual Lanes allow the multiplexing of multiple independent logical traffic flows on the same physical link• Providing the benefits of i

Strany 104 - SP Switch

• Each port has one or more associated LIDs (Local Identifiers)– Switches look up which port to forward a packet to based on its destination LID (DLID

Strany 105

• Basic unit of switching is a crossbar– Current InfiniBand products use either 24-port (DDR) or 36-port (QDR) crossbars• Switches available in the ma

Strany 106 - CCGrid '11

• Someone has to setup the forwarding tables and give every port an LID– “Subnet Manager” does this work• Different routing algorithms give different

Strany 107 - • Integrated Systems

CCGrid '11Grid Computing Environment6Compute clusterLANFrontendMeta-DataManagerI/O ServerNodeMetaDataDataComputeNodeComputeNodeI/O ServerNodeData

Strany 108 - Other HSE Installations

• Similar to basic switching, except…– … sender can utilize multiple LIDs associated to the same destination port• Packets sent to one DLID take a fix

Strany 109 - Presentation Overview

CCGrid '11IB Multicast Example61

Strany 110 - InfiniBand

CCGrid '11Hardware Protocol OffloadComplete HardwareImplementationsExist62

Strany 111 - Case Studies

• Each transport service can have zero or more QPs associated with it– E.g., you can have four QPs based on RC and one QP based on UDCCGrid '11IB

Strany 112 - Message Size (bytes)

CCGrid '11Trade-offs in Different Transport Types64AttributeReliableConnectionReliableDatagrameXtendedReliableConnectionUnreliableConnectionUnrel

Strany 113 - Bandwidth (MBps)

• Data Segmentation• Transaction Ordering• Message-level Flow Control• Static Rate Control and Auto-negotiationCCGrid '11Transport Layer Capabili

Strany 114

• IB transport layer provides a message-level communication granularity, not byte-level (unlike TCP)• Application can hand over a large message– Netwo

Strany 115 - MVAPICH/MVAPICH2 Software

• IB follows a strong transaction ordering for RC• Sender network adapter transmits messages in the order in which WQEs were posted• Each QP utilizes

Strany 116 - One-way Latency: MPI over IB

• Also called as End-to-end Flow-control– Does not depend on the number of network hops• Separate from Link-level Flow-Control– Link-level flow-contro

Strany 117 - Bandwidth: MPI over IB

• IB allows link rates to be statically changed– On a 4X link, we can set data to be sent at 1X– For heterogeneous links, rate can be set to the lowes

Strany 118

CCGrid '11Multi-Tier Datacenters and Enterprise Computing7...Enterprise Multi-tier DatacenterTier1Tier3Routers/ServersSwitchDatabase ServerAppli

Strany 119 - Bandwidth: MPI over iWARP

• InfiniBand– Architecture and Basic Hardware Components– Communication Model and Semantics• Communication Model• Memory registration and protection•

Strany 120

• Agents– Processes or hardware units running on each adapter, switch, router (everything on the network)– Provide capability to query and set paramet

Strany 121 - Convergent Technologies:

Inactive LinksCCGrid '11Subnet ManagerActive LinksCompute NodeSwitchSubnet ManagerInactive LinkMulticast JoinMulticast SetupMulticast JoinMultica

Strany 122

• InfiniBand– Architecture and Basic Hardware Components– Communication Model and Semantics– Novel Features– Subnet Management and Services• High-spee

Strany 123 - InfiniBand CA

• High-speed Ethernet Family– Internet Wide-Area RDMA Protocol (iWARP)• Architecture and Components• Features– Out-of-order data placement– Dynamic an

Strany 124 - SDP vs. IPoIB (IB QDR)

CCGrid '11IB and HSE RDMA Models: Commonalities and DifferencesIB iWARP/HSEHardware Acceleration Supported SupportedRDMA Supported SupportedAtomi

Strany 125

• RDMA Protocol (RDMAP)– Feature-rich interface– Security Management• Remote Direct Data Placement (RDDP)– Data Placement and Delivery– Multi Stream S

Strany 126 - IB on the WAN

• High-speed Ethernet Family– Internet Wide-Area RDMA Protocol (iWARP)• Architecture and Components• Features– Out-of-order data placement– Dynamic an

Strany 127 - Features

• Place data as it arrives, whether in or out-of-order• If data is out-of-order, place it at the appropriate offset• Issues from the application’s per

Strany 128 - “appear” local

• Part of the Ethernet standard, not iWARP– Network vendors use a separate interface to support it• Dynamic bandwidth allocation to flows based on int

Strany 129 - Sunnyvale

CCGrid '11Integrated High-End Computing EnvironmentsCompute clusterMeta-DataManagerI/O ServerNodeMetaDataDataComputeNodeComputeNodeI/O ServerNode

Strany 130 - 3300 miles 4300 miles

• Can allow for simple prioritization:– E.g., connection 1 performs better than connection 2– 8 classes provided (a connection can be in any class)• S

Strany 131 - Cluster B

• High-speed Ethernet Family– Internet Wide-Area RDMA Protocol (iWARP)• Architecture and Components• Features– Out-of-order data placement– Dynamic an

Strany 132 - Communication Options in Grid

• Regular Ethernet adapters and TOEs are fully compatible• Compatibility with iWARP required• Software iWARP emulates the functionality of iWARP on th

Strany 133 - Modern WAN

CCGrid '11Different iWARP ImplementationsRegular Ethernet AdaptersApplicationHigh Performance SocketsSocketsNetwork AdapterTCPIPDevice DriverOffl

Strany 134 - Data Transfer

• High-speed Ethernet Family– Internet Wide-Area RDMA Protocol (iWARP)• Architecture and Components• Features– Out-of-order data placement– Dynamic an

Strany 135 - IPoIB-64MB

• Proprietary communication layer developed by Myricom for their Myrinet adapters– Third generation communication layer (after FM and GM)– Supports My

Strany 136 - Application Level Performance

• Another proprietary communication layer developed by Myricom– Compatible with regular UDP sockets (embraces and extends)– Idea is to bypass the kern

Strany 137

CCGrid '11Solarflare Communications: OpenOnload Stack87Typical HPC Networking StackTypical Commodity Networking Stack• HPC Networking Stack provi

Strany 138

• InfiniBand– Architecture and Basic Hardware Components– Communication Model and Semantics– Novel Features– Subnet Management and Services• High-spee

Strany 139 - Memcached Design Using Verbs

• Single network firmware to support both IB and Ethernet• Autosensing of layer-2 protocol– Can be configured to automatically work with either IB or

Strany 140 - Memcached Get Latency

CCGrid '11Cloud Computing Environments9LANPhysical MachineVMVMPhysical MachineVMVMPhysical MachineVMVMVirtual FSMeta-DataMetaDataI/O ServerDataI/

Strany 141 - Memcached Get TPS

• Native convergence of IB network and transport layers with Ethernet link layer• IB packets encapsulated in Ethernet frames• IB network layer already

Strany 142 - Bandwidth with C version

• Very similar to IB over Ethernet– Often used interchangeably with IBoE– Can be used to explicitly specify link layer is Converged (Enhanced) Etherne

Strany 143

CCGrid '11IB and HSE: Feature ComparisonIB iWARP/HSE RoE RoCEHardware Acceleration Yes Yes Yes YesRDMA Yes Yes Yes YesCongestion Control Yes Opti

Strany 144

• Introduction• Why InfiniBand and High-speed Ethernet?• Overview of IB, HSE, their Convergence and Features• IB and HSE HW/SW Products and Installati

Strany 145 - Hadoop Sort Benchmark

• Many IB vendors: Mellanox+Voltaire and Qlogic– Aligned with many server vendors: Intel, IBM, SUN, Dell– And many integrators: Appro, Advanced Cluste

Strany 146

CCGrid '11Tyan Thunder S2935 Board(Courtesy Tyan)Similar boards from Supermicro with LOM features are also available 95

Strany 147 - Concluding Remarks

• Customized adapters to work with IB switches– Cray XD1 (formerly by Octigabay), Cray CX1• Switches:– 4X SDR and DDR (8-288 ports); 12X SDR (small si

Strany 148 - Funding Acknowledgments

• 10GE adapters: Intel, Myricom, Mellanox (ConnectX)• 10GE/iWARP adapters: Chelsio, NetEffect (now owned by Intel)• 40GE adapters: Mellanox ConnectX2-

Strany 149 - Personnel Acknowledgments

• Mellanox ConnectX Adapter• Supports IB and HSE convergence• Ports can be configured to support IB or HSE• Support for VPI and RoCE– 8 Gbps (SDR), 16

Strany 150 - Web Pointers

• Open source organization (formerly OpenIB)– www.openfabrics.org• Incorporates both IB and iWARP in a unified manner– Support for Linux and Windows–

Komentáře k této Příručce

Žádné komentáře