Tolly Report: HP Flex-10 vs Cisco UCS (Network Bandwidth Scalability Comparison)

Tolly.com announced on 2/25/2010 a new Test Report that compares the network bandwidth scalability between HP BladeSystem c7000 with BL460 G6 Servers and Cisco UCS 5100 with B200 Servers, and the results were interesting.   The report simply tested 6 HP blades, with a single Flex-10 Module vs 6 Cisco blades using their Fabric Extender + a single Fabric Interconnect.  I’m not going to try and re-state what the report says (for that you can download it directly), instead, I’m going to highlight the results.  It is important to note that the report was “commissioned by Hewlett-Packard Dev. Co, L.P.”

Result #1:  HP BladeSystem C7000 with a Flex-10 Module Tested to have More Aggregate Server Throughput (Gbps) over the Cisco UCS with a Fabric Extender connected to a Fabric Interconnect in a Physical-to-Physical Comparison
>The test shows when 4 physical servers were tested, Cisco can achieve an aggregate throughput of 36.59 Gbps vs HP achieving 35.83Gbps (WINNER: Cisco)

>When 6 physical servers were tested, Cisco achieved an aggregate throughput of 27.37 Gbps vs HP achieving 53.65 Gbps – a difference of 26.28 Gbps (WINNER: HP)

Result #2:
 HP BladeSystem C7000 with a Flex-10 Module Tested to have More Aggregate Server Throughput (Gbps) over the Cisco UCS with a Fabric Extender connected to a Fabric Interconnect in a Virtual-to-Virtual Comparison
>Testing 2 servers each running 8 VMware Red Hat Linux hosts showed that HP achieved an aggregate throughput of 16.42 Gbps vs Cisco UCS achieving 16.70 Gbps (WINNER: Cisco). 

The results of the above was performed with the 2 x Cisco B200 blade servers each mapped to a dedicated 10Gb uplink port on the Fabric Extender (FEX).  When the 2 x Cisco B200 blade servers were designed to share the same 10Gb uplink port on the FEX, the achieved aggregate throughput on the Cisco UCS decreased to 9.10 Gbps.

A few points to note about these findings:
a) the HP Flex-10 Module has 8 x 10Gb uplinks whereas the Cisco Fabric Extender (FEX) has 4 x 10Gb uplinks

b) Cisco’s FEX Design allows for the 8 blade servers to extend out the 4 external ports in the FEX a 2:1 ratio (2 blades per external FEX port.) The current Cisco UCS design requires the servers to be “pinned”, or permanently assigned, to the respective FEX uplink. This works well when there are 4 blade servers, but when you get to more than 4 blade servers, the traffic is shared between two servers, which could cause bandwidth contention. 

 Furthermore, it’s important to understand that the design of the UCS blade infrastructure does not allow communication to go from Server 1 to Server 2 without leaving the FEX, connecting to the Fabric Interconnect (top of the picture) then returning to the FEX and connecting to the server.  This design is the potential cause of the decrease in aggregate throughput from 16.70Gbps to 9.10Gbps as shown above.


One of the “Bottom Line” conclusions from this report states, “throughput degradation on the Cisco UCS cased by bandwidth contention is a cause of concern for customers considering the use of UCS in a virtual server environment”  however I encourage you to take a few minutes, download this full report from the Tolly.com website and make your own conclusions about this report. 

Let me know your thoughts about this report – leave a comment below.

Disclaimer: This report was brought to my attention while attending the HP Tech Day event where airfare, accommodations and meals are being provided by HP, however the content being blogged is solely my opinion and does not in anyway express the opinions of HP.

53 thoughts on “Tolly Report: HP Flex-10 vs Cisco UCS (Network Bandwidth Scalability Comparison)

  1. Pingback: uberVU - social comments

  2. Aaron Delp

    Hey Kevin – I haven't seen the report but some thoughts just sitting here for 30 seconds and taking a look.

    The report doesn't seem fair becuase it of course stressed a Cisco oversubsubscription up to 8 blades (the max for UCS). Guess what, if you tested the max amount of HP blades I bet you see the same thing, here's why.

    The UCS FEX has 4 uplinks for 8 blades, a 2:1 oversubscription.
    The C7000 switch has 8 uplinks for 16 blades, a 2:1 oversubscription.

    An 8 blades test isn't valid because in this case the C7000 each use a didicated link (1:1 ratio) vs. Cisco using (2:1) ratio. I'm also wondering (I don't know this answer off hand) if putting in the 2nd FEX you now have a total of 8 uplinks and since you are active/active on the data path that would even out the oversubscription.

    To make it fair, the C7000 would need to be tested for bandwidth on 16 blades to test throughput at the 2:1 oversubscription level.

    This is another case of somebody dragging of “lab queens” (systmes that aren't real world configurations) and running some test. No one would ever use a single 10GB Flex10 switch and no one would use a single FEX.

  3. Kevin Houston

    Aaron – You should download the report -it's only 6 pages and takes about 2 minutes to registers. I looked up the Cisco UCS 5108 users's manual and it states you would never hook up 2 x FEX's to a single Fabric Interconnect, so I'm not sure why that was even included in this test. I agree with your comments of the report appearing to not test equally from an oversubscription perspective, but I read the report to be focusing on bandwidth contention due to the design of the traffic between servers needing to leave the chassis to communicate with another server. I would like to see something from Cisco to counter this report. It is also important to note that the report says that Cisco was asked to participate in this report but they reportedly declined. Thanks for your thoughts and for reading!

  4. ewan

    I was under the impression that part of the reason why the Cisco blades don't talk directly to each other inside the blade chassis is to ease security management – all traffic goes via the central fabric switch.

    The use of a fabric extender inside the blade chassis rather than a switch does introduce that bandwidth limitation, but it also eases management significantly – there's 2 cisco fabric switches to maintain across a number of UCS chassis, while each HP blade chassis would have it's own 2 switches to manage, same as an IBM blade chassis?

  5. M. Sean McGee

    (Disclaimer: I currently work for Cisco as a data center architect. Prior to that, I was a network architect for the HP BladeSystem BU for many years.)

    This Tolly Group report is fundamentally flawed. One of its main premises is that only a single remote line card (FEX module) can be active at one time. Unfortunately, they don't know how to use the product. All UCS remote line cards and all uplinks on all of these UCS remote line cards can be used at the same time.

    In addition, this HP-funded report ‘needs’ to discredit Cisco's new architecture because HP can't do anything even close with their product. The blade designs from the big three (HP, IBM and Dell) all rely on the antiquated concept of a “mini rack” (blade enclosure). Their mini-racks have miniature versions of the devices you normally find at the top of the rack (eth switches, FC switches, rack manager). When you add two or three of these mini-racks…er, I mean, HP blade enclosures…to your real rack, you've got all the management overhead of multiple little racks inside your big rack. When an HP blade customer buys their 17th/33rd/49th/etc. server, they have to buy a whole new mini-rack and add a whole new management domain to their datacenter.

    Well, Cisco changed all that. Cisco doesn't put in little mini switches in every blade chassis. Cisco puts in remote line cards (FEX modules). In other words, Cisco extends the pair of external switches INTO multiple blade chassis (yes, they can both be active at the same time). Cisco doesn’t require the customer to deploy lots of little switches like HP (yes, HP’s Virtual Connect counts as little switches, but Ill leave that for another discussion.). Cisco also doesn't put in 2x little mini rack managers (Onboard Administrator) in every blade chassis either. Cisco's chassis is a simple “block” of 8 servers that share power, cooling, and remote line cards. Take several of these “blocks” and plug them into the Fabric Interconnects and you have one logical chassis. So, customers can have multiple UCS blade chassis in multiple racks all managed as a single logical blade chassis. I'd be very worried if I was one of the big three.

    It's in HP's best interest to discredit this brand new and improved blade architecture from Cisco that does away with the antiquated mini-rack concept that HP continues to ship to their customers. And that is the biggest problem with this report is…it’s funded by HP. In other words, Tolly only gets paid if HP likes what they hear – apparently, even if it’s factually incorrect.

    Unfortunately for HP, this report can be immediately discredited because its primary assumption about the UCS architecture is false.

  6. Kevin Houston

    RE: The Tolly Report – thanks for the disclaimer that you work for Cisco and thanks for the comments. I'm not debating that the report is false or accurate, that's for the Tolly Group to handle, however your statement of “one of its main premises is that only a single remote line card (FEX module) can be active at one time” is accurate, to an extent. In this testing environment, they only chose 1 x Cisco UCS 6120XP Fabric Interconnect, and based on the Cisco UCS 5108 User's Manual at http://www.cisco.com/en/US/docs/unified_computi… it appears to me that you must have a 2nd UCS 6120XP Fabric Interconnect if a 2nd FEX is used. Maybe I'm misreading it, but if that's the case, then the Tolly Group didn't do anything wrong except spend money on a 2nd FEX that wasn't needed. Since I didn't include what hardware was included, I encourage you, if you haven't already, to download the full 6 page PDF from the Tolly Group. Thanks for your comments and opinions. Appreciate you reading!

  7. M. Sean McGee

    Hi Kevin,
    It is correct that each remote line card can only be connected to a single Fabric Interconnect. It is also correct that the Tolly Group would have to have two Fabric Interconnects to have two active FEX modules. However, the report makes the reader believe that the only configuration supported by UCS is a single active FEX and that UCS can only achieve 40 Gbps of bandwidth at any one time. That premise is completely false.

    Very few customers would run UCS, or any other blade solution, with a single upstream switch. No one wants a single point of failure. As a result, the test is not based on a real world example. Instead, its based on an incomplete configuration. It's like the Tolly group testing an HP chassis with only one I/O module in the blade enclosure and then complaining that they can only get link on 1 NIC in every server (HP requires a separate I/O module for every physical NIC port). That wouldn't be a real world example of an HP blade deployment and no one would pay attention to the test results. Same with this report on UCS.

    In my opinion, the report is equivalent to testing the performance of a V8 engine with 4 out of the 8 spark plugs removed and then complaining that you can only get four pistons to work.

  8. BladeGuy

    (Disclaimer: My name is Ken Henault. I currently work for HP as an Infrastructure Architect.)

    The report doesn't directly state the number of uplinks from either configuration. You can imply from the slot to uplink mapping that they used all 4 uplinks on the UCS2104. The Virtual Connect will give the performance described in this report with a single uplink.

    That's the whole point of this report. Virtual Connect handles all this traffic within the enclosure, while UCS has to send it to the UCS6120 top of rack.

    Before this report I would have assumed that most UCS installations would only use 1 or 2 uplinks per FEX. This makes a good argument that all 4 might not be enough.

    (These opinions are mine, and do not necessarily reflect the views of my employer)

  9. BladeGuy

    (Disclaimer: My name is Ken Henault. I currently work for HP as an Infrastructure Architect.)

    There are many things in the reply I personally disagree with, I’ll just take a few to start.

    “Cisco doesn't put in little mini switches in every blade chassis” The UCS2100 FEX is a mini switch much like Virtual Connect. UCS will allow you to multiplex a single 10Gb uplink from the UCS2104 FEX to all 8 blades in an enclosure. Any time a connection is multiplexed, there's some sort of switching going on. Both Cisco and HP would rather you don't think of these devices as a switch, because switches have a bad connotation, (points of management, spanning tree convergence, etc). Cisco likes to complain that Virtual Connect is too much of a black box, but their FEX is even more of a black box. To make matters worse, Cisco doesn't take advantage of the mini switch and forwards all the traffic to the top of rack UCS.

    “Cisco also doesn't put in 2x little mini rack managers” Each UCS2104 contains a management chip. Just because Cisco hides there's in the UCS6100 interface doesn't mean management chip is not in there. In addition UCS requires $60,000 worth of top-of-rack UCS 6100s before you even buy your first blade.

    “HP requires a separate I/O module for every physical NIC port”, well so does Cisco. If you only have one FEX, only half your CNAs will have link.

    “Very few customers would run UCS, or any other blade solution, with a single upstream switch. No one wants a single point of failure.” So you agree that you don't put two of something in a system because you need two to run properly, you put in two so it will run properly when one fails. But your closing statement implies you need all components in place to work properly. If all components are required to make it work properly, UCS is missing some key redundancy.

    Just because Cisco hides the switching and mangement capabilities within their enclosure, doesn't mean they don't exist. It might seem simpler to manage, but in hiding it, they force all traffic to the top of the rack. Forcing traffic to leave the enclosure is what's causing the difference in performance in these two tests. The hardware is there, Cisco could allow local switching within the enclosure, but then they'd lose the ability to say “we're different”. Different doesn't mean it's better.

    (These opinions are mine, and do not necessarily reflect the views of my employer)

  10. Kevin Tolly

    I would like to clear up a misconception in some of the postings about the test. The test was not utilizing the uplinks on the HP Virtual Connect for anything but a small amount (less than 100Kbps) of management traffic.

    The test compares intra-chassis communications in the case of HP this traffic is within the c7000 chassis vs. a Cisco UCS (a logical system created by a single blade Chassis+Fabric Extenders (FEX)+Fabric Interconnect).

    For the Cisco UCS the wire from the FEX to the Fabric Interconnect is probably better described as a server downlink (see: https://supportforums.cisco.com/docs/DOC-5950;j… “The uplink ports are connected to the external infrastructure and the downlink ports are connected to the fabric extenders in the UCS chassis”)

    So what we see in the test is simple physics, two Cisco UCS blades servers share a wire and therefore the bandwidth is shared between the two but is less than what would have been if it is not shared. Hence the oversubscription issue. HP intra-chassis traffic has no such constraint. Thus, we were able to run the HP tests from any-to-any blade within the chassis without seeing any difference in throughput. Such was not the case with the Cisco solution.

    This issue is a caveat for the UCS approach and would even be more pronounced if, for example, if a user were trying to scale a logical UCS implementation to more than a single chassis. For example in the 320 server logical system with 40 enclosures each connected to a single port on a 40 Port Fabric Interconnect, the amount of “downlink” sharing is significantly more than even than even the test pointed out.

  11. Kevin Houston

    I apprecate a representative from the Tolly Group jumping into the conversations RE: #HP Flex-10 vs #Cisco UCS. I appreciate your clarifications and hope you'll keep following this discussion in case representatives from Cisco or HP have further thoughts. Thanks for reading!

  12. Aaron Delp

    Kevin – Thank you for your post, it helps clear up a lot.

    I will simply ask this. How real world is anything in this report? How often does line rate speed inside the traffic from blade to blade come into play in the real world? Maybe I'm the minority but I don't see any of the statistics presented here coming into play in most production systems today.

    We can talk about interchassis traffic and blade pinning all day but if it doesn't represent real world then who cares? If customers can't take the data and compare it to their situations in their business, what is the value of the report?

    At the end of day it doesn't matter if the vendor is HP or Cisco, this report is about oversubscription and basic physics. The best HP can do is 2:1 (16 blades, 8 uplinks) oversubscription out of the chassis and the best Cisco can do out of the chassis is 2:1 (8 blades, 4 uplinks).

    Let's test somthing that is real world. Where is dual Flex10's or Dual FEX's? Until that happens, the tests are just skewed to whoever is paying for the study. Oversubscription is a fact of life in the networking world and HP needs to not muddy the waters with FUD that oversubscription is a bad thing.

    I like both the HP and UCS platforms. I can get the job done with either of them. I'm not happy with studies that bring out “lab queens” that aren't anywhere near real world to sling mud one way or the other.

  13. Dan Hanson

    Disclaimer: I am a Data Center Architect at Cisco and have been designing E-Commerce sites for 10 years.

    This exactly shows the issue for the testing. Cisco documentation clearly states these fundamental operating modes which were missed. As you point out:

    “So what we see in the test is simple physics, two Cisco UCS blades servers share a wire and therefore the bandwidth is shared between the two but is less than what would have been if it is not shared”

    The missing element? That is a PER-FABRIC operation, but both fabrics can be active – hence no oversubcription. You can define this when creating your profiles. This is a simple google search and why it was missed/ignored can only be answered by the testers.

    Is this an issue with HA? No – customers hate reserving unused links for bandwidth backup. How have they designed around this? By loadsharing per VLAN (802.1s+w) on alternate links. If a link fails – viola – you drop the bandwidth in half but everything stays up. This is the situation the Tolly test showed. If there is a failure along the flow, all paths/adapters to the OS/etc. remain – only a potential bandwidth drop during an outage situation that QoS handles very nicely (oh wait – there was no mention of any of this in the report).

    Racetrack numbers, 100% traffic inside a chassis, lack of understanding of basic setup of the UCS = YAWN. Interesting to see the properly engineered solutions outperformed Flex-10 – even without the $40k-$60k per-chassis switching/extra management points that are required (oh yes – then add in the management costs)

  14. Brad Hedlund

    So it appears that only (1) FlexNIC was used on the HP server for this test. That detail is missing but easily assumed given the results.

    HP: Is that the configuration you recommend to your customers for a VMware environment? Only (1) FlexNIC configured for 10Gbps?

    The answer of course is, No. Since Flex-10 has no intelligent QoS capabilities, your customers need to carve out additional FlexNICs for important management traffic, leaving something less than 10Gbps available for actual production data traffic.

    So, HP, why don't you rerun this test with using a configuration on the HP server that your customers would actually implement in a production environment?

    Since Cisco does have intelligent QoS capabilities, a full 10G of production data throughput is available to the server, while also guaranteeing bandwidth to management traffic, so at least you got that part of the test right.

    Cheers,
    Brad
    (Cisco)

  15. M. Sean McGee

    (Disclaimer: I currently work for Cisco as a data center architect. Prior to that, I was a network architect for the HP BladeSystem BU for many years. All posts represent my thoughts and opinions and not necessarily those of my past or present employers.)

    Hi Ken!
    Long time no talk, my friend. I really wish I could have hung out last week in The Woodlands. I was looking forward to a shuffleboard rematch… :-) The other guys also asked me to tell you “Hi” for them.

    I sincerely appreciate your willingness to openly discuss the differences in our company’s respective blade architectures. There might still be some blade customers that have misperceptions about Cisco’s UCS new blade architecture and these types of open discussions really help clear those up. I know we are both passionate about our products and want to make sure the facts are clearly understood so customers can make an informed decision.

    You brought up several good questions in your post. I’d like to take the time to individually respond. I apologize in advance for the lengthy reply. However, there is a lot of information covered in our discussion.

    #1.
    Sean said: “Cisco doesn't put in little mini switches in every blade chassis”

    <Ken’s reply>
    The UCS2100 FEX is a mini switch much like Virtual Connect. UCS will allow you to multiplex a single 10Gb uplink from the UCS2104 FEX to all 8 blades in an enclosure. Any time a connection is multiplexed, there's some sort of switching going on. Both Cisco and HP would rather you don't think of these devices as a switch, because switches have a bad connotation, (points of management, spanning tree convergence, etc). Cisco likes to complain that Virtual Connect is too much of a black box, but their FEX is even more of a black box. To make matters worse, Cisco doesn't take advantage of the mini switch and forwards all the traffic to the top of rack UCS.

    <Sean’s reply>
    I’m glad to see you agree that Virtual Connect is a mini-switch. ;-) All multiplexing does not have to be plain old switching, like what a mini-switch does. Cisco UCS introduces the concept of a “remote line card” for multiplexing. That’s just one of the innovations in Cisco’s approach to blade switching vs. the legacy blade architecture of the big three blade server vendors.

    Calling a UCS FEX a “mini switch” is like describing a Cisco Catalyst 6500 switch as “a chassis with a bunch of mini switches”. We all know the Catalyst 6500 is a single switch that is expandable by adding line cards. The UCS Fabric Interconnect is also expandable by adding line cards. The difference is the UCS Fabric Interconnect line cards have been physically moved from the central switch to the server blade chassis. Hence, they are remote line cards, not mini-switches. They don’t represent “black holes” any more than line cards on a Catalyst 6500 represents black holes. In both cases, the line cards are fully managed and monitored as a part of the top of rack switch, not managed individually like bladed mini-switches. The UCS Fabric Interconnect and FEX architecture is a fundamentally new approach to server blade design and does not perpetuate the concept of individual-decision-making bladed mini-switches.

    Unlike a remote line card, with bladed mini-switches (like Virtual Connect, as you said) the blade customer has lots of little devices making their own decisions about how to switch Ethernet frames. Each mini-switch represents an independent data plane and each of these data planes has to be configured, monitored, troubleshot, secured, etc. because they are each independently deciding what to do with a frame when they receive it. The customer may have a single UI per enclosure to configure the mini-switches, but the mini-switches are still making their own decisions about moving frames. A remote line card doesn’t do that. A remote line card is an extension of the top of rack switch. That means the top of rack switch (Fabric Interconnect) decides whether or not frames are allowed between two servers in the same (or different) chassis, what the QoS policy is for a particular flow across multiple chassis, how much bandwidth different NICs across all chassis are allowed to consume, etc.

    In regards to “Cisco…forwards all the traffic to the top of rack” – that’s how we did it with rack servers…all traffic went to the top of rack switch (or end of row) and that wasn’t considered a bad thing. Why is it a bad thing now? The simplest design in switching is the design with the fewest layers. Why do I want to add a mini-switching-layer for every blade enclosure I deploy? Cisco UCS architecture provides all the infrastructure benefits of blades (shared power and cooling, reduced foot print, reduced cabling) yet UCS’s network architecture provides the simplicity of a rack server deployment – no additional switching layers and clear visibility to each individual server NIC (including Virtual Machine vNIC visibility with the Palo CNA and VMware integration with VN-Link).

    The new blade architecture introduced by Cisco UCS is a win-win-win for the server, network, and storage teams. The server team can deploy an architecture with all the benefits of blades while the network and storage teams retain the simplicity of a top of rack (or top of multi-rack) switching design.

    #2.
    Sean said: “Cisco also doesn't put in 2x little mini rack managers”

    <Ken’s reply>
    Each UCS2104 contains a management chip. Just because Cisco hides there's in the UCS6100 interface doesn't mean management chip is not in there. In addition UCS requires $60,000 worth of top-of-rack UCS 6100s before you even buy your first blade.

    <Sean’s reply>
    “Having an ASIC that performs a function” is very different than “having an ASIC that performs a function AND has to be individually managed by the server team”. The point is not that UCS doesn’t have the ASICs. The point is that Cisco blade chassis ASICs work together with the Fabric Interconnect to form a single logical chassis made up of several physical chassis that is managed as a single logical blade chassis. Cisco’s approach to blade chassis management is very similar to their approach to blade switching – it’s centralized, simpler, easier, etc.. The Cisco UCS chassis management controller is like a “remote line card” for the top of rack blade chassis manager (Fabric Interconnect). To use a geeky analogy, it’s like the Borg in Star Trek. :-) (No, I’m not a Trekkie.) This fundamental change in blade chassis management radically reduces management overhead for the server admin. Engineering ASICs to work together to perform a centralized function is not the same as “hiding” the ASIC.

    As an example, let’s use management IP addresses to demonstrate the architectural differences in points of management. Here’s two 80 blade scenarios that use the minimum number of infrastructure management IP addresses:

    **Scenario A: 5 HP BladeSystem Enclosures with 2 x VC Enet and 2 x VC FC (80 server blades)
    – 10 x IPs for Onboard Administrators
    – 10 x IPs for Virtual Connect Ethernet (Flex-10) modules
    – 5 x IPs for Virtual Connect Manager cluster address (optional but typical)*
    – 10 x IPs for Virtual Connect Fiber Channel modules
    Total: 35 IP addresses for 80 blades
    (*use of Virtual Connect daisy chaining for enclosures 1-4 reduces IP count by 3 – so, 32 total)

    **Scenario B: 10 Cisco UCS Chassis with 2 x Fabric Interconnects (80 server blades)
    – 2 x IPs for Fabric Interconnects
    – 1 x IP for Fabric Interconnect cluster address
    Total: 3 IP addresses for 80 server blades

    HP’s 35 Management IP addresses vs. Cisco’s 3 Management IP addresses demonstrates the fundamental architectural differences in management philosophy between the two approaches.

    #3.
    <Ken said>
    In addition UCS requires $60,000 worth of top-of-rack UCS 6100s before you even buy your first blade.

    <Sean’s reply>
    Don’t tighten that sales hat too tight, bro! ;-) We all know the 6100s ports are licensed on an as-needed basis so that customers don’t have to buy the entire infrastructure until they need it – just like many Fiber Channel switches. In other words, the customer doesn’t have to license all UCS Fabric Interconnect ports when they buy the first UCS blade server.

    On the flip side, the mini-rack model means the customer has to re-buy the mini-rack infrastructure for every 17th blade – additional mini Ethernet switches, additional mini Fiber Channel modules, and additional mini-rack managers (OA).

    In other words, the pricing comparison between the architectures is more than just lobbing a single number up in the air.

    #4.
    Sean said: “HP requires a separate I/O module for every physical NIC port”

    <Ken’s reply>
    Well so does Cisco. If you only have one FEX, only half your CNAs will have link.

    <Sean’s reply>
    I think you missed my point. My point was that my HP scenario example would not be a real world scenario – just like the UCS scenario tested by Tolly wasn't real world.

    #5.
    Sean said: “Very few customers would run UCS, or any other blade solution, with a single upstream switch. No one wants a single point of failure.”

    <Ken’s reply>
    So you agree that you don't put two of something in a system because you need two to run properly, you put in two so it will run properly when one fails. But your closing statement implies you need all components in place to work properly. If all components are required to make it work properly, UCS is missing some key redundancy.

    <Sean’s reply>
    The UCS Fabric Interconnect + FEX, just like HP’s Virtual Connect, have to be physically installed to provide physical NIC connectivity. Like Kevin Tolly said, it’s plain physics. The report is based on an incomplete UCS system that has “external switch + remote line card” and “<missing external switch> + remote line card”. It’s clear that the second remote line card (FEX) will not work without it’s associated switch (Fabric Interconnect).

    The point here is that HP and the Tolly Group are testing the new blade architecture (UCS) using a legacy blade architecture mindset (mini-racks and lots of bladed mini-switches) to test a non-real world configuration using half of the required parts (only 40Gbps instead of the 80Gbps possible) and then are reporting that the sky is falling. UCS, when configured correctly, provides 8 active 10 Gbps uplinks for up to 8 blade servers. That's 10GE per blade server slot.

    #6.
    <Ken said>
    Just because Cisco hides the switching and mangement capabilities within their enclosure, doesn't mean they don't exist. It might seem simpler to manage, but in hiding it, they force all traffic to the top of the rack. Forcing traffic to leave the enclosure is what's causing the difference in performance in these two tests. The hardware is there, Cisco could allow local switching within the enclosure, but then they'd lose the ability to say “we're different”. Different doesn't mean it's better.

    <Sean’s reply>
    Cisco is not “hiding” devices. Cisco built a new architecture that doesn’t require every ASIC to be individually managed – they work together as a system. That requires a new mindset, not a mini-rack mindset. Trying to apply the legacy mini-rack mindset to the new Cisco UCS blade architecture is like trying to apply the constraints of the legacy horse and buggy transportation method to the brand new Ford Model T transportation method. Its true – UCS users can’t put hay in the gas tank or nail horseshoes to the tires. ;-)

    “Forcing” traffic to the top of rack is not a bad thing – just like it isn’t a bad thing for rack servers to all communicate through top of rack switches. It means you have LESS layers of switching. You have the simplicity of multi-chassis management from a single interface. You have enhanced traffic control and security across multiple chassis. Those are all good things for customers.

    Lastly (back to the original purpose of the blog), UCS delivers up to a full 10 Gbps of traffic PER half slot blade server to top of rack or up to a full 20 Gbps PER full slot server to top of rack – contrary to the Tolly report. That’s A LOT of bandwidth per server. The sky is not falling as some might suggest. All UCS blade chassis uplinks can be used at the same time – as in Active/Active – contrary to the Tolly report. In other words, UCS provides a full 80 Gbps for an 8 blade server chassis. Just to be clear – all 8 x 10 GE uplinks are active at the same time. In my opinion, the Tolly report should be renamed to “An HP Best Case Scenario Compared to a Cisco Worst Case Scenario”

    Again, thanks so much for the open discussion, Ken. I think it’s great for potential customers that are evaluating our products to gain the insight from technical discussions like this.

    Your friend,
    -sean

  16. Pingback: HP Blades Day 2010: Thursday – Day 2 |

  17. M. Sean McGee

    In regards to “UCS has to send it to the UCS6120 top of rack” – that’s how we did it with rack servers…all traffic went to the top of rack switch (or end of row) and that wasn’t considered a bad thing. Why is it a bad thing now? The simplest design in switching is the design with the fewest layers. Why do I want to add a mini-switching-layer for every HP blade enclosure I deploy?

    Cisco UCS architecture provides all the infrastructure benefits of blades (shared power and cooling, reduced foot print, reduced cabling) yet UCS’s network architecture provides the simplicity of a rack server deployment – no additional switching layers and clear visibility to each individual server NIC (including Virtual Machine vNIC visibility with the Palo CNA and VMware integration with VN-Link). UCS provides 10GE of bandwidth PER blade server, contrary to the Tolly report

    I'm still confused why you think multiple little layers of switching inside a single rack is better than a single, intelligent switching layer per rack. Blades aside, is HP recommending that customers deploy little rack switches for every 16 rack servers and then connect those up to a top or rack switch? What's the difference?

  18. BladeGuy

    (disclaimer: Ken Henault, HP yada yada yada)

    Yeah Sean, it would have been nice to catch up with you last week. How old is Kian now, 4 months? You getting any sleep yet?

    I believe a fair and honest exchange is best for everyone, but I also believe it's you that has your sales hat on a little too tight. That FEX or remote line card or whatever it is you want to call it has one of them fancy Nouva ASICs on it. It looks at the MAC addresses of the incoming packets, and decides where to send them. So despite what your marketing folks might want you to say, this thing walks like a duck and quacks like a duck, I'm thinking we have a duck here.

    While we're on the subject of sales hats on too tight, a pair of 6100s with power supplies and I/O modules does have a list price of about $60,000, without any port licenses last time I checked. As you and your friends have spent so much time pointing out, you really do need a pair of them before you buy your first server blade. Now HP only used one when testing your blades, but they only used one VC module as well, so it's pretty much even there.

    As to whether this test represents real world or not, VMware was very proud to demonstrate how a real world vSphere 4 server could flood a 10Gb NIC at VMworld last year. Now the traffic might have been generated different than how VMware did it, but from a network perspective, one flooded 10Gb link is as good as another.

    Well it's late, and I'm too lazy to write a book tonight, so I'll just leave it at that. Take care of Kian, and say howdy to Heather for me. Stop by my blog when you have a chance, http://bladeguy.blogspot.com .

    Take care bud,
    Ken

  19. BladeGuy

    Most of my customers have been using VC to stack 2 or 3 enclosures, then just using home runs to the core. I don't know anyone that uses TOR with VC.

  20. M. Sean McGee

    Hey Ken,
    Kian is doing well. Yes, 4 months. Yes, I'm sleep deprived. But it's worth it as soon as he can mow. :-)

    Quick responses:
    – How did you determine that the FEX “looks at MAC addresses of the incoming packets and decides where to send them”? Your logic analzyer must be broken. ;-) Anytime you are near Austin, TX, feel free to stop by and I'll show you in person.

    – I guess by your same argument, your first rack server requires you to buy two top or rack/end of row switches. Since when does a customer by a single rack server or a single blade server? I truly don't understand this argument…

    – If a customer has a VMware environment that needs a full 10GE per server, UCS can provide it. A properly configured UCS system provides 80Gbps of traffic to the blade chassis. That's 10 Gbps per server.

    Yes, I stopped by your site and posted a comment. I'm looking forward to the response. ;-)

    Have a good night, sir.
    -sean

  21. M. Sean McGee

    So you'd recommend a pair of stacked rack switches for every 16 rack servers (4-6 switches per rack – the VC model) rather than a single pair of switches serving two whole racks (the UCS model)?

  22. BladeGuy

    Now you're getting rather childish. By that standard UCS would have 10 -14 switches per rack for the same configuration.

  23. Dan Hanson

    I agree that we have a difference in philosophy – do we move the network end-points out to the servers, or do we move switching out closer to them. If we move switching out closer – who will manage and field any 3am calls? Who manages this new access layer of data switching? How many devices do they have to manage? All of these questions relate to $$ that are not readily apparent on a parts list.

    I dont know that anyone on this list can say – this will all be market driven.

    Back to the main point of this topic – the tests that were run. This discussion is going in multiple directions – but it was a basic configuration issue.

  24. M. Sean McGee

    <Ken said: Now you're getting rather childish. By that standard UCS would have 10 -14 switches per rack for the same configuration.>

    Hey Ken,
    You and I have worked together for many years building a relationship of mutual respect. I really want to maintain that.

    It’s seems the confusion here centers on the UCS Fabric Extender (FEX) – a.k.a. remote line card – and your belief that it’s “a mini-switch like Virtual Connect” (from your earlier post). Your team trusted me over the years to deliver technically accurate training on blade switching and Virtual Connect and I’d respectfully ask you to not only trust me that the FEX is not a mini-switch like Virtual Connect, but I also encourage you to read just a few opinions from non-Cisco sources:
    Colin McNamara: http://tinyurl.com/yl62d8a
    Information Week: http://tinyurl.com/yz4tz3k
    Network World: http://tinyurl.com/by9463

    Hopefully with it clear that the FEX is not a mini-switch, I would like to re-ask my earlier question to better understand your position:
    Given two designs that both provide at least 10Gbps of bandwidth per server, would you recommend:
    A) a pair of stacked rack switches for every 16 rack servers (4-6 mini-switches per rack – the VC model)
    -or-
    B) a single pair of switches serving multiple racks (the UCS model)?

    Respectfully,
    Sean

  25. Bob

    This report is getting hammered on the internet.
    Just google the keywords like: hp, cisco, ucs, tolly
    HP's tactic seems to be backfiring.

  26. OmarSultan

    Gosh–there is such a great conversation going on, I didn't think there was much to add, but, here you go–here is the official take on things: http://bit.ly/cXQv6w

    Omar

    Data Center Solutions
    Cisco
    blogs.cisco.com/datacenter

  27. Big D

    Ok Kevin Tolly.. First everyone knows that Tolly is pay per view… As a vendor you pay Tolly to run a test with a predetermined plan that is stacked against the competition. Tolly invites the competition letting them know we are going to run a test “against” you formatted by your competitor.

    Tell me this what friggin moron would implement a UCS with one fabric interconnect.

    Would you drive your car with 3 tires on it?

  28. Kevin Houston

    RE: the Tolly Report – I “think” the result was from a misunderstanding of the UCS design. HP's design is an Active/Passive design, where if Module 1 fails, Module 2 takes over, so naturally the assumption would be that Cisco's design is the same, however that apparently is not the case. You can read Cisco's response here: http://blogs.cisco.com/datacenter/comments/the_…. Thanks for reading!

  29. ANetworkGuy

    Disclaimer – I work for Cisco

    HP misrepresents a fundamental fact of how UCS works, active/active bandwidth (which is clearly referenced in the data sheet, and in configuration guides), bases a very expensive “Tolly report' on it and it's a miss understanding of the design?

    From the UCS data sheet – “two fabric extenders provide up to 80 Gbps of I/O to the chassis”

    I wonder what else HP 'misunderstands' about networking? Like claiming HP Virtual Connect isn't a switch (it is), Flex-10 provides fair interface QoS (it doesn't), Flex-10 modules aren't managed (they are, they have IP addresses). The list goes on and on…

  30. Pingback: HP NonStop

  31. Kevin Houston

    Tolly Report Comment: The UCS data sheet says you use 2 FEX for 80Gbs of I/O, but what happens if 1 dies – don't you have 40Gbs of I/O, for 8 servers? I think that's what the Tolly Report was working toward.

    Thanks for your comment and for reading.

  32. Pingback: Stefan Mititelu

  33. Pingback: Thoughts about the HP Infrastructure Software and Blades Tech Day 2010 | TechHead.co.uk

  34. Pingback: Rolland Miller

  35. Pingback: Kevin Houston

  36. Pingback: Frank Owen

  37. Pingback: Kevin Houston

  38. Pingback: Kevin Houston

  39. Pingback: Joe Onisick

  40. Pingback: Kevin Houston

  41. Pingback: Mark S A Smith

  42. Pingback: Blades Made Simple » Blog Archive » New Study Shows Dell M1000e Chassis Most Power Efficient Chassis

  43. Pingback: New Study Shows Dell M1000e Chassis Most Power Efficient Chassis « BladesMadeSimple.com (MIRROR SITE)

  44. Pingback: M. Sean McGee

  45. Pingback: Blades Made Simple™ » Blog Archive » 2010 Stats for BladesMadeSimple.com

  46. SomeOne

    Well with the new dual 40-Gb virtual interface cards for servers, and the doubling of bandwidth from the chassis to the FEXes (80Gbps) with the UCS 2208XP, the situation certainly has evolved.

  47. Pingback: David owen

  48. Pingback: Digi auto links

Comments are closed.