Paul Edlund's profilePaul EdlundPhotosBlogLists Tools Help
    June 19

    Third Party Vendors and Exchange 2007

    I’m really interested by what I’m learning about how companies protect their cash cows.   Instead of competing on merit, some sales people and consultants seem to fall back on fear as a means of showing a value proposition.  IMHO, this is not only a bad business tactic, it’s downright frustrating for your customers. 

    Case in point, I’ve been working with a customer on an Exchange 2007 deployment.  We’ve had many discussions around Exchange’s Local Continuous Replication (LCR), Cluster Continuous Replication (CCR) and Standby Continuous Replication (SCR).  Before Exchange 2007, LCR, CCR and SCR was not available.  Instead, 2-node (Active/Passive) or N+1, which is now called Single Copy Cluster (SCC), has traditionally relied on a shared storage solution such as SAN.   I’m not going to cover LCR in this blog entry but you can read about it here.  So now, vendors are trying to tell my customer that they are making a “grave mistake” by utilizing CCR and SCR in an enterprise scenario.  When you peek under the covers at this argument, it reads like a script that someone in the marketing department created to protect their investments.  Before I get into this rat’s nest, I don’t mean to imply that Microsoft doesn't use market intelligence when competing.  However, as sales people and consultants you have to backup your value proposition with hard evidence.  Let me tell you this, in the scenario above with my customer, the vendors were not able to provide such evidence.  If I find out that they have provided evidence, I promise that I will modify this blog post to represent their findings so long as they are valid.

    First let’s cover the basics:

    Instead of trying to recreate a complete primer on the differences between LCR, CCR and SCR, I’ll just provide you with a link here.   

    Basically, the value of CCR is that it allows you to have two local servers clustered together without having common storage.  The functionality is similar to SCC but without the shared disk.  As a matter of fact the two servers could even be different vendors with different types of disk in them.  The caveat is that the drive letters should match on both boxes.  In theory the secondary server could even be virtualized but that represents a support issue that would need to be elaborated on in a separate post.  Here’s an image of a CCR cluster taken from TechNet.

    Cluster Continuous Replication Architecture

    SCR, which was introduced introduced with Exchange 2007 SP1, is similar to what is called “stretch clustering or geo-clustering”.  This is NOT exactly a cluster because the word “cluster” implies that nodes are acting together as a virtual server.   While it isn't exactly a cluster, it provides failover that spans the wide area network and allows you to failover between two datacenters.  With SCR, Exchange is also not reliant upon a shared disk solution.  Nor is SCR reliant upon 3rd party storage solutions to do disk to disk replication.  This is where storage vendors have made significant money in the past.  Namely to overcome the shortcomings of Exchange 2003 and earlier.  Here’s an image of an SCR cluster taken from TechNet.

    SCR from one stand-alone server to another

    So here we are in the present and we find some of these vendors spreading FUD (Fear, Uncertainty and Doubt) telling customers things like the following:

    • FUD #1 – Exchange 2007 could replicate corrupt data to the target CCR or SCR node.  So you need “our” solution because we check the content before it’s replicated to the secondary storage.
      • Response – Not true.  While it is possible to get corrupt data in a source or target database, Exchange does an integrity check as well as a checksum to make sure the log file is consistent and complete.  Databases are only replicated ONCE and that is during the pairing of a CCR or SCR nodes.  The only thing that is shipped after that is log files.  Truncation happens on each side the cluster.  In addition, a checksum and performed before the log is shipped from the source node and again before the log is committed on the target node.
    • FUD #2 – Databases could need to be “reseeded” because of long lag times in replication or WAN issues. 
      • Response – This is very much a fear statement that needs some explanation.   First, reseeding is when the entire database needs to be copied to the source node to the target.  If the databases are 200 GB, going over the WAN could take a long time.  If this happened frequently, it would generate so many product support calls that I’m sure Microsoft would remove the feature.  Under almost all conditions, reseeding is something that is generated by a person and not by disaster.  You should know that Microsoft internally uses CCR and SCR without SAN.  All disk is locally attached storage.  Here are the scenarios when reseeding could happen:
        •   Seeding is required under the following conditions (for SCR):
          • When a SCR target is created. (you expect this and plan for it)
          • After a failure occurs in which data is lost and an SCR target has become diverged or unrecoverable.  (This is when you have completely lost the secondary server… once again… you would plan when to recreate the pair.  This is not a disaster, this is a failure at the DR site equivalent to losing SAN at the secondary site.  This is where a backup comes in handy.)
          • When the system has detected a corrupted log file that cannot be replayed into the SCR target database.  (Since we perform a checksum before the log is shipped from the source and before it is committed at the target, very unlikely.)
          • After an offline defragmentation of the SCR source or target database occurs.  (Microsoft flat out states that offline defrags are ONLY to be performed when you must recover a corrupted source database.  In my 12 years of working with Exchange, I have only been witness to disastrous database corruption 3 times across hundreds of customers.) 
          • After a page scrubbing of the SCR source database occurs, and you want to propagate the changes to the SCR target database.  (Page scrubbing is the overwriting of zeroed out pages in the database making them unrecoverable.  This is something you have to turn on in Exchange 2007.  It doesn’t happen by default.  Also, with SP1, page scrubbing is handled gracefully by zeroing out the pages within the logfile.  This is shipped to the target server so it doesn't result in a reseeding event.)
          • After the log generation sequence for the storage group has been reset to 1.  (Once again, you would do this on purpose so it would be a self-inflicted wound to do so and would definitely reseed the database.) 
        • OK… so that was a long winded response to disaster.  Now let’s talk about network latency. 
          • First, check out http://technet.microsoft.com/en-us/library/bb676465(EXCHG.80).aspx  and look at the section called “SCR and Log Truncation”
          • Some 3rd party vendors will say, if you lose your WAN connection (they wont say for how long) then the database needs to be reseeded.  FUD!  In an SCR environment, an SCR target that is disabled and then enabled again may not need to be reseeded if all of the required log files are available, based on the following:
            • If circular logging is enabled for the storage group, log deletion will result in the enabled SCR target requiring a reseed due to gaps in the log sequence.  (Circular logging is disabled by default and usually not recommended under most circumstances because it gives you no mechanism in a database disaster to replay log files into the database.  So you have to go back to your last full backup to recover the database.)
            • If a backup is taken that includes log file truncation, log deletion will result in the enabled SCR target requiring a reseed due to gaps in the log sequence.  (Did you get that?!?!?!?!  That means you have to reseed the database only if you have truncated the log by either doing it manually or by completing a backup while the target node was unavailable.  So this is quite a disaster!!  You have to read that link directly above to fully understand this concept.  Namely, that SCR databases are truncated continually and SCR sources do not truncate logs (even if the backup is successful) until all SCR nodes are available.  So this type of reseeding should not be happening!) 
            • If log files are not truncated via either of the preceding means, disabling and then enabling SCR should not require a reseed. In this case, log files at the SCR target will need to be deleted, but they will be replicated again from the SCR source.

    You should take serious notice that Microsoft doesn’t use Single Copy Clusters for their production environment anymore.  Microsoft has 80,000 employees and including contractors it’s closer to 121,000 mailboxes in Exchange 2007.  Here’s a link to the Microsoft IT deployment of Exchange internally.  So we’re not talking about CCR and SCR as a SMB solution.  This is FUD talking!

    So now that you have read all this, you should understand that the fear some people are spreading is unfounded.  Of course there is always some “worst case disaster” that will result in something bad happening.  I mean, let’s be real.. stuff happens and usually it’s due to a human error.  So it is possible that what these sales reps say “could” happen but it’s certainly not very likely once you understand the facts. 

    Let me make this clear - There is value in SAN.  I’m a big believer that the only way to get the right amount disk IO per second (IOPS) in large Exchange installations is by using SAN.  Otherwise you need more servers to house the internal storage like a HP DL580 G5 which comes with 16 SAS drives like this one.   What I’m not saying is that you HAVE to use SAN replication schemes in order to achieve High Availability and Disaster Recovery.  THIS IS A FALICY in Exchange 2007.  That SAN is almost always more expensive short-term and long-term than using LCR, CCR and SCR. 

    So now let’s come back to cash cows.  Microsoft has their own cash cows… namely Office, SharePoint and the client OS.  Microsoft has to continually beat down FUD around those products as well.  Microsoft sales people and consultants should also be mindful of not spreading their own FUD.  Consultants should always stick to the facts.  However, those facts change.  So before you spread FUD (3rd party vendors or anyone else :) do your research and read a few TechNet articles! 

    Windows Live Mesh will change your life

    Not that I had many religious readers over the years... however, since I took on my new job, I haven't blogged much.  Mostly because I've been heads down learning some new tricks of the trade.  Plus I feel like if I don’t have anything original to say, that I should just keep my trap shut.  Well I'm opening the trap again I guess. :)

    Let me start by asking you a question... How do you share pictures with friends and family?  (Maybe flickr or Shutterfly?) How do you share other files with them?  (email?)  If you maintain a family tree with a piece of software like Family Tree Maker, how do you share that file with other people?  (you probably don’t) If you tag photos with things like people, places and date taken, how do others get those updates?  (you probably don’t so they are always out of sync) If you run web servers for a living, how do you share configuration across many web servers?  If you have customers, how do you send them big files? (Why would you use email?  They will hate you for it! :)

    I cant bet that 9 times out of 10, you are probably saying "email".  For the question about web server configuration, my guess is that you use something very basic like XCOPY or ROBOCopy on the Windows platform or something like RSync on everything else.  None of these methods are simple and they are all fairly labor intensive. 

    If you haven't signed up for a Windows Live Mesh account (www.mesh.com), you need to get one ASAP!  It will change the way you work on multiple computers.  Namely, all of the problems above are solved quickly and easily with Mesh.  Imagine taking a picture with your cell phone and instantly having that picture show up in your PC's Pictures folder.  Imagine tagging a picture with a name or a place and your family getting an update to the picture in their own Pictures folder!  This is what Mesh does.  It will support Mac's and cell phones.  It will put a copy in the cloud if you let it so you can access it anywhere.  It will let you take remote control of any PC you own... even behind a firewall or NAT. 

    For those of you more familiar with other Microsoft technologies like FolderShare or SkyDrive, Live Mesh will look like it overlaps those technologies… and it does.  However, Live Mesh isn't as constrained as those other two solutions.  First, Live Mesh has a 5GB limit when you copy stuff up to the cloud but it isn't a hard limit.  You can exceed the 5GB but it’s more of a “first-in-first-out” kind of transaction (currently).  Unlike FolderShare, Mesh does not limit the file sizes or number of items you can replicate.  There are some theoretical limits but getting up to 100,000 items is not likely for most people. 

    In the enterprise, the options are endless.  Pushing out application databases and configuration files securely will be enabled by Mesh.  Sending your customers or co-workers large media files or presentations will be a thing of the past with Mesh.  Mesh even has a rich API so you can use Mesh as a way to communicate with your users and customers.  I cant say it enough... you need to see Live Mesh!  (www.mesh.com)