Author: Kyle Hughes

Quest for VM Backup Solution

This goes along with my last post around the time VMworld 2011 where I was in dire need of a new backup solution for my virtual machines.  I figured I would share some of the good, the bad and the ugly as my journey for finding a new backup vendor continues.

First off I will say I have been using PHD (formally esXpress) for about four years now. Back in the day I had great relations with the staff there, great support and overall had no issues with the backup software. Piece of mind is key for me knowing that backups will run fine and allow me to keep my weekends free from unnecessary work.  Over the past few years, esXpress was bought out, members of the staff have changed and from my viewpoint on things, their support and customer service has dwindled.  We have been stuck using an outdated version to keep the Full/Delta model running with encryption for a good couple years now. That had worked for a while but the last version was pretty buggy, they really didn’t support that version much longer and support calls in to them were met with various “well you can disable this feature and it should work fine” when the feature they wanted you to disable was allowing simultaneous backups, crucial part for keeping within the backup window.  All those factors caused us to rethink our backup solution.

The first criteria was that the solution could encrypt the backups for us, allowing us to offsite tapes without encrypting the tape as a whole.  The only solution I saw that fit that was Acronis.  We currently use them for our desktop imaging and albeit a few bugs and hiccups here and there, it works pretty well. On top of that being able to integrate the bulk backup solution to the same management window I thought it was a perfect fit.  At VMworld 2011 they released vmProtect to go along with their Backup and Recovery 11 product.

Acronis vmProtect 6

Good – Easy to setup, allows sandbox restores to test backups, encryption built in (AES 256), easy creation of jobs

Bad – No centralized jobs / management.  Can’t simultaneously backup VM’s per appliance.  Would have to create multiple jobs with certain VM’s and run it at the same time to get “simultaneous” backups.  I will be honest after hearing about single VM at a time per job, I stopped testing right then and there as it wouldn’t cut it for me.

Overall – Acronis seemed to actually put together a pretty good backup solution in my mind with vmProtect.  It was pretty easy to setup, configure and get going with your backups.  A web interface that was very GUI oriented made it visually easy to see what was going on and configurations that were made.   While it lacks larger scale features that could drive an administrator insane (no simultaneous backups), for SMB’s with a few hosts or a small SoHo it could be a great fit.

Acronis Backup and Recovery 11

This is the product I was testing prior to vmProtect came out and after VMworld 2011 after a couple long discussions with other attendees and staff at the Acronis booth.  At VMworld, I was pretty set that Acronis was going to be my solution from what I had seen on their page, initial testing and experience with their workstation side of things.  After that point, things just went downhill for me and ABR 11 Virtual Edition.

Good – Encryption built into the backups, nice centralized management, multiple VMs backing up per appliance (up to ten I’ve been told).

Bad – This list may get a little long as my experiences over the past three weeks haven’t been the best. The setup seems pretty easy, but in the end it isn’t easily upgradable as they have released a patch or two recently. A red flag went off in my head at VMworld as even one of the technical guys said it was a pain to setup, but once it was up and running it worked fine.  Software glitches, if you searched for Acronis errors there are other reviews from past versions of the product, and also on their own forum boards which are littered with negative comments in regards to them adding features but not fixing major flaws in the system.  Licensing server is pretty cumbersome especially when they only release a 15 day trial which makes you continually add licenses in, gets pretty cluttered pretty fast.  No clear documentation on access rights, what is needed where.  I can’t tell you how many times I would test a backup, move it to a larger backup job, even clone the backup job to keep the settings only to have it error out due to access is denied messages.  Connecting directly to the appliances and running the backup there would work but not from the management console running the same job.  Personally I was working with an account rep, who I have no gripes with, but the process made getting support when I needed it hard as they would want me to write the problem to the account rep, who would forward it to the support person, then the support person would write back to the account rep and who would forward it to me.  The few webex’s we did he would troubleshoot the error messages by going directly to the appliances and the backups would work there, but later that night would try to run the job from the management console and it would fail.

Overall- In the end this solution just didn’t fit our needs.  We were looking for a newer solution that was reliable and well known. I felt I would be switching one unstable product for another and that is something no one should do when it comes to disaster recovery items.  There are just too many downsides, too many negative comments on their own forum boards and other blogs to continue making a square peg fit in a circular hole.  While I know there are a good handful of people that have it up and running successfully in their environment, after three weekends of trying to switch over and never having a successful full backup I’m choosing to look other places for a new solution.

Being prepared for the unexpected, and the expected

I will start off by saying I tend to consider myself a prepared person for most situations and tend to think in the logic of the flow charts if the result of A is this, then move here, if not move here.  Recently I found myself scrambling after our old SAN that stores our bulk backups took a dump and we lost the controller.

I can’t say this wasn’t expected, but I found myself not prepared with the appropriate plan of action which made for a week and a half of long days and nights with my director on my case to get things back up and running.  While all I needed was a simple storage solution for the bulk backup to push to, there was also a counter part of getting these backups to our tape for offsite storage.  When this all happened there wasn’t a huge amount of panic as the best thing you can do is to stay cool and collected, logically think “what is the best option to move forward with?”

For me this was using our little IOmega ix4 NAS box which allowed me to just create a SMB share and pointed the backups there.  It didn’t quite have the performance of our OLD (IBM DS400) Fiber Channel box, but it did well enough to think, hey this isn’t that bad.  I was able to rerun all the Full backups from the weekend and was on my way to having my temporarily solution in place. This is when part two of our backup process came into place, moving the backup files to tape.  I rudely found that I was not able to get Backup Exec to attach to that SMB share and backup the full files.  While I know I probably could’ve eventually figured out how to get Veritas to see those files it seemed like it wasn’t the best solution for a timeframe of getting a new storage solution in place.

I then went in the direction of creating a iSCSI disk from the ix4 to the backup server running Backup Exec.  This in itself was a small challenge since we’re a FC shop, I have had next to no iSCSI experience, a bit of time on Google was able to help me muddle through the process of setting it up.  I copied the files off of the SMB share, onto a small USB Storage drive, attached the iSCSI drive to the server and copied those files back up.  I thought well, this is good, now I can see the drive through Veritas and I’ve always heard about how good iSCSI is, sweet here is my solution!  This time around, performance on the iSCSI drive just tanked. While I know iSCSI works and the performance is there, or companies like NetApp wouldn’t be in business anymore, my configuration wasn’t up to snuff obviously.  So here is attempt number two in the books, no closer to finding a solid solution to hold off my management from making a rash purchase that I would have to live with for the next five to seven years I started to panic a little.

After a week long process of trying to get this up and running, staying up to check on performance during the backup time window, I was starting to get really frustrated and knew my time was drawing short. The golden rule of IT kept playing over in my head, lose data, lose your job and I was going on a week of no bulk backups.  I was about to go to bed, when my girlfriend reassured me, you’ll find something soon, and that is when the light went on in my head.  Earlier that day looking at our production SAN for how much storage we would need to replace our production san (move our production to backups) I remembered there was a 700GB chunk of storage not being used.  Funny thing was the DS400 had 700GB of storage we were using for backups… a perfect match. There should be no performance issue going onto a more powerful SAN and it’s on the fiber network.  The biggest obstacle would be convince my director to allow me to carve this logical drive out for backups, where he always put his foot down that he wanted backups and productions strictly separated.  That night I sent an email stating all the reasons while we should use this as our temporally solution, he agreed to my idea and off I went.

Right now that is the solution we’re using and it seems to be holding up, at least for the time being.  The moral of the story is to think what equipment you have, the performance of that equipment and how are you prepared for the unexpected, or in my case the expected.  While I thought while we have this Iomega ix4 which I had been using for my R/D lab and it had worked well for that, it couldn’t handle 15-20 concurrent connections all dumping backup files.  If we didn’t have that 700GB on our production storage, I would probably still be in the weeds.  While you can’t always have the equipment onsite to counter any problem, it doesn’t hurt to think about where you are vulnerable or single points of failure and what you might have to do to counter those issues if they ever arise.

VMworld 2010 Day 3, 4 and Closing Thoughts

While everything is starting close down, the solution exchange is now closed and the final sessions are starting up, here are a few closing thoughts while combining day three and day four.  Day three was composed of a few sessions but my day to really browse around the solutions exchange to see if I could fill some needs at my company.  I thought one of the better sessions I attended during day three was the “Storage Best Practices, Performance Tuning and Troubleshooting.  While it was given by an IBM employee, he kept away from really selling their products, besides a few jokes here and there.  Being someone who is getting more into our storage systems I’m learning a little bit here and there regarding calculating IOPS, what does and doesn’t matter and better options to go down.  Of course there was the VMworld party Wednesday evening, which is the primary reason of no day three write up, the adult beverages were flowing quite freely…  I will say while I enjoyed the main band of Foreigner more last year, the overall party was better than 2009.  They brought in more games, more things to do, expanded into the Metreon and their arcades.

Day four for me, like some may know, was a very long morning just trying to get to the conference unfortunately.  I live about 45 minutes outside of San Francisco so I had been using public transportation (BART rail) to get into the city which hasn’t been too much of an issue since Moscone center is only a few blocks from a station.  This morning started off with a small earthquake (3.3) in the Bay Area and a region of BART’s rail lines which caused an immediate halt of all BART trains for a 5-10min period followed by current inspection of the rails by trains out on the tracks, which meant the train I was on went under half the speed it would normal go.  Finally was nearing my stop and got up to realize “Where is my VMworld badge….?” I worked backwards and figured I left it at the station where I got on the train.  Headed an hour back to that station, luckily someone turned it into lost and found and then got on the next train back into the train for another hour ride to San Francisco… by the time I got to the Moscone Center it was close to 12:30pm and I have missed the two sessions I was really looking forward to this morning.  I guess what’s nice about VMworld you can always just watch them from your couch at home later on.  I was able to catch my last scheduled session which was TA8133 – Best Practices to increase Availability and Throughput for VMware which was hosted by Vaughn Stewart, NetApp and Chad Sakac, EMC.  These two storage powerhouses put on a great session which, for a reason I can’t explain, I didn’t pull out my Flip video to record because it was quite an informative session.  I would say that was the best one for me all week.  If you get the sessions from VMworld 2010, I would defiantly say check out that session for great storage tips.

Overall 2010 was another great VMworld.  The weather this year has been pretty weird in California all year but this week really showed the true “California weather” and how nice it can be here.  I even broke out the shorts today, something that you may never get to do all year here.  I find myself very lucky to have attended the last three conferences even with the down economy.  While there were some changes that were implemented I didn’t quite agree with or understand, next year will be at Vegas which I think can handle such a large conference a lot better.  This year there was over 17,000 attendees I think which was up from last year and I’m sure even more next year as virtualization continues to grow.

On a personal note, while I wasn’t too upset earlier this year when I wasn’t reselected as a vExpert 2010 after getting it in 2009 I found myself really missing it especially when they had the vExpert session before the party and everyone I knew disappeared to the “not-so-secret” location.  Hopefully this conference gave me the motivation to get my participation up in the VMTN forums and here on the blog side.  Also, coming from a SMB environment and not finding as much material for my type of builds I’m looking forward after bouncing ideas from some fellow friends about going for a session in 2011.

That’s it from VMworld 2010, looking forward to 2011 and another great conference.  Maybe next year Tom can make it from across the pond to share some slush puppies in Vegas!  Now it’s off to settle a score with Brian Atkinson (VMroyale from VMTN) from VMworld 2008 at the go-cart track!

VMworld Day Two

As day two starts to calm down and the last sessions start to end a lot of news was presented to the VMworld attendees.  Some of the highlights of the keynote from this morning was the release of vCloud Director (formerly known as project redwood), release of security suites from edge to endpoint, acquisition of Integrien, and the introduction to the “cloud stack”.  A lot of this information was anticipated but still a lot of information to take in during the keynote.  A big thing I noticed from Paul Maritz he was pushing a lot of these changes to the cloud was going to happen with or without VMware but they want to be the one to push the issues.  The replay of the keynote this morning can be found at www.vmworld.com

Another busy day at the sessions around the Moscone center, with sessions at capacity again.  For the most part things seemed to run a bit smoother today as people calmed down and realized they were going to eventually get into all the classes they wanted to get into.  I spent the day trying to adjust to this “cloud” idea moving away from our cluster setup.  I decided to switch a session from vSphere best practices to moving your cluster to private cloud.  Unfortunately I ran into a session where the speaker didn’t speak on anything that was in the subject of the session.  There were quite a few angry people, like myself, leaving that session.   Besides that most of the other sessions I was in were informative and useful, and was able to take a few things here and there that I could implement.

After a few years of VMworld I’ve learned a few things about not burning yourself out on sessions early on in the week and allow for networking time where you learn quite a bit.  I made sure today was a shorter day as tomorrow will be long with the VMworld party scheduled and one more day after that.  Hopefully tomorrow will be a quiter day in the solutions exchange as I have mapped out some vendors I would like to sit down with and really learn more about their product.  I’ll have a beer or alcoholic slush puppy  for everyone who wasn’t able to make it down to to VMworld 2010 tomorrow at the party.

VMworld 2010 Day 1

Being the third VMworld I have attended I had a pretty good idea what to expect when I arrived.  I know there were some changes to the setup of sessions and labs going from a sign up format to pick your sessions and first come first serve gets in.  Also with this new format change came many repeat sessions for those who couldn’t get in during their first try.  I have some mixed feelings regarding this new change as I feel the scheduling has fallen into the hands of the lazy attendees who didn’t bother to sign up for sessions before they filled up, but on the other side it allowed everyone a fair chance to get into sessions regardless when they signed up.  What came of the change were extremely long lines for almost every session, making those in line wonder if they were wasting their time in the back trying to get into a popular session.  Along with the long lines you had no idea if you were in the correct line for your session.  All the sessions I was in line I was able to get into.  Another by product of this new setup I think were improved attendance per session as before those sessions that might have shown “full” in past years causing people to not even consider showing up, now everyone is showing up and filling up the rooms.  All sessions I was in today were nearly completely full at the start.  While that is a good thing having full sessions it felt more crowded than a middle row, middle seat of a 747 jet in the back of the plane.

Long lines for sessions made it extremely hard to move around the VMworld conference since most lines were 100-500 people long wrapping around corners and zigzagging through the grounds.  Another crowded space was the eating area. I know it’s hard to compare this year’s conference from past VMworld’s such as VMworld Vegas 2008 which had a giant eating area where everyone could find a seat.   This year lines were extremely long to get into the eat area to just get food let alone trying to find a spare seat or two.

Once 4pm rolled around the solution exchange opened up for the first time and the area filled with attendees, vendors, guests along with everyone asking to scan your badge.  It was busier than usual to the point it was hard to just move around the area let alone talk to any of the vendors with them trying to give away free gear with hopes of winning money or another popular item, the iPad.  It got to the point after 15-30min of wondering around, uneventful booths and less than interesting “shwag” it was time to get out of there and come back later to get some decent information from the vendors later on.  If I get that chance to win the iPad later in the week so be it, but the less spam I get over the next year the better.   According to sources there might be a couple big announcements tomorrow at the key note which I’ll try to get a video for the site.  Until tomorrow and day two of VMworld…

Possible Issues with June Microsoft Security Updates

There have been reports that a couple of Microsoft security updates for the month of June are causing some issues with VMware vCenter, the main one is KB 980773 – this is an update to DotNet.  While I am in the process of rebuilding my test lab this will be something I will defiantly test out prior to updating our production vCenter server this month.  For a little more information of the subject, there is a small discussion going on the VMware forums regarding this issue, this should be a decent source of information until more is discovered later on.  http://communities.vmware.com/message/1549225

http://communities.vmware.com/message/1549225

esXpress runs into issues in 2010

Looks like the change to 2010 caused issues, very simular to what most feared when we changed into the year 2000. It looks like when we moved into 2010 it caused issues with all version of esXpress, 3.1.* and 3.6 not allowing it to run delta backups.  Right now the scheduled backups are running as straight full backups, which obviously adds more time and more storage space needed.  I don’t believe the dedup process is affected with this glitch.

It looks like they’re already in testing for a fix, and I have been told that they will be patching both 3.1.* and the 3.6 versions (for all you that are still running 3.1.*, you wont need to upgrade quite yet). For more updates please keep your eye on this thread – http://www.phdvirtual.com/forums?func=view&id=1898&catid=13

I’m hoping they get out a patch sometime today (1/5/10) but in the mean time make sure you have plenty of storage space to house all the full backups until they fix the issue.