ONTAP 9 Feature: Volume rehosting

Why Is The Internet Broken?

ontap9week

Clustered Data ONTAP (now known as NetApp ONTAP) is a clustered file system that leverages virtualized storage containers known as Storage Virtual Machines (SVMs) that act as “blades” to create a secure, multi-tenant environment with a unified namespace.

These SVMs own objects such as network interfaces and Flexible Volumes (FlexVols) and act as their own segmented storage systems on shared hardware. In previous releases, the volumes were dedicated to the SVMs and could not be easily moved to another SVM in the cluster. You had to SnapMirror the volume over to the new SVM or copy the data. This process was time consuming and inefficient, so customers, for years, have asked for the ability to easily migrate volumes between SVMs.

In the 8.3.2 release, this functionality was added in limited fashion, for use with the new Copy-Free Transition feature. The volumes could only be migrated if they were…

View original post 1,410 more words

NetApp TMP Volumes

As transitions from 7-mode to clustered mode continue to emerge, I have been dealing with TMP volumes lately due to unfinished migrations for many reasons.. mostly user error. This is not a new “thing” but rather a not well understood procedure.

7MTT is a tool that allows migration from 7-mode to clustered-mode by not only creating snapmirrors between the clusters, but also copying other important metadata such as shares, ACLs, exports, and much more.

During the data migrations, projects are created in 7MTT which includes volumes, those volumes can later be removed from the project prior to cutover, which results in TMP volumes at the destination node. I’ve seen individuals remove volumes from 7MTT projects, snapshot related to the snapmirror relationship, or simply chose to attempt to complete the migration manually… Not sure why!

The thing is… those volumes are pretty much useless outside of 7MTT. Even if you break the mirrors manually, those volumes will not be RW. Also, if you try to do anything from System Manager, you will get an error.

NetApp_tmp.png

Regardless of your case, if you wish to turn those volumes into a normal type (RW), you will need to disable transition protection on that volume. This needs diagnostic privileges, so run at your own risk… ’cause I won’t be responsible if you mess up.

Cluster::> set diag

Cluster::*> volume transition-protect -vserver vserverA -volume volA -is-enabled off

Cluster::*>vol show

Cluster::*>set admin

As of ONTAP 8.3.x these commands work, but use the ? in case they change in the future.

Again. This is not a new “thing” just being asked by a lot of people lately, so I thought this may help.

Run at your own risk.

FlexGroups: An evolution of NAS

Another excellent write up by JP…

Why Is The Internet Broken?

evolution-of-man-parodies-333

I’ve been the NFS TME at NetApp for 3 years now.

I also cover name services (LDAP, NIS, DNS, etc.) and occasionally answer the stray CIFS/SMB question. I look at NAS as a data utility, not unlike water or electricity in your home. You need it, you love it, but you don’t really think about it too much and it doesn’t really excite you.

However, once I heard that NetApp was creating a brand new distributed file system that could evolve how NAS works, I jumped at the opportunity to be a TME for it. So, now, I am the Technical Marketing Engineer for NFS, Name Services and FlexGroups (and sometimes CIFS/SMB). How’s that for a job title?

We covered FlexGroups in the NetApp Tech ONTAP Podcast the week of June 30, but I wanted to write up a blog post to expand upon the topic a little…

View original post 1,476 more words

Migrating to ONTAP – Ludicrous speed!

Cool stuff by JP…

Why Is The Internet Broken?

As many of those familiar with NetApp know, the era of clustered Data ONTAP (CDOT) is upon us. 7-Mode is going the way of the dodo, and we’re helping customers (both legacy and new) move to our scale-out storage solution.

There are a variety of ways people have been moving to cDOT:

(Also, stay tuned for more transition goodness coming very, very soon!)

What’s unstructured NAS data?

If you’re not familiar with the term, unstructured NAS data is, more or less, just NAS data. But it’s really messy NAS data.

It’s home directories, file shares, etc. It…

View original post 1,080 more words

VSAN 6.2 Disk Format Upgrade Fails

error_image I’ve been doing quite a bit of VSAN deployments and upgrades lately. When upgrading up to version 6.1, I did not encounter any issues, luckily. Upgrading VSAN cluster (vSphere) to 6.2 also very smooth; however, upgrading the disk format from version 2 or 2.5 to version 3 has been a daunting task so far. I have an 80% failure rate on this type of upgrade. Here are some of the errors I came across.

The first issue was related to inaccessible objects in VSAN.

Cannot upgrade the cluster. Object(s) xxxxx are inaccessible in Virtual SAN.

VSAN_Inaccessible_Obj

This is actually not a new issue.  These inaccessible objects are stranded vswap files that need to be removed. In order to correct this issue, you will need to connect to your vCenter using RVC tools. The RVC command to run is: vsan.purge_inaccessible_vswp_objects 

VSAN_purge_inaccessible

 

The second issue I ran into was related to failed object realignment. Error:

Failed to realign following Virtual SAN objects…. due to being locked or lack of vmdk descriptor file, which requires manual fix.

VMware has acknowledged the issue and has created a python script to correct this issue. The script and instructions can be found on KB2144881

The script needs to be run from the ESXi host shell with the command below, after you have copied the script to a datastore that the host has access to. The script name is VsanRealign.py, but if you rename the file, you will obviously need to use the correct name instead. NOTE: The script takes quite a while to run, so just let it go until it finishes.

python VsanRealign.py precheck

VSAN_realign

Here the script takes care of the descriptor file issue once you answer yes. In this case, the object is not a disk and is missing a descriptor file is removed permanently, since it is a vswap file. If the vswap file is actually associated to a vm, the vm will keep working normally (unless you are swapping, which then you have bigger problems). The vswap file will be recreated once you reboot the vm.

Ok, so time to move. Ready to upgrade…. Maybe not. Ran into another issue after running the same script with precheck option. In this case, the issue was related to disks stuck with CBT (Change Block Tracking) objects. To fix this, simply run the same script but use the fixcbt option instead of the precheck option.

python VsanRealign.py fixcbt


VSAN_fixcbt

VSAN_fixcbt2

 

So at this point, everything looked healthy and ready to go. However; when I tried to do the disk format upgrade yet again, it gave me another error. So this was the fourth error during the upgrade process, luckily this was an easy fix and may not apply to all VSAN environments.

I ran into this with 2 small environments of 3 hosts each. The error stated that I could not upgrade the disk format because there were not enough resources to do so.

A general system error occurred: Failed to evacuate data for disk uuid <XXXX> with error: Out of resources to complete the operation 

To be able to upgrade the disk format to V3, you will need to run the upgrade command from RVC using the option to allow reduced redundancy.

Log in to RVC and run the following command: vsan.ondisk_upgrade –allow-reduced-redundancy

VSAN_Ondisk_Upgrade

 

Each host removes the VSAN disk group(s) from each host and re-adds them on the new format. DO NOT try to do this manually as you will have mismatches that VSAN can’t function properly under. Please follow the recommended procedures.

These steps allowed to upgrade VSAN disk format to V3. It did take quite a while to do this (12 hours), but this was due to me testing all these steps on my lab prior to doing it in production. Yes, the lab had some of the same issues.

After the upgrade was done, I checked the health of the VSAN cluster and noticed a new warning. This warning indicated the need to do a rebalance. So manually running a rebalance job solves the issue.

All good after that…

 

SIDE NOTE:

I did the proper troubleshooting to find out the root cause. The main issue was related to a firmware bug that was causing the servers to not recognize the SD card where vSphere was installed on, and eventually crash. The many crashes across all hosts caused all these object issues within VSAN.

Such issue was related to the iLO version of HP380 G9 servers, running iLO version 2.20 at the time. The fix was to upgrade the iLO to version 2.40 (December 2015) which was the latest version.