Dealing With Problems Inherent To Expanded Environments
By Owen Cole, Technical Director, UK&I
Enterprises, faced with enormous and growing volumes of data washing over them, have a storage problem. IT departments, who are supposed to manage this data cost effectively, add more storage. What they often end up with is a storage infrastructure that does not allow for different types of data or applications; a costly set-up tailored for the highest common denominator.
All data is not created equal.
Or at least, it shouldn’t be treated as such. When environments grow, more servers are added, more engineers, more network components…and ultimately you spend money and time to backup information you don’t need to backup. If an organization has 10TB of storage with a cost of $200,000 and experiences 100% growth, their storage-related purchases will soar from $200,000 in the first year to $400,000 in the second, $800,000 in the third, and so on.
This is untenable, and in order to address this waste of spend, data needs to be looked at in a different way; not as some amorphous, indistinguishable blob, but from the standpoint that different values can be apportioned to data dependent on its characteristics. In short - a tiered approach.
Most data stored is not critical, and therefore does not warrant expensive tier 1 storage. Older data is generally less relevant, and gets changed less. In most companies over two thirds of files haven’t been altered in the last three months or more. Personal music, photo or video files and e-mail archives take up large amounts of space, but are far from critical to company profitability.
If not all data has equal relevance to the business then not all data need reside on the same type of storage. Given that there are wildly varying performance, availability and cost points associated with different types of storage, the logical conclusion is that there are tremendous efficiencies and cost savings open to IT teams. If the organization described above purchased a different class of storage at half the cost, and was able to move 70% of their data to this second tier, their storage-related purchases would drop from $200,000 in the first year to $130,000 and from $400,000 in the second year to only $260,000. Adding more tiers would result in further savings.
Decoupling the data
The challenge in being able to realize these cost savings lies in the ability to:
1. Identify different types of data,
2. place them on appropriate tiers of storage,
3. manage this relationship over time, and
4. do all this without
a. impacting client access to the data or
b. increasing management costs.
The technology that has these abilities is called file virtualisation. Simply put, it decouples client access to files from the physical location of the files. Clients are no longer required to possess explicit knowledge of the physical location of the data. In fact, clients are not aware that data is moving between different storage tiers and their access to data is never disrupted.
File virtualisation solutions classify data based on flexible criteria such as age or type, place the data on the correct storage tier and automate the movement of data throughout its lifecycle based on policies. For example, a company might create a policy that says:
- All new files are placed on tier 1.
- Files that have not changed in 90 days are moved to tier 2.
- Files that have not changed in 1 year are moved to tier 3.
- If any file that has been moved to tier 2 or 3 changes, return it to tier 1.
Along with rising costs, burgeoning amounts of data increases backup pain. As the amount of data escalates so too does the length of time it takes to complete a backup. When backups run the risk of exceeding allocated backup windows, IT departments are faced with an unwelcome choice of either failing to meet service level agreements to the business or, (worse yet) not adequately protecting the data.
But the fact is much of the data being backed up is the same data that has been captured in previous backups. This is true for almost every company. Since only a small fraction of data is actually changing, the vast majority does not need to be continually backed up. Non-essential content like personal music libraries or e-mail archives probably doesn’t need to be backed up either. The bottom line is that many companies are spending time and money backing up information they do not need to.
Because file virtualisation can track data is not changing and manage it accordingly, the amount of redundant data that is backed up on a regular basis is dramatically reduced, thereby condensing the time taken to do backups. Recently a large media company used a file virtualisation solution to move data that had not changed in a month out of the primary backup data set and saw their backup times drop from 36 hours to less than one hour.
Reducing the amount of data in the primary backup data set also cuts the costs associated with the backup infrastructure (which usually requires far more capacity than the primary data). Less data to backup means less tape, fewer tape libraries, less virtual tape, lower licensing costs, and reduced fees associated with offsite storage.
File virtualisation, then, enables an automated tiering solution that is seamless to clients and provides dramatic cost and efficiency benefits. Elements to consider when looking at the possible solutions on the market could include:
- The ideal solution will work with the storage environment you have today as well as providing flexible options for the future. The solution should not lock you into a specific storage technology or force you to change the infrastructure you already have.
- Look for a solution that will meet not only your current scale needs, but accommodate your future growth. Solutions that require links or stub files can be difficult to remove, and often come with performance penalties and scale limitations.
- A solution that manages information at the file level rather than the directory level will give you greater flexibility and provide the greatest business value.
- The most effective tiering solutions will manage data in real-time. Placing a file on the right storage device when it’s created is more effective than having to go and search for it and move it after-the-fact.
Source: StoragePR
<>