- Home
- Hardware
- Software
- Business Continuity
- Business Continuity Manager
- Copy-on-Write Snapshot
- Hitachi Backup Services Manager
- Hitachi Data Protection Suite
- Hitachi Data Retention Utility
- Hitachi Dynamic Link Manager
- Hitachi Protection Manager
- Hitachi Replication Manager
- Hitachi ShadowImage In-System Replication
- Hitachi TrueCopy Remote Replication
- Hitachi Universal Replicator
- Hitachi Virtual Tape Library
- IBM Flashcopy replication
- PAV/HPAV
- XRC Replication
- Storage Management
- Basic Operating System
- Basic Operating System V
- Hitachi Device Manager
- Hitachi Tuning Manager
- Hitachi Replication Manager
- Hitachi Global Link Manager
- Hitachi Dynamic Provisioning
- Hitachi Tiered Storage Manager
- Performance Monitor
- Resource Monitor
- Server Priority Manager
- Storage Capacity Reporter
- Storage Command Portal
- Storage Navigator Modular 2
- Storage Services Manager
- Universal Volume Manager
- Virtual Partition Manager
- Operating Systems
- Business Continuity
- Solutions
- Education
- Forums
Feed aggregator
HDS at VMworld 2010
In my series on data center transformation I started with Server transformation and the closer integration of server and storage virtualization through the use of VAAI or vStorage APIs for storage arrays.
These APIs were introduced at VMworld in 2008 when VMware announced their vStorage initiatives.
When VMware released these APIs on July 13, 2010, Hitachi jointly released support for these APIs on our AMS 2000 storage arrays. A lot of effort went into this integration as it is a massive technology enhancement for the transformation of the data center. The testing that we have done with Hitachi Dynamic Provisioning volumes on an AMS 2300 with VAAI has shown the following results:
- Full copy - 18% performance improvement (speed to copy VM’s)
- Write same - 85% performance improvement (speed to clone VM’s)
- Hardware Assisted Locking - 25% to 35% performance improvement including the removal of SCSI reserves (powering on 1400 VM’s on 4 x Servers simultaneously)
See what VMware CTO Steve Herrod says about these enhancements in his executive blog.
You can see demonstrations of these enhancement and more at VMworld 2010 which is being held this week in San Francisco. Hitachi Data systems will be highlighting the integration of our AMS, USP V/VM, and HNAS line of products with VMware through a number of demonstrations and speaker sessions.
Hitachi has been a VMware partner since 2002 and a Premier Technology Partner (TAP) since that program was started in 2006. Besides being one of the first to implement support for VAAI, we were the first to certify a storage virtualization solution with VMware server virtualization. We are also a large user of VMware with over 1,000 virtual machines on our Hitachi Blade Symphony with 32 Blade servers. We use VMware Site recovery Manager (SRM) to protect more than 6TB of data between our data centers in Indianapolis and Santa Clara.
What’s Being Highlighted at the Show?
Hitachi Data Systems will showcase a wide portfolio of our solutions and expertise at our booth #301:
- VAAI integration on a Blade Symphony and AMS 2300 which we announced in July of 2010
- Demonstration of Hitachi Dynamic Provisioning and Hitachi Tiered Storage Manager with Hitachi Universal Storage Platform V and VM, etc.
- The latest advancements in data protection for VMware environments enabled by vStorgae APIs for Data Protection (VADP) with the Hitachi Data Protection Suite, and Symantec NetBackup
- Integration of Hitachi NAS with VMware vSphere and vCenter with our storage replication adapter (SRA) for VMware SRM and our VSS support.
Speaker Sessions
Session: Using Storage to Efficiently Scale and Manage Enterprise VMware Workloads (Session ID SP9662)
Presenters: Michael Heffernan, global product manager for Server Virtualization, Hitachi Data Systems; Niel B. Harris, vice president of Hosting Services, Apollo Group
Day and Time: Monday, August 30, 3:00pm
Session: Next Generation VM Storage Solutions with VAAI (Session ID TA7121)
Presenters: Michael Heffernan, global product manager for Server Virtualization, Hitachi Data Systems; Lucas Nguyen, senior alliance storage technologist, VMware; Satyam Vaghani, senior staff engineer, VMware
Day and Time: Wednesday, September 1, 10:30am and Thursday, September 2, 3:00pm
Session: Optimizing Storage Performance with vSphere 4.1
Presenter: Michael Heffernan, global product manager for Server Virtualization, Hitachi Data Systems
Day and Time: Thursday, September 2, 12:45pm
(Exhibition Hall Solutions Exchange Theater)
Attend VMworld, visit our demos at Booth #301 and hear our speakers to see how VMworld and Hitachi are working together to transform the data center.
Categories: HDS Blogs
Data Center Transformation, Part 7: Application Transparency
As I have stated in my previous posts on data center transformation, virtualization plays an important role in creating a dynamic pool of server and storage resources and masking the physical infrastructure from the application, so that the provisioning, movement, and refresh of the infrastructure can be done without disruption to the business. While we are masking the physical resources, however, we still need to provide the applications with transparency into the virtual infrastructure to ensure that their service level objectives are being met. Transparency requires openness, communication and accountability.
Hitachi’s Solution to Application Transparency
The Hitachi Storage Command suite provides an integrated set of management modules that culminates in a Hitachi Command Portal, which provides an application with a dashboard view of the storage infrastructure that sits behind the virtualization. This enables the application to communicate with the infrastructure managers, and provides accountability of the service level objectives to the application.
The Hitachi Storage Command Suite starts from the bottom up with a Device Manager, which provides a deep dive view into the Hitachi Storage products. There is also an optional Global Link Manager that provides monitoring and control of all the alternate paths to all the systems under Hitachi Storage Command management. This enables the management of alternate paths for multiple storage systems from one console.
Across the top of these two modules is a resource monitoring tool called Tuning Manager that works with the Device Manager and Link Manager to monitor the activity of the system and provide reports and alerts for any out-of-boundary conditions.
Tuning Manager can provide alerts and information to a tiering and migration manager (Hitachi Tiered Storage Manager), to a Policy Based Tiered Storage Management service, or to a policy manager for automating application backups with storage system-based replication (Hitachi Replication Manager).
Hitachi Command Portal
Up to this point all these modules provide information and management for the storage infrastructure. Across the top of all these modules, Hitachi provides a Command Portal which provides a business or application view into this infrastructure.
This portal presents a dashboard for a business unit or application which shows them four basic panels of information. The first is the status of its Service Level Objective during a selected period, whether it is red, yellow or green, based upon specified thresholds. Second is the actual storage allocation in terms of LUNs, Subsystem id, physical disks, RAID type, and actual allocation. Third is a panel that shows the health of its storage subsystem array groups and port processors over a selected period, whether it is red, yellow, green. The last panel shows the storage allocation over a specified time period for planning purposes.
This dashboard is a portal that enables the user to dive into the infrastructure modules that support it and produce any number of reports. The presentation is through Adobe Flex, which enables movement of screens and drill down.
Summary
While application users may be relieved to be isolated from the infrastructure turmoil that accompanies any form of transformation, they still have the responsibility to ensure that they are meeting their business objectives. In order for the application user to fulfill his obligations to the business, he or she must have transparency into the infrastructure, even though it is virtualized or moved behind a cloud. Management tools like the Hitachi Storage Command Suite will provide that transparency for openness, communication, and accountability.
Categories: HDS Blogs
Data Center Transformation Part 6: One platform all Data
There is a growing need for specialized storage servers to provide functions like Network Attached Storage (NAS) over Internet protocols, Content Archives, enterprise document management, Virtual tape Libraries (VTL), deduplication, low cost modular storage, high availability enterprise storage, etc. While storage servers provide benefits for the management and preservation of certain types of data, it can create storage/server sprawl and increase the fragmentation of data center resources if these services are delivered as standalone storage and server bundles. This tight bundling of storage, server, and server application software limits the scalability of the service and the ability to migrate or refresh the technology without a major disruption to the service it performs. A data center may have a number of these storage servers, each with different management, different protection, and different search tools from different vendors. While some analyst will say that a “one vendor approach” is easier to manage, this is not the case if they need 5 or 6 different products to meet the data centers’ different storage requirements. This piecemeal approach is not viable as the cost for managing and maintaining these disparate products increase with the explosion of data. Hitachi’s answer to this dilemma is an approach that we call “One Platform for all Data.”
One Platform for all Data
This approach separates the storage services from the storage so that these services can be provided through gateways to a common enterprise storage platform. This storage platform can also attach modular storage on the back-end to satisfy the requirement for low cost modular storage without sacrificing the availability and scalability of enterprise storage. The common enterprise platform is the USP V/VM. While we sell our own modular storage on the back-end and our own gateway servers on the front-end, we can support nearly anyone’s Fibre Channel (FC) storage and anyone’s gateway servers that attach through FC. We sell our own high performance Hitachi NAS (HNAS) server powered by BlueArc, our content platform, Hitachi Content Platform (HCP), a backup and deduplication platform in Hitachi Data Protection Suite (HDPS) which is a partnership with CommVault, and a VTL with FalconStor. We also have customers who have installed gateways from NetApp, Data Domain, Diligent, and others.
Advantages of One Platform for all Data
The advantages of this approach are simplification and scalability. Simplicity comes from having one common management, data protection, and search across a consolidated pool of storage that is virtualized behind a USP V/VM. Simplicity comes from using standard protocols instead of proprietary APIs. Scalability comes from the multiple processors that are tightly coupled through the global cache of the USP V/VM. The Hitachi Content Platform enables content and archive servers to scale to PB by offloading the storage management, data ingestion, indexing and search functions to a common content storage platform.
“One platform all data” enables load balancing across different modalities of data. If you need 10TB of storage, you can probably find 10TB to provision to any of the servers out of the common pool of storage. If you use Dynamic Provisioning, you probably do not need the full 10TB upfront, but can thin provision it with a couple TB to start. In the case of the silo approach, you have to decide where you need the storage. There may be 50TB spread around the different server/storage bundles, but it is not in the right place, so you have to buy an additional 10TB. In most cases, that server/storage bundle will not have thin provisioning, so you would have to allocate the full 10TB even though only 2 or 3TB is actually going to be used initially. All that waste can be eliminated with a “one platform all data” approach.
Categories: HDS Blogs
Data Center Transformation Part 5: Leveraging Dynamic Provisioning with storage virtualization
After my last post on Data Center Transformation Part 4: Dynamic Provisioning, where I talked about the benefits of Hitachi Dynamic Provisioning, Lucas Mearian from Computerworld wrote an article “A waste of Space: Bulk of drive capacity still underutilized – Most companies can reclaim as much as 60% of their storage Capacity with Monitoring and thin provisioning tool”
In this article, he observed that the low utilization of storage capacity is still a rampant problem even though tools like thin provisioning are available today to monitor usage and thin provision storage. While a recent survey from TheInfoPro showed as many as 50% of Fortune 1000 companies were using thin provisioning or were planning to do so, other analysts like Forrester’s Andrew Reichman contend that companies are not using these tools and utilization of storage is still between 20 to 40 percent. While he did say that thin provisioning was not being used for thin provisioning he did observe that it was being used for striping performance which is a result of writing data in chunks across many disk drives.
According to this article thin provisioning can reclaim up to 60 percent. So why aren’t companies jumping at this especially in the current economic environment? The article goes on to speculate on some of the reasons for this lack of adoption of thin provisioning. Hitachi Data Systems is having a great deal of success with our Dynamic Provisioning which includes the ability to thin provision storage. In this post I will look at some of the inhibitors to the adoption of thin provisioning and how the Hitachi USP V/VM can leverage Dynamic Provisioning with storage virtualization to solve thee problems.
It takes time to acquire more storage
Data Centers need to keep their businesses running and no one wants to be caught short because of a lack of disk capacity. Since these disk frames are getting larger with more applications per frame, they need a longer lead time to add or upgrade storage. A lead time buffer is required to satisfy their growth requirements while they go through the next acquisition and device migration cycle which may take 6 to 9 months. Since storage capacity is cheap and getting cheaper it seems easier to just buy more capacity today than wait until you need it. Unfortunately, all this extra unused capacity means higher total cost since the operational costs of that capacity will far exceed the acquisition cost.
Hitachi recognizes the need to have a lead time buffer. However, unlike other thin provisioning solutions, the Hitachi USP V/VM can leverage Dynamic Provisioning with storage virtualization to minimize this lead time. The USP V/VM can minimize that buffer by pooling different silos of storage capacity into a virtualized pool of capacity. If you had 3 separate 100TB frames at 20% utilization, each frame would have 80TB of buffer capacity which cannot be shared with other frames. If we virtualize the three frames behind a USP V, we could consolidate the excess capacity into a shared pool of 240TB. Then, may be, we can reduce the total buffer pool to 80 TB to be shared across the three frames. If you need more capacity and you need it quickly, you can plug in another bank of drives or roll in another frame and attach it behind the USP V where it can join the Dynamic Provisioning pool without disruption to the application. With Storage virtualization you can add storage quickly when you need it and take advantage of the price erosion of storage capacity.
The risks of over commitment
While this technology is called thin provisioning, provisioning less than what the allocation calls for, it can also be used to over provision or over commit your usable capacity which could cause problems under peak loading conditions. By over commitment, we commit to some number of users that we will support their full allocation request. Let’s say there are 10 users who asked for 5TB apiece. So we commit a total of 50TB, when we only have 30TB of physical capacity, since we assume that everyone is over stating their allocation requirements. Thin provisioning software does have thresholds to warn and or limit new provisioning requests when the thresholds are reached. But there is the danger that a spike in demand will hit these limits. Again, Hitachi’s combination of storage virtualization and dynamic provisioning can quickly add additional resources to the dynamic provisioning pool. Virtualization can also non disruptively migrate lower priority volumes out of the pool temporarily if a higher priority volume needs more capacity.
Some of our Dynamic Provisioning customers choose not to overcommit the physical storage in the pool because of the risk of running out of storage during peak periods. Also some file systems and data bases are not thin friendly. They may write formatting information across the initial allocation which negates the potential for thin provisioning. Some file systems start out thin friendly but get bloated very quickly as files are deleted and created. The storage system has no way of knowing if the file system has freed up some space unless the file system notifies the storage through an API or special SCSI command.
If a user asks for an allocation of 10TB, 10TB of physical capacity is reserved in the pool even if only a fraction of the capacity is actually used. While there are no savings on capacity, these customers use Dynamic Provisioning for its many other benefits. The first benefit is the ability to dynamically provision new storage in minutes. When Oracle ASM wants to expand a data base, Dynamic Provisioning can provision that in minutes. No need to create RAID groups, carve out LUNs, format, concatenate, etc, which could take hours. The second major benefit is the performance improvement that comes from wide striping the data across the width of the pool. Just by going from 8 spindles to 32 spindles, tests have shown a 700% improvement in throughput for a random workload. As Andrew Reichman observed, the customers he contacted were using thin provisioning for srtiping performance and not for recovering unused capacity. The third benefit is that even though we reserve enough capacity for the total allocation, the provisioning is still done a page at a time, so any moves, copies, replications, migrations only work on the provisioned pages which can greatly reduce the time and operational costs for these activities. However, that being said, a number of our customers like Qualcomm are using the thin provisioning feature of Dynamic Provisioning.
While some thin provisioning vendors claim that you can drive utilization up to 80 or 90 percent. I personally do not recommend that if your application is critical and your business is dynamic. You need a contingency buffer for peak demand. I think 60 to 65% is a conservative objective.
The requirement to replace existing infrastructure
Gartner is referenced in this article as noting another reason for not embracing thin provisioning is the requirement to replace existing infrastructure which is difficult to justify in a recession. Adam Couture of Gartner is quoted as saying:
“if your array wasn’t built to take advantage of thin provisioning, there’s no way you can retrofit it.”
He is not aware of Hitachi Dynamic Provisioning and storage virtualization in the USP V/VM. Hitachi Dynamic Provisioning does not require replacing existing infrastructure as long as it can be virtualized behind a USP V/VM. Once it is virtualized behind a USP V/VM, the LUNs on external storage can be formatted into a dynamic provisioning pool, and from there it can have all the capability of Dynamic Provisioning. Once the storage is connected behind the USP V/VM, we can move the “fat” external volume into a dynamic provisioning pool creating a page at a time. When we are finished we check the pages for zero formats and return those pages back to the pool as empty pages, leaving the volume in a thin state. One of our large customers reclaimed 40% of existing capacity just by virtualizing behind a USP V and moving the volumes into a dynamic provisioning pool.
Thin provisioning storage can not scale
Another inhibitor that was called out was the fact that many thin provisioning systems did not scale and required costly rip and replace. Some vendors required a large system to install before you can get thin provisioning and other vendors started small but remained small which required disruptive replacement as requirements grew. Hitachi can start with a USP VM which is a modular USP. If there is existing capacity already available, the USP VM can be installed without any internal disks. Another approach is to start with an entry model of the USP V. A third option is to use the ASM 2000 for Dynamic Provisioning within the AMS 2000. It does not virtualize external storage but can scale up to 2048 host connections and 4096 LUNs in a modular architecture.
Summary
Unlike other thin provisioning system that only do thin provisioning; Hitachi’s Dynamic Provisioning can be combined with the Storage virtualization of the USP V/VM and leverage all the other capabilities of an enterprise controller like migration, tiering, and distance replication. This eliminates the inhibitors that were mentioned above. Considering that Dynamic Provisioning is a relatively new product, it’s adoption rate is impressively high and climbing. The ability to reclaim unused capacity simply by attaching existing capacity to a USP V/VM and moving the volume into a dynamic provisioning pool can provide immediate payback on your investment. There are many immediate benefits to implementing Dynamic Provisioning and storage virtualization, and it does not require you to rip out your existing storage investments. Thin provisioning is only one of the benefits of Dynamic Provisioning. Not all files are thin provision friendly, but all files can benefit from dynamic provisioning in a matter of minutes from a pool of preformatted pages and automatic wide striping for high performance.
Categories: HDS Blogs
Monolithic versus modular storage is not an either/or question
Those of you who subscribe to Gartner reports may have seen their recent report:
“Choosing Between Monolithic Versus Modular Storage: Robustness, Scalability and Price Are the Tiebreakers”
While I agree with some of their definitions of monolithic and modular storage, it is no longer a question of one versus the other. With the Hitachi USP V/VM we combine the best of both worlds, by providing a “monolithic” or enterprise tier 1 front-end with lower cost modular back-end storage.
I agree with their description of monolithic storage as having many controllers that share direct access to a large, high performance, global cache, supporting a large number of host connections, including mainframes, and providing redundancy to ensure high availability and reliability.
I also agree with their definition of modular storage, which contains two variants, a dual controller architecture with separate cache memory and a scale out architecture that can have many nodes with separate caches in each node. I also agree that modular storage is easier to expand capacity by adding modules of storage trays, and that their acquisition costs are lower due to their simpler design (no global cache).
The differences between monolithic storage and modular storage
The key difference between monolithic storage and modular storage is the cache architecture. A dynamic global cache enables the tight coupling or pooling of all the storage resources in a monolithic storage system. As we add incremental resources like front-end port processors, cache modules, back-end array processors, disk modules, and program products like Hitachi Universal Replication, they are tightly coupled through the global cache so that they create a common pool of storage resources, which can be dynamically configured to scale up or to scale out to meet different host server requirements. Separate caches, in controllers or in nodes, create silos of storage resources. Host server volumes can only access the storage resources that are in the controller or node that it is attached to. The host server may access another volume in another controller or node, but it cannot have one volume extend across multiple controllers or nodes. Since this is not a common pool of storage resources, this leads to fragmentation and under-utilization of resources within the controllers or nodes. One node may be running at 90% utilization while other nodes are idling at 10% or 20%.
While most analysts like Gartner will acknowledge that dual controller systems, with limited amounts of cache and compute capacity, cannot match monolithic systems in performance and throughput. They make the assumption that “multinode scale-out architectures hold the promise of helping modular systems to asymptotically approach monolithic storage system levels of throughput.”
I disagree since the throughput and performance that you get from multimode scale-out architecture is limited to a distribution of the workload across multiple nodes. Unless the distribution is perfectly balanced across the nodes, you have the fragmentation that I mentioned earlier. Even if the cumulative total of cache and compute capacity is the same as what is in a monolithic storage system, it is not tightly coupled into a common pool of resources, and cannot match their performance and throughput .
The Hitachi AMS 2000 family of modular storage is a dual controller storage system with separate caches. However, there is additional intelligence in the architecture that enables load balancing of LUN ownership between the two controller caches to ensure that one controller is not overworked while the other controller is idling. There are some single thread workloads where modular storage can outperform monolithic storage, but in multithreaded workloads the monolithic storage will have higher performance and throughput due to its larger cache, multiple compute processors, and load balancing across storage port processors.
So while there are important differences between monolithic and modular storage, the best way to use them is to use them in a tiered configuration. Since 60% to 80% of storage does not need tier 1 performance, it does not need to be on tier 1 storage. However, all your storage needs tier 1 protection and availability. You can achieve that by virtualizing modular storage as tier 2 or 3 storage behind a tier 1 monolithic storage front-end. The modular dual controller or multi-node scale out storage systems now sit behind a global cache, and become part of a pool of common resources that can be dynamically allocated based on business requirements. The advantages of modular storage around cost and ease of expansion are coupled with the advantages of monolithic (enterprise) tier 1 functionality and performance, with common management, protection, and search.
USP V/VM: the best of both worlds
One of the disadvantages cited for monolithic storage is the higher cost. That is only true in smaller configurations if all the storage capacity resides in the monolithic system. If most or even all of the storage capacity resides on external modular storage that is virtualized behind a USP V/VM, the cost of the combination will be even lower since all the storage is now efficiently managed as a common pool of storage resources, saving operational as well as capital costs. Since the USP V/VM does Dynamic Provisioning, it can save time and the costs for provisioning external modular storage, thin provision and reclaim unused capacity, and wide stripe the modular storage for higher performance. The data mobility provided by the USP V/VM will increase availability by non-disruptively moving the data off of the modular storage during scheduled down times or for technology refresh migrations, further reducing operational costs over stand alone modular storage.
Host servers are going through a massive consolidation with the availability of multi-core processors and virtual server platforms like VMware and Hyper- V. These virtual server platforms are driving 10 to 20 times the I/O workload of non- virtual servers, and virtual server cluster are driving as much as 100 times this load through one file system. This type of workload requires a monolithic storage system that can scale up through a tightly coupled, global cache on the front end while the majority of the storage capacity resides on lower cost modular storage that is virtualized behind it.
So I do agree with Gartner for the most part on the differences between monolithic and modular storage, but I do not think it has to be an either/or decision as to which storage you chose. I believe the best choice is a combination of modular storage that is virtualized behind monolithic storage as we do with the USP V/VM. This way you can have the best of modular storage combined with the best of monolithic storage, at the lowest total cost.
Where do you fall on this issue?
Categories: HDS Blogs
Data Center Transformation Part 4: Dynamic Provisioning
This is the fourth part in my series on data center transformation. My last post was on storage transformation and the impact of storage virtualization on the data center. In this post I will address the impact of Dynamic Provisioning on the Data Center.
The history of storage provisioning
The provisioning of storage has been a major effort since the introduction of the random access disk drives in 1956. Prior to disk drives, data was stored on punched cards or magnetic tapes, and provisioning storage was a matter of throwing more cards in the reader or hanging another reel of tape. But disks were more difficult to provision since they were very expensive, and had limited capacity, so they were only used for applications that needed random access to data, and the cost was shared by allowing multiple users to consume parts of the capacity.
Originally, only mainframes could afford to use DASD (Direct Access Storage Devices), and IBM developed catalogs, naming conventions, and a Job Control Language that you used to specify your allocation of storage with a Data Definition statement. This specified a data set name, the volume serial number of the storage you wanted to use, primary and secondary units of capacity that you expected to use, and disposition of the extents after use, whether you wanted to keep the data set, release the capacity that you didn’t need, catalog it, or delete it. Later, this was simplified through the use of Systems Managed Storage which allowed the user to define data classes and storage groups and set policies for storage and data management. Then non-mainframe systems started to use external disk storage in the early 1990s, but without the disciplines that were developed by IBM for the mainframes.
At first it seemed simpler. You did not have to work with job control languages and worry about how much space you used since it was not shared in those days. But as open systems began to explode and Storage Area Networks (SANs) consolidated more servers to larger and larger storage systems, life became much more complicated, and storage administrators were required to provision storage on behalf of many applications that shared the same storage resources. Users estimated how much capacity they needed for an application, then doubled it to make sure they had enough, before they asked a storage administrator for an allocation of storage.
The storage administrator would look at his spreadsheets to see where capacity was available, format the disks, create a RAID group, carve out the LUNs, concatenate the LUNs for a specific volume size, and stripe the LUNs across multiple spindles, or limit the allocation to certain bands on a disk to minimize seek time between tracks. The storage administrator might even add some more buffer capacity to avoid getting a request for more capacity in the middle of the night on a week end or add replicas of the allocation for off line processing copies for backup, data mining, extract/translate/load, development test, etc. All these copies also replicated the over allocation. This could take days or even weeks to provision storage. If new storage had to be acquired, it could take months to requisition, acquire, and install additional storage capacity.
The advantages of dynamic provisioning
Dynamic provisioning helps to solve these problems by creating a pool of preformatted 42 MB pages which can consist of hundreds of RAID groups or spindles. When a user asks for an allocation, the storage administrator can allocate a virtual volume in a matter of minutes. No storage is actually used until the user starts to write to the allocation. Storage is physically allocated a page at a time and the pages are striped across the width of the provisioning pool which automatically wide stripes the data and gets the maximum number of spindles working on each I/O request. By allocating storage by pages from a common pool of pages, we have the greatest flexibility and agility in provisioning storage for new application requirements. If one application happens to need more storage pages than it had originally requested we can borrow from the pool of pages that other applications may have requested but are not using. When copies are needed for backup or data mining, or data needs to be moved to another tier of storage, the copies and moves only use the pages that are actually used and not the whole allocation that the application had requested. This not only requires less storage but also reduces the operations time by doing “thin” copies and “thin” moves.
Thin provisioning is only one of the benefits of Dynamic Provisioning. Thin provisioning is not the Holy Grail as some vendors and analysts promote it to be. Not every file or storage application is thin provision friendly. They might write formatting information or meta data across the entire allocation in which case thin provisioning from the storage side would not be effective. Some file systems may start out thin but become fat very quickly as they add and delete files, unless they tell the storage device where in the file share they deleted a file. Symantec does this with a SCSI “Write Same” command which describes the extent boundaries that were freed up for Dynamic Provisioning to recover.
Dynamic Provisioning with HDS USP V/VM
With storage virtualization and Dynamic Provisioning in the USP V/VM, any external storage that is virtualized can be included in the Dynamic provisioning pool. Another feature of Dynamic Provisioning is the ability to do Zero Page Reclaim (ZPR). If external storage is virtualized behind the USP V/VM, we can move a “fat” volume from an external storage system into a Dynamic Provisioning pool of pages without disruption to the application. Once this is done we can check the pages to see if any pages have zero formats and release those pages back to the pool. Some of our customers have recovered as much as 40% of their existing capacity simply by attaching to the USP V/VM and moving their fat volumes into a Dynamic Provisioning pool. I am not aware of any other thin provisioning solution that can convert fat volumes as easily as that.
Most storage vendors do not use pages for thin provisioning. They will use chunks and chunklets within the larger chunk as their unit of thin provisioning. Using smaller granularity chunks or pages has a cost. Nothing is free. The cost in this case is metadata that describes the pages or chunks and the processing of a mapping table to keep track of the configuration. In the USP V/VM, we keep this metadata in a separate control store so that it does not impact the performance of the data store. Storage systems that do not have a separate control store must process this in their data store or on external disk which creates a performance penalty. In order to reduce the impact of metadata, one method is to define a large chunk and then index into the chunklet for the unit of provisioning. Since our AMS 2000 does not have a separate control store like the USP V/VM, we also use that technique with the AMS 2000.
Another feature of Dynamic Provisioning is wide striping of pages for performance. As pages are created for a volume they are striped across the width of the Dynamic Provisioning pool which may be hundreds of disk spindles. The performance gains that come from striping a volume across many spindles is not magic. Storage administrators have been doing that for years with software volume managers. The difference is Dynamic Provisioning does this automatically. If an administrator decides to increase the stripe for a volume, he must stop access to the volume, back it up, reformat and restripe the spindles, than reload the application. With Dynamic Provisioning, we can dynamically add disks to the pool and the stripe will be automatically restriped across the new pool width with no impact to the application. Although data bases are not thin provision friendly, our customers find that wide striping performance is a major benefit. Databases like Oracle with ASM, the ability to expand databases, also find the ability to dynamically provision new capacity in minutes to be another major benefit.
Summary
The primary benefit of Dynamic Provisioning is agility and it is a perfect complement to the agility provided by virtual servers. When a sudden event occurs, triggered by other events across the globe, it may require a data center to spin up a dozen virtual servers in a matter of minutes. But servers are not very useful without storage. Dynamic provisioning can provide storage resources in a matter of minutes. When we combine this with Storage virtualization in the USP V/VM, the benefits of Dynamic Provisioning can be extended to external storage from other vendors and use existing assets to transform the data center. These benefits include:
- Dynamic Provisioning of Storage in a matter of minutes
- Thin Provisioning to eliminate the waste of over allocation
- No performance impact with USP V/VM Architecture
- Thin Copies and Thin Moves to reduce the operational costs of copies and moves
- Zero page reclaim to reclaim wasted allocation from existing storage
- Automatic wide striping for increased performance and automatic tuning
- Combined with storage virtualization, it can enhance existing storage assets with all of the above benefits
Categories: HDS Blogs
Data Center Transformation Part 3: Storage Transformation
This is the third part in my series on data center transformations. My last post was on server transformation and the impact of virtual servers on the data center. In this post I will address the impact of storage transformation on the data center.
Data is at the core of the Data center
Data is at the core of the data center, and any effort to transform the data center must involve the movement, provisioning, access, and protection of data which is provided by storage systems. Unlike processing power and network bandwidth which can be consumed like electricity or water, storage capacity is stateful and any transformation of the data center will require the transformation of that storage capacity with minimum disruption to the applications and the rest of the infrastructure. This transformation becomes increasingly difficult as more data is generated and more applications are dependent on access to that data storage. Also many applications are intertwined through the data and coordinated through scripting or large consistency groups which further complicates things. Storage could be the biggest inhibitor to data center transformation, unless it is addressed through the type of storage virtualization that can meet the requirements for transformation. I had the pleasure of discussing this last week with TMCnet’s Rich Tehrani.
Requirements for Storage Virtualization
The first requirement is that it should be easy to implement. It should be as easy to implement as virtual servers in a hypervisor. Storage virtualization solutions that require an extra layer of management to remap physical volumes to virtual volumes, maintain a mapping table which is a single point of failure and a vendor lock in, violate security by cracking or proxying data packets, and require three layers of zoning in the heart of an already complex Storage Area Network, will add more complexity than it is worth. An enterprise storage controller can simply connect external storage through standard FC interfaces, discover the LUNs or volumes that already exist on the external storage, and present them through its global cache to the host server as though it was native storage with all the existing enterprise functionality and performance of the storage controller. Just as simple as it is to virtualize external storage, it should be as simple to de-virtualize, so there is no vendor lockin. By not remapping the LUN and by writing the data back to the external LUN, the state of the data remains with the external storage and you can de-virtualize simply by disconnecting it and connecting it to a host server where the LUN can be discovered and mounted.
The second requirement is that the storage virtualization controller should be able to enhance whatever storage it virtualizes with tier 1 functionality, like tier 1 storage, global cache, replication, load balancing, dynamic tiered storage, dynamic provisioning, etc. An appliance with limited connectivity, limited cache, limited processing power, and no tier 1 storage, cannot enhance the storage it virtualizes, except to do some limited copies and moves between external storage systems.
A third requirement is scalability. The ability to scale up dynamically to meet the consolidation demands of increasingly powerful virtual server clusters. The ability to scale out dynamically across a pool of shared virtual storage resources instead of a cluster of standalone storage silos. And the ability to extend that scale up and scale out capability to externally attached storage.
A fourth requirement is security in terms of partitioning for safe multi-tenancy, separate address spaces for virtual ports that share the same physical ports, separation of control data from user data for remote maintenance, FC CHAP for end to end authentication, encryption for data at rest. These are features that have to be architected into the product and not added as an external after thought.
A fifth requirement is transparency, the ability for applications to see into the virtual infrastructure and be able to see the health of the underlying physical components that support their virtual storage, monitor their service level objectives, and be able to track their usage trends. This requires an integrated set of software tools that can gather the data from the infrastructure, correlate them to an application or server and present it through an easy to understand dashboard, with drill downs and report generation. Hitachi Data Systems provides this through the Hitachi Storage Command Portal.
In previous posts, I had listed a requirement for the management of storage virtualization to be independent of application, server and network management. What I meant is that storage virtualization needs to be done where the information for storage is available, and that is in the storage controller which is the target to the host initiators ( no need to proxy or crack FC packets), and where the information about cache slots and track tables for the data storage reside. I am modifying that somewhat because of the need for storage, servers, application, and networks to work together. For instance, storage systems need to furnish providers to Microsoft VSS to provide synch points for snapshot copies, SRM adapters for VMware site recovery, reclamation providers for Symantec write same command, etc. So while storage virtualization is best done in the storage controller, independent of other elements in the infrastructure, storage virtualization should have application and OS awareness to support interfaces which enable better coordination between the storage and the rest of the infrastructure.
How can Storage Virtualization help transform the Data Center
Storage virtualization separates the application and server view of data from the physical storage infrastructure so that we can change and transform the physical storage infrastructure without disruption to the application. The first thing it can do is transform your legacy storage infrastructure without the need to rip and replace. Once your FC storage is attached to the USP V or VM storage virtualization controllers and your applications are redirected to the virtual ports on the USP V/VM, your applications will be able to access their existing volumes through the high performance global cache of this virtualization engine and be able to leverage new capabilities like high speed distance replication for business continuity, dynamic tiering for life cycle management, wide striping and the latest high speed media for increased random performance, thin provisioning to recover the waste of over allocation, VSS providers for synchronized snapshots, and SRM adaptersfor VMware site recovery or disaster recovery testing.
Most virtualized storage systems will see an increase in performance just by sitting behind the large global cache of the USP V/VM, but if you need more performance you can wide stripe your volumes or move them onto the tier one storage in the USP V/VM and do both. If changing configurations on your existing storage is disruptive due to static .BIN file changes as on EMC Symmetrix storage, set the configuration once and let the USP V/VM dynamically manage the configuration changes from then on. If you still have several years of useful life in your existing storage, but the warranty is about to expire, use that storage as tier 3 where the expense of tier 1 maintenance is not required, and convert to time and materials. If it makes more sense to replace the older storage with greener, more cost effective storage, you can do the migration without stopping the application.
If you are converting servers or applications as part of this transformation, you can create non-disruptive clones of the data for conversion, extract/translate/load, development test on lower cost tiers of storage, or dynamically spin up new allocations of virtual storage to support virtual servers. Not only can storage virtualization protect your applications and servers from changes in the physical storage infrastructure, but it can enable your applications and servers to change and grow dynamically.
To conclude, the Hitachi Data Systems approach to data center transformation with storage virtualization allows customers to consolidate resources, technologies and applications, and reduce the complexity of years of ‘bolting on’. If the requirements discussed above are met, by design, you are building in future flexibility. This also greatly helps the bottom line.
Categories: HDS Blogs
Why FCoE will die a silent death
I've said it before, storage is not simple. There are numerous things you have to take into account when designing and managing a storage network. The collaboration between applications, IO stacks and storage networks have to be very stable in order to get something useful out of it both in stability as well as performance. If something goes wrong its not just annoying but it might be disastrous for companies and people.
Now I've been involved in numerous positions in the storage business from storage administrator to SAN architect and from pre-sales to customer support and I know what administrators/users need to know in order to get things working and keep it this way. The complexity that comes to the administrators is increasing every year as does the workload. A decade ago I use to manage just a little over a terrabyte of data and that was pretty impressive in those days. Today some admins have to manage a petabyte of data (yes, a 1000 fold more). Now going from a 32GB diskdrive to a 1TB diskdrive might look like their life just simplified but nothing is further from the truth. The impact it has when something goes wrong is immense. Complexity of applications, host/storage based virtualisation etc etc have all added to an increase of skills required to operate these environments.
So what does this have to with FCoE. Think of it as this: you have two very complex environments (TCPIP/networking and FibreChannel Storage) who by definition have no clue what the other is about. Now try to merge these two together to be able to transport packets through the same cable. How we do that? We rip away the lower level of the ISO and FC layers, replace that with a new 10GbE CEE interface, create a new wrapper with new frameheaders, addressing and protocol definitions on those layers and away we go.
Now this might look very simple but believe me, this was the same with fibre channel 10 years ago. Look how the protocol evolved. Not only in speeds and feeds but also tremendously in functionality. Examples are VSAN's, Virtual Fabrics, FibreChannel Routing to name a few. Next to that the density of the FC fabrics has increased as does the functionality on storage arrays. I already wrote in a previous article that networking people in general are not interested in application behaviour. They don't care about IO profiles, responsetimes and some packet loss since TCPIP will solve that anyway. They just transport packets through a pipe and if the pipe isn't big enough they replace it with a bigger pipe or re-route some of the flow to another pipe. That is what they have done for years and they are extremely good at it. Storage people on the other hand need to know exactly what it hitting their arrays and disks. They have a much more vertical approach because each application has a different behaviour on storage. If you mix a large sequential load with a very random one hitting the same arrayports and spindles you know you are in a bad position.
So here is were politics will collide. Who will manage the FCoE network. Will it be the networking people? (Hey, it's Ethernet right? So it belongs to us!). Normally I have no problem with that but they have to prove that they know how FibreChannel behaves, what a Ficon SBC codes set looks like as well as an FCP SCSI CDB. (I see some question marks coming already)
Now FCoE doesn't work on your day-to-day ethernet or fibrechannel switch. You have to have specialized equipment like CEE and FCF switches to get things going. Most of them are not backwards compatible so they act more as a bridging device between an CEE and FC network. This in turn add significantly to the cost you were trying to save by knocking off a couple of HBA's and network cards.
FCoE looks great but the added complexity in addition to an entire mindshift of networking and storage management plus the need for extremely well trained personnel will make this technology sit in a closet for at least 5 years. There it will mature over time so true storage and networking convergence might me possible as a real business value add. At the time of this writing the standard is just a year old and will need some fixing up.
Businesses are looking of ways to save cost, reduce risk and simplify environments. FCoE currently gives neither of these.
Now I've been involved in numerous positions in the storage business from storage administrator to SAN architect and from pre-sales to customer support and I know what administrators/users need to know in order to get things working and keep it this way. The complexity that comes to the administrators is increasing every year as does the workload. A decade ago I use to manage just a little over a terrabyte of data and that was pretty impressive in those days. Today some admins have to manage a petabyte of data (yes, a 1000 fold more). Now going from a 32GB diskdrive to a 1TB diskdrive might look like their life just simplified but nothing is further from the truth. The impact it has when something goes wrong is immense. Complexity of applications, host/storage based virtualisation etc etc have all added to an increase of skills required to operate these environments.
So what does this have to with FCoE. Think of it as this: you have two very complex environments (TCPIP/networking and FibreChannel Storage) who by definition have no clue what the other is about. Now try to merge these two together to be able to transport packets through the same cable. How we do that? We rip away the lower level of the ISO and FC layers, replace that with a new 10GbE CEE interface, create a new wrapper with new frameheaders, addressing and protocol definitions on those layers and away we go.
Now this might look very simple but believe me, this was the same with fibre channel 10 years ago. Look how the protocol evolved. Not only in speeds and feeds but also tremendously in functionality. Examples are VSAN's, Virtual Fabrics, FibreChannel Routing to name a few. Next to that the density of the FC fabrics has increased as does the functionality on storage arrays. I already wrote in a previous article that networking people in general are not interested in application behaviour. They don't care about IO profiles, responsetimes and some packet loss since TCPIP will solve that anyway. They just transport packets through a pipe and if the pipe isn't big enough they replace it with a bigger pipe or re-route some of the flow to another pipe. That is what they have done for years and they are extremely good at it. Storage people on the other hand need to know exactly what it hitting their arrays and disks. They have a much more vertical approach because each application has a different behaviour on storage. If you mix a large sequential load with a very random one hitting the same arrayports and spindles you know you are in a bad position.
So here is were politics will collide. Who will manage the FCoE network. Will it be the networking people? (Hey, it's Ethernet right? So it belongs to us!). Normally I have no problem with that but they have to prove that they know how FibreChannel behaves, what a Ficon SBC codes set looks like as well as an FCP SCSI CDB. (I see some question marks coming already)
Now FCoE doesn't work on your day-to-day ethernet or fibrechannel switch. You have to have specialized equipment like CEE and FCF switches to get things going. Most of them are not backwards compatible so they act more as a bridging device between an CEE and FC network. This in turn add significantly to the cost you were trying to save by knocking off a couple of HBA's and network cards.
FCoE looks great but the added complexity in addition to an entire mindshift of networking and storage management plus the need for extremely well trained personnel will make this technology sit in a closet for at least 5 years. There it will mature over time so true storage and networking convergence might me possible as a real business value add. At the time of this writing the standard is just a year old and will need some fixing up.
Businesses are looking of ways to save cost, reduce risk and simplify environments. FCoE currently gives neither of these.
Categories: Admins Blog
Data Center Transformation Part 2: Server Transformation
This is the second post in my series on data center transformation. In my first post, I offered up several warning signs that indicate why it is time to take action and transform your data center to be agile, sustainable, and business-oriented. We believe achieving a meaningful level of responsiveness to changing business and economic conditions requires a fundamental transformation of the data center that starts by designing flexibility and efficiency into the architecture.
In this post, I want to examine one of the major transforming trends in the data center — the movement to virtual servers, which is enhanced with the power of multi-core server technologies.
Virtual servers have solved the problem of server proliferation, which saw power hungry servers configured for peak demand, but idling at 10% utilization for most of the day. It also improved business agility by enabling data centers to spin up new servers on demand, load balance workload, and do site recovery across a pool of server resources. Today, there are more virtual servers deployed than there are physical servers.
Consolidating application servers used to be a difficult task since applications had hooks into the underlying operating systems and these hooks had to be undone in order to consolidate the applications. VMware solved that by taking the application as well as its operating system and stacking them up into a single host server. The operating system and the application storage were virtualized on to a VMDK or virtual machine disk, which is a file within a VMFS or virtual machine file system. This is a clustered file system that enables the attachment of virtual server clusters and supports vMotion or the movement of Virtual Machines across physical servers in the cluster. These virtual server clusters can drive hundreds of virtual machines through the VMFS. That means the storage arrays that service the VMFS must be able to scale to hundreds of times the workload that they normally would see when attached to stand alone servers.
Why storage arrays are needed to scale up with virtual servers
Hitachi Data Systems provides storage systems that can scale up to meet the increasing I/O demands of virtual servers. These storage systems include the USP V/VM with its global cache; the AMS 2000 with active/active controller; and High Performance NAS (HNAS) with its hardware based NAS engine, which front ends an AMS 2000 or USP V/VM.
While virtual servers are a major step forward in the transformation of the data center it cannot do everything itself. Some of the work, especially for I/O, needs to be off loaded to the storage arrays in order for virtual servers to scale beyond their current limitations and increase their ROI. VMware is aware of this and has provided APIs for array integration (VAAI). On Tuesday, we announced the first of our storage array support for vSphere 4.1 with our AMS 2000, where we use these APIs to support:
- Hardware-assisted Locking: Enables more efficient locking at the sector level than the LUN level between ESX hosts which share a VMFS volume
- Full Copy: Enables the storage arrays to make full copies of data within the array without the ESX Server reading and writing the data
- Block Zeroing: Enables storage arrays to zero out a large number of blocks to enhance the deployment of large-scale VMs.
Why Servers and storage must cooperate in the transformation of the Data Center
Hardware assisted locking is a good example of how the sharing of workload between server and storage can facilitate the transformation of the data center. Without this feature, virtual servers and ESX hosts would have to use a SCSI reserve to write to a shared VMFS volume. This locks the entire volume and impacts functions like creating virtual machines, creating templates, powering on virtual machines, growing files for snapshots, allocating space for thin virtual disks, and vMotion. By using an Atomic Test and Set command the storage array can lock at the sector level and leave the rest of the LUN available for access by other ESX hosts. This could improve performance by 4 times or enable 25% more virtual machine I/Os. This also means more scale up workload for the storage array.
Are there other examples where sharing the workload is important?
Another example of where we need better cooperation is in the area of content data which is estimated to be growing at over 121% per year. Here you would expect Enterprise Content Management (ECM) systems to be in high demand, but in fact ECM see <4% of the enterprise data today. This is mainly because ECM solutions do not scale. They try to do everything within their own proprietary stack, including ingestion, indexing, storage, refresh, retention policies, life cycle management, protection, retrieval, dissemination, etc. As a result, very few ECM solutions can scale beyond tens of TB, when they need to be scaling to Petabytes, and eventually Exabytes. The only way to solve this problem is to offload some of the storage and management functions to intelligent storage systems so that ECM can concentrate on the content and scale beyond their current limitations. The interface between ECM and storage must be based on open protocols and not proprietary API’s which limit the interface to a few chosen vendors. One popular content storage system uses a hash of the content as an address into their system, which locks the content to that vendors system. Hitachi Data Systems content storage platform, HCP, provides the ability to ingest content across standard protocols and store multiple modalities of data, file, documents, email, PACS, etc., into a common repository with safe multi-tenancy. If ECM vendors could provide open APIs like VMware does and offload more of the workload to storage systems we could go a long way to address the explosion in content data.
Hopeful Signs of better cooperation
While VMware is owned by a storage company, they are to be commended for opening up their API’s to other storage vendors. This cooperation between applications, systems, and storage vendors in sharing the workload is required for data center transformation. We are beginning to see more of this cooperation from vendors like VMware, Microsoft, Symantec, and Oracle. There is also great progress working through SNIA and ANSI T10. Vendors are coming to the realization that no one can do everything themselves. Data center transformation will take all of us working together.
Categories: HDS Blogs
To BIN or not to BIN, that is the question
Hamlet was depressed when he posed the question, “to be to not to be”. There was no questions in Barry Burk’s mind when StorageNerve asked Michael Hay “ Where is the Hitachi BINfile” and Michael answered “Hitachi doesn’t have the concept of a ‘BINfile’.”
Barry’s immediate response was that “EVERY intelligent storage array has the equivalent of a Binfile”. Barry also makes the correction that the correct name is .BIN file.
For those of you who may not know what a .BIN file is, StorageNerve provides a comprehensive description of it in EMC Symmetrix: BIN file.
A .BIN file is used only with the Symmetrix, to hold the configuration information for the Symmetrix. It requires EMC services for the initial installation and for hardware upgrades. It was used for the first Symmetrix in 1990 and is still used today with the VMax . I am not aware of any other storage array that has a .BIN file. Not even the EMC Clariion. .BIN file changes are loaded into front end directors, backend directors, and global cache in a process called IML (Initial Memory Load).
Hitachi does not require a .BIN file to map configurations into our directors or cache. The mapping of our cache is dynamic. Starting in the mid 1990’s with the introduction of the 7700 Freedom storage arrays we store the configuration data in a mirrored control store which is directly accessible to the front end and back end directors on busses which are separate from the connections to the data cache. We can change the configuration of the data cache simply by changing bits in the control store. This enables non-disruptive, configuration changes, upgrades, maintenance, tiering, dynamic provisioning, and mapping of cache to external storage arrays for storage virtualization. Keeping control data in a separate storage area has other advantages. Performance is increased by eliminating the cache contention between control data and user data. Privacy of user data is insured for remote call home maintenance since remote maintenance can only access the control store which is separate from the data cache.
There is boot information which is kept in flash storage on the front and back end directors of the USP V and VM, and our control and data stores are backed up by batteries. This enables us to offer a diskless version of the USP VM. Users can configure the USP V or VM through a Device Manager or Storage Navigator software.
So for Hitachi storage arrays, there is no need for a .BIN file. I do not know of any other intelligent storage arrays that has a .BIN file. This only seems to be a requirement for the Symmetrix architecture which is now over 20 years old. .
Categories: HDS Blogs
Data Center Transformation Part 1
Andy Kyte from Gartner was been sounding the alarm for data center modernization for a number of years. He warns that the data center is headed for a train wreck. and he provides the following warning signs which I am paraphrasing here:
Aging IT systems and infrastructures are creating an increasing burden to maintain and switching to new systems and infrastructure becomes more disruptive and resource intensive. The resources required for change are often not available and the only recourse is to pile on with more of the same. “Information becomes increasingly difficult to access and analyze as data structures age. The business is forced to work without the information it needs to make decisions.“ The business world has become very spiky with rapid swings between extremes, triggered by unforeseen events around the world. On May 6, the New York Stock Exchange dropped 1000 points and then rebounded 600 point in the same day, triggered by events in Greece or a finger check at some brokerage house. Businesses must become more agile but that depends on the agility of IT.
New interfaces like web connections and faster multi-core processors drive up transaction volumes which require legacy storage systems to scale up. Since storage systems are capitalized or leased over 3 to 5 year cycles and migration to new storage systems takes six months or more, the only solution seems to be to acquire more of the same legacy system even though they are two or three generations behind. Even though data ages quickly, it continues to accumulate on expensive tier 1 storage systems and gets backed up over and over again even though 80 to 90 % of the data is static. This alarming and unstoppable growth of data forces us to re-examine whether it is placed on the “right” tier of storage – but do we even know what the “right” tier is? Have we established all the economic and performance benchmarks? How can we take advantage of new cost saving technologies without throwing out our storage systems which still have 2 or 3 years left on depreciation? In the meantime, operational costs become a greater part of the IT budget as the piling on increases and content data accumulates year after year.
Regulatory compliance issues and risk increase even for lightly regulated businesses. Encryption of data at rest is becoming a defacto requirement to ensure privacy, but encryption solutions seem difficult to implement and carry the risk of data loss with key corruption. Companies with legacy systems are at risk of compliance penalties or are excluded from certain markets as regulation and legislation increase around the world. Reducing risk, or at least managing it, is also at the heart of many business continuity initiatives across all industries. Are we using the right technologies to protect our data? At the right cost? Or is it about never “failing”?
As IT demands increase, the availability of power to drive IT is becoming a major concern. In many regions, affordable power is becoming scare and companies are forced to relocate to continue operations. Substantial power savings can be achieved through replacement of power hungry legacy systems and more efficient utilization of current systems. However, these changes become increasingly more disruptive as data continues to accumulate.
If these warning signs are familiar, then now is the time to take action and transform the data center into an agile, sustainable, business oriented data center of the future. The good news is that tools are available to transform your legacy systems through virtualization and systems management tools that link business objectives with infrastructure performance and provide an ROA, return on your total assets. Some companies are doing this today, using virtualization to seamlessly migrate off of peta bytes of legacy storage systems and recover 40% or more of existing storage capacity through Dynamic Provisioning.
In subsequent posts I will go into more of the specifics of Data Center Transformation, as well as how converged and unified infrastructure trends enable the next wave in Data Center Modernization.
Categories: HDS Blogs
What Storage Virtualization can not sacrifice
There is an increasing interest in storage virtualization as seen an the increasing number of articles and blog posts on storage virtualization. In the last few days Rick Vanover posted a very balanced overview of storage virtualization for Datamation where he reviewed some of the many options. Carol Sliwa posted a Storage Pro Guide to block-based storage virtualization for SearchStorage which cited some use cases. One of the use cases was the City of Coquitlan (Canada) who is a 2010 Computerworld Honors Laureate award winners in IDG’s Computerworld Honors Program and a customer of Hitachi.
While there is a lot of information on what storage virtualization can do and the many ways to accomplish the same result, there are some things that virtualization should not do to achieve those results. I think it is time to mention a few of these.
First, storage virtualization should not add complexity. It should not require you to tear apart the SAN to insert another layer of management complexity. For that reason, Hitachi’s storage virtualization solution resides in the storage controller and not in the SAN.
Second, it should not violate security practices. The secure connection between the initiator (server) and the target (storage) should be preserved. Virtualization approaches that sit outside the storage controller must either crack Fibre Channel data packets or proxy the I/O request to see if they have a read or a write. A storage controller is the target to the initiator and can support FC-SP which provides a CHAP authentication between the initiator and the host.
Third, it should not degrade performance for your critical data. The virtualization process must be able to improve rather than degrade the performance of your data. That can only be done if the virtualization engine is more powerful than the storage system it virtualizes. It should also have its own internal high performance cache and disks for tiering of critical data from lower performance storage systems.
Fourth, it should not have less functionality than the storage it virtualizes. If your external storage has advanced functions like distance replication or thin provisioning, the virtualization engine must provide the same functionality without additional complexity or performance overhead.
Fifth, it should not expose the privacy of your data or impact your quality of service. The value of storage virtualization is the ability to consolidate many storage users onto a common set of storage resources, but at the same time the virtualization engine must be able to protect each user from the bad or excessive behavior of other users who share the same resources. Our USP V/VM can partition the data users as soon as they enter the storage port. The USP V/VM assigns each host to a separate host storage domain. Even though different hosts may be accessing the same storage port, they can be assigned their own LUN address space with its own priority setting which can be changed dynamically. We can also partition the cache dynamically to insure that a user does not dominate the cache at the expense of other users. These partitions ensure that there is no data leakage or escalation of management privileges.
So while different storage virtualization systems are available and may produce the same results, you also need to look at how they get their results. It may be at the sacrifice of simplicity, security, performance, functionality, or privacy.
Categories: HDS Blogs
Hitachi Data Systems Blogger Day
Last week Hitachi Data Systems held their first blogger day, which was attended by 10 bloggers who cover the IT space. It was an eclectic mix. Besides expert industry consultants like Chris Evans, Elias Knaser, Paul Miller, and Nigel Poulton; this group included a CTO, Devang Panchigar, an end user IT Infrastructure Manager, Rick Vanover; a SAP technical support consultant, Bas Raayman; a Unix Systems Administrator, Phil Jaenke; a reseller VP, Greg knieriemen; and a web hosting VMware Engineer, Simon Long;
There were a lot of interesting discussions. It was great to meet some of the people behind the blogs. Greg along with Devang, and Nigel recorded a pod cast with me The organizer had a live feed of the tweets that were being generated during the sessions, so feedback was immediate.
Two things stood out for me. First was the industry reach that these few people have through their blogs and tweets. At the first level they probable reach tens of thousands of other readers, and many of these readers are bloggers who in turn tweet and blog around the topics that are posted, like ripples in a pool.
The second thing was the need to be more proactive in informing the industry about our products and business directions. Even for this group of experts who are plugged into the storage industry, there were misconceptions, particularly around our software, which were due to lack of information.
The feedback and suggestions we received from this Blogger day was fantastic. I would like to thank the blogger quests, Peter Gerr and Michael Hay for hosting this event and Carli Gelfi who wasthe primary organizer of this event. I look forward to repeating this again and again.
Categories: HDS Blogs
“Do More with Less”- is there any end in sight?
A new survey by Intercall shows that 48 percent of americans who use technology in their everyday jobs say that they are now required to do more work with fewer resources due to the current economic climate. As an example, nearly one third (30 percent) feel that they need to stay connected to work 24/7, even during weekends, breaks or holidays.
For the last decade or more the directive for IT has been “Do more with Less”. This cry becomes even more strident with every down turn in the economy, and even more focussed on storage as data continues to accumulate through good times and bad. Although storage demand continues to grow at about 60% per year, most IT shops have not hired any storage administrators for the past 7 to 10 years. Are we reaching the limit? Is IT on a treadmill with increasing workload month after month with no end in sight?
In my post on the “mythical FTE per TB” I pointed out that some Data Centers are managing over a 1000 TB per full time employee where 5 or 7 years ago each FTE was only managing about 10 TB. Although that sounds like a tremendous amount of productivity or doing “more with less” that hasn”t addressed the increasing cost of IT which continues to increase by 7 to 8% per year.
I think we are reaching the physical imits of the “do more with less” movement as it relates to storage administrators or FTE. Resource management tools and networking have helped with the consolidation of storage. However, storage is statefull and requires the physical movement of data when changes are made. While software can automate functions to be executed against storage, the execution has to wait until the data is read, written, moved, copied, replicated, compressed, deduped or formatted to spinning disk. The execution of a storage command is not just a matter of flipping a few bits in memory as you would with a processor command. It often entails some mechanical movement which takes time.
The only way that we can continue to solve the problem of productivity is to virtualize the storage and storage capacity, so that we can make the physical changes in the background while the applications work with a relevent subset of the total data. By storage and storage capacity virtualization, I mean the ability to attach heterogeneous storage and virtualize them behind a scalable storage system so they they look like a common pool of pages which can be managed on a page basis rather than on a volume basis. In this way you can dynamically provision, move, copy, replicate and migrate data on a more granular basis, a page rather than a volume.
With storage virtualization combined with capacity virtualization, we can make our storage administrators even more productive in reducing IT costs. This type of virtualization makes it possible for IT to do less while still doing more to reduce costs and increase productivity.
Categories: HDS Blogs
Computerworld Honors Laureate Award Winners
Since 1988, the Computerworld Honors Program has been recognizing and documenting the achievements of men, women, organizations and instututions around the world whose visionary use of information technology promotes positive social, economic and educational change.
We are pleased to announce that five Hitachi Data Systems customers have been selected as the 2010 Computerworld Honors Laureate award winners by IDG’s Computerworld Honors Program. These Hitachi Data Systems customers will be recognized during the 22nd Annual Laureates Medal Ceremony & Gala Awards Evening on June 7, 2010 at the Andrew W. Mellon Auditorium in Washington, D.C.
Darren Browett, technical Services Manager for the City of Coquitlam, British Columbia, Canada, implemented a leading edge storage virtualization solution with USP VM and dynamic thin provisioning to improve the levels and types of services to their citizens, businesses, and city workers.
Gonzalo Bongolan, president of Home Guaranty Corporation, HGC, focuses on promoting home ownership to middle and low income families in the Philippines. When their efforts were being hampered by severe storage challanges and a decentralized data storage system, they turned to the Hitachi Adaptable Modular System 2000. The AMS 2000 provides modular storage solutions with symmetric active-active controllers which simplifies operations with automated hardware-based front-to-back-end I/O load balancing. It is capable of providing enterprise, consolidated, performance and availability to serve HGC’s data needs.
Darwin Gosal, IT Manager, Centre for Quantum Technologies, National University of Singapore, manages a variety of computing facilities for general research and general administration as well as advanced information services for state of the art projects which generate a vast amount of information. CQT built one of the most robust, scalable, and cost effective IT infrastructure using Hitachi High-Performance NAS and AMS storage. This storage infrastructure is one of th highest performaing networked storage infrastructure in the academic community.
Andres Espeneira, Co-Founder, Chairman, and President of Products for Pixorial needed a very versatile storage environment to ensure optimal digital workflow while meeting exponential growth projections in a cost efficient way. Pixorial adopted the AMS 2500 and a cluster of Hitachi High Performance NAS to meet thier storage needs.
Sander Ujzanovitch, team Leader Digital management, for Stadsarchief Amsterdam, manages one of the world’s largest archives with 22 miles of archives. They implemented a Digital Storage Depot that would be required to provide the capacity required for Staadsarchief Amsterdam’s future with the control mechanisms to continuously check the digital archive documents for authenticity. That Digital Storage Depot is being implemented on a Hitachi Content plaform for digitized as well as digitally created objects.
Congratulations to our Computerworld Honors Laureate Award winners.
Categories: HDS Blogs
The Mythical FTE per TB
Full Time Employee (FTE) per TB used to be a measure of productivity for storage managers. Some people still use that metric today. I submit that FTE per TB is no longer relevant today.
For the last 10 years the mantra for IT has been “do more with less”. Ten years ago I would visit Data Centers where 2 people would be managing 20 TB. When I visit that shop today I am likely to see the same two people managing 500TB. Some data center managers boast of having one person manage a peta byte or more of storage. Does that mean that people have become more productive? Have advancements in storage management tools reduced the need for storage managers? What has changed in the last 10 years?
One difference is that storage has become a lot denser and cheaper. You can buy a 2 TB SATA disk for less than you paid for 9 GB FC disk ten years ago. The other difference is the introduction of Storage Area Networks which make it possible to network more servers to larger capacity storage frames than you could when storage was direct attached. Ten years ago 20 TB would have been spread across 20 different storage frames. Today 500 TB can be contained in two or three storage frames that are SAN connected to hundreds of servers.
Cheaper storage means that you can throw a lot more storage at an application and hope that you can reallocate the excess to other applications on the SAN. So a storage administrator might be responsible for a peta byte of storage, but how much of it is he really managing efficiently. One way to address storage requirements is to throw a lot of cheap capacity at it and delay the consequences.
Another thing that changed was the rapid adoption of the internet which has created a global marketplace. Most corporations must now be available 24 hours a day, seven days a week. We cannot afford to have any down time for our applications. As more and more applications are attached to a denser storage frame, the down time required to do a device migration becomes an increasing problem. Today there may be 100 applications on a storage frame. How do you migrate the data to a new storage frame without disrupting the applications? In order to minimize the down time we steal some time on weekends and migrate 5 or 6 applications at a time and end up taking 6 months to do the migration. You can do this with one FTE or 10 FTE and it will still take 6 months since it is no longer an FTE problem. It is a scheduling problem. The best solution for a scheduling problem is storage virtualization. By separating the application view of data from the physical storage, migrations, moves, and copies can be done without disruption to the applications.
FTE per TB is no longer a good measure of productivity and should not be used to measure the efficiency of your storage administration. Hitachi Data Systems offers a resource called storage economics which helps to identify the total cost of ownership. This is the only way to evaluate your efficiency. Storage economics can also help you map technologies like virtualization against those costs and quantify the cost savings of these technologies in your environment. For more informtion on storage economics, check out David Merrill’s blog at http://blogs.hds.com/david/
Categories: HDS Blogs
Content Aware Search
A Computerworld article by Bernard Golden highlights an IDC report that says that the “digital Universe” will grow by 1.2 zeta bytes or 1.2 million peta bytes in 2010. One of the biggest issues with this increasing data growth is the ability to search for that particular piece of data that you need. Especially when this data is mostly unstructured data.
Our Hitachi Content Platform is designed to ingest data through standard protocols like NFS, CIFS, Webdav, HTTP, etc and provide a common search across differnent modalities of data, files, documents, email, etc. We can also stub out files from an HNAS file share into HCP where it can be indexed and archived for search through HCP. Hitachi Data Discovery Suite provides indexing and search capabilities across HNAS files, HCP archive, and NetApp files, enabling a content aware, federated search of NAS and archive data. The embedded search engine behind HCP and HDDS is an optimized version of FAST.
Search is going to be an increasing area of research and investment as the different types of data grow. Hitachi has many programs in this area. Here is a link to a similar image search engine that Hitachi America posted under GazoPa for your amusement. Upload your picture and see what similar images are on the web. You might be surprised.
Categories: HDS Blogs
Meet Ray
The downturn in the economy has been difficult for many IT shops as it has been for many of the storage vendors. Our strategy during this downturn has been to focus on the needs of our customers and help them increase the utilization of their storage assets and reduce their operational costs. When we entered this downturn in 2008, IT shops were typically running at about 30% to 40% utilization with most of their data on expensive tier 1 storage. Through offerings like Hitachi Dynamic Provisioning and Zero Page Reclaim we were able to help some customers reclaim as much as 40% of their allocated unused storage and reduce their need to buy more capacity. With storage virtualization we helped them extend this savings across their existing heterogeneous storage assets. And with storage virtualization we helped them reduce their need for expensive tier 1 storage by non disruptively moving the majority of their data to lower cost internal and external tiers of storage. We also helped reduce their working set of data by moving stale data to an active archive on our Hitachi Content Platform. To assist the over worked operations staff we provided remote managed services to off load some of the grunt work of monitoring, reporting, and provisioning that kept them from learning, planning, and implementing the tools they needed to be more efficient.
That means customers can better utilize what they already have. We increased our software and services to help our customers reduce their operational costs. And finally, we are gaining new customers by offering them ways to increase the utilization and the value of their existing assets.
Here is a short video on what we did for Ray during this down turn.
Categories: HDS Blogs
Hitachi’s Answer to Storage Virtualization Requirements
Five Requirements for Storage Virtualization
In my previous blog, I identified the following requirements for Storage Virtualization. The first two requirements were already identified by SNIA in 2001. The additional three requirements are addressed by Hitachi in our implementation of Storage virtualization in the USP V/VM. These five requirements are:
1. Application, server, and network Independent management of storage infrastructure
2. Enhance existing storage assets with the latest enterprise storage functions
3. Safe multi-tenancy to leverage shared storage resources across multiple applications
4. Transparency to provide applications with the ability to track their Service level objectives
5. Scalability to meet growing peak demands
Application and Network Independent Management of Storage Infrastructure
The purpose of Storage virtualization is to abstract the management of storage so that changes can be made to storage systems without interruption to the application, server, or network that it is connected to. Hitachi is the only storage vendor that meets these criteria since it has implemented storage virtualization at the storage controller level in the USP V/VM storage system. External storage systems can connect to the USP V/VM through standard FC storage ports. Software in the USP V/VM accesses the existing LUNs in the external storage through these FC connections and presents them through the USP V/VM as though they were USP V/VM LUNS. From then on the host systems attached to the USP V/VM see only the volumes presented by the USP V/VM and the management of the external and internal storage is abstracted and managed independently of the network, the server, and the application.
Enhance Existing Storage Assets with the Latest Enterprise Storage Functions
The USP V/VM is an enterprise storage system with a full complement of tools to do non disruptive moves, copies, migrations, replication, tiering, and dynamic provisioning across internal and external storage. A commodity storage system can inherit these enterprise features simply by connecting behind a USP V/VM. Since the USP V/VM has a large global cache with multiple active paths for load balancing, modular two controller storage systems often see as much as a 30% improvement in throughput when attached to the USP V/VM. The virtualization provided by the USP V/VM can enhance existing storage assets with the latest enterprise capabilities.
Safe Multi-tenancy To Leverage Shared Storage Resources Across Multiple Applications
One of the benefits of storage virtualization is the ability to reduce costs by sharing common resources like storage ports, cache, and disk. However, sharing storage resources raises the risk of data leakage or contention between the multiple storage users or tenants of the virtualized storage systems. The USP V/VM addresses this at different levels. At the host port level, the ports are virtualized into 256 virtual ports and each virtual port can be set to a different server mode (Windows, HP, Solaris, AIX, etc). Each virtual port can also have its own LUN address space which insures save multi-tenancy of the storage ports and LUN images. The next level is at the cache where the cache can be partitioned into 32 dynamic cache partitions. This can prevent a very fast processor like a mainframe from dominating the cache over slower processors and impact their Quality of service. Dynamic tiers of storage and Dynamic Provisioning Pools can also be defined to enable different performance levels for different application requirements.
Transparency to provide applications with the ability to track their Service level objectives
Applications may not trust their data to a virtual pool of shared storage resources without the ability to monitor their service levelobjective, and the health of the physical resources that actually support their storage. This transparency can be provided through the Hitachi Command portal which provides a dashboard with a business view into the shared storage resources that are used by that business or application. The Hitachi Command Portal is part of the Hitachi Storage Command Suite of products which include Device Manager, Global Link Manager, Tuning Manager, Tiered Storage Manager, and Replication Manager. The Command Portal sits on top of these infrastructure management tools, gathers and correlates their information to the application and presents a dashboard of this information through a portal.
One panel on the dashboard show the status of the Service Level Objective, another shows the actual storage allocation, another shows the subsystem health, and the last panel shows the storage allocation trend. The above example provides information for the business unit “Exchange EMEA Corp-Production”
Scalability to meet growing peak demands
Any storage virtualization solution must be able to scale to support the peak demands of the application users of the virtual storage. The virtualization engine must be able to scale up with multiple processors that can be tightly coupled through a large global cache. The Hitachi Virtualization engine is the USP V which can scale up to 128 processors which are tightly coupled through a large 512 GB cache, and support as much as 247 PB of virtual storage capacity. A loose coupling of single storage virtualization processors will not be able to scale to meet the increasing demand for storage resources that are being driven by large virtualization servers that currently run 20 or more server platforms on one powerful multi core server.
Summary
It is not enough to simply provide a virtual storage image that can enable the physical storage to be managed separately from the application, server, and network layers. A complete storage virtualization solution must be able to provide enhancements to external storage so that the external storage itself can be commoditized. It must provide a safe multi-tenant virtual environment where multiple users can share the common pool of virtualized storage resources without fear of data leakage or impact from the behavior of other applications in the pool. There should be a simple tool that enables a user to monitor the performance of his Service Level Objective, the health of the physical resources behind the virtualization, as well as his trend line for utilization. And finally the solution must be able to scale to meet the increasing performance and capacity demand of data hungry applications.
Categories: HDS Blogs