A quick Google search on the term “big data” turns up 37,900,000 results in 0.11 seconds. That’s a lot of data about a term that’s generating equal parts buzz and confusion these days.
For a basic definition, Wikipedia describes big data as “datasets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics and visualizing. . .” That’s a good start but consider this...
Who’s picking up the pieces of the data explosion?
The biggest big data challenges come from larger enterprises dealing with storing, accessing, managing and analysing petabytes of data. Web 2.0 companies are coping with very large distributed aggregations of loosely structured and often incomplete data. Healthcare providers are figuring the best ways to handle the storage and archival of large amounts of medical image data. And media and entertainment companies are wrestling with creating, storing and distributing high-definition content. Increasingly huge data volumes also pose a challenge for any enterprise needing to store its data for compliance and regulatory purposes.
More than a “big data” problem—it’s a “big everything” problem.
In all these scenarios, complexity and scalability limitations of legacy architectures can stunt emerging application deployment, along with the ability to more effectively harness the power of corporate information.
Think about it: For today’s organisations, this really is a “big everything” problem—one brought on not only by the rapid growth of unstructured data and ineffective archive solutions, but also by general content proliferation, the analytical data explosion and the advent of massive content depots. This in turn creates big challenges—like how to:
Effectively scale infrastructure to meet new requirements?
Manage spiralling costs?
Move data to different parts of the infrastructure to better optimise cost and performance?
Refresh cycles and protect technology investments?
The solution lies in scalable, converged infrastructure
However you label it, you can tame the data explosion with a converged infrastructure that includes:
Scalable storage to start small and grow, scaling out with storage and advanced software that allows large content depots and archives to be addressed as one global data resource
Thin provisioning to delay capacity purchases to when capacity is written as opposed to when capacity is needed
Automated storage tiering for non-disruptive data movement to store the right data on the right tier for the right cost metric
Data de-duplication to reduce capacity and enable capacity efficiency that keeps up with data growth
This discussion is just getting started...