Real-world
big data use cases have moved beyond “possibilities” to multiple use patterns
including analytics and low-cost archival storage of large datasets. In
all cases, big data is thought of in an ENTERPRISE
sense, even if big data analytics that is performed by data scientists is
highly focused on a particular entity or problem such as customer behavior,
sensor data or trade patterns. Today,
data virtualization is playing a key role in exposing big data as abstracted
data services using RESTful interfaces and real-time query capabilities to
democratize access to this intelligence, and as such has enabled faster
enterprise adoption of big data alongside traditional databases, operational
systems and BI systems.
As
more data sources enter big data stores/processing (i.e. such as social media, sentiment, click
stream, weblogs, transactional data, data.gov web sites, web data, machine data, etc.)
enabling a data services interface with security and governance through data virtualization
will broaden the usage of big data for analytical and operational uses.
After
all, typical business drivers for big data relate to everyday business
processes and decision makers who are
trying to gain more customer insight, identify new market opportunities, stop
revenue leakage or fraud, enable mass personalization of products, improve
process efficiencies, reduce costs, etc. based on processing large amounts of
available data. Yet this is not being
done today due to high costs of storage & processing power needed. Big Data
technologies have momentum because they can now do this at very low relative
cost compared to databases and data warehouses.
So,
when companies think about big data initiatives at a technical level, they
mainly have two objectives in their minds:
- Run analytics on large volumes, velocity, or variety of data using distributed storage infrastructure and distributed scan-based processing instead of the typical consolidated data warehouses running queries. Sometimes the use of big data systems is just to store large volumes of data comprised of log files, call archives, etc. cheaply without incurring database costs.
- Build a layer of abstraction on top of the (big) data infrastructure that usually includes Hadoop or big file systems in order to offer their developers a much easier and/or flexible platform to create new enterprise applications based on big data use.
Currently,
much focus is on the former and involves
how to capture, store and retain big data and improve processing. In particular there is focus on Hadoop and
some NoSQL and on big data offerings in the cloud. While each model or technology offers
advantages, they also have drawbacks in terms of enterprise class features,
security, limitations of MapReduce paradigm to support real-time, query-based
interactions, etc. Vendors and
technologies will compete to alleviate these limitations and improve storage
and processing of big data over time. But that is not enough.
When
you refer back to the business drivers of big data it is clear that the
everyday business processes and decision makers involved in them need access to
the results of big data analytics in a simple, integrated fashion. This is the second objective to leverage big
data enterprise-wide and not become another data silo. Data virtualization is therefore a critical
part of the big data solution. It facilitates
and improves the use of big data in the enterprise by:
1.
Abstracting semi-and unstructured big data into relational-like views
2.
Integration with other enterprise sources
3.
Adding real time query capabilities to big data
4.
Providing full support for RESTful web services and linked data
5.
Adding security and other governance capabilities to the big data
infrastructure
6.
Helping to solve the siloed data/applications problem through a unified data
layer
There
is no doubt that the big data trend provides a huge opportunity to gain new and
valuable business insights from large volumes of data that were earlier
unavailable or uneconomical to capture. One part of the big data technology platform
focuses on finding the most cost-efficient and scalable ways to store and
process the big data. The other part of
the big data platform raises the level of abstraction of big data, enables it
to be easily discovered and queried as linked data services for use across
enterprise-wide application development. Data virtualization which provides this
abstraction, real-time query and data services capability is thus an essential
part of every big data platform.
About the Author
No comments:
Post a Comment