- Rob Sobers, technical manager at Varonis
(www.varonis.com), says:
Considering the
concerns organizations may have about managing human-generated big data, the
fact that Web crawlers are now scouring the many pages of the Web and indexing
that information shows that companies need to treat their human generated
content more carefully.
The Web is full of
all kinds of data - some of which is being cleaned up and interpreted by
services such as Infochimps or Microsoft’s Windows Azure Marketplace. Other
services – such as Datafiniti and Factual – are also, it is reported, building
entire businesses based on scraping the content from Web sites and then
creating customized databases for clients.
Other companies -
like Yelp, which went public based on the content provided by users, content
that it then vigorously defended from Google’s indexing - are also taking
advantage of human-generated data (e.g. reviews and comments) to enrich
themselves.
More than anything,
this highlights the fact that the power of the content that humans create is
only now being realized. And, of course, organizations have a lot of human
generated content, some on the web, but most resides inside their
organizations.
It is now clear
that external entities can glean enough information from the Internet to make a
business, but the data held inside most organizations is largely untapped by
external organizations (thankfully) at present.
With a growing
number of companies whose business is harvesting and interpreting externally
available data, it should be clear to any IT professional that - as well as
protecting their internal data from
external and prying eyes - they also need to harness its power to take
advantage of new opportunities.
This is where the
idea of human generated big data – which is defined as data sets generated by
people (rather than machines) that grow so large and change so quickly that the
content, usage, and permissions become difficult to capture, store, search and
analyze – enters the frame.
The same kinds of
big data technologies that crawl web sites can also be applied to human
generated content inside the organization so corporations can better manage,
protect, and take more advantage of their human generated content – their
documents, spreadsheets, emails, presentations, audio and video files, allowing
companies to analyze it and benefit from its “hidden” value. It is also extremely important, of course, to
prevent the human generated big data from being stolen by anyone – whether an
insider or an information harvesting company.
This makes it all
the more imperative that companies understand the need to use analytics on
their human generated big data. Conventional data security software rarely
provides adequate coverage for this content, so IT professionals need to take
action to fully protect all of their digital data assets.


No comments:
Post a Comment