What is Dark Data?

Dark data is collected information and intelligence that is unused or serves no purpose, therefore it is usually not analyzed. Gartner’s IT Glossary defines the term as “information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes.”

Organizations often gather large volumes of data and much of it, though related to the organization’s business goals is often the extraneous results of data generation or outdated versions of raw data—such as profiles of ex-employees, web blogs, email correspondence, old financial records, etc. Also known as “dusky data”, organizations generally store all of their unstructured data in repositories, even if they haven’t made future plans for it. As a result, the IDC estimates that 90% of the unstructured data are never analyzed. But dark data is important to store properly because it may contain sensitive information that can lead to data breaches and regulatory compliance issues and cause harm to organizations.

Dark data is often just untapped data that can be valuable but is not recovered because of a lack of resources, skilled analysts and the sheer volume of dark data that exists. Now because of big data and AI tools, Machine Learning and data mining techniques, precious insights and a treasure trove of information can be excavated from dark data and turned into optimized data. Software such as RPA (Robotic Process Automation) automates and streamline operations. Dark data needs to be tended to regularly and organized within the repository.

Data analytics is traditionally linked to structured data but dark data analytics is the process of unearthing untapped data to find hidden opportunities.

What is Dark Data Analytics?

Dark data analytics sifts through three categories of information; traditional instructed data that already organizations have already stored (i.e. emails, documents); non-traditional unstructured data are usually media assets that cannot be processed through big data methods; and huge volumes of data found in the deep web which is curated from a variety of sources (government agencies, third-party domains, etc.).

Organizations should monitor dark data for the following factors:

  • Future leverage. Historic data such as customer records or trends over time can provides insight into creating business plans that creates better targets and strategies.
  • Lost opportunities. If organizations take the time and make the effort to wade through dark data, they will find overlooked materials that can be used to build business. The International Digital Corporation predicted that organizations that can analyze all relevant data and deliver actionable information could achieve an extra $430 billion in productivity gains over their peers by 2020.
  • Increased revenue. Unused data is wasted money. Dark data contains a wealth of previously unknown information that can be used to derive profits, decrease costs and increased ROI.
  • Minimize risks. Even if an organization’s dark data is unusable, it is important to store safely as to not run into compliance and regulatory issue.