When I think about unstructured data, I see my colleague Rob Gerbrandt (an information governance genius) walking into a customer’s conference room where tubes of core samples line three walls. Each contains carefully extracted and preserved layers of planet Earth that differ in color and texture. While most of us would see dirt and rock, Rob sees unstructured data. His client gestures around the room and says, “This is mission-critical information. How can you help us with it?”
Unlike that energy company, many organizations have yet to feel an urgency to capitalize on the value of their vast reservoirs of unstructured data. After all, we in the information management and technology industry have talked at length about unstructured data since “Big Data” was big news more than a decade ago. What’s different now? Advances in AI, particularly generative AI, have made deriving value from unstructured data easier.
The evolution of AI and the use of structured and unstructured data
When discriminative AI rose to prominence in sectors such as banking, healthcare, retail, and manufacturing, it was primarily trained on and used to analyze, classify, or make predictions about unstructured data. Applications such as financial forecasting and customer relationship management brought tremendous benefits to early adopters, even though capabilities were constrained by the structured nature of the data they processed. Structured data lacks the richness and depth that unstructured data (such as text, images, audio, and video) provide to enable more nuanced insights.
Since those early days, the ratio of structured and unstructured data has shifted as the internet, social media, digital cameras, smartphones, digital communications, etc. have encouraged the creation of unstructured data. Over the past 20-odd years, unstructured data has grown in volume, making up 90% of the data created last year, according to IDC estimates. Yet IDC says that “master data and transactional data remain the highest percentages of data types processed for AI/ML solutions across geographies.” But that was before generative AI became a sensation in the form of ChatGPT. Generative AI thrives on unstructured data and, in a recent survey conducted by Vanson Bourne on behalf of Iron Mountain, 93% of IT and data decision-makers said their organization already uses generative AI.
Tapping into unstructured data reservoirs
While a growing volume of unstructured data exists in digital form (such as PDFs, JPEGs, MP4s, etc.), much of it is still stored in physical, or analog, formats such as paper, tape, film, microfiche, etc. Digitizing relevant physical assets and objects, such as those core samples, IT equipment, office equipment, etc. and applying and enriching metadata helps organizations take a big step toward innovating with generative AI.
For example, generative AI models are especially adept at making sense of diverse, unstructured datasets to create realistic content, enhance data for machine learning training, simulate and model complex scenarios and environments, and personalize algorithms for targeted marketing and product recommendations.
What’s hiding in your unstructured data?
While every enterprise has unique physical and digital assets, the following are examples of the power that may be hidden in your unstructured data. Generative AI uses:
- Natural language text, including customer reviews, support tickets, emails, and other documents, to create chatbots that generate automated responses, summarize large volumes of text, customize and personalize content, and assess risks associated with different contractual arrangements.
- Images and videos to analyze behaviors and create synthetic, realistic images and videos for training AI systems, enhancing privacy by avoiding using real imagery.
- Audio recordings to train AI models for speech recognition and sentiment analysis and to generate synthetic voices for virtual assistants and digital avatars.
- Social media content, such as tweets, posts, and other user-created data, to analyze trends and public sentiment and predict consumer behavior. ·
- Sensor and IoT data for predictive maintenance, supply chain optimization, and product design enhancements applications.
- User-generated content, such as blogs, forums, and customer feedback, to understand customer preferences, improve product recommendations, and tailor user experiences.
- Biometric data, such as fingerprints, facial recognition data, and DNA sequences, in sectors such as security and healthcare to train AI models for identification and diagnostic purposes.·
While new applications of generative AI are continuously emerging, so are challenges related to unstructured data and generative models.
Balancing reward and risk
Once in the shadows, unstructured data is pivotal in helping generative AI enable human creativity and problem-solving. Organizations are converting relevant paper documents, analog audio, and video tapes into digital formats while looking for advanced data cleaning, normalization, and enrichment tools to improve the quality of data fed into generative AI models. As enterprises collect and use more unstructured data, concerns about data privacy and the ethical use of AI are growing. Meanwhile, storing, managing, and processing large volumes of unstructured data presents problems of scale and complexity, causing enterprise decision-makers to reconsider their asset management strategies.
Learn more about seizing AI opportunities while overcoming challenges with “AI in the Information-Rich Enterprise,” a paper and video podcast by Moor Insights and Strategy sponsored by Iron Mountain. These materials explore the evolution of AI, the challenges of “driving meaningful outcomes,” and the role of a unified asset strategy in helping organizations succeed with their AI initiatives.
[content tag]
AI/ML