The digital age has ushered in an era of unprecedented data growth. Interestingly, the IDC forecasts that by 2025, a mind-blowing 175 zettabytes of data will populate the world, with unstructured data accounting for an overwhelming 80% of it. Unstructured data refers to data that doesn’t have a pre-defined data model or isn’t organised in a specific manner, often encompassing formats like text, images, videos, or social media posts. But here’s the catch: a staggering 90% of this unstructured data remains unanalysed, predominantly due to the complexities associated with its extraction and transformation.
For many organisations, analysing such vast amounts of unstructured data has remained a herculean challenge, often necessitating a plethora of tools and resources. However, Google Cloud’s recent leaps in generative AI, including the advent of functional models for text and vision, promise to change this narrative by enabling data teams to harness this latest data potential capably.
In this blog, we will give you an overview of what was announced at Google Cloud Next ‘23 and some insights from our team at CloudSmiths on what this means for the future of data analysis.
BigQuery’s Evolution
One of the most notable breakthroughs is the integration of BigQuery with Vertex AI foundation models, which facilitates the effortless analysis of unstructured data directly within BigQuery. This novel approach promises many advantages:
- Eliminates the need to build and manage data pipelines between BigQuery and generative AI model APIs
- Streamlines governance and helps reduce the risk of data loss by avoiding data movement
- Reduces the need to write and manage custom Python code to call AI models
- Enables you to analyse data at petabyte-scale without compromising on performance
- Can lower your total cost of ownership with a simplified architecture
The backbone of this integration is the BigQuery ML inference engine. The remarkable year-on-year query growth that BigQuery ML has registered in the past two years is a testament to its burgeoning popularity.
PaLM 2 and BigQuery: An Overview
Taking the leap into advanced text processing is now a reality with the integration of the PaLm 2 model (text-bison) into BigQuery ML. With a few SQL lines, data teams can embark on sophisticated text-processing tasks such as summarisation and sentiment analysis. The process leverages the ML.GENERATE_TEXT function, which in turn invokes the Vertex AI text-bison models from the Model Garden.
Unveiling Real-World Applications
The practical applications of BigQuery and Vertex AI foundation model integration are vast and varied, spanning numerous industries. For instance, using ML.GENERATE_TEXT can simplify advanced data processing tasks:
- Content generation: Analyse customer feedback and generate personalised email content right inside BigQuery without the need for complex tools
- Summarisation: Summarise text stored in BigQuery columns such as online reviews or chat transcripts
- Data enhancement: Obtain a country name for a given city name
- Rephrasing: Correct spelling and grammar in textual content such as voice-to-text transcriptions
- Feature extraction: Extract key information or words from large text files such as in online reviews and call transcripts
- Sentiment analysis: Understand human sentiment about specific subjects in a text
CloudSmiths weighs in: Bridging the gap between Unstructured Data and Business Insights
Tom Fowler, CTO at CloudSmiths, shared his insights with us on this development, “Harnessing the power of unstructured data has been one of the white whales of our industry, Google’s integration of BigQuery with Vertext AI marks a significant step towards achieving this goal. The seamless synergy between these platforms can genuinely revolutionise how we understand and interpret data. It’s an exciting time for data enthusiasts like myself and businesses alike.”
Hardus Swanepoel, Head of Innovation at Cloudsmiths, recently shared his thoughts on the announcement. “Unstructured data has always been akin to an iceberg in the ocean of analytics, yet its true depth and potential remain hidden beneath the surface. Google Cloud’s latest advancements promise to change that narrative,” he begins.
Emphasising the sheer volume of unstructured data that remains unused, he added, “Think of the countless insights lying dormant, just waiting to be unearthed. With Google’s recent strides in gen AI and BigQuery’s integrations, we’re closer than ever to turning this data into actionable business intelligence.”
Swanepoel was particularly enthusiastic about the possibilities this integration holds for real-world business applications. “The thought of seamlessly merging customer sentiment with first-party data, all within BigQuery, is nothing short of revolutionary. The potential use cases across industries are limitless. From tailoring marketing campaigns based on customer feedback to enhancing data quality in real-time, the sky’s the limit.”
He also touched upon the streamlined architecture that this integration promises. “The beauty of this advancement isn’t just in its capabilities but in its simplicity. Eliminating the need for intricate data pipelines and custom coding is a boon for data teams. It’s about making powerful analytics more accessible, more user-friendly, and ultimately more impactful.”
Concluding his thoughts, he looked towards the future, “As Head of Innovation at CloudSmiths, I’ve always championed technologies that bridge gaps and catalyse growth. Google’s move is a monumental step in that direction. It signals a shift in the industry, and I’m eager to see where we go from here. The horizon of data analytics has expanded, and we’re ready to explore what lies beyond.”
Final Thoughts
Google Cloud Next ‘23 has once again reaffirmed Google’s commitment to innovation and advancing the realm of data analysis, especially in the sphere of unstructured data. Their ambitious strides in integrating BigQuery with Vertext AI foundation models are more than technical feats; they represent a future where data, irrespective of its format, becomes a comprehensible and actionable asset for businesses. The insights from experts at CloudSmiths underscore the transformative nature of these advancements.
While the IDC prediction paints a picture of an overwhelming data landscape by 2025, technologies like these ensure that we’re not just passive observers but active participants capable of deriving meaning and insights from this data deluge. The future of data analytics, as presented by Google and perceived by us at CloudSmiths, is not just about volume but the depth of understanding and the breadth of application. The world of unstructured data might have remained a mystery for many years, but the tools to decipher its enigma are now within our grasp. The voyage into this vast ocean of data has just begun, and the discoveries promise to be as transformative as they are numerous.
To learn more, visit the Google Cloud documentation page or try this tutorial to extract keywords from text.