Data Centric AI? Yes, but…
2024-11-6 17:0:8 Author: hackernoon.com(查看原文) 阅读量:0 收藏

I used to build and sell data warehousing solutions. Like many in the field, I’ve had my share of successful and failed implementations. Deep inside I still occasionally wish for the utopia of multi-tier 3NF highly-integrated data architectures. If you happen have one in your company, good for you. But most companies are not you. Many companies chased this utopia. Some chase them to this day.

With Generative AI at the peak of the hype cycle as of this writing, there’s an interesting re-emergence of the data warehouse gospel, from luminaries such as Andrew Ng. There’s an insistence that companies must first build a data warehouse before venturing into AI. For me this is a dangerous déjà vu. While the importance of clean, structured data can’t be overstated, many organizations have struggled for years to build data warehouses—not just due to cost but because their data is simply not ready to be warehoused. For these companies, forcing data into a rigid warehouse structure before pursuing AI is an exercise in futility.

Let’s take a contrarian view.

Andrew Ng, a strong advocate for data-centric AI, contends that quality data is paramount and that careful preparation of data can significantly reduce the volume of data required for successful AI. While it’s true that high-quality, “smart-sized” data can make AI more effective, the practical limitations many businesses face with data readiness are immense. AI’s strength is its ability to work with less-than-ideal data, providing insights without the prerequisite of extensive, curated datasets. Ng’s argument overlooks the fact that many organizations lack the time, resources, or data structure to meet the high bar of pristine data, especially at the onset of AI integration.

Here’s the reality: Generative AI is already useful even without a database or data warehouse. Today’s AI models are highly adaptive, capable of working with unstructured or semi-structured data. AI can sift through messy data and still find patterns that deliver actionable insights, making it a game-changer for businesses, large and small, that want to start benefiting from AI without waiting for the perfect data infrastructure to be in place.

Ng’s perspective is valuable but assumes a near-ideal data environment that is out of reach for many, especially for smaller businesses or those without established data infrastructure. AI tools today offer flexibility, allowing companies to generate value from their existing, often imperfect, data rather than waiting for perfectly “y consistent” datasets, as Ng suggests. By bypassing the strictures of curated data at the beginning, AI can provide significant returns on investment in a fraction of the time it takes to assemble a “perfect” dataset.

Let’s also talk about data warehousing itself. For years, it was heralded as the ultimate solution for business intelligence and analytics. Yet many companies that poured resources into building warehouses never reaped the promised rewards. They struggled with integration, maintenance, and scaling issues, leaving many wondering if the hype around data warehousing ever really delivered. In many cases, it didn’t. It was part of an old technology cycle that ran its course, and now, as AI takes center stage, the relevance of data warehousing is increasingly being questioned.

But let’s be clear, I am not advocating against data warehousing, just the unfair sequence of events we seem to be hearing. Companies can use Generative AI now to achieve some business value, then later they can certainly use it to help with data warehousing efforts.

In fact, AI can actually help prepare data for warehousing. With machine learning models designed to automate data cleaning, transformation, and categorization, AI can assist in bringing order to chaotic data sources. In this sense, AI could come first and act as the very tool that lays the groundwork for a more structured data ecosystem later on. Instead of waiting for a flawless data warehouse, companies can use AI to build towards it—on their own terms and timelines.

As AI democratizes industries by providing accessible, intuitive tools for a broad spectrum of users, data warehousing often serves the limited interests of IT departments. While the old-school data engineers and architects who championed warehousing are still valuable, they’re increasingly at risk of being left behind in the AI age unless they adapt. The promise of a centralized data warehouse serving the entire enterprise’s needs has been largely confined to niche applications, while AI is democratizing data access and insights across functions, from marketing and operations to HR and finance.

Generative AI is a force multiplier for organizations looking to unlock value from their data immediately. The tools are already available, and they don’t require years of building a perfect data warehouse. We need to stop framing data warehousing as a prerequisite for AI when, in many cases, it’s AI that can prepare organizations for effective data management down the road.

It’s time to flip the narrative. I think AI can, and now often should, come first—not just because it delivers immediate results but because it helps solve many of the very problems data warehousing was supposed to address but never fully did. So let’s move past this outdated mindset. AI’s flexibility, accessibility, and transformative power are here today, ready to be harnessed. Why wait for the cumbersome, expensive promise of data warehousing when AI can already deliver on its potential?

About Me: 25+ year IT veteran combining data, AI, risk management, strategy, and education. 4x global hackathon winner and social impact from data advocate. Currently working to jumpstart the AI workforce in the Philippines. Learn more about me here: https://docligot.com


文章来源: https://hackernoon.com/data-centric-ai-yes-but?source=rss
如有侵权请联系:admin#unsafe.sh