Securing Google AI Data: Why Unstructured Data Poses a Risk and What to Do About It

calendar08/14/2025
clock7 min read
feature image

Attending Google Cloud Next this year cemented one thing for me: the AI era is here, and it continuously transforms how we work. As AI adoption accelerates – especially with Google’s AI toolset, such as Gemini, Agentspace, and NotebookLM – businesses face a new reality: none of these work without secure data.  

Unstructured data, such as meeting notes, customer support tickets, and team chats, among others, are both the fuel for AI and a fault line for risk. As a source of rich insights, it’s essential, but without the right controls, misconfigured permissions can lead to accidental exposure. This is the problem our customers face across their multi-cloud data, and it’s why we’re helping enterprises secure their data for confident, scalable AI adoption.

The Real Risk Lurking in Your Data

Traditionally, data security efforts have focused on tools that utilize structured data such as customer relationship management platforms (CRMs), databases, and spreadsheets. That made sense when most business logic was handled in apps with rigid schemas.

But AI changes the game. Gartner projects that by 2026, 75% of organizations with generative AI (GenAI) initiatives will focus their efforts and spending on unstructured data security initiatives. That’s a massive shift, and one that many teams are unprepared for.

To give context, unstructured data presents a dual challenge:

  • More valuable to AI: Large Language Models (LLMs) train on documents, summaries, and creative content, making this data the fuel for AI-driven insights.
  • More vulnerable to exposure: These files are casually shared, stored in forgotten folders, or mislabeled, and are rarely governed with the same rigor as structured datasets.

We’ve seen a common scenario play out across the industry: An employee used Google Gemini to draft a strategy document. Google AI, doing its job, surfaced relevant files, including a confidential budget sheet stored in the same shared drive. The issue is that the sheet was supposed to be restricted, but outdated permissions from years ago had left the file accessible to a wider group than it should have been. This kind of accidental overexposure happens more often than you’d think. It’s a sobering reminder that even well-intentioned AI use can lead to serious data risks if access controls aren’t properly managed. It’s not an AI problem; it’s a data problem. AI just exposes the existing vulnerabilities.

Google Workspace Has Strong Native Security, but Organizations Need More

Shane Donahue, one of our Senior Solutions Engineer, gave us a great walkthrough on our recent webinar of the security tools built into Google Workspace. These tools provide a solid foundation for securing your data:

  • Admin controls for Gemini access: Admins can easily toggle Gemini features on or off for users, groups, or entire Organizational Units (OUs).
  • External sharing settings: These controls allow admins to define who can share files outside the organization and with which domains.
  • Classification labels: Also known as sensitivity labels, these can be applied to documents (e.g., low, medium, high) to support governance workflows and alert users to the sensitivity of data.
  • DLP rules and real-time enforcement: Google’s Data Loss Prevention (DLP) engine can detect sensitive data and block external sharing almost instantly, acting as a critical last line of defense.

These tools are powerful, but as Shane pointed out, organizations need more visibility and control if they’re going to prevent data issues from occurring:

  • Audit logs only go back 180 days. This means older files that haven’t been touched recently can’t be remediated using the native investigation tool. A file with open permissions from two years ago is completely invisible to this process.
  • Internal sharing isn’t directly regulated. The controls are heavily focused on preventing external exposure, leaving a major blind spot for internal oversharing.
  • Access toggles are broad. As Shane mentioned, the Gemini app control basically turns on or off Gemini entirely regardless of license. There’s no middle ground. Theres no way to allow Gemini to access only specific folders or shared drives — its all or nothing.

I wholeheartedly agree with Shane that these are hammer controls, not scalpel controls. Theyre designed to be broad and impactful, but they lack the granularity and long-term visibility needed for true proactive data governance. This confirms that AI isnt the root cause of these issues; it’s simply a spotlight on the underlying data problems that have been there all along.

Data Security Better Together: Proactive, Granular Risk Management and Governance

As we discussed these built-in Google tools, it became clear that while they offer a strong foundation, a complementary solution is required when it comes to internal visibility and long-term governance. The AvePoint Confidence Platform builds on Google’s native capabilities to give organizations the deeper visibility and control they need to anticipate risk before it becomes a problem you must investigate. Some of the core features are as follows:

  • Real-time scanning: We continuously scan the current state of your environment, so nothing gets missed due to inactivity or age. We can find those risky files from five years ago just as easily as we can find a new one.
  • A risk matrix that blends sensitivity and exposure: We don’t just flag sensitive files; we assess how widely they’re shared to help you prioritize what needs attention first. A confidential file shared with 50 people is a much higher risk than one shared with two.
  • Bulk remediation and lifecycle controls: Admins can act across multiple files at once, from revoking access to setting expiration dates, and even managing retention policies. This allows teams to clean up legacy data at scale and set new policies to prevent future issues.

This approach helps organizations achieve two critical goals:

  1. Clean up legacy data. Files that haven’t been touched in years should be surfaced, assessed, and remediated, so old permissions don’t become new liabilities.
  2. Prevent breaches before they happen. Identify risky configurations early so teams can fix issues proactively before sensitive data ever gets exposed through an AI tool or an unauthorized user.

Agentic AI and the Future of Data Governance

While the AvePoint Confidence Platform helps organizations take control of their data today, were also closely monitoring where AI is headed. The future is about agentic AI, where AI assistants become more like colleagues than tools. They wont just automate tasks; they will redefine business processes by making decisions, coordinating workflows, and reshaping how teams operate.

Google’s Agentspace aims to unify siloed data across multi-cloud environments. As organizations expand across platforms, data fragmentation is re-emerging. Agentspace is Google’s response to that challenge — a platform designed to bring scattered data together. AvePoint supports this vision by enabling unified governance across these multi-cloud environments, ensuring that data remains secure and compliant as it gets more integrated.  

Getting Ahead of Risk: How to Secure Google AI Data

As we wrapped up the session, I thought about the teams just beginning to align their AI goals with data governance. If that sounds like you, here are some security strategies for Google AI in enterprise workflows I’d suggest based on what we’ve seen work:

  1. Get visibility first. Before you can secure anything, you need to know what’s out there. Start by surfacing all of your unstructured data, especially older files that may still be accessible but haven’t been touched in years.
  2. Don’t just look at sensitivity; look at exposure. A sensitive document shared with two relevant people might be fine. But a highly sensitive one with a “share with anyone” link? That’s a bigger risk. Its the combination of sensitivity and exposure that tells the real story.
  3. Build policies that can grow with your AI use. AI is only going to become more embedded in how we work. Set up lifecycle policies now – including classification, retention, and access expiration – so your data stays clean and ready for whatever’s next.

The need for clean, secure, and well-governed data remains constant in the AI-driven advancement we’re experiencing today. Whether youre working with Gemini today or preparing for agentic AI tomorrow, securing unstructured data is essential for safe, effective AI adoption. AI is only as good as the data behind it. Let’s make sure that data is secure, compliant, and ready to drive results.

Access the full recording on demand

author

Ava Ragonese

Ava Ragonese is a Product Marketing Manager at AvePoint, leading the GTM of data security solutions for Google Workspace and Cloud. She helps organizations focus on quality data and insights to drive innovation and how multi-cloud collaboration can impact businesses. Ava has a M.Eng. in Systems Analytics from Stevens Institute of Technology and enjoys bringing her technical acumen to complex business decisions such as AI adoption.