Report highlights growing risks in managing confidential data

The amount of sensitive data that companies store in non-production environments, such as development, testing, analytics, and AI/ML, is growing, according to a new report. Executives are also increasingly concerned about protecting it—and injecting it into new AI products isn’t helping.

The Delphix 2024 State of Data Compliance and Security Report found that 74% of organizations processing sensitive data have increased the volume stored in non-production environments, also known as lower-level environments, over the past year. What’s more, 91% are concerned about the increased scope of exposure, which puts them at risk of breaches and compliance penalties.

The amount of consumer data that companies store is growing overall, due to the rise of online consumers and their ongoing efforts to digitally transform. IDC predicts that by 2025, the global data sphere will grow to 163 zettabytes, ten times more than the 16.1 zettabytes of data generated in 2016.

As a result, the amount of sensitive data being stored, such as personal data, protected health information and financial data, is also increasing.

Sensitive data is often created and stored in production or live environments, such as CRM or ERP, which have strict controls and limited access. However, standard IT operations often result in multiple copies of data to non-production environments, which allows more employees to access and increases the risk of a breach.

The report’s findings were the result of a survey of 250 senior executives at organizations with at least 5,000 employees who handle sensitive consumer data, conducted by software provider Perforce.

SEE: National Public Data Breach: 2.7 Billion Records Leaked on Dark Web

More than half of companies have already experienced a data breach

More than half of respondents said they had experienced a breach of confidential data stored in non-production environments.

Other evidence confirms that the problem is getting worse: An Apple study found that data breaches are expected to increase by 20% between 2022 and 2023. Indeed, 61% of Americans have learned that their personal data has been compromised or compromised at some point.

The Perforce report found that 42% of respondent organizations have experienced ransomware. This malware, in particular, is a growing threat worldwide; a Malwarebytes study released this month found that global ransomware attacks have increased by 33% over the past year.

Part of the problem is that global supply chains are becoming longer and more complex, increasing the number of potential entry points for attackers. A report by the Identity Theft Resource Center found that the number of organisations affected by supply chain attacks increased by more than 2,600 percentage points between 2018 and 2023. Payouts also exceeded $1 billion (£790 million) for the first time in 2023, making it an increasingly lucrative attack for attackers.

The biggest culprit when it comes to consumer data insecurity is artificial intelligence

As companies now implement AI into business processes, it becomes increasingly difficult to control what data goes where.

AI systems often require the use of sensitive consumer data for training and maintenance, and the complexity of the algorithms and potential integration with external systems can create new attack vectors that are difficult to manage. In fact, the report found that AI and ML are the leading causes of the growth of sensitive data in non-production environments, as cited by 60% of respondents.

“AI environments may be less controlled and protected than production environments,” the report’s authors wrote. “As a result, they are more easily compromised.”

Business decision-makers are aware of this risk: 85% report concerns about non-compliance with regulations in AI environments. While much of the AI regulation is in its infancy, GDPR requires that personal data used in AI systems be processed lawfully and transparently, and in the US, state laws vary.

SEE: Artificial Intelligence Executive Order: White House Releases 90-Day Progress Report

The EU’s Artificial Intelligence Act came into force in August, setting strict rules on the use of AI for facial recognition and security for general-purpose AI systems. Companies that fail to comply face fines ranging from €35 million ($38 million) or 7% of global turnover to €7.5 million ($8.1 million) or 1.5% of turnover, depending on the violation and the size of the company. Similar AI laws are expected to appear in other regions in the near future.

Other concerns about sensitive data in AI environments, cited by over 80% of Perforce survey respondents, include the use of low-quality data as input to AI models, re-identification of personally identifiable information, and theft of model training data that may include intellectual property and trade secrets.

Businesses are concerned about the financial costs associated with lack of data security

Another major reason why big companies are so concerned about insecure data is the prospect of hefty fines for non-compliance. Consumer data is widely covered by expanding regulations like GDPR and HIPAA, which can be confusing and change frequently.

Many regulations, such as GDPR, impose fines based on annual turnover, so larger businesses face higher fees. The Perforce report found that 43% of respondents had already had to pay or correct non-compliances, while 52% had experienced audit issues and failures related to non-production data.

But the cost of a data breach can exceed the fine, since some of the lost revenue comes from halted operations. A recent Splunk report found that the biggest cause of downtime incidents was cybersecurity-related human error, such as clicking on a phishing link.

Unplanned downtime costs the world’s largest companies $400 billion annually, including direct revenue loss, reduced shareholder value, stagnant productivity, and reputational damage. Ransomware damage costs are expected to exceed $265 billion by 2031.

According to IBM, the average cost of a data breach in 2024 is $4.88 million, up 10% from 2023. The tech giant’s report added that 40% of breaches involved data stored in multiple environments, such as the public cloud and on-premises, with an average cost of more than $5 million and the longest time to identify and contain. This shows that business leaders are right to worry about data sprawl.

SEE: Nearly 10 Billion Passwords Leaked in Biggest Compilation of All Time

Taking steps to secure data in non-production environments can be resource-intensive

There are ways to secure data stored in non-production environments, such as by masking sensitive data. However, the Perforce report found that companies have several reasons for being reluctant to do this, including that respondents consider it difficult and time-consuming, and because it can slow down the organization.

Nearly a third of users fear this could slow down software development because it can take weeks to safely replicate production databases to non-production environments.
36% believe that masked data may be unrealistic and therefore affect software quality.
38% believe security protocols can make it difficult for a company to track and comply with regulations.

The report also found that 86% of organizations allow data compliance exceptions in non-production environments to avoid the hassle of storing data securely. These include using a limited set of data, data minimization, or obtaining consent from the data subject.

Recommendations for securing sensitive data in non-production environments

The Perforce team outlined four key ways organizations can secure their sensitive data in non-production environments:

Static data masking:Permanently replace sensitive values with their fictitious but realistic counterparts.
Data Loss Prevention (DLP):A perimeter security approach that detects and attempts to prevent potential data breaches and theft.
Data encryption: Temporarily converts data to code, allowing only authorized users to access the data.
Strict access control:Policies that categorize users by roles and other attributes, and configure those users’ access to sets of data based on those categories.

The authors wrote: “Protecting sensitive data in general is not easy. AI/ML increases this complexity.

“Tools that specialize in protecting sensitive data in other non-production environments—such as development, test, and analytics—are well-positioned to help protect your AI environment.”