Using Defensible Deletion and Auto-classification to Control the Data Explosion

In today’s technology-driven climate, the word “data” is used everywhere. It has become a buzzword of sorts—and it means many things to many people.

Using Defensible Deletion and Auto-classification to Control the Data Explosion

By: Benjamin Kennedy, Director, Epiq

In today’s technology-driven climate, the word “data” is used everywhere. It has become a buzzword of sorts—and it means many things to many people. As a society, we create, use, and store tons of data. Like individuals, organisations create an enormous amount of data every day. There are emails, text messages, instant messages, word documents, social media posts, and IP addresses, just to name a few.

The increased prevalence of mandatory breach notifications, the General Data Protection Regulation (GDPR) and other data privacy regulations, and increased regulatory and legal dispute activity are forcing organisations to have more control over their data, which in turn has led to increased focus and investment in information governance. Information governance is about understanding the data created within your organisation and where it is located. Knowledge of the data lifecycle within an organisation enables defensible data disposition, an increasingly important function for companies of all shapes and sizes to manage their data.

Defensible Data Disposition

Most organisations are drowning in the large volume of data they create. With data growth at an all-time high, and no signs of it slowing down, the ability to deal with massive amounts of data inside your organisation can be overwhelming. Not only does this generate higher storage costs, but keeping unnecessary data can have other severe consequences. For example, you may have been able to defensibly delete data that later turns into a smoking gun against you in a case. Another risk is that if your organisation is ever the victim of a breach, more data than necessary will be exposed, which could lead to potential fines and reputational harm.

Keeping unnecessary data is both risky and expensive. Deleting data, in accordance with a reasonable and enforced retention policy and schedule, and in accordance with any duties to preserve documents, that has no regulatory, business, or legal purpose has a number of benefits.

  1. Confidentiality: Deleting data helps maintain and protect proprietary information and trade secrets; and protects against data breaches and identity theft.
  2. Accessibility: An organisation can improve its ability to access preserved records more efficiently and effectively, thus making retained records more usable and valuable over time.
  3. Cost Savings: Regular disposal of unnecessary records and unstructured data, in accordance with a reasonable and enforced retention policy and schedule, and in accordance with any extant litigation holds, will substantially reduce the storage and related costs that result from unnecessary records and data retention. In addition, data reduction leads to more efficient systemic operations by permitting systems and backups to run faster, and also reduces the amount of time and cost required to search for and review records and data, e.g. in response to litigation and investigatory requirements.

Email, shared drives, and user shares get clogged with redundant content, personal multimedia files and aged data. Determining what information has ongoing business value and what can be deleted is a complex process. The challenge to remove data without value means most organisations end up hoarding and stockpiling this content and expanding their storage capacity year after year - eating away at restricted IT budgets.

Organisations can often recoup significant storage capacity by following a few simple policies. These suggestions may easily uncover tons of unnecessary data:

Redundant Content: Find duplicate files based on a hash value of the document content and eliminate duplicate copies and keep one version. A hash value is like a digital fingerprint, and only when the documents are byte-by-byte identical will it be considered a duplicate.

Aged Data: From metadata, look to last accessed and modified dates for data that has not been accessed in more than a specified number of years. The exact time period will vary between business units’ jurisdictions and the associated regulations. Outside counsel may assist with navigating the legal obligations to keep data. Searches for aged data not subject to preservation obligations and not containing intellectual property will isolate documents that are candidates for deletion. Upon review, if it is determined that this set of documents has aged and no longer has value it can be purged.

Abandoned Content: IT systems may store information from users no longer with the organisation. Electronic documents in storage keep the original and last author of content. The authorship information may also help identify documents that are candidates for deletion.

Multimedia: Music and video requires much more storage than written content such as email and business documentation. Searching and finding large multimedia files can identify users who store non-work related content that has no purpose on the network.

Classification

After deletion, the next step to controlling your data is to organise that data. In order to properly organise data there must be an understanding of what each document is and why it was created, also known as classification. This process can require the data owner to manually tag a document of its purpose and determine if it should be kept or deleted. Manually classifying information can be a challenge as it is extremely labour intensive. However, today there are tools that can auto-classify data. Auto-classification is a process that eliminates the need for humans to create and/or edit individual document metadata. The manual process is replaced with an automated solution, which analyses the text of the document and applies rules to systematise the classification process.

Auto-classification can accomplish the following:

  • Migration and Remediation: Auto-classification can move large amounts of data from one location to another for either archival or disposal. Rules can be set-up to determine if a document meets a business or legal purpose. If it does, it is moved to an appropriate archive, if it does not, that data can be automatically deleted.
  • Compliance, Regulatory, and Records Management: Auto-classification can flag data that contains specific topics, patterns, or phrases in order to ensure compliance with internal policies and procedures.
  • Security and Privacy: Auto-classification can identify documents that are sensitive in nature from a business perspective, such as company intellectual property, and ensure these documents are located in a secure storage area out of reach to unauthorised employees and the public. If sensitive data is found in unsecure locations, classification aids in identifying it and moving it to avoid any undue risk.
  • Litigation and Investigation Readiness: This allows organisations to categorise data for any potential litigations and to identify potentially relevant documents based on patterns of behaviour that may become relevant during internal or external investigations.

Once a document is properly classified, a business decision can be made to either archive the document or have it deleted.

Together, defensible disposition and auto-classification policies are a highly effective way to control your organisation’s data and maximise its value. First, unnecessary data is eliminated, freeing up storage. Next, classification allows for better organisation and management of the data that needs to be kept for business, regulatory, or legal reasons. When organisations know where their data is located and have minimised the amount of data, they can take proactive steps to protect sensitive data. With data breaches continuing to make headlines, some with significant financial consequences, efforts to protect a company’s information assets can help to reduce negative public exposure and associated monetary losses.

 

 

 

 

 

 

 

About the author: Benjamin Kennedy brings fifteen years' experience consulting within government and private sectors on litigation support. His technical skills, passion for new technology and legal knowledge are valued by clients when developing cost effective solutions for information review exercises of any size. Kennedy's primary focus is overseeing eDiscovery in Australia and New Zealand. He and his team assist in a variety of matters including, construction and commercial disputes, class actions, and forensic investigations that involve all manner of information sources. Kennedy regularly provides educational seminars for lawyers on eDiscovery, and speaks on advancements in the field.