Understanding Personal Data in Data Processing

Personal data (also referred to as PII – Personal Identifiable Information) identifies an individual. In that personal data, a name by itself is not enough to identify an individual, however, a name including the address would. For example, when you are called by an operator in a call centre, you will often be asked for your name, address and date of birth. That is an example of personal data. What they do with your personal information goes into data processing, that is, to store, sort, workflow (further processing) and retain till archive (disposal) that data. The controller of that data (that is, the owner of that call centre) is responsible for ensuring that data is stored securely on their system.

However, no matter how secure the system is, or in fact, any data processing system is, social engineering attacks is a key threat vector. The weakest part of an system is the people, since they are the easiest to deceive and manipulate. One of the weakest aspects is through impersonation attacks. Impersonation attacks are where someone plays the role of someone who is likely to be trusted (whose personal information was obtained through data processing). In other words, one of the ways to get enough personal information of an individual to impersonate is through getting their personal information in the first place, and, to a large degree, ‘data processing’ is where that information can be obtained.

Through the data processing, there are three key elements:

The Subject, who owns the data.
The Controller, who is responsible determining how the data is processed.
The Processor, who is responsible for the data processing of the subject data.

This can be represented in virtually any process, let’s take a basic SharePoint example:

The Subject uploads data into a SharePoint site.
The Controller manages the site where the data is uploaded and determines how the data is processed (also known as a data owner on the site)
The Processor holds that data from the point where it has been uploaded, to the point where the data is disposed of (this is generally some technical processes in the repository where the content has been uploaded, e.g. storage, workflow, disposition, security).

There are some assumptions to the above example in terms of the management of Personal information:

The Subject assumes that the data being uploaded is securely managed (it can be security classified, can be readily accessed, and remain intact).
The Controller assumes they are accountable to that data being serviced by making sure the processor is sufficiently set to securely that content and marked with an identifier of the subject. A basic example in SharePoint is that data uploaded gets marked with the name of the individual uploading the data, the time when that occurred and history of working with that data is recorded.
The Processor assumes that it is designed in such a way that personal information is secure; that any alterations, changes, modifications, workflow is known to the subject.

A conundrum is the sheer collaboration aspect; that is, who has access to that data, and, unfortunately, that some controllers misunderstanding that security is somehow, the product. It is not. Any system can be bypassed using social engineering if that data is compromised. The personal information of the Subject can be accessed by anyone who has the view access to that data. For example, in the case of SharePoint Online, when a Subjects name is clicked in a repository, a list of other documentation recently uploaded / accessed is visible, along with a link to getting more information through Delve. Additionally, on the Delve screen is a further link to the Subjects’ OneDrive, and from there visibility of the Subjects’ team. In terms of technology this example works slightly different in SharePoint On-Premise, however, the outcome is the same.

The challenge is that irrespective of the tools used to provide the nature of storage, availability, integrity of data that there needs to be some thinking in managing the security of the data as well as personal information. Lets’ look at some of the requirements on those three elements:

From the Subject, there needs to be understanding of how much information they enter about themselves using Delve (About Me, Projects, Skills and Expertise, Schools and Education, Interests and Hobbies) as this affects PII.
From the Controller, they need to provide awareness to the subject that the data they provide is secure, and the level of accessibility to that data, and where that data is located.
From Processing, the design of repositories needs to include the management of storage of data, classification, workflow relevant to how personal information is passed. For example, the use of a form to capture information relevant to content uploaded may require the subject to ensure personal information, maybe even beyond what is already supplied in Delve. If this is the case, the Controller should make the subject aware of the kind of information gathered, and what the Processing element will do with that data, and how long that data will exist in the repository. Another example is where the data uploaded includes Personal Information and through machine learning metadata is extracted and posted as viewable column data in a repository. And this does stop at metadata. The very design of code used to carry out further processing through workflow should be scrutinised since the level of Personal Information recorded needs to meet legislative laws; in particular, the Data Protection Act 1998 UK law covers this including sensitive personal information to which you should have the Subjects consent beforehand to process.

Summary.

Managing Personal information is a crucial aspect data security; the key aspect to understanding how to prevent breaches of personal data through impersonation or masquerading.

This short article is designed to get you thinking of the actual service delivery in managing personal information not just for SharePoint (either Online or On-Premise); even beyond to endpoints connected to it; and then even beyond into the roadmap surrounding Office365. In a data processing system, the fundamental requirement is to secure the content through its lifecycle. The challenge is ensuring that the features of the tools involved are fully understood when it comes to the storage of personal information, and that security awareness, policies, and training is provided to subjects and data controllers. Legislation is important to understand in this area and a good checklist to start from is here.

The key element, the Subject, can be made more aware of storing personal information in content they upload. A good article for working say through a Word document and removing personal identifiable information is here.

A distinct point to make is that Microsoft cloud applications are by their very nature tools used to upload and transmit information by customers through their relevant tenants. For example, whilst Azure services covers tools for system maintenance, infrastructure, security, record retention, information management, system development, there are data processing activities which the tenants manage. They must ensure that they are responsible for ensuring that information stored or transmitted through the services is securely managed, in that they at the very least should have carried out a security risk assessment against any data processing / controlling surrounding managing personal Information. The key thing to remember, is that security is not a product – the responsibility of securing data through its processing lies with the Controller of that data – the tenant owner, the SharePoint site owner, and therefore the organisation responsible for providing those sites.