Purview Information Protection Deep Dive Pt2 - Automatic Labelling

Part two of a deep dive series on Purview Sensitivity Labels:

Part 1 - Manual Labelling
Part 2 - Automatic Labelling
Part 3 - Recommendations and Limitations

Automatic Labelling

Manual Labelling is often the introduction to Sensitivity Labels, but eventually you will want to look at automation. In particular, it will be needed to add coverage of data at rest. Manual labelling, even if mandatory, only applies when an existing file is re-saved.

The main blocker for many companies is that Automatic Labelling is not available with an E3 license. It requires either Microsoft 365 E5, or E3 plus Microsoft 365 E5 Compliance or Microsoft 365 E5 Information Protection and Governance.

There are two options for automatically applying Sensitivity Labels:

Client-side auto-labelling for files and emails
Service-side auto-labelling for SharePoint Online, Exchange Online and OneDrive

Client-side Auto-labelling

Auto-labelling for files and emails is a client option, configured per-label, that either automatically applies the label, or recommends it to the user based on pattern matching in the content.

Client-side auto-labelling only occurs when an item is being created or edited, it doesn’t apply to data at rest or in-transit. The auto-label policy is based on matching one or more Sensitive Info Types or Trainable Classifiers. For example, if the content matches a passport number.

Client Auto-Labelling

Service-side auto-labelling

Service-side auto-labelling is a separate option in the Information Protection blade of the Purview Compliance Center. This option is not dependent on supported client apps or reliant on user adoption. Labels are applied by the back-end M365 services.

Create an Auto-labelling policy to apply one of the pre-defined labels to unlabelled data at rest in supported locations.
The policy wizard has pre-defined Templates for matching content based on well-known regulatory and enterprise requirements. For example, the UK Financial Template includes the Sensitive Info Types that match credit card numbers, debit card numbers and SWIFT bank codes in the scanned content.

Alternatively, the Custom Template option creates a policy with bespoke pattern matching, as follows…

First select one or more of the supported storage locations to apply the automatic labelling:

Exchange Online
SharePoint Online
OneDrive for Business

SharePoint and OneDrive both support auto-labelling of data at rest. Exchange Online only supports auto-labelling of data in-transit.

Client Auto-Labelling

Next create rules containing include and exclude Conditions to identify items in-scope. A rule can contain multiple conditions and it can apply to all three storage locations, or be specific to a particular one.

Conditions available for SharePoint Online and OneDrive for Business:

Condition	Description
Content is shared	Inside or Outside the organisation
Content Contains	Sensitive Info Types and/or Trainable Classifiers

Exchange Online has many additional Conditions:

Condition	Description
Content is shared	Inside or Outside the organisation
Recipient Domain is	List of email domains
Recipient is	Specific email addresses
Sender IP Address	Specific IPv4 address or a range
Sender domain is	List of email domains
Sender is	Specific email addresses
Attachment file extension is	List of file extensions
Attachment is password protected	Detects email attachments with password protection
Attachment’s content could not be scanned Attachment’s content didn’t complete scanning	Detects emails with attachments that can’t be scanned
Header matches pattern	Regex match on portion of email header
Subject matches pattern	Regex match on email subject
Recipient address contains words Recipient address matches pattern	List of words in recipient address or match to regex pattern
Sender address matches words Sender address matches pattern	List of words in sender address or match to regex pattern
Content Contains	Sensitive Info Types and/or Trainable Classifiers

The additional conditions are only available for Exchange when creating a per-location rule.

Simulation mode is a mandatory step in creating an auto-labelling policy. Content discovery takes place and the administrator can review matched items to ensure rules are correct. The policy can then be turned-on for real.

Sensitive Information Types

Sensitive information Types are search patterns for named data types. They are used to automatically classify content that matches the pattern, for example:

US / UK Passport number
SWIFT code
Japan Social Security Number
Credit Card Number
Azure AD client secret

Sensitive Info Types can be used in Client-side and Service-side auto-labelling.

There is a long list of built-in Sensitive Information Types (SIT) provided by Microsoft. You can also create your own in the <em>Data Classification</em> blade of the Purview Compliance Center. Custom SITs are one of the following:

Pattern-based SIT

When you create a pattern-based Sensitive Info Type, you can specify a Primary element and Supporting elements. The elements consist of the following options:

Regular Expression
Keyword list
Keyword dictionary (longer list of keywords)
Existing Sensitive Info type

The Secondary Element can be a list of the above, grouped by Any, All or None. The Primary Element and Supporting Elements can be anywhere in the document or near each other within a specified number of characters.

Fingerprint-based SIT

A Fingerprint-based Sensitive Info Type is created by uploading an example document. It works best with Forms and Templates.

Trainable Classifiers

This method of automatic-labelling is based on machine learning. A Classifier is trained to recognise a document through examples. The classifier must first be fed a selection of documents that are in-scope and out-of-scope of the required label. The administrator must confirm or reject the automatic classification to train the learning model. It can then be applied to a bulk repository such as a document library.

Microsoft provides many “ready-to-use” classifiers, including ones to detect profanity, threats and discrimination and even resumes (CVs). Creating a custom Trainable Classifier requires at least 50 sample documents. It can take up-to 2 weeks to scan your environment using these classifiers.

Trainable Classifiers can be used in Client-side and Service-side auto-labelling. They can also be used to apply Retention Labels.

Trainable Classifiers

Exchange Mail Flow Rules

Encryption can be applied using Labels in Exchange Mail Flow rules. This option may be an attractive alternative if you don’t have an E5 license that supports Automatic Label Policies. The Label and associated encryption is applied to messages in transit (not messages at rest in mailboxes).

Create a Rule as follows…
In the Exchange Admin Portal, select Mail Flow > Rules > Add a rule

In the drop-down list select the option to Apply Office 365 Message Encryption and rights protection to messages

Mail Flow Rule

Complete the Rule Conditions and select Rights protect message with [SELECTED LABEL]

SharePoint Default Sensitivity Label

You can set a default Sensitivity Label for SharePoint content and it will apply to documents uploaded or re-saved in a Library. The setting can be applied at the Site Level (Settings > Site Information) or the Library Level (Settings > Library Settings). It requires Site Admin permissions.

SPO Default

A manually-applied label will always win over a SPO Library default. The Library setting can override an auto-labelling policy or label policy default setting, if the Library setting is higher priority.

A SharePoint default Label only applies to new document uploads. Existing files in the Library only receive the Label when they are re-saved.

Labels with the following settings can’t be used as a SharePoint default label:

A Label with encryption set to Let users assign permissions when they apply the label
A Label set to In Word, PowerPoint and Excel, prompt users to specify permissions

If a user manually removes encryption from a labelled document in a SharePoint Library, the encryption will be restored the next time it is accessed or downloaded.

The automatic Labelling process does not affect the Last Modified date of files.

NOTE that an SPO Default Label is different to Label Policies that target SharePoint. Label Policies apply container settings such as external access.

SharePoint Sensitivity column

Don’t forget to update the All Documents view to show a column for Sensitivity

In a Library > Add column > Show or hide columns > Select Sensitivity > Apply Then click the All Documents drop-down and select Save view as, leave the default “All Documents” and click Save

The Library will then always show the Sensitivity Label alongside the File Name and Modified Date.

Protecting Teams Meetings and Chat

A Label can be applied to Teams meetings and Chat if the following are true:

Organization has a Teams Premium license (included in E5 but not E3)
The Label is scoped to both Files and Emails
The meeting owner is using M365 Apps for Enterprise or OWA on a Desktop computer (not supported on mobile apps)

When encryption is turned-on, the following controls can be configured by the Label:

Who can bypass the lobby
Who can present, record
Automatic recording
Prevent copy of meeting chat
Add a watermark during screen sharing and camera streams

What is Azure Information Protection?

Azure Information Protection (AIP) extends what is available in Purview Information Protection and also provides some of the services used by Purview Information Protection. In its original form AIP used separate Sensitivity Labels managed in the Azure Portal. Since 2019, AIP has been updated with a Unified Labelling Client that uses Purview Sensitivity Labels.

The following are some reasons to extend the Data Classification Framework with AIP:

Bulk labelling and protection of on-prem file shares and SharePoint libraries
Bulk decryption for data recovery
Apply labels directly from File Explorer or PowerShell
Supports additional file types for classification and protection
Supports Office 2003-2007 file formats
Logging to the Windows event log

To use AIP you will need to deploy the unified labelling client to Windows computers. You may see references in documentation to the AIP Classic Client. This is the older version of AIP that is now deprecated.

If the AIP client is installed, the built-in labelling interfaces in Office are disabled (e.g. the Ribbon Sensitivity button) and an Office add-in displays an information bar instead. It’s a supported scenario to use AIP for some features while still using the built-in labelling for Office. To achieve this, just disable the AIP Office add-ins (MSIP.WordAddin, MSIP.ExcelAddin, MSIP.PowerPointAddin, MSIP.OutlookAddin) using a Group policy / CSP setting.

AIP Unified Labelling Scanner?

The Scanner works by running jobs that crawl the specified data stores to label and protect the documents. This can be a one-off exercise to ensure wide coverage (then relying on native labelling options), or it could be a repeat scheduled process.

The Scanner runs as a service on a Windows Server (2016 or later for long path support) and also requires a SQL server backend. It requires an application registration in Azure AD, that provides a token for the service to authenticate with AIP. The scan jobs are configured in the AIP blade of Azure rather than on the Server.

The process of scanning large data stores can be time consuming, so the Scanner supports inclusion and exclusion by file type (file extension) to pre-filter the file list. It can also be clustered with multiple servers working in parallel. It uses the same built-in iFilters used by Windows Search to access document content and look for matches to automatic label patterns.

The Scanner can be run first in Discovery Mode to create a report of labels and protection that would be applied through automatic classification rules.

AIP Supported file types for scanner inspection

 .doc, docx, .docm, .dot, .dotx, .xls, .xlt, .xlsx, .xlsm, .xlsb, .ppt, .pps, .pot, .pptx, .pdf, .txt, .xml, .csv

This article was originally posted on Write-Verbose.com

Automatic Labelling#

Client-side Auto-labelling#

Service-side auto-labelling#

Sensitive Information Types#

Pattern-based SIT#

Fingerprint-based SIT#

Trainable Classifiers#

Exchange Mail Flow Rules#

SharePoint Default Sensitivity Label#

SharePoint Sensitivity column#

Protecting Teams Meetings and Chat#

What is Azure Information Protection?#

AIP Unified Labelling Scanner?#

AIP Supported file types for scanner inspection#

Automatic Labelling

Client-side Auto-labelling

Service-side auto-labelling

Sensitive Information Types

Pattern-based SIT

Fingerprint-based SIT

Trainable Classifiers

Exchange Mail Flow Rules

SharePoint Default Sensitivity Label

SharePoint Sensitivity column

Protecting Teams Meetings and Chat

What is Azure Information Protection?

AIP Unified Labelling Scanner?

AIP Supported file types for scanner inspection