Over the past few weeks, Microsoft, and more specifically the Office 365 Network team have seen a large influx of questions from customers around how best to optimize their Office 365 connectivity as they work diligently to plan for a large amount of their userbase suddenly working from home. We’ve also seen similar queries from customers looking for best practice whilst rapidly enabling their Office 365 benefits, Free Teams plans or free 6 month E1 trial recently announced to rapidly roll out Teams to allow their business to continue to function and allow users to collaborate effectively without being in the Office.
The recent COVID-19/Coronavirus outbreak has caused many customers to rapidly enable, or proactively plan for the bulk of their employees working from home. This sudden switch of connectivity model for the majority of users typically has a significant impact on the corporate network infrastructure which may have been scaled and designed before any major cloud service was rolled out and in some cases, not designed for a situation when it is required simultaneously by all users.
Network elements such as VPN concentrators, central network egress equipment such as proxies, DLP etc, central internet bandwidth, backhaul MPLS circuits, NAT capability and so on are suddenly put under enormous strain due to the load of the entire business using them, with the end result being poor performance and productivity coupled with a poor user experience for those users forced to adapt to working from home.
A simple diagram of a traditional network model can be seen below, where remote user’s connectivity is forced in and back out of the corporate network to reach critical resources as well as branch offices using MPLS circuits to reach the services offered at head office. It is an incredibly common network model for businesses around the world, but it was designed to be effective for a pre-cloud world.
This model made perfect sense and worked very well when the bulk of applications, data and services resided within the corporate network (the dotted line in the diagram), but as enterprises shift to the cloud, it rapidly becomes a cumbersome environment which doesn’t scale well or provide the organization with any agility to react to situations such as that we face today. Many customers report to Microsoft that they have seen a very rapid shift of network traffic which used to be contained within the corpnet now almost exclusively connecting to some external cloud-based source.
Fortunately, Microsoft has been working closely with customers and the wider industry for many years to provide effective, modern solutions to these problems from within our own services, and also aligned to industry best practice. Solutions that apply very simply and effectively to remote workers as much as they do to branch offices. Microsoft has designed the connectivity requirements for the Office 365 service to work efficiently for remote users whilst still allowing an organization to maintain security and control over their connectivity.
Below we will outline the simple steps an organization can take to drastically reduce the impact Office 365 traffic has on the traditional corporate infrastructure when we have a large percentage of users working remotely all at once. The solution will also have a significant impact on user performance and also provide the benefit of freeing up the corporate resources for elements which still have to rely on it.
Most remote users who are not using a virtualized desktop will use a VPN solution of some sort to route all connectivity back into the corporate environment where it is then routed out to Office 365, often through an on premises security stack which is generally designed for web browsing.
The key to this solution is separating out the critical Office 365 traffic which is both latency sensitive and that which also puts enormous load on the traditional network architecture. We then treat this traffic differently and use the user’s local internet connection to route the connectivity directly to the service. To do this we need to follow a simple set of actions:
1. Identify the endpoints we need to Optimize
Microsoft has already identified these endpoints and marks them very clearly for reference. In the URL/IP list for the service these endpoints are marked as “Optimize”. There are just four URLS which need to be optimized and nineteen IP subnets. In just this small group of endpoints we can account for around 80% of the volume of traffic to the service and it also includes the latency sensitive endpoints such as those for Teams media. Essentially this is the traffic that we need to take special care of and is also the traffic which will put incredible pressure on traditional network paths.
URLs in this category have the following characteristics:
- Are Microsoft owned and managed endpoints hosted on Microsoft infrastructure.
- Have IPs provided
- Low rate of change to URLs/IPs compare to other two categories
- Expected to remain low in number of URLs
- Are High volume and/or latency sensitive
You can also query the REST API Web Service for this information, and a PowerShell example script which does this and outputs the URLs/IPs/Ports for all three endpoint categories can be found using the link above.
Endpoint to Optimize
This is one of the Core URLs Outlook uses to connect to its Exchange Online server and has high volume of bandwidth usage and connection count. Low network latency is required for online features including: Instant search, Other mailbox calendars, Free / busy lookup, manage rules & alerts, Exchange online archive, Emails departing the outbox.
This is use for Outlook Online web access to connect to its Exchange Online server and network latency. Connectivity is particularly required for large file upload and download with SharePoint Online.
This is the primary URL for SharePoint Online and has high volume of bandwidth usage.
This is the primary URL for OneDrive for Business and has high volume of bandwidth and possibly high connection count from the OneDrive for Business Sync tool.
Teams Media IPs (no URL)
UDP 3478, 3479, 3480, and 3481
Relay Discovery allocation and real time traffic (3478), Audio (3479), Video (3480), and Video Screen Sharing (3481). These are the endpoints used for Skype for Business and Microsoft Teams Media traffic (Calls, meetings etc). Most endpoints are provided when the Microsoft Teams client establishes a call (and are contained within the required IPs listed for the service).
UDP is required for optimal media quality.
<tenant> should be replaced with your Office 365 tenant name. For example contoso.onmicrosoft.com would use contoso.sharepoint.com and constoso-my.sharepoint.com
At the time of writing the IP ranges which these endpoints correspond to are as follows. It is strongly advised you use the script referenced previously or the URL/IP page to check for any updates when applying the policy, and do so on a regular basis.
- TCP ports 80/443
- UDP ports 3478, 3479, 3480, 3481
IPV6 endpoints can be ignored if not currently required, i.e. the service will currently operate successfully on IPV4 only (but not the other way round). This will likely change in future but IPV4 only is possible for the time being.
2. Optimize access to these endpoints via the VPN
Now that we have identified these critical endpoints, we need to divert them away from the VPN tunnel and allow them to use the user’s internet connection to connect directly to the service. The vast majority of VPN solutions allow split tunnelling, where identified traffic is not sent down the VPN tunnel to the corporate network but rather sent direct out the user’s local internet connection. The VPN client should be configured so that traffic to the above, Optimize marked URLs/IPs/Ports are routed in this way. This allows the traffic to utilize local Microsoft resources such as Office 365 Service Front Doors such as AFD as one example, which deliver Office 365 services & connectivity points as close to your users as possible. This allows us to deliver extremely high performance levels to users wherever they are in the world. There is also Microsoft’s world class global network which is very likely within a small number of milliseconds of your users direct egress, and is designed to take your traffic securely to Microsoft resources wherever they may be in the world, as efficiently as possible.
The solution would look something like that below.
Sounds simple? It is in most cases, but for an enterprise, this shift in connectivity invariably raises questions about security. In the traditional network approach security is often applied inline to network traffic as it egresses to the internet. Proxies and firewalls perform inspection on the traffic to check for data exfiltration, viruses and so on. By bypassing this we are removing this layer of protection we have come to rely on when connecting to the internet. The good news is, for the highlighted endpoints above, Microsoft has numerous features in place which means your security with the modern approach may well be higher than available previously. We will run through some of the common solutions below, not all will be relevant or necessary to all customers, but we will cover the majority of common concerns that come up when implementing modern network connectivity.
3. Common questions when implementing local breakout and split tunnelling for Office 365
It should be noted that the two steps above are all that is necessary to solve the performance/scalability issues if you need to move very quickly given the current situation. The elements below can be added as needed and as time allows or you may have them in place already.
Q1. How do I stop users accessing other tenants I do not trust where they could exfiltrate data?
A: The answer is a feature called tenant restrictions. Authentication traffic is not high volume nor especially latency sensitive so can be sent through the VPN solution to the on-premises proxy where the feature is applied. An allow list of trusted tenants is maintained here and if the client attempts to obtain a token to a tenant which is not trusted, the proxy simply denies the request. If the tenant is trusted, then a token is accessible if the user has the right credentials and rights.
So even though a user can make a TCP/UDP connection to the Optimize marked endpoints above, without a valid token to access the tenant in question, they simply cannot login and access/move any data.
Q2. Does this model allow access to consumer services such as personal OneDrive accounts?
A: No, it does not, the Office 365 endpoints are not the same as the consumer services (Onedrive.live.com as an example) so the split tunnel will not allow a user to directly access consumer services. Traffic to consumer endpoints will continue to use the VPN tunnel and existing policies will continue to apply.
Q3. How do I apply DLP and protect my sensitive data when the traffic no longer flows through my on-premises solution?
A: If required, endpoints can be protected with Office DLP if required and it’s much more efficient to provide this feature in the service itself rather than try and do it in line at the network edge. Azure Information protection can also be used to provide a high level of information protection if required.
Q4. How do I evaluate and maintain control of the user’s authentication when they are connecting directly?
A: In addition to the tenant restrictions feature noted in Q1, conditional access policies can be applied to dynamically assess the risk of an authentication request and react appropriately. Microsoft recommends the Zero Trust model is implemented over time and we can use Azure AD conditional access policies to maintain control in a mobile & cloud first world. Conditional access policies can be used to make a real-time decision on whether an authentication request is successful based on numerous factors such as:
- Device, is the device known/trusted/Domain joined?
- IP – is the authentication request coming from a known corporate IP address? Or from a country we do not trust?
- Application – Is the user authorized to use this application?
We can then trigger policy such as approve, trigger MFA or block authentication based on these policies.
Q5. How do I protect against viruses and malware?
A: Again, Office 365 provides protection for the Optimize marked endpoints in various layers in the service itself, outlined in this document. As noted, it is vastly more efficient to provide these security elements in the service itself rather than try and do it in line with devices which may not fully understand the protocols/traffic.
Q6. Can I send more than just the Optimize traffic direct?
A. Priority should be given to the Optimize marked endpoints as these will give maximum benefit for a low level of work. However, if you wish, the Allow marked endpoints are required for the service to work and have IPs provided for the endpoints which can be used if required.
There are also various vendors who offer cloud based proxy/security solutions called secure web gateways which provide central security, control and corporate policy application for general web browsing. These solutions can work well in a cloud first world, if highly available, performant, and provisioned close to your users by allowing secure internet access to be delivered from a cloud based location close to the user. This removes the need for a hairpin through the VPN/corporate network for general browsing traffic, whilst still allowing central security control.
Even with these solutions in place however, Microsoft still strongly recommends the Optimize marked Office 365 traffic is sent direct to the service.
Q7. Why is port 80 required? Is traffic sent in the clear?
A. Port 80 is only used for things like redirect to a port 443 session, no customer data is sent or is accessible over port 80. This article outlines encryption for data in transit, and at rest for Office 365 and this article outlines how we use SRTP to protect Teams media traffic.
Q8. Does this advice apply to users in China using a worldwide instance of Office 365?
A. No it does not. The one caveat to the above advice is users in the PRC who are connecting to a worldwide instance of Office 365. Due to the common occurrence of cross border network congestion in the region, direct internet egress performance can be variable. Most customers in the region operate using a VPN to bring the traffic into the corporate network and utilize their authorized MPLS circuit or similar to egress outside the country via an optimized path. This is outlined further in this article https://docs.microsoft.com/en-us/office365/enterprise/office-365-networking-china
Finally, please ask any questions you may have in the comments section below and we will do our best to answer as quickly as possible.
4. Further reading
General best practice for Office 365 connectivity:
Recorded Ignite sessions
Office 365 Partner Program
Network Connectivity performance testing
This tool runs some tests against Office 365 endpoints including the Optimize marked ones and give you some clear feedback around how connectivity looks for those endpoints and anything you can do to improve the connectivity.
This tool is one mechanism you can use to monitor user’s Office 365 network traffic volumes to get a clear figure for bandwidth requirements for the wider business.