ATLASSIAN CLOUD
Atlassian Cloud architecture and operational practices
Learn more about the Atlassian Cloud architecture and the operational practices we use
Introduction
Atlassian cloud products and data are hosted on industry-leading cloud provider Amazon Web Services (AWS). Our products run on a platform as a service (PaaS) environment that is split into two main sets of infrastructure that we refer to as Micros and non-Micros. Jira, Confluence, Statuspage, Access, and Bitbucket run on the Micros platform, while Opsgenie and Trello run on the non-Micros platform.
Cloud infrastructure
Atlassian Cloud hosting architecture
We use Amazon Web Services (AWS) as a cloud service provider and its highly available data center facilities in multiple regions worldwide. Each AWS region is a separate geographical location with multiple, isolated, and physically-separated groups of data centers known as Availability Zones (AZs).
We leverage AWS’ compute, storage, network, and data services to build our products and platform components, which enables us to utilize redundancy capabilities offered by AWS, such as availability zones and regions.
Availability zones
Each Availability zone is designed to be isolated from failures in the other zones and to provide inexpensive, low-latency network connectivity to other AZs in the same region. This multi-zone high availability is the first line of defense for geographic and environmental risks and means that services running in multi-AZ deployments should be able to withstand AZ failure.
Jira and Confluence use the multi-AZ deployment mode for Amazon RDS (Amazon Relational Database Service). In a multi-AZ deployment, Amazon RDS provisions and maintains a synchronous standby replica in a different AZ within the same region to provide redundancy and failover capability. The AZ failover is automated and typically takes 60-120 seconds, so that database operations can resume as quickly as possible without administrative intervention. Opsgenie, Statuspage, Trello, and Jira Align use similar deployment strategies, with small variances in replica timing and failover timing.
Data location
Jira and Confluence data is located in the region closest to where the majority of your users are located upon sign-up. However, we know that some of you may require that your data stay in a particular location, so we do offer data residency. Currently, we offer data residency in the US and EU regions, as well as Australia, with plans to add additional regions. For information and to sign-up for updates, see our data residency page.
Data for Bitbucket is located in the US-East region and we leverage US-West for disaster recovery.
Data backups
We operate a comprehensive backup program at Atlassian. This includes our internal systems, where our backup measures are designed in line with system recovery requirements. With respect to our cloud products, and specifically referring to you and your application data, we also have extensive backup measures in place. We use the snapshot feature of Amazon RDS (Relational database service) to create automated daily backups of each RDS instance.
Amazon RDS snapshots are retained for 30 days with support for point-in time recovery and are encrypted using AES-256 encryption. Backup data is not stored offsite but is replicated to multiple data centers within a particular AWS region. We also perform quarterly testing of our backups.
For Bitbucket, data is replicated to a different AWS region and independent backups are taken daily within each region.
We don’t use these backups to revert customer-initiated destructive changes, such as fields overwritten using scripts, or deleted issues, projects, or sites. To avoid data loss, we recommend making regular backups. Learn more about creating backups in the support documentation for your product.
Data center security
AWS maintains multiple certifications for the protection of their data centers. These certifications address physical and environmental security, system availability, network and IP backbone access, customer provisioning and problem management. Access to the data centers is limited to authorized personnel only, as verified by biometric identity verification measures. Physical security measures include: on-premises security guards, closed circuit video monitoring, man traps, and additional intrusion protection measures.
Bitbucket uses NetApp CVS for the file storage of repositories. CVS is a file system managed by NetApp and hosted from a NetApp data center. NetApp follows the requirements of global data security laws that require reasonable security measures for storing, transmitting, and processing data. Additionally, NetApp leverages both self-assessments and third-party auditors to ensures that compliance requirements are met. For more information, see NetApps security practices and compliance certifications.
Cloud platform architecture
Distributed services architecture
With this AWS architecture, we host a number of platform and product services that are used across our solutions. This includes platform capabilities that are shared and consumed across multiple Atlassian products, such as Media, Identity, and Commerce, experiences such as our Editor, and product-specific capabilities, like Jira Issue service and Confluence Analytics.
Figure 1
Atlassian developers provision these services through an internally developed platform-as-a-service (PaaS), called Micros, which automatically orchestrates the deployment of shared services, infrastructure, data stores, and their management capabilities, including security and compliance control requirements (see figure 1 above). Typically, an Atlassian product consists of multiple "containerized" services that are deployed on AWS using Micros. Atlassian products use core platform capabilities (see figure 2 below) that range from request routing to binary object stores, authentication/authorization, transactional user-generated content (UGC) and entity relationships stores, data lakes, common logging, request tracing, observability, and analytical services. These micro-services are built using approved technical stacks standardized at the platform level:
Figure 2
Multi-tenant architecture
On top of our cloud infrastructure, we built and operate a multi-tenant micro-service architecture along with a shared platform that supports our products. In a multi-tenant architecture, a single service serves multiple customers, including databases and compute instances required to run our cloud products. Each shard (essentially a container – see figure 3 below) contains the data for multiple tenants, but each tenant's data is isolated and inaccessible to other tenants. It is important to note that we do not offer a single tenant architecture.
Figure 3
Our microservices are built with least privilege in mind and designed to minimize the scope of any zero-day exploitation and to reduce the likelihood of lateral movement within our cloud environment. Each microservice has its own data storage that can only be accessed with the authentication protocol for that specific service, which means that no other service has read or write access to that API.
We’ve focused on isolating microservices and data, rather than providing dedicated per-tenant infrastructure because it narrows the access to a single system’s narrow purview of data across many customers. Because the logic has been decoupled and data authentication and authorization occurs at the application layer, this acts as an additional security check as requests are sent to these services. Thus, if a microservice is compromised, it will only result in limited access to the data a particular service requires.
Tenant provisioning and lifecycle
When a new customer is provisioned, a series of events trigger the orchestration of distributed services and provisioning of data stores. These events can be generally mapped to one of seven steps in the lifecycle:
1. Commerce systems are immediately updated with the latest metadata and access control information for that customer, and then a provisioning orchestration system aligns the "state of the provisioned resources" with the license state through a series of tenant and product events.
Tenant events
These events affect the tenant as a whole and can either be:
- Creation: a tenant is created and used for brand new sites
- Destruction: an entire tenant is deleted
Product events
- Activation: after the activation of licensed products or third-party apps
- Deactivation: after the de-activation of certain products or apps
- Suspension: after the suspension of a given existing product, thus disabling access to a given site that they own
- Un-suspension: after the un-suspension of a given existing product, thus enabling access to a site that they own
- License update: contains information regarding the number of license seats for a given product as well as its status (active/inactive)
2. Creation of the customer site and activation of the correct set of products for the customer. The concept of a site is the container of multiple products licensed to a particular customer. (e.g. Confluence and Jira Software for <site-name>.atlassian.net).
Figure 4
3. Provisioning of products within the customer site in the designated region.
When a product is provisioned it will have the majority of its content hosted close to where users are accessing it. To optimize product performance, we don't limit data movement when it's hosted globally and we may move data between regions as needed.
For some of our products, we also offer data residency. Data residency allows customers to choose whether product data is globally distributed or held in place in one of our defined geographic locations.
4. Creation and storage of the customer site and product(s) core metadata and configuration.
5. Creation and storage of the site and product(s) identity data, such as users, groups, permissions, etc.
6. Provisioning of product databases within a site, e.g. Jira family of products, Confluence, Compass, Atlas.
7. Provisioning of the product(s) licensed apps.
Figure 5
Figure 5 above demonstrates how a customer's site is deployed across our distributed architecture, not just in a single database or store. This includes multiple physical and logical locations that store meta-data, configuration data, product data, platform data and other related site info.
Tenant separation
While our customers share a common cloud-based infrastructure when using our cloud products, we have measures in place to ensure they are logically separated so that the actions of one customer cannot compromise the data or service of other customers.
Atlassian’s approach to achieving this varies across our applications. In the case of Jira and Confluence Cloud, we use a concept we refer to as the “tenant context” to achieve logical isolation of our customers. This is implemented both in the application code, and managed by something we have built called the tenant context service (TCS). This concept ensures that:
- Each customer’s data is kept logically segregated from other tenants when at-rest
- Any requests that are processed by Jira or Confluence have a tenant-specific view so other tenants are not impacted
In broad terms, the TCS works by storing a context for individual customer tenants. The context for each tenant is associated with a unique ID stored centrally by the TCS, and includes a range of metadata associated with that tenant, such as which databases the tenant is in, what licenses the tenant has, what features they can access, and a range of other configuration information. When a customer accesses Jira or Confluence cloud, the TCS uses the tenant ID to collate that metadata, which is then linked with any operations the tenant undertakes in the application throughout their session.
Atlassian edges
Your data is also safeguarded through something that we call an edge - virtual walls that we build around our software. When a request comes in, it is sent to the nearest edge. Through a series of validations, the request is either allowed or denied.
- They land on the Atlassian edge closest to the user. The edge will verify the user’s session and identity through your identity system.
- The edge determines where your product data is located, based on data in the TCS information.
- The edge forwards the request to the target region, where it lands on a compute node.
- The node uses the tenant configuration system to determine information, such as the license and database location, and calls out to various other data stores and services (e.g. the Media platform that hosts images and attachments) to retrieve the information required to service the request.
- The original user request with information assembled from its previous calls to other services.
Security controls
Because our cloud products leverage a multi-tenant architecture, we can layer additional security controls into the decoupled application logic. A per-tenant monolithic application wouldn’t typically introduce further authorization checks or rate limiting, for example, on a high volume of queries or exports. The impact of a single zero-day is dramatically reduced as the scope of services are narrowed.
In addition, we’ve built additional preventative controls into our products that are fully hosted on our Atlassian platform. The primary preventative controls include:
- Service authentication and authorization
- Tenant context service
- Key management
- Data encryption
Service authentication and authorization
Our platform uses a least privilege model for accessing data. This means that all data is restricted to only the service responsible for saving, processing, or retrieving it. For example, the media services, which allows you to have a consistent file upload and download experience across our cloud products, have dedicated storage provisioned that no other services at Atlassian can access. Any service that requires access to the media content needs to interact with the media services API. As a result, strong authentication and authorization at the service layer also enforces strong separation of duties and least privilege access to data.
We use JSON web tokens (JWTs) to ensure signing authority outside of the application, so our identity systems and tenant context are the source of truth. Tokens can’t be used for anything other than what they are authorized for. When you or someone on your team makes a call to a microservice or shard, the tokens are passed to your identity system and validated against it. This process ensures that the token is current and signed before sharing the appropriate data. When combined with the authorization and authentication required to access these microservices, if a service is compromised, it’s limited in scope.
However, we know that sometimes identity systems can be compromised. To mitigate this risk, we use two mechanisms. First, TCS and the identity proxies are highly replicated. We have a TCS sidecar for almost every microservice and we use proxy sidecars that offshoot to the identify authority, so there are thousands of these services running at all times. If there is anomalous behavior in one or more, we can pick up on that quickly and remediate the issue.
In addition, we don’t wait for someone to find a vulnerability in our products or platform. We’re actively identifying these scenarios so there is minimal impact to you and we run a number of security programs to identify, detect, and respond to security threats.
Tenant context service
We ensure that requests to any microservices contain metadata about the customer - or tenant - that is requesting access. This is called the tenant context service. It’s populated directly from our provisioning systems. When a request is started, the context is read and internalized in the running service code, which is used to authorize the user. All service access, and thus data access, in Jira and Confluence require this tenant context or the request will be rejected.
Service authentication and authorization is applied through Atlassian service authentication protocol (ASAP). An explicit allowlist determines which services may communicate, and authorization details specify which commands and paths are available. This limits potential lateral movement of a compromised service.
Service authentication and authorization, as well as egress, are controlled by a set of dedicated proxies. This removes the ability for application code vulnerabilities to impact these controls. Remote code execution would require compromising the underlying host and bypassing the Docker container boundaries - not just the ability to modify application logic. Rather, our host level intrusion detection flags discrepancies.
These proxies constrain egress behavior based on the service’s intended behavior. Services that do not need to emit webhooks or communicate to other microservices that are prohibited from doing so.
Data encryption
Customer data in our Atlassian cloud products is encrypted in transit over public networks using TLS 1.2+ with perfect forward secrecy (PFS) to protect it from unauthorized disclosure or modification. Our implementation of TLS enforces the use of strong ciphers and key-lengths where supported by the browser.
Data drives on servers holding customer data and attachments in Jira Software Cloud, Jira Service Management Cloud, Jira Work Management, Bitbucket Cloud, Confluence Cloud, Statuspage, Opsgenie, and Trello use full disk, industry-standard AES-256 encryption at rest.
PII transmitted using a data-transmission network are subject to appropriate controls designed to ensure that data reaches its intended destination. Atlassian's internal Cryptography & Encryption Policy sets out the general principles for Atlassian's implementation of encryption & cryptography mechanisms to mitigate the risks involved in storing PII and transmitting it over networks. The type of encryption algorithm used to encrypt PII must take into account the classification level of the PII in accordance with Atlassian's internal Data Security & Information Lifecycle Management. To learn more about how we collect, share, and use customer data, refer to our privacy page.
To keep up to date on additional data encryption capabilities, see our cloud roadmap.
Key management
At Atlassian, we use the AWS Key Management Service (KMS) for key management. To further ensure the privacy of your data, KMS is the originator and secret store for these keys. The encryption, decryption, and key management process is inspected and verified internally by AWS on a regular basis as part of their existing internal validation processes. An owner is assigned for each key and is responsible for ensuring the appropriate level of security controls is enforced on keys. Atlassian-managed keys are rotated upon relevant changes of roles or employment status.
We also leverage envelope encryption. AWS has the master key that we can never see and any key encryption or decryption requests requires the right AWS roles and permissions. And when we use envelope encryption to build or generate keys for individual customers, we have different data keys for different types of data through our data stores. Additionally, we have an encryption approach to the internal application layer that provides backup data keys in other AWS regions. Keys are automatically rotated annually and the same data key isn’t used for more than 100,000 data elements.
Soon, we will offer bring your own key (BYOK) encryption, giving you the ability to encrypt your cloud product data with self-managed keys in AWS KMS. With BYOK, you will have complete control over the management of your keys and will be able to grant or revoke access at any time, both for your own end-users and Atlassian systems.
AWS KMS can be integrated with AWS CloudTrail in your AWS account in order to provide you with logs of all key usage. This solution enables encryption of your data at different layers throughout the applications, such as databases, file storage, as well as our internal caches and event queuing. Through the whole process, there will be no product usability impact.