Onboarding is pretty straightforward and quick..
flowchart TD;
Start["Start"] --> Slack["Say Hi in DfE Slack channel #teacher-services-infra"];
Slack --> GetCoding["Start generating the code using our template"];
Slack --> FillForm["Read this page.. /documentation/onboard-form-template.md <br> and make sure you have all the info"];
FillForm --> OnboardCall["Arrange a quick intro call via DfE Slack channel #teacher-services-infra <br> to get things explained and meet the team"];
OnboardCall --> SetUpCall["We will help you request access, and set you up to self serve"];
SetUpCall --> SetUpComplete["Secure access to platform complete"];
SetUpComplete --> ProdReady["Complete the steps in /documentation/production-checklist"]
GetCoding --> SetUpComplete;
ProdReady --> Done["Start deploying"] ;
Before starting, it is important to capture the information required upfront using the Onboarding form even if it is subject to change in the future. Also, its really important to check the Production Checklist. Your code should then be ready to roll.
Most services use the same code to deploy to AKS. It has been made into a template that will evolve over time to capture all the best practices from working in multiple services. It is used both to dramatically reduce the time required to onboard a new service, and be a point of reference to align standards across repositories.
Run the make new_service command with the required environment variables. Example:
make new_service SERVICE_NAME=calculate-teacher-pay SERVICE_SHORT=ctp SERVICE_PRETTY="Calculate teacher pay" DOCKER_REPOSITORY=ghcr.io/dfe-digital/teacher-pay-calculator NAMESPACE_PREFIX=srtl DNS_ZONE_NAME=calculate-teacher-pay.education.gov.uk
This can be built iteratively since the script will stop and show if a variable is missing. Example:
% make new_service
bash templates/new_service.sh
templates/new_service.sh: line 26: SERVICE_NAME: unbound variable
make: *** [new_service] Error 1
Make sure to copy all the files, including invisible files. Example:
cp -r teacher-services-cloud/new_service/. teacher-pay-calculator
The code covers most common use cases, but it may be necessary to amend it. Examples:
- By default the code deploys a postgres database, but the service may not need it
- The only environment configurations are development and production. The service may need more or use different names.
- The web application uses
/healthcheckas health probe. It can be changed to another path or disabled by passingnull.
These steps must be done by the infra team.
Raise a PIM request to either:
s189-teacher-services-cloud-testfor non production environmentss189-teacher-services-cloud-productionfor production environments. This must be approved by a third party.
Then login to Azure using az login.
This creates the minimum Azure resources required to run terraform, ie storage account and keyvaults.
- Validate:
make <environment config> validate-arm-resources. Example:make development validate-arm-resources - Deploy:
make <environment config> deploy-arm-resources. Example:make development deploy-arm-resources
Amend the AD group of the area:
- Add the namespaces and resource groups to the AD groups spreadsheet. For instance if the service is in BAT, edit the BAT groups (delivery team and production PIM).
- Raise CIP requests to amend the 2 groups
- The developers should now have access to continue with the set-up
In the service repository, runs the Makefile commands.
If Statuscake is not required at this stage, comment out resources in terraform/application/statuscake.tf and the provider in terraform/application/terraform.tf.
If it is:
- Request a user account and an API key
- Create a secret "STATUSCAKE-API-TOKEN" in the "inf" keyvault, with the API key as value
Deploy the application, ingress, database...
- Plan:
make <environment config> terraform-plan. Example:make development terraform-plan - Apply:
make <environment config> terraform-apply. Example:make development terraform-apply
The new application uses a default domain in test.teacherservices.cloud in test and teacherservices.cloud in production. Usually a custom domain is required to present a familiar domain to end users, ending in either education.gov.uk or service.gov.uk.
The code deploys integrated DNS zone and Azure front door in the production subscription.
Before proceeding, ensure the following:
- domain.sh exists in global config and there's a make command for it
- The files present in terraform/domains/infrastructure match the sample structure in templates/new_service/terraform/domains/infrastructure, with the appropriate configuration
- The files present in terraform/domains/environment_domains match the sample structure in templates/new_service/terraform/domains/environment_domains, with the appropriate configuration
- Validate:
make domains validate-arm-resources - Deploy:
make domains deploy-arm-resources
This deploys one DNS zone and one front door which will be used subsequently by all the domains.
- Plan:
make domains-infra-plan - Apply:
make domains-infra-apply
Take note of the name servers of the newly created DNS zone and request delegation from either:
- Infrastructure and networks for the education.gov.uk zone
- GDS for the service.gov.uk zone
This configures the custom domain for a particular environment in the previously create DNS zone and front door.
- Plan:
make <environment config> domains-plan - Apply:
make <environment config> domains-apply
When running the make commands for deployment, terraform uses the Azure credentials provided by az login.
When running in a Github actions workflow, it uses a service principal key. Follow the process in the documentation to create AZURE_CREDENTIALS per environment.
- Use the
s189t01-tsc-contributorservice principal for non production environments to deploy to the test cluster - Use
s189p01-tsc-contributorfor production environments to deploy to the production cluster
Example workflow:
- name: Checkout code
uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.5.1
terraform_wrapper: false
- uses: DFE-Digital/github-actions/set-arm-environment-variables@master
with:
azure-credentials: ${{ secrets.AZURE_CREDENTIALS }}
- name: Apply Terraform
run: make ci ${{ matrix.environment }} terraform-apply
env:
DOCKER_IMAGE_TAG: ${{ needs.build.outputs.docker-image-tag }}Follow the production checklist to make sure the service is ready for end users
It may be that the container fails to start. By default, Kubernetes probes the web container on the /healthcheck endpoint and expects a successful response with a 200 HTTP code. If the application doesn't have this endpoint or returns an error, it will keep trying until it times out.
Solutions:
- Investigate the pod using
kubectl logs - If using rails, it may be that
config.force_ssl = true. Kubernetes uses HTTP, so the endpoint must be excluded. Ref: rails documentation - If your app uses a different endpoint, change it using:
probe_path = "/endpoint" - If your app doesn't have a healthcheck endpoint yet, disable the check temporarily using:
probe_path = null