How i migrated a web service in AWS ECS

27th Jan 2024 devops research

A. Championship & Tournament mobile application's web service, that i developed on Python Django framework between 2020 and 2021, was served on PythonAnywhere, hosting platform dedicated to Python.

During the course of 2023 i received a warning from the platform that was notifying an upcoming update of the virtual machine environment, that exposed how it was hard, with the current system, to replicate the execution environment and handle dependencies and isolation. So I decided to "dockerize" web service for, then, serve it on a production environment compliant to this purpose.

On that time I had just performed the lift-and-shift of my personal website on Kamatera (based on Grav CMS), taking advantage of Docker Compose to configure the environment. However for this project I considered AWS to be more suitable because it would have allowed me to arrange the infrastructure for a possible future "scaling"; moreover the migration would have contributed to the training and the deepening of Amazon services that I was carrying on concurrently.

It's correct to consider this migration as an exercise that together with others contributes to the formation I undertaken with the goal of achieving the AWS Certified Developer - Associate certification.

Guided by my professional sensitivity and principled by "Well Architected Framework" guidelines, it was primary for me to make sure that server infrastructure was as automated, replicable, parametric, reconfigurable and self-descriptive as possible; therefore I designed and implemented the architecture via IaC, describing it in an AWS Cloud Formation stack equipped with several parameters and in line with the aforementioned principles. Furthermore I implemented several bash commands in the userData of the Launch Configuration aimed at keeping necessary setups always consistent, in case of downtime and hosting instance reboot.

Moreover I generated a symmetric key through EC2 Key Paris dedicated to the project to be associated with various resources for authentication.

Follows the list of AWS services and tools I involved in the making of this architecture: EC2, Auto Scaling, EBS, Cloud Formation, Elastic Container Service, Elastic Container Registry, Route 53, IAM, Secret Manager, VPC, AWS CLI, Lambda, Cloud Watch, S3, SDK Boto3 per Python e Simple Notification Service.

Docker porting
Preliminary analysis of services
- Elastic Container Service
- RDS Database
Calculating cost estimate
- EC2 instance
- Summary of costs
EBS persistent volume
Hosted Zone in Route 53
- SSL certificate
Task Execution Role
Configuration of the infrastructure with Cloud Formation
ECS resources
Task Definition
Launch Template
"Generic" Secret
Network infrastructure
Security Group
Budget management
Account security

Docker porting

I developed A. Championship & Tournament web service based on Django framework. Originally developments and tests used to be executed in a Python virtual environment; deployment was then performed on a respective virtual environment. This environment structuring (typical of the Python web applications), if properly handled, allowed a good level of replication of the production environment on the development machine.

Following the needings described in the previous paragraph, I executed a "dockerization" of the environment, by creating a Dockerfile. In order to orchestrate the web service and the database I wrote down a docker-compose.yml file; in this way I'm able to test the image locally and in production without having to worry of managing possible configuration differences between the environments. Moreover the docker-compose file was useful as starting point of the Task Definition of AWS Elastic Container Service.

In addition, through the .dockerignore, I ruled out from the image's build all the unnecessary files stored in the project structure, to prevent an useless oversizing of the image.

Follows the main instructions of the Dockerfile:

FROM python:3.9
ENV DOCKERHOME=/home/app/webapp
RUN mkdir -p DOCKERHOME
WORKDIR $DOCKERHOME
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
RUN pip install --upgrade pip
COPY . $DockerHOME
RUN pip install -r requirements.txt
EXPOSE 8000

Docker Compose file is similar to the Task Definition I have described in a following paragraph: application is served with gunicorn on port 8000 (over which the reverse proxy manages requests), by executing collectstatic and migrate commands and mounting necessary volumes. The container is generated from the image achampionship-webservice:xxxx.xx.xx produced by the build. Through the .env file I configured environment variables of the container. In the same Docker Compose I configured the MySQL container mapping the port and the data persistence volume, so I let Docker Compose generate an internal network so that the two containers were able to communicate.

Lastly I prepared AWS CLI commands to be executed for publishing a new version of the image on AWS Elastic Container Registry:

AWS CLI authentication:

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <id account>.dkr.ecr.us-east-1.amazonaws.com

image tagging with the ARN of the ECR repository:

docker tag "$(docker images -q achampionship-webservice:xxxx.xx.xx)" <id account>.dkr.ecr.us-east-1.amazonaws.com/achampionship-webservice:xxxx.xx.xx

push of the new version on ECR:

docker push <id account>.dkr.ecr.us-east-1.amazonaws.com/achampionship-webservice:xxxx.xx.xx

Preliminary analysis of services

Elastic Container Service

ECS fees are related to the used model: Fargate or EC2. By doing a research on the documentation it came out that Fargate is not suitable for data persistence (I think about static files of the webserver and, of course, the data of database), therefore I focused on finding a cost-effective setup of EC2 through AWS Pricing Calculator.

RDS Database

For the datastore layer the most immediate choice would have been AWS RDS for MySQL, but RDS is a very expensive service due to backups and the other management systems it makes available (https://www.askhandle.com/blog/why-the-rds-is-so-expensive).

Usage time cost is very high: only the instance db.t1.micro charges 0,025USD/hour, that means more than 18,25 USD monthly.

Possibly I would have gotten a lower price by using the Reserved Instance, even if its estimate is not available in the Pricing Calculator. In addition I wouldn't have had superuser privileges, nor the access to the OS on which the database is installed. RDS has been conceived to simplify DBAs workload, but achampionship-webservice's database is very basic and this kind of support is not required.

I decided to store data on an additional volume to be associated with the EC2 instance; this volume shall survive the server infrastructure: it keeps database files and project initialization configurations (such as nginx.conf).

Calculating cost estimate

Considering the nature of the project and the related traffic (circa 2000 users and ten daily active users) I limited the budget to 10/15 USD monthly. Domain expenses (since long ago registered on Aruba) and parallel hosting on PythonAnywhere (that has been up-and-running until the consolidating of the new infrastructure) are excluded.

I leveraged AWS Pricing Calculator to estimate several combinations of costs, based on the US North Virginia Region.

EC2 instance

Initially, concerning memory, I hadn't particular expectations; sticking to Stack Overflow:

Normally for a typical Django Application it would take 60 - 80 MB for a Django app with database connections, for a Django app which only requires a little bit of database connections, only takes up about 18 MB memory. For a more sophisticated Django app which requires queueing up tasks, sending emails, database connections, user logins, etc, it would require about 130 MB.

So I chose the t4g.nano, equipped with 2 vCPU, 0.5 GiB of memory and up to 5B of Network Performance.

I selected the Compute Saving Plan payment options with an advance of 15 USD, that lowered the cost to 1,31 USD / monthly versus 3,8 required by using on-demand instances for 100% of the time.

Afterwards, though, I found some limitations with this kind of instance: the first is the incompatibility with x68_64 architecture, the second is the memory: 0.5GB turned out to be insufficient to run my webservice.

I searched for EC2 instances with the characteristics cited above making use of the following command:

aws ec2 describe-instance-types --filters Name=processor-info.supported-architecture,Values=x86_64 Name=memory-info.size-in-mib,Values=1024

In conclusion the most convenient instance equipped with the needed characteristics is the t3a.micro, where the monthly cost of 6,86 can be cut down to 4,96 USD with 1-year Saving Plans (without any advance).

Summary of costs

I ended the preliminary analysis by deciding to keep both the database and the server in the same instance, since probably I will never need to scale them one-by-one. I will consider to increase the maximum scaling if necessary, or to vertically scale the EC2 instance setup by switching to a more performant typology.

The infrastructure as-it-is-thought is anyway arranged to further possible future scalings, for instance the migration of the database from the EC2 instance to RDS is manageable also through some AWS provided services.

Calculated monthly pricing (after taxes, USD) follows:	Cost	Service
4,96	EC2	*t3a.micro* instance, 1-year option ofSavings Plan (otherwise it would be 6,86)
3,20	EBS	30 GB of float root volume, in addition to 2 GB for the "persistent" volume, 32 GB in total
0,11	ECR	private registry on which web service image is hosted
0,80	SM	for secrets registration
0,50	Route 53	fixed for the hosted zone
0,00	Cloud Watch / SNS	alert messaging through SNS and Cloud Watch. Cloud Watch alarms are free (cloudwatch/pricing) while SNS cost is supposed to never exceed a cent.

Total: 9,57 USD (11,68 USD gross)

EBS persistent volume

Beside the infrastructure, I manually configured, from AWS console, an EBS volume to be dynamically mounted on the EC2 instance, in order to store the container's persistent files. The volume was created separately from the stack to make sure that data was preserved by the application in the case of a rollback of the Cloud-Formation-generated infrastructure.

On this volume I periodically carry out back-ups that I download through FTP and store on physical machines or disks.

The connection of the container and of the EC2 instance to this volume is ensured by bash command in the Launch Configuration's userData (described in the next chapter): they, on instance startup, perform the volume attachment and then mount it in the /mnt/achampionship directory. In this directory, then, I created two subfolders:

mysql, on which MySQL Docker container in turn performs the mount of the persistent volume;
nginx/conf.d, on which nginx setup file is and on which the related container mounts its persistent volume.

Hosted Zone in Route 53

In order to handle DNS I generated in Route 53 a Hosted Zone dedicated to championshiptorunament.com. That domain has been purchased on Aruba in 2018 and I'm not intending to transfer it to AWS so far. Nevertheless it needed to be associated with the IP address of the instance exposing the web service through HTTP and HTTPS protocols, so during this structuring I also migrated name server pointings of Aruba to Route 53. Until the day of the migration, the subdomain "service.championshiptournament" (address requested by the production mobile application) pointed to the CNAME of the old server, however I configured a low TTL in order to make the transfer to the new IP address, on the day of the official migration, as fast as possible.

Despite the migration of the DNS server, the other host references have been keeped, for example the several configurations of the e-mail service of Aruba and the subdomain "www.championshiptournament.com" that points to the server of the web site.

At a certain time I thought of managing the routing of Hosted Zone towards the service through a Load Balancer DNS. However the Load Balancer service is too expensive related to the budget of this project, therefore I managed the Hosted Zone - IP association through the Launch Configuration's userData commands. These commands, at the instance startup, edit the Record Set related to subdomain of the service, making it pointing to the IP address of the EC2 instance that the Auto Scaling Group is starting up, and on which the ECS Cluster Task is putting in run the web service containers.

SSL certificate

Even before the migration, I generated an SSL wildcard certificate, through certbot, related to all the subdomains of championshiptournament.com. To do so, on Route 53, I changed the DNS record _acme-challenge setting it with the code emitted by Let's Encrypt to validate the certificate. This "anticipated" operation saved me from being forced to transfer the certificate (that moreover is not recoverable on PythonAnywhere) on the official migration day, making the migration faster and reducing unavailability risks. Indeed, at the time of migration, the subdomain service.championshiptournament turned into an alias of web.championshiptournament, where, before the official migration, the "new" application had been served. After more than 72 hours, when the DNS propagation worldwide was sure, service.championshiptournament has been routed directly on the server IP address and web.championshiptournament.com has been decommissioned after as many hours.

This mechanism allowed to perform domain migration without having to force users to update the app; reachability continuity of the service through the subdomain service.championshiptorunament has been ensured.

Task Execution Role

Concerning resources' roles, I created an IAM Role to be set up as executionRoleArn in the Task Definition. The documentation explains why is necessary to set up this role:

Your tasks are hosted on either AWS Fargate or Amazon EC2 instances and...

is using private registry authentication. For more information, see Required IAM permissions for private registry authentication.

the task definition is referencing sensitive data using Secrets Manager secrets or AWS Systems Manager Parameter Store parameters. For more information, see Required IAM permissions for Amazon ECS secret.

Once created with the name ECSTaskExecutionRole through console (as pointed in the documentation link above) it turns out the following policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ecs-tasks.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

Configuration of the infrastructure with Cloud Formation

I designed and configured most of the architecture through the drafting, in yml format, of a CloudFormation stack template (Infrastructure as Code tool by AWS) where I defined the setup of the network infrastructure, the ECS resources, the containers, the EC2 instance and the secrets, all of this with the goal of facilitating, centralizing, debugging and describing the whole of resources needed by web service functioning.

The stack I wrote involves a series of parametrizations such as the environment's name, the web service ECR image's ARN, the additional EBS volume's ID, some environment variables, passwords, the IAM roles to be associated and so on. This allow it to be reused in case it would be necessary to set up an analogous infrastructure with different parameterizations. Consequently I took care of maintain the stack constantly updated with any infrastructure change through ChangeSets.

The drafting of the template has been carried out leveraging the AWS CLI validation command: aws cloudformation validate-template:

aws cloudformation validate-template --template-body file://percorso/file

Once defined the .yml template, I uploaded the file into a specific S3 bucket in order to link it to Cloud Formation for the stack creation.

ECS resources

In first instance I configured the ECS Cluster which contains the Service and so the Task executing on the Container Instance. It hosts the Docker containers of the web service, of the ECS Agent and of the MySQL database. To link the Cluster to the Auto Scaling Group I configured a Capacity Provider, activating the Manage Scaling on it and linking it in turn to the group.

So I associated the Auto Scaling Group to the Launch Template (described in a following paragraph) and set up Max Size, Min Size and Desired Capacity to 1 (project needs don't require more capacity).

Lastly I configured the ECS Service too, in order to always ensure at least one execution Task inside the Container. Values are always the minimum ones, Desired Count and weight of the Capacity Provider both set to 1.

Task Definition

Task Definition, which describes the execution of Tasks inside the ECS Cluster, is linked to the Service and describes the three containers that has to be run on the EC2 instance: the nginx reverse proxy, the championship web service and the MySQL database. I linked to it the task execution Role described previously, setting it up compliant to EC2, to the X86_64 CPU architecture and to the Linux operating systems family. I didn't set up any generic hard limit on the CPU usage, preferring to set them up at the level of each single Container Definition; while I did it for the RAM memory usage, by setting it to 922 MB (the instance in use has 1 GB).

At the Task Definition level I then configured following volumes:

a Docker volume locally defined to map static files (for "local" I mean the "floating" root EBS volume); this is shared by web service and nginx;
a mysql-volume "host" volume having the source directory /mnt/achampionship/mysql. The path /mnt/achampionship is the one on which the instance, on startup, mounts the EBS volume;
a nginx-volume "host" volume having the source directory /mnt/achampionship/nginx/conf.d.

Container Definitions: reverse proxy

Sticking to the guidelines I configured a reverse proxy with nginx. Reverse proxy allows to enhance performance, reliability and security, in fact is a bad practice to delegate client requests directly to the web server gunicorn. Furthermore, through nginx, it was possible to serve static files in the Django production environment.

From Stack Overflow (but there is a lot of more literature about):

[...] in production we use gunicorn/uwsgi to boot the django app, so the question can be "Do we need Nginx if we have configured gunicorn/uwsgi for Django?". The answer is YES, because compared with gunicorn/uwsgi, Nginx has following advantages:

security, it can be configured to deny or allow certain ip addresses

load balancing

handle static files

cache

...

nginx configuration is written inside the nginx.conf file (linked to the container through host volume). I wrote the file separately during the persistent EBS volume configuration phase: it contains the necessary configurations to route all the requests on the achampionship-webservice container's port 8000 and to serve static files from /static/ URL. Inside it are also linked the SSL navigation certificates.

Then I configured nginx container by exposing ports 80 and 443 (respectively for HTTP and HTTPS navigation) and by mapping the following volumes:

nginx-volume "host" volume, read-only mapped on /etc/nginx/conf.d and containing the nginx.conf file;
statics Docker volume, read-only mapped on /home/app/webapp/staticfiles, where the web service container copies static files. The directory staticfiles is then referenced by nginx.conf to route the requests to static files.

Container Definitions: web service

achampionship web service's container is essential, with CPU hard limit to 512 and generated by the image achampionship-webservice:xxx.xx.xx, stored on a private registry. For each new version I build the image on my development PC with a new versioning, then I publish it, using AWS CLI and docker pull, on the arranged repository in Elastic Container Registry.

Port 8080 of the container (standard port of the Python Django applications) is mapped on 8000 of the server, in order to be available to nginx for the management of the requests.

Then I mounted the statics local volume on the directory /home/app/webapp/staticfiles. Here, on startup, the application writes the static files.

I configured the container execution command by concatenating the following sh instructions:

python manage.py collectstatic --noinput to collect the new static files of the project to the static folder from which they could be served via reverse proxy;
python manage.py migrate to execute possible database migrations required by updates;
gunicorn achampionship.wsgi:application --bind 0.0.0.0:${APP_PORT} to start the web server with gunicorn listening on port 8000 (defined via environment variable APP_PORT).

Lastly I mapped the function-essential environment variables (such as DB_USER, DB_NAME, DJANGO_DEBUG, APP_PORT, ecc...) and the secret environment variable, which value extracted from Secrets is not directly exposed in the stack parametrization. They are:

DJANGO_SECRET: retained by the "generic" secret which holds several credentials in an encrypted key/value map;
GOOGLE_CLIENT_SECRET: Google services authentication secret for SSO, retained in the "generic" secret too;
EMAIL_ARUBA_PASSWORD: Aruba password for sending emails from the application (retained by the "generic" secret too);
DB_PASSWORD: retained by the secret created outside the stack (it has to be persistent because MySQL container stores it in the EBS volume of the database) of which the value is automatically generated by Secret Manager based on the following CLoud Formation configuration:
```
"ActMySqlPasswordSecret": {  
"Type": "AWS::SecretsManager::Secret",  
"Properties": {
[...]
"GenerateSecretString": {  
  "PasswordLength": 51,  
  "ExcludeCharacters": "\"@/\\"  
},  
"SecretString": String,  
[...]
}  
}
```

Container Definitions: database

The configuration of the Container Definition for the MySQL database was less elaborated: I set CPU limit to 512, mapped the port 3306 and mounted the volume on the data source of MySQL. Clearly I mapped is the "host" volume, the one related to the path /mnt/achampionship/mysql, on which the EBS persistent volume is mounted.

Furthermore, obviously, I mapped necessary environment variables, getting the password value from the secret generated by Secret Manager and described in the previous paragraph.

Launch Template

Il template di lancio è stato un punto chiave della strutturazione del server, soprattutto a causa dei numerosi comandi bash che ho dovuto integrare nel "userdata" per gestire automaticamente configurazioni e comunicazioni dell'istanza con ECS, EBS e Route 53.

I configured the Launch Template to execute EC2 instances on typology t3a.micro (1GB of memory, 2 vCPU, I explained this choice in a previous paragraph) using Amazon Linux AMI compatible with ECS as operating system, because it installs by itself Docker and the ECS Agent without requiring further set ups.

Then I linked the Launch Template to the Security Group, configured on the network infrastructure set up stage, and I assigned an Instance Profile responsible for assuming the "ecsInstanceRole" on which I enabled the following policies:

policies to manage communication with ECS;
getSecretPolicy to extract the secrets required in the environment variables (e.g. passwords) from the Secret Manager
policies to pull images from ECR
policies to associate the instance to the persistent EBS volume (ec2:AttachVolume, ec2:DetachVolume ed ec2:DescribeVolume);
policies to set up the Record Set's IP address of the referring hosted zone (route53:ChangeResourceRecordSets);
policies to read the secret containing the database password (secretsmanager:GetSecretValue), needed by the scheduled backup script.

I wrote bash commands in the userData (in order to execute them only at the first start up). These commands ensure necessary configurations of the interaction between EC2 instance and the related resources without requiring a manual intervention to reset them; therefore they ensure functioning and reachability of the application in the case of a termination and a reboot of new EC2 instances. This kind of phenomena can take place for several causes, such as an hosting machine shutdown or just due to the autoscaling operations. userData are made up of the following steps:

writing a custom file on which the afterwards commands will be logged:
```
#!/bin/bash
exec >>/var/log/userdata-setup.log
```

attachment of the EBS volume dedicated to persistent files:

# Comandi per l'installazione e l'autenticazione in AWS CLI
# [...]
# Attachment del volume
aws ec2 wait volume-available --volume-ids <id volume>
aws ec2 attach-volume --volume-id <id volume> --instance-id $(ec2-metadata --instance-id | cut -d " " -f 2) --device /dev/sdf --region us-east-1

waiting of attachment completion, otherwise the afterward mount command would fail:

aws ec2 wait volume-in-use --volume-ids <id volume> --filters Name=attachment.status,Values=attached Name=attachment.instance-id,Values=$INSTANCE_ID

mount of the volume on the directory responsible for data mapping (/mnt/achampionship):
```
mkdir /mnt/achampionship
mount /dev/nvme1n1 /mnt/achampionship
```

set up of the IP address of the instance as IP address of the Record Set in the Hosted Zone of Route 53; this ensures that domain redirect always routes towards the running EC2 instance:

[...]
hostnamectl set-hostname hostname."${!DOMAIN_NAME}"
[...]
aws route53 change-resource-record-sets --hosted-zone-id "${!HOSTED_ZONE_ID}" --change-batch '{"Changes": [{"Action": "UPSERT","ResourceRecordSet": {"Name": "'"web.championshiptournament.com"'","Type": "A","TTL": 60,"ResourceRecords": [{"Value": "'"${!PUBLIC_IP}"'"}]}}]}'
[...]

configuring with crontab the cron job that daily invokes the Python script that sends application-usage statistics:

touch /home/ec2-user/achampionship-cronjobs.log
(crontab -l 2>/dev/null; echo ""; echo "0 12 * * * echo 'EXECUTED AT' $(date) && sudo docker exec -t ${EnvironmentName} sh -c \"python /home/app/webapp/manage.py shell < /home/app/webapp/webservice/scripts/analisi_iscrizioni.py\" >> /home/ec2-user/achampionship-cronjobs.log") | crontab -

configuring the cron job that daily invokes tthe SH script I wrote to manage database backups (it's set at italian midnight because corresponds with evening or night in most of the world):
```
(crontab -l 2>/dev/null; echo ""; echo "0 0 * * * sh /mnt/achampionship/scripts/db-backup.sh &>> /home/ec2-user/achampionship-cronjobs.log") | crontab -
```
setup of the ECS Cluster on which the instance's ECS Agent should listen to run the containers described by the execution task in the Cluster itself:
```
echo ECS_CLUSTER=${EnvironmentName}-cluster >> /etc/ecs/ecs.config
```

"Generic" Secret

Always through Cloud Formation stack, I created a Secret "generic" (as I define it) on which store several known secret keys including the Google Client Secret (for the authentication SSO of the app) and the Aruba password (for the emails). This was done to save the cost of possible further Secrets taking advantage of the key-value structuring supplied by Secret Manager. Through the specific Cloud Formation syntax, then, I made resources point the values they needed, for example in the Task Definition of the web service:

Secrets:
    [...]
    - Name: GOOGLE_CLIENT_SECRET
      ValueFrom: !Sub  "${ActGeneralSecret}:googleClientSecret::"

It's almost useless to tell that the real value of the secret was not exposed unencrypted inside the template, indeed it was parameterized via the String attribute with the option NoEcho: true, that conceals its characters when inserting the parameter during the stack creation.

Network infrastructure

For managing network infrastructure I thought to isolate all the application related resources inside a VPC built on purpose. It's configured with DNS Hostnames and DNS support enabled, CIDR block set up to 10.0.0.0/27, that is 32 IP addresses (27 actually available). I did it because I glimpsed the necessity to have a larger amount of addresses at disposal, since project size may increase.

Inside the VPC I implemented two subnets:

a public subnet, linked through Route Table to an Internet Gateway, with the first 16 addresses (CIDR 10.0.0.0/28) and on which the Launch Configuration and all the necessary resources are linked;
a private subnet (CIDR 10.0.0.16/28), currently unused, that I arranged anyway for eventual future usages (for example to host the web service into a private network making it available either via NAT Gateway or through a Load Balancer on the public one; or to host other resources that might be used in future).

On the ACL network interfaces, unlike I planned initially, I didn't add any configuration other than the default ones, delegating the ports' and the addresses' protection to the inbound/outbound rules of the Security Group.

I configured the VPC, the subnets, the Internet Gateway, the Route Tables and the Route Table Associations exclusively via Cloud Formation template. Therefore the network infrastructure, even if not that much articulated, is entirely parametrized inside the stack.

Security Group

In the scope of network infrastructure I configured a Security Group responsible for management of the project-related resources traffic. I linked the Security Group to the Launch Configuration to make sure Auto Scaling's instances "inherited" it.

The Security Group is configured to allow any type of traffic towards any outbound destination. Inbound traffic, on the other hand, is ruled in the following way:

port 22 for the SSH access to the instance and to the database, open only to my development device's IP address;
port 8000, for a matter of debugging, open only to my IP address;
port 443 and 80 open to the web traffic (CidrIp: 0.0.0.0/0) in order to make them available to the mobile app on any device.

Budget management

Budget alerts on Cloud Watch services

I sat up some per-service-alarms with Cloud Watch, after having configured notification at my email address with Simple Notification Service:

EC2 Budget ACT with condition: EstimatedCharges >= 9 for 1 data point within 6 hours;
ECR Budget ACT with condition: EstimatedCharges >= 0,15 for 1 data point within 6 hours;
Secret Manager Budget ACT with condition: EstimatedCharges > 1 for 1 data point within 6 hours;
Route 53 Budget ACT with condition: EstimatedCharges >= 0,7 for 1 data point within 6 hours.

Global budget overruns alert with AWS Budget

Afterwards I made use of AWS Budget service, accessing as root, to set up a monthly costs cap based on my estimates described in the summary. Therefore I set up a monthly budget model for A. Championship & Tournament with an estimated amount of 12 USD and supplying my email address as recipient of the notification.

The console tells:

All the AWS services are included in the scope of this budget.

You will receive a notification when 1) the effective expense will hit 85%; 2) the estimated expense will hit 100% 3) the effective expense will hit 100%.

Resources shutdown on critical budget overruns with AWS Lambda

A. Championship & Tournament is a demo training and self-produced project, therefore it doesn't bring significant revenues that may be worth a certain kind of investment, except for possible future surprises. So, in order to safeguard my credit card from unexpected situations, I wrote a Lambda function that shutdowns the EC2 instance of the web service. That function, when invoked, set Min Size and Max Size of the Auto Scaling Group to 0, therefore brings the Desired Capacity to 0 automatically, shutting down the running instance. It's triggered by a further Cloud Watch alarm set slightly higher than the budget threshold (e.g. at 25 USD).

The Lambda function that is triggered by the alarm uses the Python Boto3 SDK to update properties of the Auto Scaling Group:

import boto3

def lambda_handler(event, context):
    client_as = boto3.client('autoscaling')
    response = client_as.update_auto_scaling_group(
        AutoScalingGroupName='achampionship-webservice',
        MinSize=0,
        MaxSize=0,
    )

    print(response)

Naturally I had to equip the function's Execution Role with the policy that allows editing the Auto Scaling Group's attributes (autoscaling:UpdateAutoScalingGroup).

On CloudWatch I created the alarm by setting it to 25 USD of global billing, then I configured an Event Bus in AWS Event Bridge reserved to the application, and then a rule that triggers the Lambda function with the following Event pattern:

{
  "source": ["aws.cloudwatch"],
  "detail-type": ["CloudWatch Alarm State Change"],
  "resources": ["arn:aws:cloudwatch:us-east-1:492450522567:alarm:critical-budget"]
}

Account security

I created a user responsible for the web service management that I associated with a new User Group to which are assigned the policies that permit access only to the project related resources.

In fact, on each resource of the infrastructure, I added the tag "Project" valuing it with the name of the project. This configuration has been managed, through parameter, directly in the Cloud Formation stack. To this new User group "achampionship-webservice" I assigned the policies to restrict operations on the resources so tagged.

Based on this documentation, I created a policy (achampionship-webservice-access) that restricts access to resources "by tag":

{
    "Version": "2012-10-17", 
    "Statement": [ 
    {  
        "Effect": "Allow", 
        "Action": ["ec2:*"],
        "Resource": "arn:aws:ec2:us-east-1:492450522567:instance/*", 
        "Condition": {  
            "StringEquals": {"aws:ResourceTag/Project": "achampionship-webservice"} 
        } 
    },
    {  
        "Effect": "Allow", 
        "Action": ["secretsmanager:*"], 
        "Resource": "arn:aws:secretsmanager:us-east-1:492450522567:secret/*", 
        "Condition": {  
            "StringEquals": {"aws:ResourceTag/Project": "achampionship-webservice"} 
        } 
    },
    {  
        "Effect": "Allow", 
        "Action": ["ecs:*"], 
        "Resource": [
            "arn:aws:ecs:us-east-1:492450522567:cluster/*",
            "arn:aws:ecs:us-east-1:492450522567:container-instance/*",
            "arn:aws:ecs:us-east-1:492450522567:service/*",
            "arn:aws:ecs:us-east-1:492450522567:task/*",
            "arn:aws:ecs:us-east-1:492450522567:task-definition/*",
            "arn:aws:ecs:us-east-1:492450522567:capacity-provider/*"
        ],
        "Condition": {  
            "StringEquals": {"aws:ResourceTag/Project": "achampionship-webservice"} 
        } 
    },
    {  
        "Effect": "Allow", 
        "Action": ["ecr:*"], 
        "Resource": "arn:aws:ecr:us-east-1:492450522567:repository/*",
        "Condition": {  
            "StringEquals": {"aws:ResourceTag/Project": "achampionship-webservice"} 
        } 
    },
    {  
        "Effect": "Allow", 
        "Action": ["cloudwatch:*"], 
        "Resource": "arn:aws:cloudwatch:us-east-1:492450522567:alarm/*"
    }
    ] 
}

Previous Post Next Post