Wednesday, January 31, 2018

Getting started with AWS Free Tier

Introduction

Please note that i am a beginner with AWS, and this blog is my current understanding, and may not be totally accurate.

Knowledge of AWS has become sort of mandatory these days, so i created an account and will get to try out some facilities free for an year( within usage limits), whereas others ( the "Non-expiring Offers" ) are free for lifetime( again within limits). See https://aws.amazon.com/free/ for details.



For example, EC2 free-tier gives us 750 hours per month on a t2.micro instance. The hours are sufficient for one instance to run continuously per month. If we want to run multiple instances, the hours will have to be divided between them.
So its a good practice to stop your instances after you have finished your practice session with them, to save on hours.

Billing

Billing with AWS has quite complex clauses, e.g. transfer of data, transfer outside a region, transfer over public I.P, non-usage of elastic I.P.s, number of IO reads, etc. Also, the policy is never to stop execution when you exceed the limit, but to charge you. Thus, its better to subscribe to billing alerts at https://console.aws.amazon.com/billing/home?region=us-east-1#/preference
( region=us-east-1 will change as per your settings ) and get notified in time.

Programmatic access with APIs

In order to access your account programatically using Amazon APIs, or tools that use the API, like Boto, Fog, TerraForm, etc, you will to get the access keys from the Security credentials page : https://console.aws.amazon.com/iam/home?region=us-east-1#/security_credential( region=us-east-1 will change as per your settings ). These keys are for the amazon account, and not per instance or service.  Its possible to install the aws tools on your local machine, and use them to work with your aws services. e.g. aws s3/s3api can be used from your local machine to download files from s3. The s3api offers some extra options, like ranges to download part of a file.

About Terraform, its a tool to initialize/destroy instances, but not for installing/updating/running software. Other tools like Ansible,Chef,Puppet should be installed using Terraform initialize, and later used as needed. Also, Terraform saves and reads states from files, and may not be as suitable as Boto/Fog for running on-the-fly configurations without using a file.
When we start/stop/terminate instances, we do not communicate with the instance itself, but rather its region-level handler. This should be clear for start/create, since the instance does not exist. Each instance has an instance-id, which can be used to stop/terminate an instance. We do not need/use the DNS/I.P address.

Storage services

All of these are free only for the trail period. Charges usually apply on amount of data stored, as well as transferred.

S3( Simple Storage Service)

Grow/shrink as needed storage. Not for heavy writes.
Not a file-system with files, inodes, permissions etc. Accessible using its API.
S3 stores objects(files) up to 5 TB, each can have 2 KB of metadata. Each objects has a key, and is stored in a bucket, which itself has a name/id. So its rather like a nested hashmap. Buckets and objects can be created, listed, and retrieved using either REST or SOAP. Objects can be indexed and queried using the metadata/tags. Can be queried as SQL using the Athena service. Can be downloaded using the HTTP or BitTorrent. Bucket names and keys are chosen so that objects are addressable using HTTP URLs:
  • http://s3.amazonaws.com/bucket/key
  • http://bucket.s3.amazonaws.com/key
  • http://bucket/key (where bucket is a DNS CNAME record pointing to bucket.s3.amazonaws.com)
Because objects/files are accessible via HTTP, S3 can be used to host static websites. Some dynamic scripting could be provided by Lambda.
S3 can be used as a file-system for Hadoop.
Amazon Machine Images (AMIs) which are used in the Elastic Compute Cloud (EC2) can be exported to S3 as bundles.
https://aws.amazon.com/blogs/aws/amazon-athena-interactive-sql-queries-for-data-in-amazon-s3/

I just discovered that creating tags on S3 objects incurs costs, tho quite small ! AWS billing is really tricky. Fortunately, due to the billing alarm set up, i got notified in time.

EBS( Elastic Block Storage )

Fixed amount of block storage for high throughput. (Fast read/writes ) e.g. can store DB files. Multiple such allocations can be made. Needs to be attached to a file-system. Should be formatted before using. Attached and accessible only to a single EC2 instance.

EFS( Elastic File System )

Grow/shrink as needed, managed file-system, can be shared among multiple EC2 instances. Not HTTP accessible, no meta-data querying like S3.

Glacier

Cheap, long term, read-only archival storage

Serverless services( Lambda) :

Serverless means one does not have to setup servers or load-balancers. We just write a function that does the required processing. e.g store price updates into DB. The server, scaling is all handled by AWS. The charging is only for the use, not for the uptime. So if the functionality is not called frequently, one could use Lambda instead of an always-on EC2 instance, and be billed less. The function can be called from various sources like S3 object modifications, logging events, or an API Gateway interface that accepts HTTP calls. Using these sources may incur separate charges.

Working with EC2(Elastic Computing) instances

Its quite easy using the management console to launch an EC2 instance. The options that are eligible for the free tier are marked as such, e.g."t2.micro" instances. Sometimes, options that may incur additional charges are marked as such.

Regions

The region is important since the console usually lists instances/services for the current region. Also communication between different regions may incur additional charges. The APIs too usually query by region. So for testing, its better to keep everything under a single region. In real life, distributing your application across different regions will provide better fail-safety.

Tags

Its possible to add tags when configuring an instance. e.g. type=DB. These tags can be used by the API, e.g. to filter out only DB servers and work on them.

User-data

Specify a set of commands to run when the instance starts, e.g. set env variables, starts a DB daemon, etc. If using config/script files, where will be files come from ? Probably from an S3 or EFS storage that will have to be set up first. This option is available under "Configure Instance Data->Advanced Details". The user-data runs only the first time the instance is started, if we want it to run on every restart, use the "#cloud-boothook" directive at the start of the script.
Here is an example of setting env-variables, and copying and running a file from S3:
--------------
#cloud-boothook
#!/bin/bash
echo export MYTYPE=APPSRV > ~/myconfiguration.sh
chmod +x ~/myconfiguration.sh
sudo cp ~/myconfiguration.sh /etc/profile.d/myconfiguration.sh

aws s3api get-object --bucket <bucketname> --key test.sh /home/ec2-user/test.sh
sh test.sh
----------------

The EC2 parameter store seems to be another way to store parameters, with better security

Addresses

When an instance is created, it is assigned public and private I.Ps, as well as a public domain name. The domain name is of the form
ec2-<public I.P>.compute-1.amazonaws.com.
If we use the domain-name from outside AWS, e.g. from our local m/c, it will resolve to the public I.P of our instance. If used from within AWS, i.e. from an EC2 instance, it will resolve to the private I.P of our instance. The domain-name usually contains the public I.P address, and changes if the public I.P changes. The public I.P is not constant, as its allocated from a pool. A reboot does not change the I.P.s. However, a stop and start will change the public I.P, tho not the private I.P. One solution for a fixed public I.P is to use elastic I.P.s. However, they can incur charges in certain cases, e.g. if not used. A terminate and create-new will of course change the I.P.s.

For better security, it should also be possible to have only a private I.P. for some EC2 instances,and access them via SSH from the EC2 instances that have a public I.P. This is probably the "Auto-assign Public IP"option, enabled by default

Key-Pairs

We usually work on an instance using SSH, with public/private keys. (These are different from the API Access keys for the account.) These key-pairs can be generated from the console, and associated with an instance. ( Advanced : Can also be generated on your local m/c, and the public key copied to the proper directory on your instance ). Has to be done when creating an instance. If using the Launch wizard, you will be prompted for creating a key-pair, or using an existing one. A key-pair can be shared among multiple EC2 instances. Make sure to use a name that keeps the file unique on your local file system.e.g. use aws, account etc in the name.

Security Groups

Each EC2 instance is associated with a security group, which is like a firewall. It controls what protocols and ports are available for inbound and outbound calls.

IAM Roles

For internal use within AWS services. e.g. Accessing S3 from EC2 requires the  account secret keys for auth. Instead one can create an IAM Role with S3 permissions, and grant to EC2 instance. Not very flexible though. Can be specified only when launching the instance. Also, combination of roles cannot be granted.

Storage

The instance launch wizard  will by default create an 8 GB EBS root volume for the instance. In addition there is an option to attach more volumes. For free-tier, only EBS seems to be supported. There is a "Delete on termination" option, which if checked, will delete the EBS volume after the instance is terminated. Stopping an instance won't affect the EBS volume tho, and i checked that some files that i had added to the volume were intact after a Stop and Start.