Elastic Beanstalk Deployment Automation

We are going to talk about a setup where all you need to do it commit your code and all the rest of the steps from unit tests to deployment can be taken care of by some externally hosted cloud platform that provides continuous integration. In my case, it is going to be Shippable that I am using as a sample but you can use almost anything like TravisCI or codeship, for example.

Setup

Here is the setup we will be looking at:

Architecture

Architecture

Shippable for commits

We will use shippable for the following:

  • Unit Tests
  • Regression Tests
  • Localized DB Tests
  • Tagging of the source code if the commit passes all tests
  • Deployment of the source code on beanstalk running docker

Login onto shippable and setup your project to be built. It uses webhooks with your repository like GitHub or shippable which are called on every commit. You can create a shippable.yml file in your project which will be called on every commit. If you have used docker before, it might look familiar because they invoke a docker script to run the script within shippable.yml

Here is one of my sample files from the project that powers this blog:

language: python

install:
  - pip install Pygments 

before_script:
  - git config --global push.default matching
  - mkdir -p themes
  - git clone git@github.com:abhi1010/abhi1010.github.io.git
  - cd themes
  - git clone https://github.com/abhi1010/hyde-a.git
  - cd ../

script:
  - hugo -d abhi1010.github.io/

after_success:
  - echo `pwd`
  - export NOW_HOUR=$(date +%d-%b-%H_%M) 
  - git config --global user.email "abhi.pandey@gmail.com"
  - git config --global user.name "abhi1010"
  - git config --get remote.origin.url
  - git remote set-url origin git@github.com:abhi1010.github.io.git
  - cd abhi1010.github.io
  - git status -s
  - echo -e "a\n*\nq\n"|git add -i
  - git commit -am 'Automated build from Shippable - '$NOW_HOUR && git push


notifications:
     email:
         recipients:
             - abhi@boun.cr
         on_success: change
         on_failure: always

cache: true

This script does the following tasks:

  • Set python as the default language for the scripts to use
  • Install Pygments using pip
  • My blog is done using three repos - so it does a git clone for each
  • Calls on hugo to create the static site
  • Commits the changes made in static content to GitHub

Once it works, you will see the following on shippable site:

Shippable Build Status

Shippable Build Status

Unit Tests

You may also want to set up unit tests and regression tests as part of your scripts Just do the following then

script:
  -  py.test-3.4 tests/test.py --maxfail=3 -s --full-trace --tb=long --junitxml=../shippable/testresults/pytests.xml

Git Tagging

If the tests pass in scripts only then does shippable go to after_success section. Over there, you might want to tag your source code, so that docker will only pull the tagged and approved commits from shippable, not every commit - which is very important.

Here’s how to do that:

after_success:
  - git tag -f recon_prod master
  - git push -f --tags

Deployment

Once you have approved your code commit, it is time to deploy it to docker on beanstalk. I like to keep deployment scripts in another bash script, so that deployment can be done in various other ways as well, if needed.

after_success:
  - main/scripts/deploy.sh 
  - echo 'CODE DEPLOYED'

Or, you may choose to have the “deployment” script from another project, if you wish. It allows you to separately maintain all the moving parts.

rm -rf cd eb-reconwise
echo 'Deploying New Dockers'
git clone https://github.com/abhi1010/deplyment_project.git
cd deplyment_project/
chmod +x deploy.sh
eb use beanstalk_env_name
./deploy.sh
echo 'Deployment Complete'

Dockerfile

Now first what we need is setting up docker on beanstalk.

  FROM abhi1010/base_docker
  MAINTAINER abhi1010 <codersdigest@gmail.com>
  
  ENV DEBIAN_FRONTEND noninteractive
  
  ENV WS '/ws'
  ENV CURR_HOME '/root'
  WORKDIR $WS
  
  RUN git clone https://github.com/abhi1010/dockerprj.git \
   && . $WS/ve_envs/rwv2/bin/activate \
   && $WS/prj/rw/manage.py collectstatic --noinput
  
  COPY supervisor-app.conf $WS/
  
  RUN \
      cp $WS/supervisor-app.conf /etc/supervisor/supervisord.conf \
   && chown -R www-data:www-data /var/lib/nginx

  VOLUME ["/var/shared/", "/etc/nginx/sites-enabled", "/var/log/nginx", "/ws/"]
    
  EXPOSE 80
  CMD ["supervisord", "-n"]

This does the following tasks:

  • Run a base docker from a custom image - where all apps and project requirements have already been installed and configured. It helps me save a lot of time during deployments.
  • Download the source code using RUN - which I update using another method.
  • Copy the supervisord config as well
  • Set the right user rights for nginx
  • Setup folders to be shared using VOLUME
  • Expose port 80 so that this docker container can be used as a web container
  • Set cmd so that it allows supervisord to be used for running the container

Beanstalk Configuration

Once we have the Dockerfile ready, we need to set up the configuration for beanstalk so that during deployment, other steps can be taken care of as well. Some of the things to keep in mind in beanstalk setup are:

Tips

  • All beanstalk configuration has be kept in a folder called .ebextension
  • beanstalk ec2 instance maintains a folder internally to run the scripts while setting up docker for you so that the instance can be ready for you
    • It is totally possible to plug your own scripts into beanstalk initialization setup so that you can program a custom EC2 instance for yourself
    • Folder to place your scripts are /opt/elasticbeanstalk/hooks/appdeploy/post and /opt/elasticbeanstalk/hooks/appdeploy/pre
    • Scripts placed in the folders are read in alphabetical order
  • You can increase the timeout of your docker initialization setup if it takes too long due to multiple steps
    option_settings:
        - namespace: aws:elasticbeanstalk:command
          option_name: Timeout
          value: 1800

User Accounts

  • You can also add any user account if you’d like pragmatically and make sure they are always part of the EC2 instance that you are creating
    • Create a file called 00001.ftp.config
    • Use pre folder to setup accounts
files:
  "/opt/elasticbeanstalk/hooks/appdeploy/pre/0001_create_users.sh":
    mode: "000755" 
    owner: root
    group: root
    content: |
      #!/usr/bin/env bash
      echo 'About to Add user gh-user'
      adduser gh-user
      runuser -l gh-user -c 'mkdir ~/.ssh'
      runuser -l gh-user -c 'chmod 700 ~/.ssh'
      runuser -l gh-user -c 'touch ~/.ssh/authorized_keys'
      runuser -l gh-user -c 'chmod 600 ~/.ssh/authorized_keys'
      runuser -l gh-user -c 'echo "ssh-rsa r4CBI2cWQohEwBkGw9CcW0vWfnlAcKkrCnsJvwe/+kG5w9J8gJdnNQ8BFB+q+kQ6fWl+1kw7b+8jah5q0nNpOzLbed+Rzse1BoOIjsSXqN/L7AW8y61PVBULcVAVBKCrVy0U5zifv/e6a5+dsUD3WLiD3yXTgPDcZoqQqPYkurCx5ZzxLylKfXfL37k7sz00e+Tu/Y+J9xXdI9j3G5bU9rmIe4SH4mK4BCMRQ6zCHqAzAXZtnmN5U1XR3XrfMtuDLvVgcOlEpXIMl9q2kco0ZCdMkYoSzf3Yj" >> ~/.ssh/authorized_keys'

Pre-installing Packages

  • You should also install any package that you’d like on all EC2 instances
files:
  "/opt/elasticbeanstalk/hooks/appdeploy/pre/0003_packages.sh":
    mode: "000755" 
    owner: root
    group: root
    content: |
      #!/usr/bin/env bash 
      yum install -y git
     

Sharing volumes between docker and EC2

  • Ensure that folder sharing is enabled between EC2 and docker instances
    • This goes into file Dockerrun.aws.json
{
   "AWSEBDockerrunVersion": "1",
  "Volumes": [
    {
      "HostDirectory": "/var/shared",
      "ContainerDirectory": "/var/shared"
    }
  ],
  "Ports": [
    {
      "ContainerPort": "80"
    }
  ],
   "Logging": "/var/log/"
 }

Folder and files structure

  • Finally, make sure that your folder files are setup as follows:
$ (master) tree -a 
.
|-- .ebextensions
|   |-- 0000_users.config  # setup extra users
|   |-- 0002_packages.config  # install external packages
|   |-- 0004_bash.config  # I like to manage all 
|-- .elasticbeanstalk
|   |-- config.yml  # AWS connection details
|-- .gitignore 
|-- Dockerfile  # Docker instance
|-- Dockerrun.aws.json   # folder sharing
|-- updater.sh  # script to update any code

.elasticbeanstalk folder for aws configs

  • You might be wondering what’s .elasticbeanstalk folder. It is the folder that’s responsible for taking your AWS secret key and access id for doing the actual deployment. If you don’t set it up, AWS will ask you every time.
    • For setting it up, you just need to call eb config one time, it creates the folder for you with all the details, including connection details. You can then make it part of your git commits
    • Make sure it is secure

And that’s it! Once you commit your code, shippable will run the tests, tag your code and finally download this deployment project and deploy it to beanstalk through docker containers.

Nginx Upload limits on Beanstalk Docker

If I am not wrong, nginx only allows you to upload up till max 2Mb of data by default. If you are doing a docker deployment on beanstalk you may to remember to change that not once but twice!

As you may know already, beanstalk creates an EC2 instance to manage the docker environment.
Since EC2 needs to manage the docker environment and serve the web interface as well, it does so by having another nginx instance to serve the nginx within docker. Hence, if you had to modify the nginx settings to allow bigger uploads, you’d have to modify the settings for nginx on both - docker as well as EC2.

    # max upload size
    client_max_body_size 10M;   # adjust to your liking

Also, if you don’t want to have any limit at all for uploads, then just change the client_max_body_size to 0.

    # max upload size
    client_max_body_size 0;

Updating Django Source with Docker Deployments

While deploying docker multiple times, you may not want to copy over your Django source code every time you do a deployment.

Setting up supervisord

Luckily there is an easy way to manage this. Since you are working with Django, there is a good chance that you are also managing the processes (like uwsgi) with supervisord.

Here are some of the steps that you can take with supervisord

  • Set up a new process in supervisord
  • Do not allow it to autorestart since it will be a one-shot process
  • Call another script in any format to update the source code
    • As an example, I use bash to update my source code through git

Here’s a sample code:

    [program:source-updater]
    redirect_stderr = true
    stdout_logfile = /shared/source_code_updater.log
    directory = /ws/
    command = /ws/source_code_updater.sh
    autorestart=False

Updating the source code

Few things are important to note in a docker deployment:

  • Not every commit needs to be deployed
  • Filter your commits to only allow deployable code to be updated on docker
  • Include regression, unit and system tests to be part of your build process
  • Once everything has been confirmed to be working, tag your code so that you know it is worthy of going to docker
  • Another way would be to manage this process through branches and merge only if everything passes
  • docker deployments would build off this merged branch or tagged version
  • This way even if you have made 10 commits while fixing a bug and are still in the process of fixing it, you know it won’t go to docker deployment

With that idea, do a checkout and update the source code according to specific tag:

    git checkout -f tags/your_tag_name
    git pull origin tags/your_tag_name

Telling uwsgi about the updated source code

Once you have updated your source code, you need to re-load the project onto uwsgi so that nginx or apache can pick it up. The simplest way to achieve it using the config parameter of uwsgi: --touch-reload. It will reload uWSGI if the specified file is modified/touched

Just remember to setup supervisord in your Dockerfile with this config parameter.

[program:app-uwsgi]
redirect_stderr = true
stdout_logfile = /var/shared/_uwsgi.log
command = /ws/ve_envs/rwv2/bin/uwsgi --touch-reload=/ws/wsgi.ini --ini /ws/wsgi.ini

You can choose any file. I choose uwsgi.ini because the contents never really need to change in it.

Multiple Virtual Environments in Docker

It may seem like a daunting task to have multiple python projects running in their own virtual environments in docker as you want to manage the running of the tasks from a single source - let’s say supervisord. However, all that is required here is to know that python automatically picks up the location of the virtual environments if you provide full path to the virtual environment’s python.

For example, in my docker environment, I have virtual environment install at the following location:

/ws/ve_envs/rwv1/

To enable a project with this virtual environment, I can run the following:

/ws/ve_envs/rwv1/bin/python3.4 PYTHON_PROJECT_FILE_TO_RUN.py

Similarly, other projects can be set up in the same way.

For example, for running uwsgi I provide the full path for python as follows:

[program:appName]
stdout_logfile = /var/shared/_uwsgi.log
command = /ws/ve_envs/project/bin/uwsgi --touch-reload=/ws/wsgi.ini --ini /ws/wsgi.ini

You might want to read about --touch-reload in my other post.

Sharing folders on Beanstalk Docker

It is very easy to setup volume sharing in docker. You ideally want the following folders to be shared when a new docker is initialized for you:

  • /var/log so that you can keep track of logs
  • nginx specific folders because you will have two instances of nginx running - one on docker and another on EC2. This allows you to share logs
  • your personal workspace or anything that you’d like to share

Here’s how you’d do it. The keyword is VOLUME… in your Dockerfile

VOLUME [ \
    "/var/shared/", \ 
    "/etc/nginx/sites-enabled", \ 
    "/var/log/nginx", \
    "/ws/" \
]

Convert GitHub Wiki to Static Site with themes

I recently wanted to setup a wiki so that I could convert it into a static html site with a proper themes. What could be a possible use case for such a requirement:

  • Manage the documentation of a product internally through git but publish it for clients/world through static site
  • Convert the uncolored wiki to a themed version
  • Allow serving of the wiki through web application frameworks like Django
    • It may allow you to have authentication system as a first step hurdle to stop everybody from giving access

Anyways, I went about the whole process and decided to jot down everything. Here I am taking D3 Wiki as an example which I will be converting into a static site. Let’s begin.

D3 Wiki using pelican

D3 Wiki using pelican

Setup and requirements

What do we need to get started?

  • We will need a static site generator
    • Let’s use pelican for this demo
  • An actual wiki
  • Python environment so that pelican and fabric can be installed

Virtual Environment with pelican

Setup the virtual environment

$ virtualenv ve_blog
$ source ve_blog/bin/activate

Install pelican

$ pip install pelican

Pelican Quickstart

Setup pelican using pelican-quickstart so that all files are setup correctly for creating a static site.

$ pelican-quickstart

Welcome to pelican-quickstart v3.6.3.

This script will help you create a new Pelican-based website.

Please answer the following questions so this script can generate the files
needed by Pelican.

    
> Where do you want to create your new web site? [.] 
> What will be the title of this web site? D3 WIKI
> Who will be the author of this web site? abhi1010
> What will be the default language of this web site? [en] 
> Do you want to specify a URL prefix? e.g., http://example.com   (Y/n) n
> Do you want to enable article pagination? (Y/n) Y
> How many articles per page do you want? [10] 
> What is your time zone? [Europe/Paris] Asia/Singapore
> Do you want to generate a Fabfile/Makefile to automate generation and publishing? (Y/n) Y
> Do you want an auto-reload & simpleHTTP script to assist with theme and site development? (Y/n) Y
> Do you want to upload your website using FTP? (y/N) N
> Do you want to upload your website using SSH? (y/N) N
> Do you want to upload your website using Dropbox? (y/N) N
> Do you want to upload your website using S3? (y/N) N
> Do you want to upload your website using Rackspace Cloud Files? (y/N) N
> Do you want to upload your website using GitHub Pages? (y/N) N
Done. Your new project is available at /Users/apandey/code/githubs/d3wiki

Get the wiki

$ git clone https://github.com/mbostock/d3.wiki.git

Cloning into 'd3.wiki'...
remote: Counting objects: 12026, done.
remote: Compressing objects: 100% (67/67), done.
remote: Total 12026 (delta 607), reused 552 (delta 552), pack-reused 11407
Receiving objects: 100% (12026/12026), 9.92 MiB | 1.49 MiB/s, done.
Resolving deltas: 100% (7595/7595), done.
Checking connectivity... done.

Setting the wiki as content for pelican

$ rmdir content
$ ln -s dr.wiki content

Why simple pelican won’t work and what to do

If you tried to simply call pelican command to build the static site, you will notice a lot of errors like:

$ fab build

RROR: Skipping ./请求.md: could not find information about 'NameError: title'
ERROR: Skipping ./过渡.md: could not find information about 'NameError: title'
ERROR: Skipping ./选择器.md: could not find information about 'NameError: title'
ERROR: Skipping ./选择集.md: could not find information about 'NameError: title'
Done: Processed 0 articles, 0 drafts, 0 pages and 0 hidden pages in 3.47 seconds.

The problem is that pelican expects some variables to be defined in each markdown file before it can build the static file. Some of the variables are:

  • Title
  • Slug
  • Date

You may add your own ones as well that you want. However, for our initial purposes, we will keep it simple and just try to add these.

Next, how do we achieve this automation? fab is our answer.

Let’s write a function in python that will modify the markdown files and update them to add Title, Slug, Date

We will edit fabfile.py and add a new function create_wiki:

def create_wiki():
    files = []
    # Find all markdown files in content folder 
    for f in os.walk('./content/'):
        fpath = lambda x: os.path.join(f[0], x)
        for file in f[2]:
            fullpath = fpath(file)
            # print('f = {}'.format(fullpath))
            files.append(fullpath)
    filtered = [f for f in files if f.endswith('.md')]
    for file in filtered:
        with open(file, 'r+') as f:
            content = f.read()
            f.seek(0, 0)
            base = os.path.basename(file).replace('.md', '') 
            lines = ['Title: {}'.format(base.replace('-', ' ')),
                    'Slug: {}'.format(base),
                    'Date: 2015-08-07T14:59:18-04:00',
                    '', '']
            line = '\n'.join(lines)
            # Add the lines to the file
            f.write(line + '\n' + content)
        print(file)
    
    # build and serve the website
    build()
    serve()

Now you can call this function easily:

fab create_wiki

The website can now be viewed at http://localhost:8000

What happened to the menu?

There is a minor issue here though, you will notice that the menu is not available - it is all empty. It is an easy addition. We will need to add some lines to publishconf.py to say what the menu is gonna be.

For my example, I have chosen to show up the following for D3:

  • API Reference
  • Tutorials
  • Plugins

    # We don't want all pages to show up in menu
    DISPLAY_PAGES_ON_MENU = False
    
    # Choose the specific pages that should be part of menu
    MENUITEMS = ( 
    ('HOME', '/home.html'),
    ('API Reference', '/API-Reference.html'),
    ('Tutorials', '/Tutorials.html'),
    ('Plugins', '/Plugins.html'),
    )

Choosing themes

By default, pelican uses its own theme for the static site, but theme can be modified. Let’s choose pelican bootstrap3 for our example here:

git clone https://github.com/DandyDev/pelican-bootstrap3.git

Now, add the full path to the theme at the end of the publishconf.py file:

THEME = "/Users/apandey/code/githubs/pelican_coders/all_themes/pelican-bootstrap3"

Finally, build your site again and serve:

fab build
fab serve
Pelican Bootstrap3 theme

Pelican Bootstrap3 theme

Get all this code in github repo

I realize there maybe a few things going on here. You can get this whole setup as a project from my github repo

You will find all this code and setup so that your life is easier. Just start with d3 wiki along with virtual environment and you will be fine.

Docker Container cleanup on Elastic Beanstalk

Sometimes you may notice that old containers are not cleaned up from Beanstalk environment. This may be due to your container still running as a ghost on the background. One way to find out about this is to quickly look into your /var/lib/docker/vfs/dir directory whether it has too many folders.

Next, find out what container processes you have going on. [root@ip dir]# docker ps -a

You might see something like this:

    CONTAINER ID        IMAGE                              COMMAND             CREATED             STATUS              PORTS               NAMES
    1611e5ebe2c0        aws_beanstalk/staging-app:latest   "supervisord -n"    About an hour ago                                           boring_galileo
    e59d0dd8bba1        aws_beanstalk/staging-app:latest   "supervisord -n"    About an hour ago                                           desperate_yalow
    3844d0e18c47        aws_beanstalk/staging-app:latest   "supervisord -n"    2 hours ago         Up 8 minutes        80/tcp              pensive_jang

Ideally, we want to “forcibly remove” all images (and hence the folders from /var/lib/docker/vfs/dir directory) that are not in use anymore. Just run the following to test whether it works:

    docker rmi -f `docker images -aq`

You might run into trouble where docker says that all those images already have a container that is running them. This means those container are orphaned but not killed as we thought them to be. Let’s remove the shared volumes if any, for each one of them.

    docker rm -fv `docker ps -aq` 

This will

  • kill the container
  • unlink the volumes

You should see a lot more space now on your beanstalk instance.

    [root@ip dir]# df -h
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/xvda1      7.8G  1.8G  5.9G  24% /
    devtmpfs        490M   96K  490M   1% /dev
    tmpfs           499M     0  499M   0% /dev/shm

Last Resort

If you feel that all this is not working, then you can try one of the scripts provided by docker itself at GitHub

It will delete the folders under /var/lib/docker and try to do it responsibly.

Partition Linked List around a Value X

How do you partition a list around a value x, such that all nodes less than x come before all nodes greater than or equal to x?

Well, there are some solutions possible. The solution, I came up with, is a bit convoluted but let me tell the idea behind it. You want to track the following:

  • Two pointers to remember the beginning of the lower and higher series each

  • One pointer (current) to iterate through the Linked List

  • The list may itself start with higher or lower value compared to the middleValue. Thus we need to remember the beginning of the lower series (lowerSeries) as this is what we will send back

Now that we have this out of the way, let’s look at the code:

Code

As usual the code is available here:

https://github.com/abhi1010/Algorithms

Find the Kth to Last Element of a Singly Linked List

It is possible to a recursive solutions but I will use a simple runner logic. Recursive solutions are usually less optimal.

Note here that, in our logic K=1 would return the last element in the linked list. Similarly, K=2 would return the second last element.

The suggested solution here is to use two pointers:

  • One pointer will first travel K items into the list
  • Once that is done, both the pointers start travelling together, one item at a time
  • They keep travelling until the end of linked list is found
  • In that situation, the first pointer is at the end of the list, but the second pointer would have only reached till Kth element - this is what you want

Let’s have a look at the code:

As usual the code is available here:

https://github.com/abhi1010/Algorithms

Removing Duplicates from Linked List

Duplicates can be removed in many ways:

  • Create a new Linked List containing only unique items

  • Iterate through the Linked List and keep removing items that are being repeated

The internal structure itself for the algo can either be map or set based. When using map the Node itself can be saved thereby making your life easier if you are creating a new Linked List. However sets can be very useful if we are just iterating through the Linked List and simply deleting items that are being repetetive. This is also a great spacesaver. Hence we decided to go down this path.

Code

As usual the code is available here:

https://github.com/abhi1010/Algorithms

Here’s a small sample as to how to do it: