DevOps community blog


#1

This blog is dedicated to Coral Project, I will describe the various phases and challenges of the project from the DevOps perspective.Its a Mozilla Foundation project, in collaboration with the New York Times and the Washington Post.

We are creating open source tools and practices for newsrooms of all sizes, to build better communities around their journalism. That means better comment boxes, better user-generated content modules, and a whole lot more.

As a Mozilla project, the user’s control over their privacy and identity will be paramount throughout.
My name is Eugene de Fikh, I am a DevOps engineer who has been with a Coral Project starting mid December 2015. I am in charge of the infrastructure and automation using todays DevOps principles. I have been a systems engineer for the past 15 years , in the last 5 years I have been working on automating AWS infrastructures for various startups.

Current setup is done in AWS console using VPC subnets. I have assigned private and public subnet , creating a three tier architecture design. We allow web servers to talk to the outside world and to middle tier servers. Middle tier is allowed to make direct connection to database. Database servers are not accessible from outside. In addition we use OpenVPN server to allow access to developers per allowed groups. We will be implementing OpenLdap to manage ssh keys and rotate users logins on regularly scheduled basis.Backend is run by Mongo db 3.2 cluster , with a primary and 2 secondary servers.

Middle tier is Pillar application and front end is Xenia web servers. We choose to setup Dev/Stage/Prod environments and automate server and configuration management using Puppet server. Puppet server manages common server deployments and setup of user, ssh keys, configuration files for application use, server partitioning, etc.We choose to create custom AMIs with 3 types of servers, web , middle tier and database servers. Each type of server can reside in one of the 3 environments ( Dev/Stage/Prod).

Our CI server is Jenkins that resides in private subnet and is allowed to deploy using ssh keys to middle tier and web servers for stage only at this time. We have 2 jobs in Jenkins, one to create the build and generate artifacts that then get deployed by a second CI job to each of the designated servers ( web / middle tier). We automated build process to check every 30 minutes with github for an updated master branch. If changes are detected new build is generated and automatically tested for errors and deployed to staging server. if a job fails an email is sent to the team letting them know of a failure. In the future posts I will describe deployment in more details and also will talk about monitoring and log management.


#2

I am working on creating an all in one solution for our clients that will allow the use of a single YAML file and docker compose to spin up all of our applications on any platform.

I am using docker compose and DockerHub to achieve this. DockerHub automatically generates a docker image based on a Docker file hosted on our github repo.

All that i have to do is call on these repos from a yaml file, below is a sample compose.yml file that I created:

pillarapp:
     environment:
             - PILLAR_HOME=/opt/pillar
             - PILLAR_ADDRESS=:8080
             - MONGODB_URL=mongodb://xxxx:xxxx@xxxx:27017/databasename
     image: coralproject/pillar
     ports:
        - "8080:8080"
atollapp:
     image: coralproject/atoll
     ports:
        - "8181:8181"


xeniaapp:
     environment:
             - XENIA_MONGO_HOST=x.x.x.x:27017
             - XENIA_MONGO_USER=xxxxx
             - XENIA_MONGO_PASS=xxxxx
             - XENIA_MONGO_AUTHDB=xxxx
             - XENIA_MONGO_DB=xxxx
             - XENIA_LOGGING_LEVEL=1
             - XENIA_HOST=:4000
     image: coralproject/xenia
     ports:
        - "4000:4000"


cayapp:
     image: coralproject/cay
     ports:
        - "80:80"

Now to spin up all 4 apps at the same time I issue the following command on my laptop or any other node that has docker-compose installed

/usr/local/bin/docker-compose -f compose.yml up -d

to see if the instances are running :

/usr/local/bin/docker-compose -f compose.yml ps

to see the logs:

/usr/local/bin/docker-compose -f compose.yml logs


#3

Today we finished creating our first working demo. Its accessible via demo.coralproject.net

The demo instance was built on AWS platform and uses docker Hub images to start each application. With few environment parameters such as database and host connections we were able to start up our demo in one click.

I installed Reverse proxy using Nginx on the demo host so we can give users a friendly URL. We also introduced a simple PHP page that allows to us to control the login.

My next step is to evaluate the use of AWS ECS container services that would allow me to do automated updates to the demo using github hooks from ECS.

We are using NewRelic to gather stats and monitor servers uptime. We are also using Google analytics and site24x7 services to do synthetic transaction monitoring.


#4

In this post I wanted to give an overview of our code deployment system by Jenkins.

We have setup multiple Jenkins jobs to deploy our web apps in staging.

Xenia application:

  • We have 2 Jenkins jobs, one builds the artifacts using go build and environment parameters specified in github repo, the second job copies the artifact into the /opt/xenia folder and launches the app using restart of daemon. We are logging to a rsyslog on the system.

  • we have a puppet server that deploys configuration files for each application to the specified location and registers it.

  • puppet server also deploys common configuration options such as users for each system, sets hostname and registers the server with monitoring system

  • we employ pre build and post build QA tests using go lint and some other tools to make sure the build is good before we deploy it.

  • if the build is deployed successfully we post notification to slack and send email to admin users.

  • for pillar and atoll apps we deploy using docker hub images , we also make sure that puppet agent checks github for newer builds and starts deployment job is update is detected to a master branch.

  • we plan to keep a version history using S3 and allow users to deploy specific versions of the applications and docker containers if so desired.

  • Jenkins is being monitored and regularly backed up to S3.


#5

Now that we have all of our applications dockerized I decided to explore Amazon ECS. There are few reasons to go this route.

  • We need ability to version our releases and integrate builds directly from github/ docker hub allowing ability to deploy changes from a desired github branch quickly.

  • We need ability to scale our applications elastically , automatically increasing and decreasing instance count based on the load. ECS allows the use of ELB and cloudfront making it easy to introduce high availability and redundancy

  • AWS ECS services allow the use of clusters , making it easy to deploy new applications under a general umbrella of coral space.

  • Demo and staging environments can be closely in sync and can be tested against as needed utilizing best features of docker.

  • docker compose can still be utilized for smaller setups.

  • AWS ECS can be easily ported into production , allowing to meet all of security regulations and isolation using VPC


#6

I was able to deploy our first ECS cluster using Amazon Web services. Here are some highlights I found interesting:

  • AWS provides Docker runtime environments that allow Docker containers to
    run in ECS. Developers create Docker containers from their development
    environments. Therefore, the container possesses all the components and
    libraries it needs to run and scale up and down the ECS cluster.

  • ECS can be setup to start with 1 small instance and scale up and down if ECS observes specified metrics being above or below thresholds. It utilizes auto scaling groups and cloudwatch metrics to monitor cluster use.

  • I deployed Xenia/Pillar/Atoll using docker hub images and currently working on integrating ability for Jenkins CI to automatically update the builds once commits happen on master branch of github. This is the same mechanism we use to integrate github with dockerhub , allowing seamless auto deployment of new images using Dockerfiles. Having ECS do it for us eliminates the need for maintaining separate set of servers and patches for each application. I am still exploring what happens to the instances that do not go away on auto scale and how the base AMI gets updated with security patches. I am also interested on how secured ECS cluster is and if there is a way to test it using today’s penetration tests.

  • Cay is a little trickier to deploy, it requires config.json file to live in root of Nginx and has connection and DB settings for each application. I am looking for a way to automate this part using ansible possibly so that config can be setup as variables. It would be great to add anvil.io as Oauth mechanism , setting up a trusted auth ticket that can be used elsewhere in the system. Since this is a customer facing module we want to secure it with best mechanism, including the use of SSL.

  • Once we have ECS cluster fully configured , security penetration testing will be done by an outside audit firms to determine the level of security and vulnerability that our application adheres to.

  • I envision having a template that can be deployed easily into ECS cluster and have the ability to version each deployment so that we can QA against them.

  • Here is an overview of ECS cluster we deployed https://github.com/coralproject/reef/wiki/AWS-ECS-Container-management-services-and-Coral-Apps


#7

The next part of the solution is to focus on is a security setup. Currently we have a need to secure communication on front end and back end.

  • We have separated our database servers from the rest of the web farm , so that only web servers can communicate to a database. We are working on securing Mongo db to enterprise standards.

  • Network based intrusion detection is used along with Host Based intrusion detection. We are also encrypting data on AWS.

  • Our front end is going to be secured using Oauth solution from anvil.io We will use a client / server architecture to serve auth tokens via encrypted auth tickets that can be retained and used in the system. The solution requires setup of server and clients. Once setup client application can request authentication using API forms.

  • We are also using SSL certificates to encrypt communication to the end user and between the nodes.

  • We plan to do penetration against our systems in the near future to determine bottlenecks and weak points in our systems.


#8

This is great! I really appreciate how thorough you are explaining the rationale for each decision, and even touching on future decisions to be made. As someone who is interested in contributing to the project, it is encouraging to see such public-facing dialogue/documentation, even if I don’t understand some of it :wink:


#9

Hi @edefikh these posts are extremely helpful! A very thoughtful documentation of your lines of thinking & process.

I’m curious to hear more about where that compose.yml file stands — the one committed here doesn’t seem to be fully operational yet (though extremely promising and exciting to see things spin up quickly on my Mac!).

  • It seems like the plan is also for this docker-compose file to manage the creation of a demo mongo database as well? If so, is this something where it’d be helpful for me (or someone else from the community) to pitch in somehow?
  • The Docker for setup for Cay is also somewhat unique, since unlike the other applications (where docker links will suffice to open up ports/connections between conntainers) Cay connects to the other services from the browser, meaning the hostnames/IP’s its configuration requires can’t be private Docker IP’s. I took a stab at parametrizing the Cay Dockerfile here so that it’s possible to generate that config.json from environment variables, but this doesn’t quite solve the issue of “How the docker-compose file is aware of the host machine”… curious to hear your thoughts on this!

Additionally, I’m curious how one might go about deploying these applications (as Docker images or otherwise) without ECS — though I’m definitely a fan, I can see that for an organization just getting started, it might be handy to have a path for folks who either use another container deployment solution (Kubernetes) or whose scale (“just experimenting!”) might merit something less fully-fledged.


#10

I am making progress on updating docker-compose.yml file as described here. In this file I attempted to set some of the variables like URLS for Xenia and Pillar. as well as creating of config.json file for Cay application. This is version 2.0 , I welcome your comments and suggestions.

  • I am working on a docker-compose.yml file that will be deploying complete demo setup of all apps including mongo db and sample data set using Docker files. For mongo I plan to use a reputable Docker Hub image and do a mongo restore of our sample data.

  • We welcome deployment of our apps as Docker images or as artifacts using a deployment system. You can deploy each app using docker Hub or build it yourself using Dockerfiles supplied in each of our public repos. You can run these docker containers on any of today’s popular services like Kubernetes and others.

  • One thing to keep in mind is the fact that our repos will be updated constantly, therefore you will need a way to keep your Docker containers up to date. We let you choose different options like Jenkins and Puppet client for example. I will make a detailed post soon about the setup we are using in our dev environment ( non ECS)


#11

In this post I want to describe in more detail our Dev setup, which is running as folloows:

  • our Xenia app is currently built using Jenkins from source and then pushed out to Xenia stage server using Jenkins CI.

  • Pillar is deployed using Dockerfile , we have a Jenkins job that checks on Github for commits to master branch and calls on updates.sh script on Pillar server to pull down newest Docker Hub Pillar image.

  • In order to make sure Pillar is using latest config files and verify that latest Docker Hub image is deployed on Pillar server we utilize the following Pupper recipe that runs every 30 min on our Pillar server. This way we can keep our dockerized image up to date running on any platform or server.

  • Cay is deployed using same format as Pillar, we also support same format for Xenia.


#12

It has been a little while since I updated the devops blog, here are the latest highlights.

  • In the last few months we have been experimenting a lot with docker-compose , hoping to get it to run smoothly on all flavors of todays popular operating systems with mixed results. We like the portability and ease of use that docker-compose brings us, but it does lack few basic needs , like the need to define correct relationships and complexity of bringing SSL into the mix

  • we had great success at SRCCON on deployment of our ASK suite of apps ( Pillar/Cay/Mongo(sampledata)/elkhorn), people had no problem checking out the install git repo and running thru simple steps of setting up non SSL version of ask

  • some challenges came up with SSL setup, our certificate was not from a well known root CA ( RapidSSL) so we had to hack around to get it to work. Hope to smooth out the details as we make progress. We are using Nginx as a reverse proxy / SSL offload engine, so having a good certificate from a well known CA like Goddaddy can potentially save some time and efforts.

  • as a result of these findings, we decided to try using Makefile process to generate binaries from our Jenkins CI , that include the binary itself or RPM/ DEB setup files with default configs. I am working on making it happen soon.

  • AWS has been great for our setup, but we want to try using docker cloud to get some benefits of being able to create template of our setup. Same goes for Heroku and Goggle engine / Kubernites.


#13
  • I started working with Caddy web server today Caddy & Letsencrypt its a lightweight web server with letsencrypt built in, allowing us to seamleasly integrate SSL with our products. It will be replacing Nginx in the next few weeks on all of our projects as the main webserver , well liked by developers crowd.

  • I am also working on finishing automated binaries / RPM / DEB / Docker image creation using a Jenkins job. Just like our other automated builds we will create new artifacts automatically once we see a commit to master or release branches or our repos. The difference is that we will be compiling docker images on Jenkins and then pushing them to docker hub, bypassing slow docker hub queue and speeding up our time to deploy. I will export the Jenkins job for users to test and use on their own CI servers.

  • Once we get our artifacts and they are properly marked with build numbers and dates created we will be pulling these artifacts from S3 or Docker hub and use Makefile to deploy our apps with simple command like this : Make deploy Ask , this command will install Pillar/Cay/Elkhorn/SSL optional all in one simple command. The user can be instructed to git clone our install repo and then run Make Deploy Ask to install full suite.

  • We are also looking at Docker Cloud as a simple way to release our apps from a template with few easy clicks. Same for Heroku, we have a template that deploys our suites.


#14

As we are testing and improving our Jenkins CI process, I noticed that our reliance on Docker Hub to do automated deploys was slowing down teams efforts. This is due to the fact that there is a wait queue for builds and sometimes you have to wait in line for few minutes to get your build finished.

As a result we decided to create docker images locally on our Jenkins CI and then push them out to docker hub. This way we can have our builds deployed locally , without relying on Docker Hub and then update the Docker Hub image as we please. This significantly speeds up or deploy time and removes Docker Hub dependence.


#15

Hi Eugene!
Thanks for all of this useful info.
Just curious if there are any recent learnings to add to this blog, i.e. further knowledge gained since Sep-16?

/Niclas


#16

Hi Niclas

Eugene moved onto other projects some time ago, and we also moved to different deployment methods.

You can read our technical docs here. - we’ve been updating them a lot over the past few days, and will be adding more later this week.

Let us know if you have any specific questions!

Best

Andrew
Project Lead, The Coral Project


#17

All good Andrew!

I don’t have any specific questions right now, but will let you know if any comes up.


#18

Hi. I had the time to come back to this and it was great information. Thanks.
At.some point It would help to add a summary of the deployment changes and benefits.
Collectively it’s a.good case study.


#19

While we’re not writing blogposts about it any more, we have detailed deployment instructions and architecture information in our docs:

You can find Ask’s docs here: https://docs.coralproject.net/ask
Talk Docs here: https://docs.coralproject.net/talk

Hope that helps!

Andrew


#20

Thanks @andrew_coral. I can say I am digging in to those docs and have downloaded the code at this point. This decision is based on the stack in use and the ability to scale hugely.

I will have a number of questions related to deploying for really small publishers. Where is the documentation that addresses implementation and resource requirements and procedures for both these use cases. Really (really) small in particular, I can see you have BIG handled. Lol.