Notice: Constant WP_DEBUG already defined in /var/www/html/wordpress/wp-content/plugins/changyan/sohuchangyan.php on line 12

Notice: Constant WP_DEBUG_LOG already defined in /var/www/html/wordpress/wp-content/plugins/changyan/sohuchangyan.php on line 13

Notice: Constant WP_DEBUG_DISPLAY already defined in /var/www/html/wordpress/wp-content/plugins/changyan/sohuchangyan.php on line 14
【转载】Zero Downtime Frontend Deploys with Vulcand on CoreOS – zoues

LOADING

Follow me

【转载】Zero Downtime Frontend Deploys with Vulcand on CoreOS
四月 18, 2017|PaaS

【转载】Zero Downtime Frontend Deploys with Vulcand on CoreOS

Zero Downtime Frontend Deploys with Vulcand on CoreOS

May 19, 2014 · By Rob Szumski

Update: Vulcand has been updated since this post was written. Jump to https://github.com/vulcand/vulcand for more info. You can follow the concepts of this post, but the commands are no longer up to date.

Running a distributed system across many machines requires a sophisticated load balancing mechanism that can reconfigure itself quickly and reliably.

The team over at Mailgun has built vulcand, an etcd-backed load balancer, to serve traffic to different parts of their systems. vulcand has many awesome features such as an HTTP API, a command line utility and the ability for complex routing rules. We’re only going look at a few simple examples in this post, but you can check out the readme for the complete details.

Today we’re going to deploy Vulcan on a CoreOS cluster and use it to facilitate two strategies for zero-downtime deployments. These examples were intentionally kept simple for easy comprehension and it’s up to you to decide what strategies you want to use for high availablity, port management, etc.

The Set Up

This post is going to cover two common front-end deployment strategies: a rolling-upgrade and a rapid switch to a new software version. We’re going to assume you’re familar with CoreOS, docker and fleet, and also have a 3 node CoreOS cluster to run the examples on. This post was tested on CoreOS 317.0.0 with etcd 0.2 and fleet 0.3.2.

Our systemd units are going to use containers that are located on the public docker index as coreos/example and have been tagged as 1.0.0 and 2.0.0. Mailgun has set up a trusted build for vulcand as mailgun/vulcand.

Each unit contains a web server that serves a very simple webpage. To easily tell the difference between versions, 1.0.0 has a red background and 2.0.0 has a blue background.

Let’s get started.

Clone the Example Units Repository

The unit-examples repository contains all of the example units used in our blog posts. Clone it to save yourself from having to copy/paste everything:

git clone https://github.com/coreos/unit-examples.git cd unit-examples cd blog-vulcan-example 

If you’re planning on submitting units to the cluster remotely, your local copy of fleetctl must match the version of fleet running on the CoreOS machines. You can find the versions with fleetctl -version and fleet -version. Browse the tagged releases on GitHub for both Linux and Mac versions of fleetctl.

Start Vulcan

First, start up an instance of vulcand and configure a DNS record to point to it. If you’re using Vagrant, you may need to forward ports on your laptop to Vagrant.

You’ll need to modify the unit files in this example with the domain/subdomain you’re using in order for vulcan’s routing to work properly. The easiest way to do this is with sed after you’ve cloned the units repository:

find ./ -type f -exec sed -i -e 's/example.com/vulcan.test.company.com/g' {} /; 

vulcan.service

We’re going to use the “trusted build” of vulcand on the public index. The trusted build is a container that is built directly from the vulcand repository every time code is commited to master. This ensures that the code you’re running comes directly from Mailgun and is always up to date.

You can also clone the vulcand repository on your CoreOS machine and build the Dockerfile manually if you’d prefer. Here’s the unit we’re going to use:

[Unit] Description=Vulcan After=docker.service  [Service] TimeoutStartSec=0 ExecStartPre=-/usr/bin/docker kill vulcan1 ExecStartPre=-/usr/bin/docker rm vulcan1 ExecStartPre=/usr/bin/docker pull mailgun/vulcand:v0.7.0 ExecStart=/usr/bin/docker run --rm --name vulcan1 -p 80:80 -p 8182:8182 mailgun/vulcand:v0.7.0 /go/bin/vulcand -apiInterface=0.0.0.0 -interface=0.0.0.0 -etcd=http://172.17.42.1:4001 -port=80 -apiPort=8182 ExecStop=/usr/bin/docker kill vulcan1 

The -etcd flag tells vulcand that our etcd cluster can be reached over the docker0 bridge since we’re running vulcand in a container. On Vagrant machines, this IP can be different and you may need to edit the unit. Let’s start it:

$ fleetctl start vulcan.service Job vulcan.service scheduled to f8ea00d4.../10.0.2.15 

To test it out, load up the location in a browser. You should see error: "Bad Gateway" since we haven’t set up anything to receive traffic. If you don’t see anything, check to see if the units have started successfully:

$ fleetctl list-units UNIT    STATE   LOAD  ACTIVE  SUB DESC  MACHINE vulcan.service  launched  loaded  active  running Vulcan  7f0e81ab.../10.0.2.15 

If it’s activating and start-pre it means that the docker pull from the ExecStartPre is still running. Check the journal of a unit to confirm this:

$ fleetctl journal vulcan.service May 12 20:59:38 core-01 systemd[1]: Started Vulcan. May 12 20:59:39 core-01 bash[3268]: 2014/05/12 20:59:39 Error: No such container: vulcan1 May 12 20:59:39 core-01 bash[3268]: Unable to find image 'coreos/vulcan' locally May 12 20:59:39 core-01 bash[3268]: Pulling repository coreos/vulcan 

If your terminal supports split panes, it’s useful to run watch -n 10 fleetctl list-units in a new pane so you can keep an eye on what’s happeneing. Feel free to read on while the container downloads.

How Vulcand Works

vulcand has two main concepts, locations and upstreams, that are connected together to serve traffic. A location is a combination of a hostname and a path that can be matched with a regular expression. The location is matched up with an upstream, which is a group of endpoints that are qualified to serve a subset of traffic. All of the examples for manuipulating vulcand in this post will be shown using etcdctl, but an HTTP API and command line tool are also provided.

Zero Downtime Frontend Deploys with Vulcand on CoreOS

Set Up the Location

Before we can start routing traffic, we need to set up the location. In the provided units, this is done in the registration sidekick in order to create the location if no units are already running. Here’s a breakdown of the commands contained in our registration unit. You don’t have to run any of these, they are just for illustration:

This will create the hostname example.com and the path, home, to be matched by the regular expression /.*, which should match all paths:

etcdctl set "/vulcand/hosts/example.com/locations/home/path" '/.*' 

Next, we need to register our container as a member of the upstream example. We’re using the unit name as the unique identifier:

etcdctl set "/vulcand/upstreams/example/endpoints/mixed-register-v1.0.0-A.service" http://10.10.10.10:8086 

The last step is to tell our location to use our upstream:

etcdctl set "/vulcand/hosts/example.com/locations/home/upstream" example 

When one of our containers is stopped we need to deregister it from the upstream:

etcdctl rm "/vulcand/upstreams/example.com/endpoints/mixed-register-v1.0.0-A.service" 

Let’s add all of this together into a working example.

Scenario 1: Rolling Frontend Update

Our first deployment scenario is very common: a rolling upgrade. This strategy is common if the changes that you’re making are hidden through feature flags or aren’t going to harm a user’s experience if they get routed to mixed versions during a session.

Zero Downtime Frontend Deploys with Vulcand on CoreOS

Start Version 1.0.0

Before we start our containers, let’s take a look and see what they’re doing. We’re running our simple web server in example-v1.0.0-*.service and a sidekick registration service in mixed-register-v1.0.0-*.service:

example-v1.0.0-A.service

In the ExecStartPre we’re doing a docker pull in order to prevent the unit from reporting active until the container download is complete. If we didn’t do this, the registration unit would start before the download is complete. TimeoutStartSec=0 disables systemd’s built-in time out when starting a unit, since our docker container is quite large and takes a few minutes to download.

Our unit conflicts with the other 1.0.0 example units in order for them to be spread across machines in the cluster.

[Unit] Description=Example 1.0.0 After=docker.service  [Service] TimeoutStartSec=0 ExecStartPre=-/usr/bin/docker kill example-1A ExecStartPre=-/usr/bin/docker rm example-1A ExecStartPre=/usr/bin/docker pull coreos/example:1.0.0 ExecStart=/usr/bin/docker run --rm --name example-1A -p 8086:80 coreos/example:1.0.0 ExecStop=/usr/bin/docker kill example-1A  [X-Fleet] X-Conflicts=example-v1.0.0-*.service 

mixed-register-v1.0.0-A.service

The registration unit executes the etcdctl commands covered earlier. Including EnvironmentFile=/etc/environment allows us to reference $COREOS_PUBLIC_IPV4 and $COREOS_PRIVATE_IPV4 in our unit. This file is populated on platforms in which CoreOS can determine the network environment (cloud providers, vagrant, etc). RemainAfterExit=yes (docs) will allow this unit to be active even after ExecStart commands run. If it didn’t our ExecStop commands would run immediately afterwards.

[Unit] Description=Register for Example BindsTo=example-v1.0.0-A.service After=example-v1.0.0-A.service  [Service] EnvironmentFile=/etc/environment RemainAfterExit=yes ExecStart=/bin/sh -c "/bin/etcdctl set /"/vulcand/upstreams/example/endpoints/mixed-register-v1.0.0-A.service/" http://$COREOS_PUBLIC_IPV4:8085; /   /bin/etcdctl set /"/vulcand/hosts/example.com/locations/home/path/" '/.*'; /   /bin/etcdctl set /vulcand/hosts/example.com/locations/home/upstream example" ExecStop=/bin/sh -c "/bin/etcdctl rm /vulcand/upstreams/example/endpoints/mixed-register-v1.0.0-A.service"  [X-Fleet] X-ConditionMachineOf=example-v1.0.0-A.service 

Start the two 1.0.0 units and their sidekicks:

$ fleetctl start example-v1.0.0-{A,B}.service mixed-register-v1.0.0-{A,B}.service Job example-v1.0.0-A.service launched on 7f0e81ab.../10.0.2.15 Job mixed-register-v1.0.0-A.service launched on 7f0e81ab.../10.0.2.15 Job example-v1.0.0-B.service launched on 62b05884.../10.0.2.15 Job mixed-register-v1.0.0-B.service launched on 7f0e81ab.../10.0.2.15 

The registration unit will create the location and the upstream and vulcan will automatically start sending traffic to it.

When you load the vulcand container in a browser, you should see a red background and the text 1.0.0. As before, you might need to wait for the container to download.

Zero Downtime Frontend Deploys with Vulcand on CoreOS

Start Version 2.0.0

Suppose that we’ve just implemented a great new feature (a blue background!) and we’re ready to deploy it. First, build and push the updated image to the docker index. Since we’re using the public index this has already been done for you, but here are the commands used:

docker build -t coreos/example:2.0.0 . docker push coreos/example:2.0.0 

Next, we need to launch units that refer to the 2.0.0 tag:

example-v2.0.0-A.service

All references to 1.0.0 have been changed to 2.0.0 and the port has been incremented to allow 1.0.0 and 2.0.0 units to run on the same machine.

[Unit] Description=Example 2.0.0 After=docker.service  [Service] TimeoutStartSec=0 ExecStartPre=-/usr/bin/docker kill example-2A ExecStartPre=-/usr/bin/docker rm example-2A ExecStartPre=/usr/bin/docker pull coreos/example:2.0.0 ExecStart=/usr/bin/docker run --rm --name example-2A -p 8086:80 coreos/example:2.0.0 ExecStop=/usr/bin/docker kill example-2A  [X-Fleet] X-Conflicts=example-v2.0.0-*.service 

Now, start the 2.0.0 units with fleet:

$ fleetctl start example-v2.0.0-{A,B}.service mixed-register-v2.0.0-{A,B}.service Job mixed.example.1.service scheduled to 99375570.../54.81.79.219 Job mixed.example.2.service scheduled to 5dae80fd.../184.73.78.150 Job mixed.register.1.service scheduled to 99375570.../54.81.79.219 Job mixed.register.2.service scheduled to 5dae80fd.../184.73.78.150 

In your browser you should now see the background alternate between red and blue since we’re now running both versions of our container concurrently.

Zero Downtime Frontend Deploys with Vulcand on CoreOS

Stopping Version 1.0.0

To complete our deployment, destroy the old units with fleet:

$ fleetctl destroy example-v1.0.0-{A,B}.service mixed-register-v1.0.0-{A,B}.service Destroyed Job example-v1.0.0-A.service Destroyed Job example-v1.0.0-B.service Destroyed Job mixed-register-v1.0.0-A.service Destroyed Job mixed-register-v1.0.0-B.service 

Load up the browser a few more times and you should only see the blue background. Our deployment from 1.0.0 → 2.0.0 is now complete. To clean up before our next example, destroy the remaining units and remove all of the state from etcd:

$ fleetctl destroy example* mixed-register* Destroyed Job example-v1.0.0-A.service Destroyed Job example-v1.0.0-B.service Destroyed Job example-v2.0.0-A.service Destroyed Job example-v2.0.0-B.service Destroyed Job mixed-register-v1.0.0-A.service Destroyed Job mixed-register-v1.0.0-B.service Destroyed Job mixed-register-v2.0.0-A.service Destroyed Job mixed-register-v2.0.0-B.service 

$ etcdctl rm /vulcand --recursive Cannot print key [/vulcand: Is a directory] 

Scenario 2: Rapid Switch to New Version

Another common deployment scenario is the need to rapidly switch from one version of your frontend to another. For this scenario we’re going to use the same primary units, but different registration sidekicks for 2.0.0. Instead of registering to the same endpoint, we’re going to create one with a different name, example2. Since we need to switch upstreams after our new containers have been deployed, that line has been omitted from our sidekicks.

Zero Downtime Frontend Deploys with Vulcand on CoreOS

switch-register-v1.0.0-A.service

[Unit] Description=Register for Example BindsTo=example-v1.0.0-A.service After=example-v1.0.0-A.service  [Service] EnvironmentFile=/etc/environment RemainAfterExit=yes ExecStart=/bin/sh -c "/bin/etcdctl set /"/vulcand/upstreams/example/endpoints/switch-register-v1.0.0-A.service/" http://$COREOS_PUBLIC_IPV4:8085; /   /bin/etcdctl set /"/vulcand/hosts/example.com/locations/home/path/" '/.*'; /   /bin/etcdctl set /vulcand/hosts/example.com/locations/home/upstream example" ExecStop=/bin/sh -c "/bin/etcdctl rm /vulcand/upstreams/example/endpoints/switch-register-v1.0.0-A.service"  [X-Fleet] X-ConditionMachineOf=example-v1.0.0-A.service 

Start Version 1.0.0

Start the units with fleet:

$ fleetctl start example-v1.0.0-{A,B}.service switch-register-v1.0.0-{A,B}.service Job mixed.example.1.service scheduled to 99375570.../54.81.79.219 Job mixed.example.2.service scheduled to 5dae80fd.../184.73.78.150 Job mixed.register.1.service scheduled to 99375570.../54.81.79.219 Job mixed.register.2.service scheduled to 5dae80fd.../184.73.78.150 

You should see a red background in your browser as the requests are sent to each backend:

Zero Downtime Frontend Deploys with Vulcand on CoreOS

Start Version 2.0.0

Now start the 2.0.0 units:

$ fleetctl start example-v2.0.0-{A,B}.service switch-register-v2.0.0-{A,B}.service Job mixed.example.1.service scheduled to 99375570.../54.81.79.219 Job mixed.example.2.service scheduled to 5dae80fd.../184.73.78.150 Job mixed.register.1.service scheduled to 99375570.../54.81.79.219 Job mixed.register.2.service scheduled to 5dae80fd.../184.73.78.150 

Since we haven’t switched the upstream yet, you should still see the red background.

Switch the Upstream

We need to tell vulcand to reconfigure to use our set of 2.0.0 servers. We do this by switching the upstream by changing the key in etcd. Remember to use the correct hostname:

$ etcdctl set /vulcand/hosts/vulcan.test.company.com/locations/home/upstream example2 example2 

You should now see a blue background as all traffic is shifted to our new containers. The switch is now complete.

Zero Downtime Frontend Deploys with Vulcand on CoreOS

Clean up the 1.0.0 containers:

$ fleetctl destroy example-v1.0.0-{A,B}.service switch-register-v1.0.0-{A,B}.service Destroyed Job example-v1.0.0-A.service Destroyed Job example-v1.0.0-B.service Destroyed Job mixed-register-v1.0.0-A.service Destroyed Job mixed-register-v1.0.0-B.service 

Next Steps

If you’re looking to put a system like this into production, it should be possible to achieve high availability with multiple vulcand containers behind a cloud load balancer, round-robin DNS or another common practice. More complex port management is another issue to tackle if you’re running many containers on your cluster.

More Information

To read the complete vulcand documentation, head over to the GitHub page. Be aware that the current status is “Moving fast, breaking things. Will be usable soon, though.”. More information about advanced unit files and example fleet deployments can be found in the CoreOS docs.

If you have questions, concerns or improvements to this this vulcand workflow, let us know on the CoreOS user mailing list.

no comments
Share