Looking for a nice way to find and spread IP-numbers with Vagrant

Mikael_Svahnberg

Context: I am setting up a lab exercise for a cloud computing course. I want the students to first create a local virtualised cloud (using VirtualBox and Vagrant, with a multi-machine setup), and then make it transparent so that they can just set the "provider" option to e.g. DigitalOcean.

Problem: It seems like it is a common enough requirement that you want your machines to find and be able to communicate with each other. For example, you may have one box running the db, and another the main application. It also seems like you would want to do this with a dynamic IP, especially if you are going cloudy with your solution. However, and sadly, it also seems like there is no support for doing this in a neat and tidy way from inside the Vagrantfile.

My Ugly Hack:

vagrant up
vagrant status, followed by some script-fu to get the machine names so that I can
vagrant ssh <each machine> -c "ifconfig-and-a-lot-of-grepping-to-get-the-ip-numbers"
Put all the thus acquired info in a file that is reachable through the /vagrant share.
vagrant push in order to populate e.g. databases and "stuff".
for each machine do vagrant ssh -c "start.sh <machine-role>" to fire up my application.

This would work until the point where one of the machines reboot and DHCP, in its wisdom, gives me a new IP-number for it. Then I would need some watchdog server with a static IP (or a DNS name) that I can, in my startup scripts go to and flag the list of ip-numbers to be dirty, causing a re-request and reload of it to all machines.

Please, please, please, tell me that there is a better way of doing this?

dkf

@Mikael_Svahnberg said:

Please, please, please, tell me that there is a better way of doing this?

Is there a way to guarantee that the machines are on the same network segment so that they can discover each other on boot? I guess not, but if there is, you can use some sort of zeroconf solution.

Otherwise, the usual way is to have a server at either a known IP address or a known hostname that acts as a directory service, in a general sense. When the other machines boot, they contact the directory service (which they can find because of its fixed location, or at least general findability in DNS) to let it know that they are up, and to get their task handed out to them. You could even deliver the directory service to them via DHCP, except you've probably not got control of that.

If the leaves regularly (once a minute?) update the directory service with their status, it can also act as a point which knows about the status. The directory service probably shouldn't do anything other than manage the other leaves, so that your single point of failure isn't exposed strongly to the outside world (and configuring the firewall to be restrictive on it is likely a good idea, provided you don't screw it up). Slapping nagios or something like that on the directory service is probably wise.

Also, studying this sort of thing is an excellent academic topic for students of how clouds work. It's quite a bit more complex than anything students will probably have encountered so far. ;)

Mikael_Svahnberg

@dkf said:

Otherwise, the usual way is to have a server at either a known IP address or a known hostname that acts as a directory service, in a general sense.

Roughly in the direction I too was going. Are there any best practices/tools for this, or does everyone write their own server for this?

@dkf said:

Also, studying this sort of thing is an excellent academic topic for students of how clouds work. It's quite a bit more complex than anything students will probably have encountered so far.

To say nothing of their poor teacher...

dkf

@Mikael_Svahnberg said:

Are there any best practices/tools for this, or does everyone write their own server for this?

All the tools I've seen for this have tended to be ridiculously complex, given that notification can be done by just ssh directory.server notifyscript $myIP (well, with a bit of tweaking). It's not rocket science.

cartman82

Now that I got you to open up a thread, I'm sad to say I have zero experience with this.

I was looking for a similar thing for my home network and found the best solution to be just a localized DHCP/DNS server (dmasq). That probably won't work in your case, so a directory service is probably the best bet.

If I had to set something up myself (having done zero research into existing solutions), I would do this:

A container with redis server listening on a specific port
Each "child" server running a cron job every minute, where day nmap local network, find redis and write their ip under a specific well-known key
Key has TTL of 2 minutes, so if they go down, they disappear from redis
If a "child" wants to find a different "child", they can look up its well-known-name in redis
Redis port is cached, so you don't have to keep nmapping LAN
If redis goes down, you get error, so run nmap again

Altogether, about 200-300 lines of bash spaghetti and you're done. Couldn't be easier!

Yamikuronue

There's a plugin for vagrant that edits your host file with the ip address of the vagrant so you can use nice URLs. It seems like there ought to be one to do this...

dkf

While that's the sort of zero-config solution I had in mind, I'm not convinced that would work in a cloud; it's probably going to run into problems with firewalls and the fact that the different nodes may not be very close in networking terms. For example, they might be on different network segments within one datacenter. It would also scare me a lot; you'd need to leave things open to attack by the (large number of) nodes in the same datacenter that you're not using.

Unless you can get all your nodes put on a private vlan, but I don't know if the hosting provider will support that.

PJH

Can't you use dyndns or similar...

[pjh@sofa discourse (master)]$ dig thinkpad.dontexist.com

; <<>> DiG 9.9.3-P2 <<>> thinkpad.dontexist.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10668
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;thinkpad.dontexist.com.                IN      A

;; ANSWER SECTION:
thinkpad.dontexist.com. 59      IN      A       172.16.4.87

;; Query time: 61 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Sat Feb 14 13:38:31 GMT 2015
;; MSG SIZE  rcvd: 67

[pjh@sofa discourse (master)]$

Or even run your own DNS server..

[pjh@thinkpad ~]$ dig dev axfr

; <<>> DiG 9.9.3-P2 <<>> dev axfr
;; global options: +cmd
dev.                    604800  IN      SOA     ns.dev. [me@work].com. 4 604800 86400 2419200 604800
dev.                    604800  IN      NS      ns.dev.
1.build.dev.            604800  IN      A       172.16.2.56
2.build.dev.            604800  IN      A       172.16.2.57
3.build.dev.            604800  IN      A       172.16.2.58
4.build.dev.            604800  IN      A       172.16.2.59
1.ha.dev.               604800  IN      A       172.16.103.100
2.ha.dev.               604800  IN      A       172.16.103.105
ns.dev.                 604800  IN      A       82.43.129.224
dev.                    604800  IN      SOA     ns.dev. [me@work].com. 4 604800 86400 2419200 604800
;; Query time: 191 msec
;; SERVER: 82.43.129.224#53(82.43.129.224)
;; WHEN: Sat Feb 14 13:40:25 GMT 2015
;; XFR size: 10 records (messages 1, bytes 269)

[pjh@thinkpad ~]$

Or isn't that the issue?

dkf

@PJH said:

run your own DNS server..

DNS is a good idea, but you still have to put that at a location that is known before the nodes boot. At that point, anything you do will work. For small values of “anything”. :D The fundamental issue is that you have to nominate one special host that knows about everything else, or have a very benign network environment so that machines can auto-discover via broadcast (or multicast, for the very keen) networking techniques. Since clouds tend to lock the network environment down heavily (for reasons of sanity on the providers' behalf) you're stuck with a special node.

IAAS providers often have a convenient way to get a constant IP address for a particular node. It's precisely for allowing this sort of thing that the capability exists.

@PJH said:

Can't you use dyndns or similar...

You almost certainly don't want to pollute dyndns with large numbers of short-lived nodes that general machines don't need access to. Anything that is external-facing might be more reasonable to put in dyndns, or you could get a name binding from elsewhere. Since @Mikael_Svahnberg is at a university, they've probably got their own DNS anyway which they might be able to leverage (subject to tight constraints and some sweet-talking)…

Mikael_Svahnberg

@PJH said:

Or isn't that the issue?

I'm not sure what my issue is. I just want a nice and generic way for my students to set up a collection of machines in a potentially public cloud so that they are able to address each other. I was hoping that there was a SOTA solution for this that I had not found.

At the moment I am leaning towards a variant of what @cartman82 suggested, but set up the redis server on the host/startup machine, since that has a fixed ip (for the duration of this exercise, at least), and then send heartbeats to it from the guests/cloud machines. Also, have a process on the host that polls redis regularly to see that everything is still healthy.