Troubleshoot Microservice Networks
There are several advantages to using microservices. As one might expect, they also bring some additional challenges. For us, one of those challenges is increased network complexity. All of our microservices have to talk to each other, they frequently do this via REST API’s. Sometimes, this communication doesn’t work as expected.
There are a few questions I ask myself when I suspect foul network play. Here they are with the tools I use to answer them. I have found these tools to be indispensable when troubleshooting our microservice deployment. It may be important to know that I am running these commands on CentOS 6.9.
Question 1
Is the service actively listening on the port I expect it to be? To answer this
question, I use netstat
with a few options.
$ sudo netstat -anp | grep <port number>
tcp 0 0 0.0.0.0:3000 0.0.0.0:* LISTEN 15241/puma 3.9.1
Note: if your user owns the process in question, there is no need for sudo
If there is no output from the above command while the service is running, I’ve
narrowed the search for my problem to the port number I expect to get traffic
in on. It’s probably misconfigured somewhere. If I do get the output I expect,
I like to stop my service then run the netstat
command again to make sure it
goes away. This sanity test proves it is, in fact, my service that is listening
on the given port.
Question 2
Am I receiving any traffic on the port I expect? tcpdump
is my tool of choice
to answer this question.
$ sudo tcpdump -i any port <port number>
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
...
Leave that running while the requesting service makes its queries. If there is no additional output from the above command when a request should be coming in (similar to the example output below), my search for the problem is narrowed to the service making the request on the other end or the network itself. While I enjoy blaming the network as much as the next guy, it’s usually a misconfigured target IP address, hostname, or port in the other service.
...
21:28:08.257231 IP 192.168.1.2.56284 > 192.168.1.15.hbci: Flags [S], seq 3502080001, win 65535, options [mss 1460], length 0
21:28:08.257279 IP 192.168.1.15.hbci > 192.168.1.2.56284: Flags [S.], seq 1941830594, ack 3502080002, win 14600, options [mss 1460], length 0
...
Question 3
Is a firewall rule getting in the way? On our servers, the answer to this
question comes from iptables
.
$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT tcp -- anywhere anywhere tcp dpt:XmlIpcRegSvc
ACCEPT icmp -- anywhere anywhere
ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:ssh
ACCEPT udp -- anywhere anywhere multiport dports snmp,snmptrap
ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:http
ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:webcache
ACCEPT tcp -- anywhere anywhere tcp dpt:1234
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited
Chain FORWARD (policy ACCEPT)
target prot opt source destination
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
ACCEPT udp -- anywhere anywhere udp spt:10053
Finding the rule that is getting in my way can be a bigger task than I want to tackle immediately. The quickest (and most fool proof) way to find out if it is the firewall causing me grief, is to temporarily turn it off.
NOTICE: I only do this in non-public, staging or dev environments. I would not turn off my firewall on production servers or any other server publicly accessible, even temporarily. Please take the time to understand the risk when playing with your firewall settings.
With the disclaimer out of the way, this is how I do it on our slightly older versions of Red Hat:
$ sudo service iptables stop
Then I confirm it is no longer in my way:
$ sudo iptables -L
Table: filter
Chain INPUT (policy ACCEPT)
num target prot opt source destination
Chain FORWARD (policy ACCEPT)
num target prot opt source destination
Chain OUTPUT (policy ACCEPT)
num target prot opt source destination
An Interesting Note
The tcpdump
tool will show traffic as it comes in, before it goes through
the firewall. The upshot to this point is, just because you see traffic coming
in from tcpdump
does not mean it is getting to your service. It could still
be getting stopped by the firewall.
Wrap Up
That’s it. A few simple commands to ensure the network side of your services are up and running correctly. It’s great when everything works. When Murphy comes to visit, however, it helps to have some good tools at your disposal. If you’ve got a network tool that regularly makes your life easier, please let me know with the contact form below.