Do You Know Where Your Logs Are?
Often times, when debugging applications, a number of problems show up simultaneously. Determining the order to attack each problem can be difficult to do. Despite the challenge, it is worth giving it some good thought. It could save hours of your valuable time. Here is a situation where I got it wrong.
The Setup
We use Supervisord to manage our processes. Our main reason for using this tool is that it runs in user space (no root access needed). This is great when we have to manage our microservices on servers that we don’t have sudo privileges on.
Another benefit of Supervisord is that it gives easy control over logs and
their rotation. The only caveat to our process is that we need to ensure that
the logs of our various services are written to stdout
and stderr
.
While deploying and testing several new microservices to our staging environment, I ran into a set of small problems. These problems took far more time to debug than they should have due to misunderstandings about what the logs were showing me, and an incomplete picture of what was going on.
The Journey
I got each of our services deployed, configured, and running on our staging
servers. With an expectation that everything might not run perfectly right out
of the gate, I started up each service and began tailing their respective logs:
tail -f logs/*.log
. When I ran my test, I noticed there were no queries
making it to the receiving service (as evident by the log).
=> Booting Puma
=> Rails 5.1.1 application starting in development on http://0.0.0.0:3000
=> Run `rails server -h` for more startup options
Puma starting in single mode...
* Version 3.9.1 (ruby 2.4.0-p0), codename: Private Caller
* Min threads: 5, max threads: 5
* Environment: development
* Listening on tcp://0.0.0.0:3000
Use Ctrl-C to stop
I began debugging the code of the first service in the chain. The settings appeared to be correct. The code was executing without exception. And the network calls were being made with the right data. All signs showed that the calling service was, in fact, working correctly. Time to look into the second service.
In the receiving service, the log showed the server startup and the debug print statements I littered throughout the code. I did not, however, see any requests being logged. Strange. I had already determined that the requests were making it here so, what gives? I was relatively confident that all requests were logged in our older Rails 4 applications (including hitting non-existent endpoints). But I wasn’t certain about these new Rails 5 API applications. I assumed that hadn’t changed, but I wasn’t sure. Oh well, I thought, I’ll tackle this problem once I get the services working.
We use Nix to manage and deploy our packaged services. Because the Nix store is read only, we had to jump through a few hoops to get our Rails API services to run. Some files had to be copied to a location that could be written to, the rest could be linked back to the Nix store. I began to suspect these linked files were causing me grief.
After spending some time experimenting, I learned that my suspicions were wrong. I also made some mistakes during the exploration that led me down fruitless paths (eg. querying wrong routes). Eventually, it became clear that I really needed to find the missing query logs to move forward.
The Eureka Moment
It finally occurred to me to look for another log that might not be getting
captured by Supervisord. I found it quickly in the Rails project folder. Within
it, lay all of the priceless information I needed to solve the world’s
problems. Well, at least my current tiny world of deploying these services. The
Puma server was logging to standard out and my puts
statements were making it
to standard out. But the Rails logs were being written to a file.
Armed with better knowledge, a quick search revealed the solution. Rails 5
needs to be configured to log to standard out in production. Within
config/environments/production.yml
there is a block of code that looks like
this:
if ENV["RAILS_LOG_TO_STDOUT"].present?
logger = ActiveSupport::Logger.new(STDOUT)
logger.formatter = config.log_formatter
config.logger = ActiveSupport::TaggedLogging.new(logger)
end
In our case, we didn’t want to use the environment variable, so we removed the
if
statement. All of those glorious logs were now where we needed them to be
and the problem with the service communication was quickly resolved.
The Lesson
I spent a lot of time debugging problems that could have been quick and easy. Without those missing logs, I followed time consuming paths of debugging. Ultimately, I focused on the wrong problems first. It is important to know where all logs are being written. If I had put that at the top of the debugging priority list, I would have finished the deployment to staging in half the time it ended up taking. Learn from my mistake, know where your logs are.