The problem we had
Our team recently encountered an error where an internal web application received a socket timeout when trying to call one of its internally hosted dependencies. Whilst investigating, we found that the application had made successful HTTP calls to the same service, immediately prior to the error.
It was puzzling but I ruled out anything Network related in our investigation given:
- The application could make some requests absolutely fine.
- There was nothing seemingly different about the requests and responses. They were all GET requests that returned a small amount of JSON.
- There weren't any connection errors.
We then found out that (through talking to Ops) that the firewall was blocking the request on the basis that the contents of the request was deemed suspect.
We wasted a lot of time investigating completely incorrect theories based on seemingly sound, but invalid assumptions.
The problem in general
How to ever know if an error is firewall related
Our problem manifested itself as a socket timeout. How would other non http based protocols report a blocking of traffic? I can easily imagine going through the same long learning process for a database, an FTP or an SMTP service.
Confusion is introduced - even if it never fails again
Let's assume that these problems are addressed and the logic is updated to handle the legitimate requests. Let's also suppose that the firewall never blocks a genuine request again. When a socket timeout error is encountered, we could now point the finger at the firewall when we should be focusing on the application.
I'd advocate a simple firewall for internal traffic that whitelists IPs and ports only. If we can be certain that the firewall completely trusts traffic based on an established TCP/IP connection, things will become a lot simpler to debug.
The simpler to debug, the quicker you fix your site in an emergency! Time is of the essence.
If you must...
If there is an absolute requirement that these firewall rules are in place, confusion can be mitigated by performing the following:
- Ensure that all developers are aware of how firewall issues may present themselves.
- Provide a console for everyone to easily see if traffic is being blocked by the firewall or not.