Intranet not working

Hello:

Is there any reason why the Intranet might suddenly stop working while the rest of the sites do work? This’s been happening to me for a few days.

Background info: I have devilbox set up in a VPS with CentOS7. I have two installations, one with v.1.6.1, one of them using PHP 5.4 and the other with PHP 7.2. They (and their virtual hosts) are set up to work on different ports.

Anyway: everything used to work perfectly until a few days ago, when suddenly the intranets stopped working. When trying to access them, I get the login screen, and after typing the password, it just times out until the browser shows a “Service unavailable” message. The Apache logs say:

[Mon Apr 06 13:13:05.852473 2020] [proxy_fcgi:error] [pid 810:tid 140647142995712] [client X.X.X.X:64591] AH01067: Failed to read FastCGI header
[Mon Apr 06 13:13:05.852613 2020] [proxy_fcgi:error] [pid 810:tid 140647142995712] (104)Connection reset by peer: [client X.X.X.X:64591] AH01075: Error dispatching request to :8043: 

And the PHP-FPM logs say:

[06-Apr-2020 13:13:05] WARNING: [pool www] child 17588, script '/var/www/default/htdocs/index.php' (request: "GET /index.php") execution timed out (124.014726 sec), terminating
[06-Apr-2020 13:13:05] WARNING: [pool www] child 17588 exited on signal 15 (SIGTERM) after 124.018236 seconds from start
[06-Apr-2020 13:13:05] NOTICE: [pool www] child 17837 started

Stopping and restarting the containers doesn’t help. Rebooting the VPS hasn’t helped either. This happens with both the PHP 5.4 and the PHP 7.2 devilboxes. The odd thing is that the virtual hosts still work perfectly fine, not even with lag or anything.

Is there any other place where I can look to see what’s going on inside PHP-FPM? Any other logs?

Anyone? I just tried shutting down the containers and rebooting the server again, and it still doesn’t work.

I’ve just cloned a new devilbox repo (v1.7.0 - 2020-03-24) and on the .env file only switched to PHP7.2, and started all containers. Everything is working fine here. I am no network expert, but maybe something with the custom ports and them not tabeing correctly forwarded? (as the log refers a proxy error?)

The logs are be in the log/ directory on your host system. They are also separate per version. See if you can find something interesting there and post it here. I don’t have any other idea at the moment.

Yep, those are the logs that I pasted above. I’ll look into PHP-FPM, to see if there’s a way to run it in “debug mode” or something and make it print more logs about whatever it’s going on internally.

I understand that the “proxy” part of the log refers to how Apache interacts with PHP-FPM. As I mentioned before, the virtual hosts work perfectly. The only thing that doesn’t is the intranet control panel, and even that worked fine until a few days ago.

Maybe I’ll try nuking my installation and starting over, as a last resort.

Okay, I just tried upgrading one of my devilboxes to 1.7.0… and it still does the same thing. The virtual hosts work, but the intranet gets stuck.

At this point, I’m starting to think that there’s some firewall rule that I’ve inadvertently inserted and is blocking the devilbox intranet from accessing the different containers to see if they are up. Just so that I know, how does devilbox do this exactly? Which connections does it open and which ports does it use?

Okay, I finally found out what was going on.

I wasn’t starting the entire devilbox stack; instead, I was starting just the services I needed:

docker-compose up -d httpd php mysql mailhog

I found out that, instead, if I started all of them (docker-compose up -d), then everything worked without problems. Apparently, the intranet was timing out waiting for the nameserver to respond with the IP addresses of the services that hadn’t been started. The logs of the “bind” container say this when starting the devilbox:

[INFO] Debug level: 2
[INFO] BIND logging: to stderr via Docker logs
[INFO] Using default DNS TTL time: 3600 sec
[INFO] Using default DNS Refresh time: 1200 sec
[INFO] Using default DNS Retry time: 180 sec
[INFO] Using default DNS Expiry time: 1209600 sec
[INFO] Using default DNS Max Cache time: 10800 sec
[INFO] Adding wildcard DNS: *.(mysite) -> 127.0.0.1
zone (mysite)/IN: loaded serial 1586898389
OK
[INFO] Not adding any extra hosts
[INFO] DNSSEC Validation: no
[INFO] Adding custom DNS forwarder: 8.8.8.8,8.8.4.4
[INFO] Starting BIND 9.11.3
14-Apr-2020 21:06:29.405 managed-keys-zone: loaded serial 0
14-Apr-2020 21:06:29.412 zone 0.in-addr.arpa/IN: loaded serial 1
14-Apr-2020 21:06:29.421 zone 127.in-addr.arpa/IN: loaded serial 1
14-Apr-2020 21:06:29.422 zone 255.in-addr.arpa/IN: loaded serial 1
14-Apr-2020 21:06:29.423 zone (mysite)/IN: loaded serial 1586898389
14-Apr-2020 21:06:29.425 zone localhost/IN: loaded serial 2
14-Apr-2020 21:06:29.425 all zones loaded
14-Apr-2020 21:06:29.425 running
14-Apr-2020 21:06:29.426 zone (mysite)/IN: sending notifies (serial 1586898389)

Then, once I open my browser and try to log into the intranet, I get this:

14-Apr-2020 21:07:05.596 client @0x7efd680898b0 172.16.238.10#36742 (pgsql): query: pgsql IN AAAA + (172.16.238.100)
14-Apr-2020 21:07:05.597 client @0x7efd68056950 172.16.238.10#47779 (pgsql): query: pgsql IN A + (172.16.238.100)
14-Apr-2020 21:07:05.620 client @0x7efd6848c030 172.16.238.10#40849 (pgsql.openstacklocal): query: pgsql.openstacklocal IN AAAA + (172.16.238.100)
14-Apr-2020 21:07:05.621 client @0x7efd680898b0 172.16.238.10#59432 (pgsql.openstacklocal): query: pgsql.openstacklocal IN A + (172.16.238.100)
14-Apr-2020 21:07:05.621 resolver priming query complete
14-Apr-2020 21:07:05.653 client @0x7efd68056950 172.16.238.10#46031 (pgsql.ovh.net): query: pgsql.ovh.net IN AAAA + (172.16.238.100)
14-Apr-2020 21:07:05.653 client @0x7efd6848c030 172.16.238.10#43289 (pgsql.ovh.net): query: pgsql.ovh.net IN A + (172.16.238.100)

It looks to me as if it’s timing out waiting for the IP address of the “pgsql” host.

The questions I have now are:

  • It used to work. I used to start only the services I needed and never had any problems. Why did it suddenly start breaking?
  • I’d rather not start the entire stack. I have a small VPS, I need to run several devilboxes and I’d rather not have containers that I won’t be using (Redis, etc.) take up resources.

Is there any solution for this?