A Journey Across Virtual Private Servers Into Error-Freedom
Have you experienced a client meltdown before, if not let me tell you how it usually occurs for a developer. The client from hell, calls you up or sends a not so friendly message resulting in the next few hours of your day is focused only on them. Or at least that is how this problem went down.
It was October 4th, 2020 11:54 AM, a text message came to my phone. It was from a client I had previously worked with only four months ago, before the global pandemic.
The Text Message explained there was an “error on the site”. At this time, I had no idea what I was getting into.
I specifically remember setting this client up with Cloudflare due to me wanting to utilize threat detection and analysis. (only a dev would want this stuff, patting myself on the back)
Or even possibly a user error of some kind, maybe relating to their content management system. (front end devs, you would understand that.)
In order to determine what the client's issue was, and to frankly understand what they were talking about in the text message I booted up my workstation on October 5th, 2020 12:05 AM. Without responding back to the text message right away from the client. (a big mistake; I should have calmed down the client first, before solving the technical problem)
The client actually called my phone at
They said, “I have some bad news, the site is down!”
I replied, “… it's not down it! Hold on let me check!”. (really most errors of the web look worst then they are and I wanted to see for myself)
This is usually when any developer (level doesn't matter) receives bad news. Totally determining if they are either competent or just in over their heads.
Personally, I was just confused about why a site would go down after four months of no issues. (there was a reason which I found out at 3:08 AM)
October 5th, 2020 12:17 AM, I read the curl request of the client's website, It returned Cloudflare Error 1033 Ray ID: 9618s2345xwrfgy5
I said to myself, “ Ok maybe it was a Cloudflare outage that will come back online normally”. (another mistake; nothing comes back online when dealing with data and servers other than errors)
I kid you not, I did not want to say it out loud but as I continued to dig further into the issue, I realized how bad this situation was.
I was just getting into “the core issue” at 1:43 AM and the client was about to receive the communication of all time, “We have lost 4 months worth of work on your WordPress Site. Do you happen to have a backup? as I had just checked in my own system and only have one since four months ago.” (..hoping for better news)
Due to privacy, I can not disclose the conversation but we determined during the time of October 4th, 2020 10:20 PM the client had uploaded a 12.3GB video file and crashed the main process of the apache server. The site was down due to a memory limit configuration not properly set up on the server. When the PHP process went down, the Cloudflare Tunnel Argo lost full connection resulting in the error displaying. Not only did the client's site have downtime but the Argo Tunnel used to host the site stayed alive revealed critical infrastructure to visitors due to its error page display.
The memory limit for all running PHP processes on the server was only 256M and the client had overextended the limit with one file.
A few points to make about this situation, pointing blame on a developer or a client is unnecessary. As the developer, you should be able to provide a working solution to your clients and additional services such as maintenance.
At this point, I was the developer and I trusted using Cloudflare Tunnel Argo not as a service process but a background job process on the server. It was this choice, that leads up to a simple PHP process taking the entire system down.
Developers please use this as a lesson to only implement services that you have had yearly success with and always follow your checklist as you could damage client property when using unknown projects within your tech stack.
As the client, you should trust the developer to be on time. However, be considerate of the developer working over hours (in this case until the AM) to resolve downtime or issues. Clients, Please be considerate of your developers and team members, make sure they have the time needed to double, triple check their own projects. Rushing a project does not lead to a successful project.
After I had discovered these issues, and by the time I was able to resolve them. I ended up with a fully working site on October 5th, 2020 3:27 AM, and Continued monitoring throughout the night.
Although that may be impressive, we successfully resolved a site with downtime, it is a sad reality to know that Virtual Private Servers may come at risk for developers of all experience levels. When compared to shared or dedicated hosting servers, VPS, Tech stacks are critical to the success of a project. Even though the issue was not directly related to the server but instead PHP, the worrisome part is the management of your server must be at the level of your client. Can a developer really promise, no downtime? Ideally no… This leads me into investigating alternative tech stacks to stay ahead of the unwanted phone calls and text messages after you have worked on a project with a client.
Hope this helps other developers and clients see the full picture and how a tech stack has great importance when deciding to start and finish a project.