The time when Shopify and EC2 disliked each other

TheChetan1 pts0 comments

EC2 and Shopify dislike each other - by Chetan Vashisht

Terminal

SubscribeSign in

EC2 and Shopify dislike each other<br>Dev Tales #2: Thoughts of a junior engineer<br>May 25, 2026

Share

Back in 2019, I was a junior software engineer at Turmswear. One blissful morning while giving my updates in the standup, our packer, stormed in yelling that there were no orders placed in the last fifteen minutes.<br>To understand what the yelling was about we need to start with the setup:<br>There’s a monolith server (in ap-south-1 AWS EC2) that holds the business logic for dealing with courier partners (order fulfillment), loyalty program, return & exchange and taxation.

We use Shopify for product and order management. Shopify communicated with our backend via webhooks. We updated Shopify regularly via REST apis.

Our frontend was a regular Shopify Store.

The Issue

We began debugging immediately, starting with Shopify. Fortunately all the orders were being placed correctly. Next I moved on to the application logs. Something seemed off. We were getting updates from Shopify, but we were unable to update Shopify from our end.<br>More specifically:<br>Shopify notifies our monolith server about orders placed, order updates etc.

Monolith process orders with business logic.

Monolith updates Shopify order metadata. This update call to Shopify started failing. This call was critical for the correct functioning of our business.

Debug Logs

Here’s my logs from Jun 1, 2019 exactly as I posted it Stackoverflow:<br>Everything was working fine till yesterday/today midnight. But today we are unable to access shopify REST apis from our ec2 instance located in Bombay (ap-south-1). The dns resolves correctly to the shopify shop:<br>[ec2-user@ip-172-31-12-194 ~]$ dig turms.myshopify.com

; > DiG 9.8.2rc1-RedHat-9.8.2-0.68.rc1.58.amzn1 > turms.myshopify.com<br>;; global options: +cmd<br>;; Got answer:<br>;; ->>HEADERHitting the shop for any REST apis doesn’t work:<br>[ec2-user@ip-172-31-12-194 ~]$ curl -vX GET https://turms.myshopify.com/admin/api/2019-04/orders/metafieldId/metafields.json -H 'Accept: */*' -H 'Authorization: Basic Auth'<br>* Trying 23.227.63.64...<br>* TCP_NODELAY set<br>* connect to 23.227.63.64 port 443 failed: Connection timed out<br>* Failed to connect to turms.myshopify.com port 443: Connection timed out<br>* Closing connection 0<br>curl: (7) Failed to connect to turms.myshopify.com port 443: Connection timed outWhy are shopify calls failing from inside the ec2 instance? Restarting the server, flushing cache and bringing up a new machine have give me no results so far. Any help is appreciated.

Reactions

I wanted to get to the bottom of this. I sshed to EC2 monolith server and looked through application the logs and searching for clues. I could see that calls to Shopify were always timing out.<br>One of the other junior engineers checked the status pages on Shopify and EC2, neither was reporting any downtime.<br>My manager created a new EC2 instance in another region to see if the failure was localized. They were able to successfully connect to Shopify and make REST api calls from there. My manager suggested we create a new proxy server in the EC2 instance to route our requests to Shopify.<br>Since he was the incident commander and the far more experienced engineer here, we went ahead with his suggestion. I remember at the time thinking, “the proxy server will work, but won’t debugging will get it done much faster?”. I was disappointed that he din’t explain the rational behind his decision. In hindsight though, it was clearly the right choice.<br>Resolution

We started a node proxy server on the new EC2 instance in another region.

We updated our monolith application to hit the new proxy server instead of calling Shopify and deployed the changes.

The whole change with debugging took us between 90-120 minutes from identification of the problem to deployment. Fortunately, the proxy server solution worked and we were able to mitigate the issue successfully.<br>The next day, I received this response from Shopify:<br>This is to let you know that early morning today there was a connectivity issue across Shopify's platform and this email is to inform you that your stores have been recovered, we understand that situations like this impact you, your business and your teams. The internet-wide network outage affected several services, including Shopify. Once the network has resorted please know our team worked to get your store online as soon as possible. In the coming days, we will work to fully understand how this widespread Internet infrastructure failure affected our platform.

I don’t recall ever finding out what Shopify’s failure was.<br>Learnings

This was the first time I learnt that the network is unreliable. It doesn't fail often, but when it does - it’s very difficult to prepare for. Although both Shopify and EC2 were independently working, their bridge wasn’t reliable.

This was a good fix to the problem because the primary role of a software engineer during an incident is to...

shopify server from monolith orders instance

Related Articles