Saturating Shopify: Gadget's strategy for Shopify sync

The ultimate guide to syncing data from Shopify

Contents

Goldilocks zones Enter Temporal: Durable syncs with checkpoints Adaptive rate limiting (Inspired by TCP)

Shopify app developers all contend with one major issue: rate limits. Because Shopify’s APIs are heavily rate-limited, every app must spend a lot of time carefully managing rate limits before it can launch. This is done for you by our full-stack app platform at Gadget, which has a built-in Shopify integration. Our goal is to handle all the infrastructure and boilerplate, including the gnarly bits of rate limit management and data syncing, so you can build useful features instead of fighting APIs. Our main strategy for avoiding rate limit pain is to sync the data you need for your app out of Shopify and into the database of your app. This gives you unrestricted access to a full-fidelity, automatically maintained, extensible copy of the data. You decide how frequently and how much you sync. Sadly, this means that the issue with the rate limit becomes ours instead of yours. We’ve spent many years getting faster and faster at syncing, and recently shipped two big changes we’d like to share:

an in-memory streaming system that consumes data as a buffer independently and pulls data from Shopify as quickly as possible. an adaptive rate limiter that is process-local and based on the AIMD (Additive Increase, Multiplicative Decrease) algorithm from TCP. The end result is faster synchronizations that fill Shopify’s API rate limits without compromising user-facing features or running the risk of 429s. We carried it out as follows: The sync issue There are three purposes for gadget syncs: Historical imports and backfills: For instance, when a shop first installs an app, it pulls in every product, order, and customer to fill the database. Reconciliation means reading the data that has recently been changed over to make sure that no webhooks were missed and fixing bugs. No-webhook models: Some Shopify resources don’t have webhook topics, so scheduled syncs are the only option for copying data out.

In all these cases, developers really care about data latency – if the sync is slow, app users notice missing or mismatched data and complain. However, there are a few obstacles to speedy syncing: Shopify’s rate limits are very low. They just don’t offer much capacity, so you must use what you do get very carefully.

If you hit them too hard, Shopify will block your IP. If you just blindly retry 429 errors as quickly as possible, you might reach a point where Shopify stops responding to your IP addresses, which would break your entire app for as long as the ban is in place. Early on, Gadget had to learn this the hard way. Work in the foreground competes with work in the background: Syncs run while the app is still online and performing any necessary work in response to user actions in the foreground. We want background syncs to go fast, but not so fast that they eat up the entire rate limit and delay or break foreground actions.

The best sync would sustain a nearly-100% use of the rate limit for the entire time it ran, but no more.

Goldilocks zones

Say we’re building a Gadget app to sync product inventory counts to an external system like an ERP. A straightforward sync flow might be: Get a product page from the Shopify API. For each product, run the actions in the Gadget app, which will send an API call to the ERP. Repeat.

This approach has two major problems:

The sync will run very slowly if the ERP system is very slow because we have to wait for it to respond for all of the products before we can move on to the next page of data, which puts performance at risk. The sync may run so quickly that it exceeds the Shopify rate limit, possibly to dangerous levels, if the ERP system is very fast. If foreground work or other Shopify resources are being synced at the same time, we risk an IP ban.

As a result, the following must be our sync strategy’s design criteria: It can move faster and not have to wait for each iteration because the rate at which we can read from Shopify is decoupled from the rate at which we can write to external systems. The rate at which we read from Shopify must be capped according to the current conditions so it doesn’t go too fast.

We have a porridge situation on our hands: not too fast, not too slow, but just right. We implemented this internally by separating the consumer (a Gadget app running business logic) from the data producer (which reads from Shopify). Backpressure streaming To do this decoupling, we built a simple in-memory streaming approach that reads data from Shopify into a queue as fast as it can, and then consumes from that buffer independently.

Here’s how it works:

A while loop adds one page of data at a time to a queue by reading it from Shopify as quickly as possible. Gadget’s infrastructure dispatches each unit of work to your Gadget app to run business logic.

The line grows if a customer falls behind, such as when an external system is slow. Once the queue hits a limit, the producer can’t add more data and is blocked, which prevents excessive rate limit consumption if the consumer is slow.

The producer can spam requests if the rate limit allows, and the consumer can take advantage of Gadget’s serverless autoscaling to process data as quickly as possible within the limits the app has set.

Our response at Gadget is no to the question of whether it is really worth writing each record to a pub-sub queue system solely for this decoupling property. We don’t want or need the pain and expense of running Kafka or Pubsub for these gazillions of records. Instead, we use a Temporal to orchestrate our syncs, and model the buffer as a simple p-queue in memory!

Enter Temporal: Durable syncs with checkpoints

We use Temporal under the hood to run all syncs as complicated, long-running, durable workflows. Each Shopify resource that needs syncing is run as an independent Temporal activity that starts up and is run (and re-run) until the resource has been fully synced. Temporal guarantees that the activity will be restarted somewhere else in the event of a crash, timeout, or the necessity of deploying a new version of Gadget. The durable heartbeat feature of Temporal is then used to keep track of how far along the sync we are. We use the cursor from the Shopify API for a given resource as our sync cursor. When an activity starts back up, it can continue reading from exactly where the last activity left off. If we’re careful to only update this cursor in Temporal after all the items in the queue have been processed, we can safely leave the queue in memory, knowing that if we crash, we’ll rewind and replay from only the most-recently fully completed cursor.

Adaptive rate limiting (Inspired by TCP)

So, we’ve decoupled producers from consumers. The question now is how quickly the producer can safely travel. Our response is that it varies. Instead of trying to set a hard limit for the rate we can make API calls, we built an adaptive rate limiter inspired by TCP congestion control.
We need to be flexible for a few main reasons: Shopify has different limits per store, which you don’t really know ahead of time. Plus, merchants get much higher rate limits, and Enterprise merchants get even higher rate limits after that
The rate limit conditions can change mid-sync, if another unrelated sync starts, or if the app has high foreground rate limit demand all of a sudden
We run syncs in parallel (for example, products + orders + customers), and each synced resource contends over the same limit but takes a different amount of time.

Coordinating a global rate limiter across multiple independent processes in a distributed system is annoying and error-prone, as you need some central state store to share who is asking for what and when. It’s especially complicated when you try to account for different processes starting and stopping and wanting some fair slice of the available limit. Instead, we’d prefer something simpler and, ideally, process-local so that each system participant doesn’t have to talk to everyone else every time it wants to make a call. Luckily, Shopify has implemented a state store for us, over the same communication channel we’re already using! They give us a 429 when we call to let us know if we’re over the limit or not. If we are careful not to spam them, we can use Shopify’s own signal to know if we should raise or lower the process-local rate at which we’re making requests.

This problem is very similar to the classic flow control problem in computer networking, and our solution is entirely copied from that world. Gadget’s syncs now throttle their rate limit using TCP’s AIMD (Additive Increase, Multiplicative Decrease) algorithm:

If things are going well (no 429s), we slowly ramp up request volume.
If we get a 429, we cut back hard (usually by half).
Over time, this converges on the real usable rate limit for this process.

Each process will start experiencing more 429 errors and will reduce its own process local rate to make room for the new process if the real usable rate limiter changes as a result of, for example, a new sync starting and using more resources than before. If that new process finishes, each remaining process will start witnessing more successful requests and ramp their request volume back up to find a new equilibrium. The equilibrium is ever changing, and that’s the point.

Another great feature of AIMD is that high rate limits for Plus or Enterprise merchants are discovered automatically without Gadget hardcoding anything, so even for single system participants, the maximum real rate limit is discovered. For example, if an app is syncing only one resource against only one high-rate-limit store, AIMD will continue to raise that one process’s local rate limit until Shopify starts 429-ing, allowing that one process all the resources Shopify will offer.

Finally, because AIMD can be adjusted, we can set an effective rate limit that is slightly lower than the actual rate limit, allowing room for foreground actions. Here, at https://github.com/gadget-inc/aimd-bucket, you can download our AIMD implementation as open source. Putting It All Together
With this new sync architecture, Gadget apps can:
Ingest Shopify data at the fastest safe rate