How I Trapped a Facebook Crawler in a Pagination Loop
I recently stumbled upon a funny behavior on my blog’s backend. Due to a small architectural quirk, I accidentally built an infinite loop that kept a crawler busy for hundreds of pages, all because of how I handled non-existent content.
The Setup
My pagination was structured simply: /posts?page=x. Usually, when a user requests a page that doesn’t exist (e.g., page 500), the server should return a 404 Not Found.
However, my frontend UI was designed to handle empty states gracefully. Instead of a 404, the server returned a 200 OK and let the frontend display a "No entries found" message within the standard layout.
The Result: An Infinite Loop
Because the crawler (in this case, Meta’s externalagent) received a success code every time it incremented the page number, it assumed there was more content to discover. It just kept going.
The Logs:
speculumx.at 57.141.20.40 - - [13/Mar/2026:13:27:01 +0000] "GET /blogpost/all?page=538 HTTP/1.1" 200 6463 "-" "meta-externalagent/1.1"speculumx.at 57.141.20.50 - - [13/Mar/2026:13:27:19 +0000] "GET /blogpost/all?page=539 HTTP/1.1" 200 6463 "-" "meta-externalagent/1.1"speculumx.at 57.141.20.49 - - [13/Mar/2026:13:27:39 +0000] "GET /blogpost/all?page=540 HTTP/1.1" 200 6463 "-" "meta-externalagent/1.1"...speculumx.at 57.141.20.28 - - [13/Mar/2026:13:28:50 +0000] "GET /blogpost/all?page=544 HTTP/1.1" 200 6463 "-" "meta-externalagent/1.1"
The crawler reached page 544 before I decided to "liberate" it by manually switching the logic to return the liberating 404:
speculumx.at 57.141.20.8 - - [13/Mar/2026:13:29:09 +0000] "GET /blogpost/all?page=545 HTTP/1.1" 404 5234 "-" "meta-externalagent/1.1"
Turning a Bug into a Bot-Trapping Feature
This unintended behavior reveals a cool technique for tying up undesired scrapers. By feeding them "fake" success codes and suggesting further pagination in the HTML, you can keep a bot occupied indefinitely.
Create a hidden logic in your template like this:
<% for (let i = 1; i <= pagination.totalPages + 1; i++) { %> <li class="page-item"> <a class="page-link" href="/posts?page=<%= i %>">Next</a> </li><% } %>
By always hinting that there is "one more page" than actually exists, and serving that page with a 200 status code, the bot enters a resource-draining cycle.
A Word of Caution
While "tarpitting" bots this way is satisfying, it’s a double-edged sword:
Crawl Budget: Legitimate search engines might penalize you for "Soft 404s" (200 OK on empty pages).
Server Load: Every request still hits your server. To truly punish a bot without hurting yourself, you should combine this with a time delay (deliberately slow responses) to maximize the bot's wait time while minimizing your CPU usage.
Content Variation: If every page has the exact same byte size (e.g., 6463 bytes in my logs), bots will eventually detect the pattern and stop. To prevent this, inject dynamic "noise"—randomly generated strings or changing UI elements—to ensure the checksum of each page is unique.
I don't know if you'll find this useful or beneficial, but I always find these kinds of coincidences funny and interesting.
💬 Kommentare 0
Kommentar schreiben
💭 Diskussion
💭 Noch keine Kommentare vorhanden. Sei der erste, der kommentiert!