• colin@lemmy.uninsane.org
      link
      fedilink
      English
      arrow-up
      14
      ·
      2 months ago

      from my limited experience, about half? i had to finally set up a robots.txt last month after Anthropic decided it would be OK to crawl my Wikipedia mirror from about a dozen different IP addresses simultaneously, non-stop, without any rate limiting, and bring it to its knees. fuck them for it, but at least it stopped once i added robots.txt.

      Facebook, Amazon, and a few others are ignoring that robots.txt, on the other hand. they have the decency to do it slowly enough that i’d never notice unless i checked the logs, at least.