Dear readers and customers,
starting this month, we added a new feature to all our web projects: basically we are blocking all AI crawlers. Or at least we try to.
It’s not said that it works or the crawlers will respect our manually integrated rules. However, at least we tried our best.
We do this for two main reasons:
1) you should be in control of your posts and thus your data. If your individual posts are used for alteration, you should know beforehand. Currently this is not given with the methodologies machine learning tools are trained. These just use what they can find on the open web
2) if your work helps in any way for monetisation of individual companies, you should get your portion. Our idea is let’s be fair: 50/50. For every Euro earned with your hard work, you should get at least 50 Cents
Here is the current list of crawlers we try to block as of now:
AI2Bot |
Explores sites for web content that is used to train open language models |
More Info |
---|---|---|
AmazonBot |
Used by Amazon’s Alexa AI to provide AI answers. |
More Info |
AppleBot |
Used by Apple for generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. |
More Info |
Bytespider |
Used by TikTok for AI training. |
More Info |
Cohere |
Used by Cohere to scrape data for AI training. |
More Info |
ChatGPT |
Used by OpenAI to power ChatGPT. |
More Info |
ClaudeBot and Claude-Web |
Used by Anthropic’s Claude. |
More Info |
CommonCrawl |
Compiles datasets used to train AI models. |
More Info |
Diffbot |
Used by Diffbot to scrape data for AI training. |
More Info |
FacebookBot |
Used by Meta (Facebook) for their AI. |
More Info |
Friendly Crawler |
Crawls websites to build datasets for machine learning experiments. |
More Info |
Google Extended |
Used by Google to power Gemini (formerly known as Bard). |
More Info |
ImagesiftBot |
Used by Hive’s Imagesift tool that scrapes images. This may be used for the company’s generative AI product. |
More Info |
Kangaroo Bot |
Used to power the Australia-focused Kangaroo LLM. |
More Info |
Meta-ExternalAgent / Meta-ExternalFetcher |
Used by Meta (Facebook) to train AI products. |
More Info |
OAI-SearchBot |
Used by OpenAI for their SearchGPT product. |
More Info |
Omgilibot |
Used by Omigili to scrape data for AI training. |
More Info |
PerplexityBot |
Used by Perplexity for their AI products. |
More Info |
Scrapy |
Blocks the Scrapy bot (used for scraping websites). |
More Info |
SentiBot |
Blocks SentiOne’s AI-powered social media listening and analysis tools. |
More Info |
Timpibot |
Used by Timpi; likely for their Wilson AI Product. |
More Info |
Webzio |
Used by Webz.io for their social listening and intelligence platforms. |
More Info |
Webzio-Extended |
Used by Webz.io for AI training. |
More Info |
YouBot |
Used by You.com to train AI products. |
More Info |
If you are already a customer (thank you!), we activated it automatically on your website for free. There is no additional cost and there never will be.
If you want to join as a new happy customer, the feature is added automatically when we set up your site. The information is already up to date on the product overview page: https://aethyx.eu/eshop/.
Sorry for the inconvenience. In an ideal world this would never have happened. However we are far from ideal at the moment. Let’s look into the future as things can only get better from here.
Enjoy fall and best wishes,
the aethyx staff