crawlers

No Robots

Dear readers and customers,

starting this month, we added a new feature to all our web projects: basically we are blocking all AI crawlers. Or at least we try to.

It’s not said that it works or the crawlers will respect our manually integrated rules. However, at least we tried our best.

We do this for two main reasons:

1) you should be in control of your posts and thus your data. If your individual posts are used for alteration, you should know beforehand. Currently this is not given with the methodologies machine learning tools are trained. These just use what they can find on the open web

2) if your work helps in any way for monetisation of individual companies, you should get your portion. Our idea is let’s be fair: 50/50. For every Euro earned with your hard work, you should get at least 50 Cents

Here is the current list of crawlers we try to block as of now:

AI2Bot

Explores sites for web content that is used to train open language models

More Info
AmazonBot

Used by Amazon’s Alexa AI to provide AI answers.

More Info
AppleBot

Used by Apple for generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools.

More Info
Bytespider

Used by TikTok for AI training.

More Info
Cohere

Used by Cohere to scrape data for AI training.

More Info
ChatGPT

Used by OpenAI to power ChatGPT.

More Info
ClaudeBot and Claude-Web

Used by Anthropic’s Claude.

More Info
CommonCrawl

Compiles datasets used to train AI models.

More Info
Diffbot

Used by Diffbot to scrape data for AI training.

More Info
FacebookBot

Used by Meta (Facebook) for their AI.

More Info
Friendly Crawler

Crawls websites to build datasets for machine learning experiments.

More Info
Google Extended

Used by Google to power Gemini (formerly known as Bard).

More Info
ImagesiftBot

Used by Hive’s Imagesift tool that scrapes images. This may be used for the company’s generative AI product.

More Info
Kangaroo Bot

Used to power the Australia-focused Kangaroo LLM.

More Info
Meta-ExternalAgent / Meta-ExternalFetcher

Used by Meta (Facebook) to train AI products.

More Info
OAI-SearchBot

Used by OpenAI for their SearchGPT product.

More Info
Omgilibot

Used by Omigili to scrape data for AI training.

More Info
PerplexityBot

Used by Perplexity for their AI products.

More Info
Scrapy

Blocks the Scrapy bot (used for scraping websites).

More Info
SentiBot

Blocks SentiOne’s AI-powered social media listening and analysis tools.

More Info
Timpibot

Used by Timpi; likely for their Wilson AI Product.

More Info
Webzio

Used by Webz.io for their social listening and intelligence platforms.

More Info
Webzio-Extended

Used by Webz.io for AI training.

More Info
YouBot

Used by You.com to train AI products.

More Info

If you are already a customer (thank you!), we activated it automatically on your website for free. There is no additional cost and there never will be.

If you want to join as a new happy customer, the feature is added automatically when we set up your site. The information is already up to date on the product overview page: https://aethyx.eu/eshop/.

Sorry for the inconvenience. In an ideal world this would never have happened. However we are far from ideal at the moment. Let’s look into the future as things can only get better from here.

Enjoy fall and best wishes,
the aethyx staff

Ether spenden // Donate Ether
Bitcoin-Spenden hier akzeptiert ^^

Advertisement
AI Translator
Subscribers
  • 157
Categories
Archives