Question Details

No question body available.

Tags

javascript sitemap robots.txt

Answers (4)

November 2, 2025 Score: 1 Rep: 180,225 Quality: Low Completeness: 60%

It is flagged because it is invalid

https://developers.google.com/search/docs/crawling-indexing/robots/robotstxt

If you want to keep it, comment it

# Content-signal: search=yes,ai-train=no

User-agent: * Disallow:

or serve it as an https header

December 28, 2025 Score: 1 Rep: 517 Quality: Low Completeness: 0%

Placing it as a https header will do no good. AI crawlers (the major ones) will look at robots.txt NOT a https header for rules. See my official answer.

December 24, 2025 Score: 0 Rep: 678 Quality: Low Completeness: 40%

in my case i got it after checked on cloudflare to disable ai crawler my site . but i dont know whats next step .

which tells bots and crawlers how they should interact with your site.

Instruct AI bot traffic with robots.txt
December 28, 2025 Score: 0 Rep: 517 Quality: Low Completeness: 70%

@mplungjan states in his reply to add Content-signals as a header. That will do almost no good. Major AI companies have not stated that they will read headers as rules and follow your request. Only the robots.txt file. See the Cloudflare document below.

This is the "best" solution for now. First, AI, like search engines, DO NOT have to respect your robots.txt file as most know. There are no legal laws that they have to respect it. The major ones do. Just like the major AI companies have agreed to respect robots.txt. Most robots.txt are perfect. Having it that Content-signal in your robots.txt gives a 6 point reduction. Bringing you down to 92. Which is still in the green zone. It's a trade-off.

What should you do?

Cloudflare has a dedicated website to Content-Signals: https://contentsignals.org/

On the very first page, scroll down just a bit and there is a large area that says "Get In Touch". There is an email address to Content-signals. Email Content-signals, aka., Cloudflare asking them to work with Google to add this rule to Lighthouse. The following is the email I sent:

I'd like to put in a request if I may please. I use contact signals in my robots.txt. It's an amazing feature, and thank you for all your hard work on it. However, Google, specifically, Google Lighthouse, when you run reports on your site dings websites by 6 points on SEO if they have a Content-signal: in their robots.txt. As a single developer placing a request to Lighthouse holds very little weight. You, being Cloudflare, have a lot more weight when it comes to Google. You're welcome to forward my request to Google. I wanted to ask if you could work with Google on requesting that they update their rules in Lighthouse to respect Content-signals?

Thank you!

On the Lighthouse Official Github repo (https://github.com/GoogleChrome/lighthouse/) there is a bug report already for this. I just checked on the update and it seems that a fix has been applied and just waiting for approval to merge into the official code.
https://github.com/GoogleChrome/lighthouse/issues/16776

To learn more, Cloudlfare has an article about Content-signals: https://blog.cloudflare.com/content-signals-policy/

  • Please excuse spelling & grammar mistake. I have severe dyslexia. I spell words backwards at times that not all speller checks are able to detect.