Skip to content

[#8132] Update actions and pages which set "noindex", "nofollow" crawler directives#8223

Open
gbp wants to merge 9 commits intodevelopfrom
8132-no-crawl-headers
Open

[#8132] Update actions and pages which set "noindex", "nofollow" crawler directives#8223
gbp wants to merge 9 commits intodevelopfrom
8132-no-crawl-headers

Conversation

@gbp
Copy link
Copy Markdown
Member

@gbp gbp commented Apr 30, 2024

Relevant issue(s)

Fixes #8132

What does this do?

Update actions and pages which set "noindex", "nofollow" crawler directives

Why was this needed?

Snippets of request content often appear on list pages, and create a whack-a-mole situation when unhappy users find that external search engines have indexed a list page (e.g. /body/foo?page=12) that contains a cached snippet of PII that we've removed from the request page itself.

Implementation notes

@garethrees are you happy with changing the number of paginated pages which are indexed? I'm concerned this might impact search ranking due to newer request pages not being indexed at all.

@gbp gbp added this to the Reduce Admin Burden milestone Apr 30, 2024
@gbp gbp requested a review from garethrees April 30, 2024 08:22
@gbp gbp added the on-staging label Apr 30, 2024
Copy link
Copy Markdown
Member

@garethrees garethrees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth making the dependency on RobotsHeaders explicit in the concerns that use it:

  module ProminenceHeaders
    extend ActiveSupport::Concern
+    include RobotsHeaders

  module PublicTokenable
    extend ActiveSupport::Concern
+    include RobotsHeaders

https://api.rubyonrails.org/classes/ActiveSupport/Concern.html says you can do this (in the intro section of the documentation)

@garethrees
Copy link
Copy Markdown
Member

Reassigning as discussed to make a few tweaks.

gbp added 9 commits March 25, 2026 08:42
Extract out into a common concern included in `ApplicationController` so
this hook/helper is available to all controllers.
Don't allow indexing of:
- New citation page
- New request page
- Similar requests page
- Request details page
- User profile wall
- User profile if suspended
Even though we're setting the response header ensure we set a consistent
value in the meta tag.
Pages after the first shouldn't be crawled, this will help with site
performance.
These actions require a user to be logged in or link to actions which we
don't allow to be indexed and as such there is no reason for search
indexers to follow them.
This attribute is supported by Google and will prevent snippets from
being rendered in Google search results.

This will mean any PII cached is easier to removed from without needing
to issue search engine removal requests.
@gbp gbp force-pushed the 8132-no-crawl-headers branch from a55f6c2 to f228d51 Compare March 25, 2026 09:43
@gbp gbp marked this pull request as ready for review March 25, 2026 12:02
@gbp gbp requested a review from garethrees March 25, 2026 12:02
@gbp
Copy link
Copy Markdown
Member Author

gbp commented Mar 25, 2026

@garethrees This looks good to go to me. I've rebased over the current develop branch so I think we can review this and get it merged if we still want this.

@gbp gbp removed their assignment Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce external search indexing of request list pages

2 participants