Skip to content

When Green Lies: Operational Blind Spots in the Ceph Dashboard#1004

Open
Asif-rehman012 wants to merge 1 commit intoceph:mainfrom
Asif-rehman012:main
Open

When Green Lies: Operational Blind Spots in the Ceph Dashboard#1004
Asif-rehman012 wants to merge 1 commit intoceph:mainfrom
Asif-rehman012:main

Conversation

@Asif-rehman012
Copy link
Copy Markdown

This blog post explores operational blind spots in the Ceph Dashboard,
focusing on active releases in 2026 (Squid and Tentacle).

It explains why HEALTH_OK and green UI indicators can mask:

  • localized failures
  • upgrade-time inconsistencies
  • scheduler throttling
  • security context issues

The post emphasizes when operators must rely on CLI tools,
logs, and metrics instead of dashboard summaries.

@ceph-jenkins
Copy link
Copy Markdown

Can one of the admins verify this patch?

@Asif-rehman012
Copy link
Copy Markdown
Author

Asif-rehman012 commented Feb 3, 2026

please review @dang @andrewschoen @Thingee

@ideepika ideepika requested review from afreen23 and nizamial09 March 30, 2026 14:45
@anthonyeleven
Copy link
Copy Markdown
Contributor

@Asif-rehman012 Feel free to tag me for review too.

---
title: "When Green Lies: Operational Blind Spots in the Ceph Dashboard"
date: "2026-02-10"
author: "Aasaf Rehman"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a different spelling compared to your GitHub handle, is that appropriate?

Copy link
Copy Markdown
Contributor

@anthonyeleven anthonyeleven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. My usual nits. Take a look at my suggestions then we're good to merge.


## Introduction

The Ceph Dashboard has become the default operational interface for most modern
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would take out "most".


The Ceph Dashboard has become the default operational interface for most modern
Ceph clusters. With cephadm-managed deployments, OAuth 2.0 authentication,
integrated Prometheus and Grafana access, and multi-cluster visibility, the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/access/observability/


![Ceph Dashboard Overview](images\dashboard-landing-page.webp)

*The Ceph Dashboard presenting a healthy cluster view — a familiar starting point for many production incidents.*
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

: instead of - please


This blog explains why a green dashboard does **not** guarantee a healthy or
operationally safe Ceph cluster. Focusing on **active releases in 2026 —
Squid (19.x) and Tentacle (20.x)** — we examine persistent dashboard blind spots,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Focusing on Squid and Tentacle


## The Role of the Ceph Dashboard

The Ceph Dashboard is implemented as a `ceph-mgr` module. It does not generate
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ceph Manager

- Old MGR assumptions
- Mixed daemon versions

In this phase, the dashboard may:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ceph Dashboard


## Blind Spot #4: Monitoring Without Diagnosis

With Tentacle, **Prometheus, Grafana, and Alertmanager** are fronted by the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest taking out the double asterisks.


## Blind Spot #5: Scheduler and Throttling Behavior

Ceph’s **mClock scheduler** dynamically balances resources between:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


## Conclusion

The Ceph Dashboard in Tentacle is powerful, polished, and indispensable.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd s/in Tentacle//


The Ceph Dashboard in Tentacle is powerful, polished, and indispensable.

But no UI — regardless of sophistication — can fully represent the complexity
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd s/-/,/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants