module: add listener_max_connection_lifetime_ms option by dethi · Pull Request #5583 · emissary-ingress/emissary

dethi · 2024-02-26T12:22:00Z

Description

Add listener_max_connection_lifetime_ms, similar to how listener_idle_timeout_ms is implemented. This limit the maximum age of a downstream connections by adding max_connection_duration to the listener's common_http_protocol_options.

This option could already be set on the clusters (upstream connections), but setting it on the downstream connections could be useful in multiple scenarios:

to periodically rebalance the load between multiple emissary-ingress, especially when most of the clients are using HTTP2/gRPC and connection stick for a really long time.
when using mTLS (client certificate), to ensure that clients reconnect with their renewed certificate, as Envoy doesn't close the connection when the certificate expires.

Related Issues

Fix #2900

Testing

Added automated tests
Not yet deployed in a real cluster

Checklist

Does my change need to be backported to a previous release?
- No, this is a new feature.
I made sure to update CHANGELOG.md.

Remember, the CHANGELOG needs to mention:
- Any new features
- Any changes to our included version of Envoy
- Any non-backward-compatible changes
- Any deprecations
This is unlikely to impact how Ambassador performs at scale.

Remember, things that might have an impact at scale include:
- Any significant changes in memory use that might require adjusting the memory limits
- Any significant changes in CPU use that might require adjusting the CPU limits
- Anything that might change how many replicas users should use
- Changes that impact data-plane latency/scalability
My change is adequately tested.

Remember when considering testing:
- Your change needs to be specifically covered by tests.
  - Tests need to cover all the states where your change is relevant: for example, if you add a behavior that can be enabled or disabled, you'll need tests that cover the enabled case and tests that cover the disabled case. It's not sufficient just to test with the behavior enabled.
- You also need to make sure that the entire area being changed has adequate test coverage.
  - If existing tests don't actually cover the entire area being changed, add tests.
  - This applies even for aspects of the area that you're not changing – check the test coverage, and improve it if needed!
- We should lean on the bulk of code being covered by unit tests, but...
- ... an end-to-end test should cover the integration points
I updated DEVELOPING.md with any any special dev tricks I had to use to work on this code efficiently.
The changes in this PR have been reviewed for security concerns and adherence to security best practices.

dethi · 2024-02-26T12:32:58Z

Two questions:

any wording recommendation for the CHANGELOG?
how can I update the doc?

rgs1 · 2024-10-08T14:36:04Z

@dethi any chance you could rebase this? It might be nice to support this on upstream connections too, since the same type of issues apply there as well (e.g.: stickiness, ensuring mTLS still works, etc). Thanks!

rgs1 · 2024-10-09T16:24:11Z

@kflynn any chance we could get this one reviewed/merged? It solves a couple of issues we've seen in prod...

dethi · 2024-10-09T18:05:19Z

Upstream option already exist today.

I can try to rebase, but I think I was missing something in this change as well for the CRD side of things. My testing kind of stopped when there wasn’t much maintainers activity on PR.

But I can try to get back to it. Would be nice to have the CI test running to see what’s missing.

rgs1 · 2024-10-09T22:55:32Z

Upstream option already exist today.

Oh nice, missed that (https://github.com/emissary-ingress/emissary/blob/master/python/ambassador/envoy/v3/v3cluster.py#L122 for others wondering where it is).

I can try to rebase, but I think I was missing something in this change as well for the CRD side of things. My testing kind of stopped when there wasn’t much maintainers activity on PR.

Yeah maybe the commit that added support for upstream is a good reference wrt what is needed for CRDs:

69bdabf

But I can try to get back to it. Would be nice to have the CI test running to see what’s missing.

cc: @kflynn or @cindymullins-dw to kick CI 🙏

Add listener_max_connection_lifetime_ms, similar to how listener_idle_timeout_ms is implemented. This limit the maximum age of a downstream connections by adding max_connection_duration to the listener's common_http_protocol_options. This option could already be set on the clusters (upstream connections), but setting it on the downstream connections could be useful in multiple scenarios: - to periodically rebalance the load between multiple emissary-ingress, especially when most of the clients are using HTTP2/gRPC and connection stick for a really long time. - when using mTLS (client certificate), to ensure that clients reconnect with their renewed certificate, as Envoy doesn't close the connection when the certificate expires. Fix emissary-ingress#2900 Signed-off-by: Thibault Deutsch <thibault@arista.com>

dethi · 2024-10-12T11:17:34Z

Rebased the change on master. Thanks for the example for the implementation. I took a bit of time to dig into my change again and now I remember where I was left confused last time: listener_idle_timeout_ms which I was taking as inspiration for my change (since it's in the same Envoy config section and applies to listener too) doesn't show up at all in the CRD yaml. It's only present in the Python code.

Now with the commit you shared, I took a look at the Go section and from what I can see, the Module CRD accept a config key that is completely untyped. Thus why I couldn't find it last time. So based on this, I don't think I need to change anything else.

I will get it to run on one of our dev clusters to validate the config, and if we can get the CI tests to run as well.

dethi force-pushed the listener-max-conn-lifetime branch from d102148 to fe369e6 Compare February 26, 2024 12:26

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Oct 8, 2024

dethi force-pushed the listener-max-conn-lifetime branch from fe369e6 to e69102e Compare October 12, 2024 10:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

module: add listener_max_connection_lifetime_ms option#5583

module: add listener_max_connection_lifetime_ms option#5583
dethi wants to merge 1 commit intoemissary-ingress:masterfrom
aristanetworks:listener-max-conn-lifetime

dethi commented Feb 26, 2024 •

edited

Loading

Uh oh!

dethi commented Feb 26, 2024 •

edited

Loading

Uh oh!

rgs1 commented Oct 8, 2024

Uh oh!

rgs1 commented Oct 9, 2024

Uh oh!

dethi commented Oct 9, 2024 •

edited

Loading

Uh oh!

rgs1 commented Oct 9, 2024

Uh oh!

dethi commented Oct 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dethi commented Feb 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Testing

Checklist

Uh oh!

dethi commented Feb 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rgs1 commented Oct 8, 2024

Uh oh!

rgs1 commented Oct 9, 2024

Uh oh!

dethi commented Oct 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rgs1 commented Oct 9, 2024

Uh oh!

dethi commented Oct 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dethi commented Feb 26, 2024 •

edited

Loading

dethi commented Feb 26, 2024 •

edited

Loading

dethi commented Oct 9, 2024 •

edited

Loading