Add failpoint for nospace on puts #16018

tjungblu · 2023-06-06T09:30:01Z

This CR introduces a new failput that will trigger a member to report no space.

This might need some more legwork, it's the first time I'm adding a failpoint here.

tjungblu · 2023-06-06T09:31:05Z

tests/robustness/failpoints.go

+ member := clus.Procs[rand.Int()%len(clus.Procs)]
+ for member.IsRunning() {
+ lg.Info("Setting up gofailpoint", zap.String("failpoint", f.Name()))
+ err := member.Failpoints().Setup(ctx, f.Name(), "return")


over time, this might render the cluster entirely unusable - what's the best way to turn this off for a given member? After x-minutes? Only if there's still quorum?

Robustness tests iteration (create cluster, run traffic, inject failpoint and delete cluster) takes usually 5s, max a minute.

seems I was out of the loop for too long, are the longer nightly linearizability tests not a thing anymore?

They are a thing, but we run 100x 5s iteration

tjungblu · 2023-06-06T09:31:36Z

tests/robustness/failpoints.go

+ if err != nil {
+ panic(err)
+ }
+ if v.LessThan(version.V3_6) {


is this sensible? I believe this should be easy to backport to 3.5 and 3.4 however

This would not be needed if you used code from goPanicFailpoint that checks list of failpoints exposed by gofail

added and reused, thanks

serathius · 2023-06-06T09:59:05Z

tests/robustness/failpoints.go

+ lg.Info("Setting up gofailpoint", zap.String("failpoint", f.Name()))
+ err := member.Failpoints().Setup(ctx, f.Name(), "return")
+ if err != nil {
+ lg.Info("goFailpoint setup failed", zap.String("failpoint", f.Name()), zap.Error(err))


We need to assert that failpoint was executed at least once. Please follow #14729 on how to do that.

done, it's much easier than sleep testing

serathius · 2023-06-06T10:02:00Z

Please give me more context on why you want to introduce this failpoint. I want to make sure that we not only code it, but also address the overall issues with no space alarms.

tjungblu · 2023-06-07T12:38:16Z

Please give me more context on why you want to introduce this failpoint. I want to make sure that we not only code it, but also address the overall issues with no space alarms.

I was briefly seeing a panic after playing around the authbackend implementation paths, which led me to run the linearization test for longer while dropping out a node at random with this alarm. I'll add a more specific test once I've figured out how to properly repro this...

(setting to draft in the meantime)

Adding a new flag to retain e2e etcd process logs after stop and saving next to the visualized model. Spun out of etcd-io#16018 where I used it for easier local debugging on model violations. Signed-off-by: Thomas Jungblut <[email protected]>

Adding a new flag to retain e2e etcd process logs after stop and saving next to the visualized model. Spun out of etcd-io#16018 where I used it for easier local debugging on model violations. Fixes etcd-io#15079 partially. Signed-off-by: Thomas Jungblut <[email protected]>

Adding a set of functions which retain e2e etcd process logs after stop and saving next to the visualized model during robustness tests. Spun out of etcd-io#16018 where I used it for easier local debugging on model violations. Fixes etcd-io#15079 partially. Signed-off-by: Thomas Jungblut <[email protected]>

stale · 2023-09-17T01:16:33Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

jmhbnz · 2024-01-18T19:08:13Z

Discussed during sig-etcd triage meeting. @tjungblu do you have capacity to resolve conflicts and finish this off?

tjungblu · 2024-01-19T13:16:19Z

Sure, do you guys want to keep the failpoint? I can remove the remainder.

tjungblu · 2024-01-19T14:06:27Z

rebased, updated and removed the remainder of unrelated changes

This CR introduces a new failput that will trigger a member to report no space. Signed-off-by: Thomas Jungblut <[email protected]>

k8s-ci-robot · 2024-08-05T22:50:40Z

@tjungblu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-etcd-unit-test-amd64	`2213f06`	link	true	`/test pull-etcd-unit-test-amd64`
pull-etcd-unit-test-arm64	`2213f06`	link	true	`/test pull-etcd-unit-test-arm64`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

tjungblu commented Jun 6, 2023

View reviewed changes

serathius reviewed Jun 6, 2023

View reviewed changes

tjungblu marked this pull request as draft June 7, 2023 12:38

tjungblu mentioned this pull request Jun 14, 2023

Retain process logs on robustness tests #16077

Open

stale bot added the stale label Sep 17, 2023

tjungblu mentioned this pull request Sep 28, 2023

etcd client can report error after a successful write #16659

Closed

4 tasks

stale bot removed the stale label Jan 16, 2024

tjungblu force-pushed the failpoint_nospace branch from a030901 to d23cc9a Compare January 19, 2024 14:05

tjungblu marked this pull request as ready for review January 19, 2024 14:07

Add failpoint for nospace on puts

2213f06

This CR introduces a new failput that will trigger a member to report no space. Signed-off-by: Thomas Jungblut <[email protected]>

tjungblu force-pushed the failpoint_nospace branch from d23cc9a to 2213f06 Compare January 19, 2024 14:17

k8s-ci-robot added the needs-rebase label Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add failpoint for nospace on puts #16018

Add failpoint for nospace on puts #16018

tjungblu commented Jun 6, 2023

tjungblu Jun 6, 2023

serathius Jun 6, 2023

tjungblu Jun 7, 2023

serathius Jun 7, 2023

tjungblu Jun 6, 2023

serathius Jun 6, 2023

tjungblu Jun 7, 2023

serathius Jun 6, 2023

tjungblu Jun 7, 2023

serathius commented Jun 6, 2023

tjungblu commented Jun 7, 2023

stale bot commented Sep 17, 2023

jmhbnz commented Jan 18, 2024

tjungblu commented Jan 19, 2024

tjungblu commented Jan 19, 2024

k8s-ci-robot commented Aug 5, 2024

Add failpoint for nospace on puts #16018

Are you sure you want to change the base?

Add failpoint for nospace on puts #16018

Conversation

tjungblu commented Jun 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serathius commented Jun 6, 2023

tjungblu commented Jun 7, 2023

stale bot commented Sep 17, 2023

jmhbnz commented Jan 18, 2024

tjungblu commented Jan 19, 2024

tjungblu commented Jan 19, 2024

k8s-ci-robot commented Aug 5, 2024