Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: StoreV2 deprecation plan #12913

Open
17 of 21 tasks
ptabor opened this issue Apr 30, 2021 · 31 comments
Open
17 of 21 tasks

RFC: StoreV2 deprecation plan #12913

ptabor opened this issue Apr 30, 2021 · 31 comments
Labels
help wanted priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. stage/tracked
Milestone

Comments

@ptabor
Copy link
Contributor

ptabor commented Apr 30, 2021

[The content of this post is being edited, and is not yet approved plan of record ]

Background

In 3.4 storeV2 is still extensively used:

  • User can opt in --enable-v2 to unable V2 API that writes solely data to storeV2 files.

  • Membership information is both stored in V2 & V3 (backend) stores

  • Membership information is being read (recovered) from storeV2

  • Publishing membership happens through StoreV2 raft operation.

    • The new ClusterMemberAttrSet applier was implemented in etcd 3.5 so cannot be used by default in 3.5: to allow 3.4->3.5 rollback.
  • During startup etcd assumes each WAL log snapshot is accompanied by storev2 snapshot.

  • Note: etcdctl snapshot restore is not restoring V2 content (producing fake storeV2 with membership information)

Plan:

3.5 release

Both V2 and V3 state are supported, but V3 becomes the source of truth. TODO:

  • --experimental-enable-v2v3= promoted to --enable-v2v3=... or deprecated. [see https://github.com/3.5 decision making for alpha/beta/experimental feature stability #12905]
  • --enable-v2v3=... should allow mounting as root ?
  • --enable-v2 prints deprecation warning [optional: when not combined with --enable-v2v3=... ]
  • ClusterMemberAttrSet is implemented.
  • Membership is only read from backend v2 (No-storeV2: Read membership information from the backend (Part5) #12914)
  • --experimental-no-v2=PHASE_1_WRITE_ONLY mode that:
    • cannot be used when --enable-v2 is ON
    • if the storev2 snapshot is found accompanying the (used to be used) WAL log snapshot,
      its validated whether it has no used-data (apart of membership & version)
    • storev2 is content created on demand using 'etcdctl snapshot restore' logic based on the last WAL snapshot

3.6 release (updated by serathius@)

V3 is the only meaningful state. V2 state is generated from V3 state to maintained backward compability. TODO:

3.7 release (updated by serathius@)

V2 code is totally removed. TODO:

  • Discovery v2 is removed
  • All V2 store code is removed

Docs

@xiang90
Copy link
Contributor

xiang90 commented May 5, 2021

@ptabor

Thanks for helping with this! (I wanted to have a clean separation between v2 and v3 a couple of years ago but never got time actually to do it).

I am not sure if we are still running the failure injection tests. I would suggest running them to ensure the correctness of this change. Also, we might want more tests on the membership changes and the upgrade path.

/cc @gyuho

@ptabor
Copy link
Contributor Author

ptabor commented May 5, 2021

@xiang90
The functional tests (https://travis-ci.com/github/etcd-io/etcd/jobs/502908317) looks to me to be a failure injection tests in practice.
Do you mean these tests, or there are other tests ?

@xiang90
Copy link
Contributor

xiang90 commented May 5, 2021

@ptabor

The functional test randomly injects failures into the running servers. Due to the randomness, it might not be super effective to catch issues if it only runs one round in the CI.

What we did previously was to run this test 24*7 for weeks on a set of servers before major releases. I am not sure if it is still the practice since we have not really introduced significant changes to etcd in the recent releases I believe.

@gyuho
Copy link
Contributor

gyuho commented May 8, 2021

@ptabor This looks great.

Regarding

--experimental-enable-v2v3= promoted to --enable-v2v3=... or deprecated

as we discussed, let's mark it as deprecated for 3.5 and remove for 3.6.

Then, there's no need for

--enable-v2 is docomissioned or FAILs if not accompanied by --enable-v2v3=...

?

@gyuho
Copy link
Contributor

gyuho commented May 8, 2021

I am not sure if we are still running the failure injection tests.

We do run functional tests with failure injection in the CI but haven't implemented member add/remove

ref.

- SIGTERM_ONE_FOLLOWER
- SIGTERM_ONE_FOLLOWER_UNTIL_TRIGGER_SNAPSHOT
- SIGTERM_LEADER
- SIGTERM_LEADER_UNTIL_TRIGGER_SNAPSHOT
- SIGTERM_QUORUM
- SIGTERM_ALL
- SIGQUIT_AND_REMOVE_ONE_FOLLOWER
- SIGQUIT_AND_REMOVE_ONE_FOLLOWER_UNTIL_TRIGGER_SNAPSHOT
- BLACKHOLE_PEER_PORT_TX_RX_LEADER
- BLACKHOLE_PEER_PORT_TX_RX_LEADER_UNTIL_TRIGGER_SNAPSHOT
- BLACKHOLE_PEER_PORT_TX_RX_QUORUM
- BLACKHOLE_PEER_PORT_TX_RX_ALL
- DELAY_PEER_PORT_TX_RX_LEADER
- RANDOM_DELAY_PEER_PORT_TX_RX_LEADER
- DELAY_PEER_PORT_TX_RX_LEADER_UNTIL_TRIGGER_SNAPSHOT
- RANDOM_DELAY_PEER_PORT_TX_RX_LEADER_UNTIL_TRIGGER_SNAPSHOT
- DELAY_PEER_PORT_TX_RX_QUORUM
- RANDOM_DELAY_PEER_PORT_TX_RX_QUORUM
- DELAY_PEER_PORT_TX_RX_ALL
- RANDOM_DELAY_PEER_PORT_TX_RX_ALL
- NO_FAIL_WITH_STRESS
- NO_FAIL_WITH_NO_STRESS_FOR_LIVENESS

ptabor added a commit to ptabor/etcd that referenced this issue May 10, 2021
Flags `--experimental-enable-v2v3` and '-enable-v2' will raise a warning in 3.5,
in 3.6 they are schedule for decomissioning, such that v2store can stop be written in 3.7.

Deprecation plan in: etcd-io#12913
ptabor added a commit to ptabor/etcd that referenced this issue May 10, 2021
Flags `--experimental-enable-v2v3` and '-enable-v2' will raise a warning in 3.5,
in 3.6 they are schedule for decomissioning, such that v2store can stop be written in 3.7.

Deprecation plan in: etcd-io#12913
ptabor added a commit to ptabor/etcd that referenced this issue May 10, 2021
Flags `--experimental-enable-v2v3` and '-enable-v2' will raise a warning in 3.5,
in 3.6 they are schedule for decomissioning, such that v2store can stop be written in 3.7.

Deprecation plan in: etcd-io#12913
ptabor added a commit to ptabor/etcd that referenced this issue May 10, 2021
Flags `--experimental-enable-v2v3` and '-enable-v2' will raise a warning in 3.5,
in 3.6 they are schedule for decomissioning, such that v2store can stop be written in 3.7.

Deprecation plan in: etcd-io#12913
ptabor added a commit to ptabor/etcd that referenced this issue May 10, 2021
Flags `--experimental-enable-v2v3` and '-enable-v2' will raise a warning in 3.5,
in 3.6 they are schedule for decomissioning, such that v2store can stop be written in 3.7.

Deprecation plan in: etcd-io#12913
@ptabor
Copy link
Contributor Author

ptabor commented May 14, 2021

I hope/think mandatory steps for v3.5 are done.

tangcong pushed a commit to tangcong/etcd that referenced this issue May 15, 2021
Flags `--experimental-enable-v2v3` and '-enable-v2' will raise a warning in 3.5,
in 3.6 they are schedule for decomissioning, such that v2store can stop be written in 3.7.

Deprecation plan in: etcd-io#12913
@stale
Copy link

stale bot commented Aug 12, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@geetasg
Copy link

geetasg commented Jun 13, 2023

@geetasg
Copy link

geetasg commented Jun 15, 2023

The PRs will be roughly in following order -

@ahrtr
Copy link
Member

ahrtr commented Jul 24, 2023

Related doc (for 3.6) - https://docs.google.com/document/d/19_7xBwZBNrM-75h5rr2PUWFM_eNjSXRVPZZpVaTS2qw/edit?usp=sharing

Nice summary. @geetasg

The PRs will be roughly in following order -

The detailed plan seems not in sync with the summary above, and isn't clear to me either.

I think roughly what we need to do in 3.6 are (on top of both @serathius 's summary and your summary in the above doc) :

  • V2 snapshots are generated from V3 data
    I think this is the most important left task in 3.6. Please provide a detailed plan and steps or draft PR. Let me know if you need assistance.
  • V2 apply code is removed
    Just as you mentioned in the doc, we can NOT directly remove V2 apply code in 3.6 because we still need to handle 3.5's publish request in mix-version scenario. But we can simplify the V2 apply code.
  • V2 no longer stores membership data
    To be clearer, we still need to store the membership data in the snapshot files (.snap), we just do not need the data in v2store in memory, because the data will come from the bbolt db.
  • Remove the etcdutl backup?
    It has some code related to v2store. I don't understand why do we need such command. It has already been deprecated in 3.5 and users are recommended to use etcdctl snapshot save command. So can we just remove it in 3.6? At the latest, we should remove it in 3.7. WDYT? @ptabor @serathius

@geetasg
Copy link

geetasg commented Jul 24, 2023

@ahrtr Thank you for the review.
Plan for "V2 snapshots are generated from V3 data" is listed below -
1.Provide a method to export membership in v2 format - Ref:#16132
2.Consume the method implemented in above step from triggerSnapshot workflow as in poc here - https://github.com/geetasg/etcd/blob/poc1/server/etcdserver/server.go#L2055
3.Similarly consume the same method from etcdutl backup as shown here - https://github.com/geetasg/etcd/blob/poc1/etcdutl/etcdutl/backup_command.go#L184
4.Consume the method from etcdutl snapshot
Ref:https://github.com/geetasg/etcd/blob/poc1/etcdutl/snapshot/v3_snapshot.go#L461
5.Consume the method from workflow to created merged snapshot Ref:https://github.com/geetasg/etcd/blob/poc1/server/etcdserver/snapshot_merge.go#L34

This poc is at branch poc1 in my repo.
Please let me know if you have questions/suggestions on this plan. I am currently at step 1 - PR is published for review.
/cc @serathius @ptabor

@geetasg
Copy link

geetasg commented Jul 24, 2023

Also listing the plan for other items -
V2 no longer stores membership data
1.Fix conf change entry validation - https://github.com/geetasg/etcd/tree/pr3
2.Remove v2store from RaftCluster - ref:8dcc34d - some of the earlier commits leading upto this removal will count too.
3.Remove v2store from bootstrap - Ref: 930ecd2
and 9b0dd97

Currently this work is on #1 - PR is being discussed..

@geetasg
Copy link

geetasg commented Jul 24, 2023

Plan for V2 apply code is removed -
1.This will involve introducing a new applier which can apply v2 req to v3 backend. Ref:733d109
2.Update tests - there are unit tests in server_test.go that issue v2 requests. Ref: TestDoLocalAction
3.Remove v2 applier - 9c02d76

v2v3 applier can be removed once the 3.5 publish node request is n/a.

@geetasg
Copy link

geetasg commented Jul 24, 2023

Remove the etcdutl backup?
makes sense to me. It can be removed if there are no objections.

@geetasg
Copy link

geetasg commented Jul 24, 2023

@ahrtr @serathius @ptabor please let me know if you have any questions/comments on the plan listed above. These changes are poced in #16000. Thanks

@ahrtr
Copy link
Member

ahrtr commented Jul 25, 2023

Overall looks good to me, let's do it step by step. StoreV2 deprecation is the top priority for etcd 3.6.0 per #16279

@ahrtr
Copy link
Member

ahrtr commented Aug 30, 2023

@geetasg Could you summarize all remaining tasks in one comment using a hierarchical structure to clearly show what has been completed and what has not? thx

@geetasg
Copy link

geetasg commented Sep 10, 2023

@ahrtr updated #12913 (comment) above to indicate latest status.

In terms of high level tasks as listed in Plan section above for 3.6 release -

  • V2 snapshots are generated from V3 data -- done
  • V2 apply code is removed
  • V2 no longer stores membership data -- WIP
  • Cleanup V2 code to leave only code for generating snapshots.

starlingx-github pushed a commit to starlingx/stx-puppet that referenced this issue Jan 10, 2024
Starting 3.4, default API for etcdctl commands is v3.
But v2 is still supported and extensively used in 3.4.
(source: etcd-io/etcd#12913)

Although it is a better idea to use v3 at as much places in
the code as possible, in case of cluster health API, v2 API gives
more clearer error about bad certificate usage than v3.
So it is better to stick to v2 than v3 at least in this case.

An environment variable ETCDCTL_API is used to specify
the usage of non-default version. A prefix for v2 API usage is added
to the only occurence of 'etcdctl' in the /etc/init.d/etcd script.

Test Plan:
PASS: Run '/etc/init.d/etcd status' with correct pair of certs.
      Etcd status is 'Running'.
PASS: Run '/etc/init.d/etcd status' with incorrect pair of certs.
      Etcd status is 'Running'.

Story: 2010878
Task: 48960

Change-Id: Idaaecfeec2c4851b4e33c21839df12cea5a65c2e
Signed-off-by: Kaustubh Dhokte <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. stage/tracked
Development

No branches or pull requests

7 participants