Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CRI metrics collection #7796

Merged
merged 12 commits into from
May 3, 2024

Conversation

Mo-Fatah
Copy link
Contributor

@Mo-Fatah Mo-Fatah commented Feb 19, 2024

What type of PR is this?

/kind api-change
/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Related to https://issues.redhat.com/browse/OCPNODE-1008

Special notes for your reviewer:

This is a continuation of Sohan's effort in #7256 based on the recent cgroup manager changes in #7658 .

Does this PR introduce a user-facing change?

Implement CRI container and sandbox metrics collection 

@openshift-ci openshift-ci bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 19, 2024
Copy link
Contributor

openshift-ci bot commented Feb 19, 2024

Hi @Mo-Fatah. Thanks for your PR.

I'm waiting for a cri-o member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

codecov bot commented Feb 19, 2024

Codecov Report

Attention: Patch coverage is 35.48387% with 40 lines in your changes are missing coverage. Please review.

Project coverage is 49.63%. Comparing base (b40648e) to head (67ba58c).
Report is 11 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7796       /-   ##
==========================================
- Coverage   49.68%   49.63%   -0.06%     
==========================================
  Files         153      153              
  Lines       16826    16884       58     
==========================================
  Hits         8360     8380       20     
- Misses       7423     7459       36     
- Partials     1043     1045        2     

@Mo-Fatah Mo-Fatah marked this pull request as ready for review February 24, 2024 11:34
@openshift-ci openshift-ci bot requested review from klihub and kwilczynski February 24, 2024 11:34
@Mo-Fatah Mo-Fatah force-pushed the cri-stats-impl branch 3 times, most recently from c1d5799 to e8ac4da Compare February 24, 2024 12:08
@Mo-Fatah Mo-Fatah changed the title WIP: Implement CRI stats collection Implement CRI metrics collection Feb 27, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 27, 2024
@kwilczynski
Copy link
Member

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 28, 2024
}

func sandboxBaseLabelValues(sb *sandbox.Sandbox) []string {
// TODO FIXME: image?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I wrote this TODO) What does cadvisor report as the image for the POD container?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to look it up but couldn't find what they are using exactly as image name for POD containers. They seem to use the image name from the container info (here and here but not sure how they differentiate between a container and a pod, and what image name exactly they are using for pods.

my suggestion would be using the sandbox name defined in the sandbox metadata (e.g. here)

@haircommander
Copy link
Member

great work here @Mo-Fatah ! I am leaning towards dropping the optimzations where we replace a slice in place. I think there will be an advantage to doing so in the future, but let's keep it simple to start. I also would love to see some unit and integration tests here to verify contents. If you don't have the bandwidth to add those let us know!

@kwilczynski
Copy link
Member

/hold

So I can do a proper review.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 7, 2024
@Mo-Fatah
Copy link
Contributor Author

@haircommander I simplified the slices updates and added an integration test for the memory metrics as a starter. the tests mainly reads the cgroup value and compare it to the value returned from metricsp . let me know if you have a better testing approach or other comments in general.

Copy link
Contributor

openshift-ci bot commented Apr 30, 2024

@haircommander: Overrode contexts on behalf of haircommander: ci/prow/ci-e2e-evented-pleg

In response to this:

/override ci/prow/ci-e2e-evented-pleg
/lgtm

Thanks @Mo-Fatah ! Great work here

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@haircommander haircommander removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 1, 2024
@saschagrunert
Copy link
Member

/retest

@Mo-Fatah
Copy link
Contributor Author

Mo-Fatah commented May 2, 2024

/retest

Mo-Fatah and others added 12 commits May 2, 2024 19:02
- Add collection_period and included_pod_metrics to StatsConfig and
  shell completions

Signed-off-by: Mohamed Abdelfatah <[email protected]>
Co-authored-by: Sohan Kunkerkar <[email protected]>
Signed-off-by: Mohamed Abdelfatah <[email protected]>
Signed-off-by: Mohamed Abdelfatah <[email protected]>
Co-authored-by: Sohan Kunkerkar <[email protected]>
- Update StatsServer to collect sandbox metrics in the update cycle

Signed-off-by: Mohamed Abdelfatah <[email protected]>
Co-authored-by: Sohan Kunkerkar <[email protected]>
- typo fix

Signed-off-by: Mohamed Abdelfatah <[email protected]>
Signed-off-by: Mohamed Abdelfatah <[email protected]>
Signed-off-by: Mohamed Abdelfatah <[email protected]>
- Metrics tests improvement

- Clean up

Signed-off-by: Mohamed Abdelfatah <[email protected]>
- internal/stats: Use the internal logging package

- Tiny refactoring for better error handling

- lint, shfmt, shellcheck

Signed-off-by: Mohamed Abdelfatah <[email protected]>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label May 2, 2024
@haircommander
Copy link
Member

/lgtm
/approve

🎉

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 2, 2024
Copy link
Contributor

openshift-ci bot commented May 2, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, kwilczynski, Mo-Fatah, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [haircommander,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@haircommander
Copy link
Member

/retest

1 similar comment
@haircommander
Copy link
Member

/retest

@kwilczynski
Copy link
Member

/test e2e-gcp-ovn

@Mo-Fatah
Copy link
Contributor Author

Mo-Fatah commented May 3, 2024

/retest

@haircommander
Copy link
Member

/override ci/prow/e2e-gcp-ovn

Copy link
Contributor

openshift-ci bot commented May 3, 2024

@haircommander: Overrode contexts on behalf of haircommander: ci/prow/e2e-gcp-ovn

In response to this:

/override ci/prow/e2e-gcp-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-bot openshift-merge-bot bot merged commit 175cc34 into cri-o:main May 3, 2024
66 of 69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants