LeaderWorkerSet integration when LeaderWorkerSet created with RestartPolicy should only recreate the deleted pod when policy is set to NoneRestartPolicy#9755
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
|
/assign |
|
Please also describe what was the problem causing the flakes |
b9a01a2 to
14393b0
Compare
|
Added explanation to the description |
… Policy should recreate pods when policy is set to RecreateGroupOnPodRestart kubernetes-sigs#9751
14393b0 to
ca5d46b
Compare
|
/retest-required |
I still don't understand what was the issue with the previous code. Sure, I'm +1 to improve the test code to use sets rather than counters for the ease of investigations, but I don't see where was the bug. Could you maybe detail the explanation by tracking the events and correlating them with kubelet logs? This would be very helpful I think. |
|
@vladikkuzn Could you please also run an experiment with 200 repeats of this test to make sure it’s fixed? |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: vladikkuzn The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
3ce656d to
7b996dd
Compare
|
@mimowo There're no logs which shows the LWS controller created a replacement; if the LWS controller never created a replacement (so the test’s recreatedCount stayed 0), we cannot distinguish between those from the current artifact set—hence we need to make the test robust by not relying on pod name for “recreated” (e.g. use UID + worker index) so that a replacement with the same or different name is still counted correctly. |
|
Interesting, maybe it didn't create the replacement Pod because the previous hasn't finished termination on time? |
|
Ah, I see indeed we don't wait for the Pod termination or deletion. Let's introduce this extra step. Sometimes termination of pods might take long. |
7b996dd to
8a804fa
Compare
|
@vladikkuzn: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| }) | ||
|
|
||
| var podToDelete *corev1.Pod | ||
| originalPodUIDs := make(map[string]types.UID) |
There was a problem hiding this comment.
Could you switch to Switch to sets.New[types.UID]() here as well?
| }, client.InNamespace(lws.Namespace))).Should(gomega.Succeed()) | ||
| g.Expect(pods.Items).To(gomega.HaveLen(3)) | ||
|
|
||
| remainingOriginalUIDs := sets.New[types.UID]() |
There was a problem hiding this comment.
The sets package already has Intersection and Difference. So you can replace the entire loop + counter with:
g.Expect(pods.Items).To(gomega.HaveLen(3))
currentUIDs := sets.New[types.UID]()
for _, pod := range pods.Items {
currentUIDs.Insert(pod.UID)
}
survived := originalPodUIDSet.Intersection(currentUIDs)
newUIDs := currentUIDs.Difference(originalPodUIDSet)
g.Expect(survived).To(gomega.HaveLen(2), "2 original pods should survive")
g.Expect(survived.Has(deletedPodUID)).To(gomega.BeFalse(), "deleted pod should not survive")
g.Expect(newUIDs).To(gomega.HaveLen(1), "exactly 1 replacement pod")
var replacement *corev1.Pod
for i := range pods.Items {
if newUIDs.Has(pods.Items[i].UID) {
replacement = &pods.Items[i]
break
}
}
g.Expect(replacement).NotTo(gomega.BeNil())
g.Expect(replacement.Labels[leaderworkersetv1.WorkerIndexLabelKey]).To(
gomega.Equal(deletedWorkerIndex),
"only the deleted worker slot should be recreated",
)| gomega.Eventually(func(g gomega.Gomega) { | ||
| pod := &corev1.Pod{} | ||
| err := k8sClient.Get(ctx, deletedPodKey, pod) | ||
| g.Expect(apierrors.IsNotFound(err)).To(gomega.BeTrue(), "deleted pod should be removed from the API") |
There was a problem hiding this comment.
IsNotFound may not always work here — LWS uses a StatefulSet which recreates the pod with the same name, and the replacement could be created before we even check for deletion. So Get(ctx, deletedPodKey, pod) will always find a pod and never return NotFound.
Maybe checking for a UID change would be more reliable since that confirms recreation regardless of timing?
ginkgo.By("Wait for the deleted pod to be recreated with a new UID", func() {
gomega.Eventually(func(g gomega.Gomega) {
pod := &corev1.Pod{}
g.Expect(k8sClient.Get(ctx, deletedPodKey, pod)).To(gomega.Succeed())
g.Expect(pod.UID).NotTo(gomega.Equal(deletedPodUID),
"pod should be recreated with a new UID")
}, util.LongTimeout, util.Interval).Should(gomega.Succeed())
})
What type of PR is this?
/kind flake
What this PR does / why we need it:
The old assertion identified the recreated pod by comparing current pods against the original map keyed by pod name. That is brittle because after deletion/recreation, the replacement can be observed in a different way than the test assumes, so the check can temporarily see no “same-name, new-UID” pod and incorrectly conclude that nothing was recreated.
I changed the test to assert the actual behavior more directly:
exactly 3 pods exist again,
exactly 1 pod has a new UID,
2 original pod UIDs are still present,
the newly created pod corresponds to the deleted worker index.
So the problem was: the test was using an unstable way to detect pod replacement, which made it report 0 recreated pods even when the system behavior was valid.
Which issue(s) this PR fixes:
Fixes #9750
Special notes for your reviewer:
Does this PR introduce a user-facing change?