[RLC-10] Rebase Custom Changes to rlc-10/6.12.0-124.45.1.el10_1 by PlaidCat · Pull Request #1004 · ctrliq/kernel-src-tree

PlaidCat · 2026-03-23T22:01:28Z

https://ciqinc.atlassian.net/browse/KERNEL-756

Update process (This kernel CentOS base for 6.12.0-124.45.1.el10_1)

Rolling Release Rebase Process
Create rlc-10/6.12.0-124.45.1.el10_1 branch from rocky10_1
Cherry-pick all code from previous branch rlc-10/6.12.0-124.40.1.el10_1 into new branch (skipping unneeded code)
- Fix conflicts as they arise
Build and Test

Rebase Log

Already on 'rlc-10/6.12.0-124.40.1.el10_1'
Already on 'jmaple_rlc-10/6.12.0-124.45.1.el10_1'
[rolling release update] Rolling Product:  rlc-10
[rolling release update] Checking out branch:  rlc-10/6.12.0-124.40.1.el10_1
[rolling release update] Gathering all the RESF kernel Tags
[rolling release update] Found 14 RESF kernel tags
[rolling release update] Checking out branch:  rocky10_1
[rolling release update] Gathering all the RESF kernel Tags
[rolling release update] Found 15 RESF kernel tags
[rolling release update] Common tag sha:  b'16ddd0825bd1'
"16ddd0825bd1a7237364f06fae356660b8837e07 Rebuild rocky10_1 with kernel-6.12.0-124.40.1.el10_1"
[rolling release update] Checking for FIPS protected changes between the common tag and HEAD
[rolling release update] Checking for FIPS protected changes
[rolling release update] Getting SHAS 16ddd0825bd1..HEAD
[rolling release update] Number of commits to check:  55
[rolling release update] Checking modifications of shas
[rolling release update] Checked 5 of 55 commits
[rolling release update] Checked 10 of 55 commits
[rolling release update] Checked 15 of 55 commits
[rolling release update] Checked 20 of 55 commits
[rolling release update] Checked 25 of 55 commits
[rolling release update] Checked 30 of 55 commits
[rolling release update] Checked 35 of 55 commits
[rolling release update] Checked 40 of 55 commits
[rolling release update] Checked 45 of 55 commits
[rolling release update] Checked 50 of 55 commits
[rolling release update] Checked 55 of 55 commits
[rolling release update] 0 of 55 commits have FIPS protected changes
[rolling release update] Checking out old rolling branch:  rlc-10/6.12.0-124.40.1.el10_1
[rolling release update] Finding the CIQ Kernel and Associated Upstream commits between the last resf tag and HEAD
[rolling release update] Getting SHAS 16ddd0825bd1..HEAD
[rolling release update] Last RESF tag sha:  b'16ddd0825bd1'
[rolling release update] Total commits in old branch: 26
[rolling release update] Checking out new base branch:  rocky10_1
[rolling release update] Finding the kernel version for the new rolling release
[rolling release update] New Branch to create: rlc-10/6.12.0-124.45.1.el10_1
[rolling release update] Creating new branch: rlc-10/6.12.0-124.45.1.el10_1
[rolling release update] Creating new branch for PR:  jmaple_rlc-10/6.12.0-124.45.1.el10_1
[rolling release update] Creating Map of all new commits from last rolling release fork
[rolling release update] Total commits in new branch: 54
[rolling release update] Checking if any of the commits from the old rolling release are already present in the new base branch
[rolling release update] Found 0 duplicate commits to remove
[rolling release update] Applying 26 remaining commits to the new branch
  [1/26] c9c7cb9604a9 tools: hv: Enable debug logs for hv_kvp_daemon
  [2/26] 35880816e479 RDMA/mana_ib: Add device statistics support
  [3/26] 618761ae9ee6 PCI/MSI: Export pci_msix_prepare_desc() for dynamic MSI-X allocations
  [4/26] 912141ce1036 PCI: hv: Allow dynamic MSI-X vector allocation
  [5/26] 3b6518a6584c net: mana: explain irq_setup() algorithm
  [6/26] 0040782e4895 net: mana: Allow irq_setup() to skip cpus for affinity
  [7/26] fdd3a51e6c87 net: mana: Allocate MSI-X vectors dynamically
  [8/26] 87f59f15f2a4 net: mana: Add support for net_shaper_ops
  [9/26] a6a687f694c3 net: mana: Add speed support in mana_get_link_ksettings
  [10/26] a9c61ac47f8c net: mana: Handle unsupported HWC commands
  [11/26] 03a2f21c06d1 net: mana: Fix build errors when CONFIG_NET_SHAPER is disabled
  [12/26] c63173a25b91 RDMA/mana_ib: add additional port counters
  [13/26] 0450c13c92f4 RDMA/mana_ib: Drain send wrs of GSI QP
  [14/26] b63664e6c8f1 net: hv_netvsc: fix loss of early receive events from host during channel open.
  [15/26] 80ea00d9c520 net: mana: Reduce waiting time if HWC not responding
  [16/26] 0c70bf6f0172 RDMA/mana_ib: Extend modify QP
  [17/26] ad4e9448861e scsi: storvsc: Prefer returning channel with the same CPU as on the I/O issuing CPU
  [18/26] 9e86095fda9f net: mana: Use page pool fragments for RX buffers instead of full pages to improve memory efficiency.
  [19/26] e4f84b30ecc2 dcache: export shrink_dentry_list() and add new helper d_dispose_if_unused()
  [20/26] 54bae277e28b idpf: fix a race in txq wakeup
  [21/26] f51c7be2c535 idpf: add support for Tx refillqs in flow scheduling mode
  [22/26] c6e0b6b0326e idpf: improve when to set RE bit logic
  [23/26] ca94a60dbe10 idpf: simplify and fix splitq Tx packet rollback error path
  [24/26] cbfcfd9b9f7e idpf: replace flow scheduling buffer ring with buffer pool
  [25/26] 561a71676f43 idpf: stop Tx if there are insufficient buffer resources
  [26/26] 5f9a41096abb idpf: remove obsolete stashing code
[rolling release update] Successfully applied all 26 commits

BUILD

$ egrep -B 5 -A 5 "\[TIMER\]|^Starting Build" $(ls -t kbuild* | head -n1)
/mnt/code/kernel-src-tree-build
Running make mrproper...
  CLEAN   scripts/basic
  CLEAN   scripts/kconfig
  CLEAN   include/config include/generated
[TIMER]{MRPROPER}: 7s
x86_64 architecture detected, copying config
'configs/kernel-x86_64-rhel.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-rocky10_1_rebuild-20b01e9350fe"
Making olddefconfig
--
  HOSTCC  scripts/kconfig/util.o
  HOSTLD  scripts/kconfig/conf
#
# configuration written to .config
#
Starting Build
  GEN     arch/x86/include/generated/asm/orc_hash.h
  WRAP    arch/x86/include/generated/uapi/asm/bpf_perf_event.h
  WRAP    arch/x86/include/generated/uapi/asm/errno.h
  WRAP    arch/x86/include/generated/uapi/asm/fcntl.h
  WRAP    arch/x86/include/generated/uapi/asm/ioctl.h
--
  LD [M]  virt/lib/irqbypass.ko
  BTF [M] net/hsr/hsr.ko
  BTF [M] net/qrtr/qrtr-mhi.ko
  BTF [M] net/qrtr/qrtr.ko
  BTF [M] virt/lib/irqbypass.ko
[TIMER]{BUILD}: 1991s
Making Modules
  SYMLINK /lib/modules/6.12.0-rocky10_1_rebuild-20b01e9350fe+/build
  INSTALL /lib/modules/6.12.0-rocky10_1_rebuild-20b01e9350fe+/modules.order
  INSTALL /lib/modules/6.12.0-rocky10_1_rebuild-20b01e9350fe+/modules.builtin
  INSTALL /lib/modules/6.12.0-rocky10_1_rebuild-20b01e9350fe+/modules.builtin.modinfo
--
  STRIP   /lib/modules/6.12.0-rocky10_1_rebuild-20b01e9350fe+/kernel/net/qrtr/qrtr-mhi.ko
  STRIP   /lib/modules/6.12.0-rocky10_1_rebuild-20b01e9350fe+/kernel/virt/lib/irqbypass.ko
  SIGN    /lib/modules/6.12.0-rocky10_1_rebuild-20b01e9350fe+/kernel/virt/lib/irqbypass.ko
  SIGN    /lib/modules/6.12.0-rocky10_1_rebuild-20b01e9350fe+/kernel/net/qrtr/qrtr-mhi.ko
  DEPMOD  /lib/modules/6.12.0-rocky10_1_rebuild-20b01e9350fe+
[TIMER]{MODULES}: 11s
Making Install
  INSTALL /boot
[TIMER]{INSTALL}: 16s
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-6.12.0-rocky10_1_rebuild-20b01e9350fe+ and Index to 2
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 7s
[TIMER]{BUILD}: 1991s
[TIMER]{MODULES}: 11s
[TIMER]{INSTALL}: 16s
[TIMER]{TOTAL} 2030s
Rebooting in 10 seconds

KSelfTest

$ ./kernel-tools/kernel_auto_rebuild/get_kselftest_diff.sh
selftest-6.12.0-io_uring_tests-c88e283d000a+-1.log: 476 passed
selftest-6.12.0-jmaple_rlc-10_6.12.0-124.40.1.el10_1-08ab2f7161e0+-1.log: 478 passed
selftest-6.12.0-jmaple_rlc-10_6.12.0-124.45.1.el10_1-9b5680ba2fb6+-1.log: 479 passed
selftest-6.12.0-jmaple_rlc-10_6.12.0-124.45.1.el10_1-3ec19c8b1979+-1.log: 479 passed

Before: selftest-6.12.0-jmaple_rlc-10_6.12.0-124.45.1.el10_1-9b5680ba2fb6+-1.log
After: selftest-6.12.0-jmaple_rlc-10_6.12.0-124.45.1.el10_1-3ec19c8b1979+-1.log
Diff:
-ok 7 selftests: timers: raw_skew # SKIP
+ok 7 selftests: timers: raw_skew

jira LE-3207 feature tools_hv commit-author Shradha Gupta <shradhagupta@linux.microsoft.com> commit a9c0b33 Allow the KVP daemon to log the KVP updates triggered in the VM with a new debug flag(-d). When the daemon is started with this flag, it logs updates and debug information in syslog with loglevel LOG_DEBUG. This information comes in handy for debugging issues where the key-value pairs for certain pools show mismatch/incorrect values. The distro-vendors can further consume these changes and modify the respective service files to redirect the logs to specific files as needed. Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Naman Jain <namjain@linux.microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/r/1744715978-8185-1-git-send-email-shradhagupta@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1744715978-8185-1-git-send-email-shradhagupta@linux.microsoft.com> (cherry picked from commit a9c0b33) Signed-off-by: Jonathan Maple <jmaple@ciq.com>

jira LE-4478 commit-author Shiraz Saleem <shirazsaleem@microsoft.com> commit baa640d Add support for mana device level statistics. Co-developed-by: Solom Tamawy <solom.tamawy@microsoft.com> Signed-off-by: Solom Tamawy <solom.tamawy@microsoft.com> Signed-off-by: Shiraz Saleem <shirazsaleem@microsoft.com> Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1749559717-3424-1-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> (cherry picked from commit baa640d) Signed-off-by: Shreeya Patel <spatel@ciq.com>

jira LE-4467 commit-author Shradha Gupta <shradhagupta@linux.microsoft.com> commit 5da8a8b For supporting dynamic MSI-X vector allocation by PCI controllers, enabling the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN is not enough, msix_prepare_msi_desc() to prepare the MSI descriptor is also needed. Export pci_msix_prepare_desc() to allow PCI controllers to support dynamic MSI-X vector allocation. Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> (cherry picked from commit 5da8a8b) Signed-off-by: Shreeya Patel <spatel@ciq.com>

jira LE-4467 commit-author Shradha Gupta <shradhagupta@linux.microsoft.com> commit ad518f2 Allow dynamic MSI-X vector allocation for pci_hyperv PCI controller by adding support for the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN and using pci_msix_prepare_desc() to prepare the MSI-X descriptors. Feature support added for both x86 and ARM64 Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> (cherry picked from commit ad518f2) Signed-off-by: Shreeya Patel <spatel@ciq.com>

jira LE-4467 commit-author Yury Norov <yury.norov@gmail.com> commit 4607617 Commit 91bfe21 ("net: mana: add a function to spread IRQs per CPUs") added the irq_setup() function that distributes IRQs on CPUs according to a tricky heuristic. The corresponding commit message explains the heuristic. Duplicate it in the source code to make available for readers without digging git in history. Also, add more detailed explanation about how the heuristics is implemented. Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> (cherry picked from commit 4607617) Signed-off-by: Shreeya Patel <spatel@ciq.com>

jira LE-4467 commit-author Shradha Gupta <shradhagupta@linux.microsoft.com> commit 845c62c In order to prepare the MANA driver to allocate the MSI-X IRQs dynamically, we need to enhance irq_setup() to allow skipping affinitizing IRQs to the first CPU sibling group. This would be for cases when the number of IRQs is less than or equal to the number of online CPUs. In such cases for dynamically added IRQs the first CPU sibling group would already be affinitized with HWC IRQ. Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Yury Norov [NVIDIA] <yury.norov@gmail.com> (cherry picked from commit 845c62c) Signed-off-by: Shreeya Patel <spatel@ciq.com>

jira LE-4467 commit-author Shradha Gupta <shradhagupta@linux.microsoft.com> commit 7553911 upstream-diff There were conflicts seen when applying this patch due to following commit present in our tree before this patch. 590bcf1 ("net: mana: Add handler for hardware servicing events") Currently, the MANA driver allocates MSI-X vectors statically based on MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases ends up allocating more vectors than it needs. This is because, by this time we do not have a HW channel and do not know how many IRQs should be allocated. To avoid this, we allocate 1 MSI-X vector during the creation of HWC and after getting the value supported by hardware, dynamically add the remaining MSI-X vectors. Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> (cherry picked from commit 7553911) Signed-off-by: Shreeya Patel <spatel@ciq.com> Signed-off-by: Shreeya Patel <spatel@ciq.com>

jira LE-4473 commit-author Erni Sri Satya Vennela <ernis@linux.microsoft.com> commit 75cabb4 Introduce support for net_shaper_ops in the MANA driver, enabling configuration of rate limiting on the MANA NIC. To apply rate limiting, the driver issues a HWC command via mana_set_bw_clamp() and updates the corresponding shaper object in the net_shaper cache. If an error occurs during this process, the driver restores the previous speed by querying the current link configuration using mana_query_link_cfg(). The minimum supported bandwidth is 100 Mbps, and only values that are exact multiples of 100 Mbps are allowed. Any other values are rejected. To remove a shaper, the driver resets the bandwidth to the maximum supported by the SKU using mana_set_bw_clamp() and clears the associated cache entry. If an error occurs during this process, the shaper details are retained. On the hardware that does not support these APIs, the net-shaper calls to set speed would fail. Set the speed: ./tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/net_shaper.yaml \ --do set --json '{"ifindex":'$IFINDEX', "handle":{"scope": "netdev", "id":'$ID' }, "bw-max": 200000000 }' Get the shaper details: ./tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/net_shaper.yaml \ --do get --json '{"ifindex":'$IFINDEX', "handle":{"scope": "netdev", "id":'$ID' }}' > {'bw-max': 200000000, > 'handle': {'scope': 'netdev'}, > 'ifindex': $IFINDEX, > 'metric': 'bps'} Delete the shaper object: ./tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/net_shaper.yaml \ --do delete --json '{"ifindex":'$IFINDEX', "handle":{"scope": "netdev","id":'$ID' }}' Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Saurabh Singh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Link: https://patch.msgid.link/1750144656-2021-3-git-send-email-ernis@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> (cherry picked from commit 75cabb4) Signed-off-by: Shreeya Patel <spatel@ciq.com>

jira LE-4473 commit-author Erni Sri Satya Vennela <ernis@linux.microsoft.com> commit a6d5edf Allow mana ethtool get_link_ksettings operation to report the maximum speed supported by the SKU in mbps. The driver retrieves this information by issuing a HWC command to the hardware via mana_query_link_cfg(), which retrieves the SKU's maximum supported speed. These APIs when invoked on hardware that are older/do not support these APIs, the speed would be reported as UNKNOWN. Before: $ethtool enP30832s1 > Settings for enP30832s1: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: Unknown! Duplex: Full Auto-negotiation: off Port: Other PHYAD: 0 Transceiver: internal Link detected: yes After: $ethtool enP30832s1 > Settings for enP30832s1: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: 16000Mb/s Duplex: Full Auto-negotiation: off Port: Other PHYAD: 0 Transceiver: internal Link detected: yes Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Saurabh Singh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Link: https://patch.msgid.link/1750144656-2021-4-git-send-email-ernis@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> (cherry picked from commit a6d5edf) Signed-off-by: Shreeya Patel <spatel@ciq.com>

jira LE-4473 commit-author Erni Sri Satya Vennela <ernis@linux.microsoft.com> commit ca8ac48 upstream-diff There were conflicts seen when applying this patch due to the following patch being in our tree before this one. 7a3c235 ("net: mana: Handle Reset Request from MANA NIC") If any of the HWC commands are not recognized by the underlying hardware, the hardware returns the response header status of -1. Log the information using netdev_info_once to avoid multiple error logs in dmesg. Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Saurabh Singh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/1750144656-2021-5-git-send-email-ernis@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> (cherry picked from commit ca8ac48) Signed-off-by: Shreeya Patel <spatel@ciq.com>

jira LE-4473 commit-author Erni Sri Satya Vennela <ernis@linux.microsoft.com> commit 11cd020 Fix build errors when CONFIG_NET_SHAPER is disabled, including: drivers/net/ethernet/microsoft/mana/mana_en.c:804:10: error: 'const struct net_device_ops' has no member named 'net_shaper_ops' 804 | .net_shaper_ops = &mana_shaper_ops, drivers/net/ethernet/microsoft/mana/mana_en.c:804:35: error: initialization of 'int (*)(struct net_device *, struct neigh_parms *)' from incompatible pointer type 'const struct net_shaper_ops *' [-Werror=incompatible-pointer-types] 804 | .net_shaper_ops = &mana_shaper_ops, Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Fixes: 75cabb4 ("net: mana: Add support for net_shaper_ops") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506230625.bfUlqb8o-lkp@intel.com/ Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1750851355-8067-1-git-send-email-ernis@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> (cherry picked from commit 11cd020) Signed-off-by: Shreeya Patel <spatel@ciq.com>

jira LE-4527 commit-author Zhiyue Qiu <zhiyueqiu@microsoft.com> commit 084f35b Add packet and request port counters to mana_ib. Signed-off-by: Zhiyue Qiu <zhiyueqiu@microsoft.com> Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1752143395-5324-1-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> (cherry picked from commit 084f35b) Signed-off-by: Shreeya Patel <spatel@ciq.com>

jira LE-4524 commit-author Konstantin Taranov <kotaranov@microsoft.com> commit 44d69d3 Drain send WRs of the GSI QP on device removal. In rare servicing scenarios, the hardware may delete the state of the GSI QP, preventing it from generating CQEs for pending send WRs. Since WRs submitted to the GSI QP hold CM resources, the device cannot be removed until those WRs are completed. This patch marks all pending send WRs as failed, allowing the GSI QP to release the CM resources and enabling safe device removal. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1753779618-23629-1-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org> (cherry picked from commit 44d69d3) Signed-off-by: Shreeya Patel <spatel@ciq.com>

…nnel open. jira LE-4494 commit-author Dipayaan Roy <dipayanroy@linux.microsoft.com> commit 9448ccd The hv_netvsc driver currently enables NAPI after opening the primary and subchannels. This ordering creates a race: if the Hyper-V host places data in the host -> guest ring buffer and signals the channel before napi_enable() has been called, the channel callback will run but napi_schedule_prep() will return false. As a result, the NAPI poller never gets scheduled, the data in the ring buffer is not consumed, and the receive queue may remain permanently stuck until another interrupt happens to arrive. Fix this by enabling NAPI and registering it with the RX/TX queues before vmbus channel is opened. This guarantees that any early host signal after open will correctly trigger NAPI scheduling and the ring buffer will be drained. Fixes: 76bb5db ("netvsc: fix use after free on module removal") Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/20250825115627.GA32189@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> (cherry picked from commit 9448ccd) Signed-off-by: Shreeya Patel <spatel@ciq.com>

jira LE-4497 commit-author Haiyang Zhang <haiyangz@microsoft.com> commit c4deabb If HW Channel (HWC) is not responding, reduce the waiting time, so further steps will fail quickly. This will prevent getting stuck for a long time (30 minutes or more), for example, during unloading while HWC is not responding. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/1757537841-5063-1-git-send-email-haiyangz@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> (cherry picked from commit c4deabb) Signed-off-by: Shreeya Patel <spatel@ciq.com>

jira LE-4521 commit-author Shiraz Saleem <shirazsaleem@microsoft.com> commit 2bd7dd3 Extend modify QP to support further attributes: local_ack_timeout, UD qkey, rate_limit, qp_access_flags, flow_label, max_rd_atomic. Signed-off-by: Shiraz Saleem <shirazsaleem@microsoft.com> Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1757923172-4475-1-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org> (cherry picked from commit 2bd7dd3) Signed-off-by: Shreeya Patel <spatel@ciq.com>

…/O issuing CPU jira LE-4537 commit-author Long Li <longli@microsoft.com> commit b69ffea When selecting an outgoing channel for I/O, storvsc tries to select a channel with a returning CPU that is not the same as issuing CPU. This worked well in the past, however it doesn't work well when the Hyper-V exposes a large number of channels (up to the number of all CPUs). Use a different CPU for returning channel is not efficient on Hyper-V. Change this behavior by preferring to the channel with the same CPU as the current I/O issuing CPU whenever possible. Tests have shown improvements in newer Hyper-V/Azure environment, and no regression with older Hyper-V/Azure environments. Tested-by: Raheel Abdul Faizy <rabdulfaizy@microsoft.com> Signed-off-by: Long Li <longli@microsoft.com> Message-Id: <1759381530-7414-1-git-send-email-longli@linux.microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from commit b69ffea) Signed-off-by: Shreeya Patel <spatel@ciq.com>

…es to improve memory efficiency. jira LE-4490 commit-author Dipayaan Roy <dipayanroy@linux.microsoft.com> commit 730ff06 upstream-diff This patch was causing build failures due to missing commit 0f92140 ("memory-provider: dmabuf devmem memory provider") To fix it, we have removed pprm.queue_idx parameter which seems redundant in this case. This patch enhances RX buffer handling in the mana driver by allocating pages from a page pool and slicing them into MTU-sized fragments, rather than dedicating a full page per packet. This approach is especially beneficial on systems with large base page sizes like 64KB. Key improvements: - Proper integration of page pool for RX buffer allocations. - MTU-sized buffer slicing to improve memory utilization. - Reduce overall per Rx queue memory footprint. - Automatic fallback to full-page buffers when: * Jumbo frames are enabled (MTU > PAGE_SIZE / 2). * The XDP path is active, to avoid complexities with fragment reuse. Testing on VMs with 64KB pages shows around 200% throughput improvement. Memory efficiency is significantly improved due to reduced wastage in page allocations. Example: We are now able to fit 35 rx buffers in a single 64kb page for MTU size of 1500, instead of 1 rx buffer per page previously. Tested: - iperf3, iperf2, and nttcp benchmarks. - Jumbo frames with MTU 9000. - Native XDP programs (XDP_PASS, XDP_DROP, XDP_TX, XDP_REDIRECT) for testing the XDP path in driver. - Memory leak detection (kmemleak). - Driver load/unload, reboot, and stress scenarios. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/20250814140410.GA22089@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net Signed-off-by: Paolo Abeni <pabeni@redhat.com> (cherry picked from commit 730ff06) Signed-off-by: Shreeya Patel <spatel@ciq.com>

…nused() jira SECO-468 commit-author Luis Henriques <luis@igalia.com> commit 395b955 Add and export a new helper d_dispose_if_unused() which is simply a wrapper around to_shrink_list(), to add an entry to a dispose list if it's not used anymore. Also export shrink_dentry_list() to kill all dentries in a dispose list. Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> (cherry picked from commit 395b955) Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>

jira KERNEL-170 commit-author Brian Vazquez <brianvv@google.com> commit 7292af0 Add a helper function to correctly handle the lockless synchronization when the sender needs to block. The paradigm is if (no_resources()) { stop_queue(); barrier(); if (!no_resources()) restart_queue(); } netif_subqueue_maybe_stop already handles the paradigm correctly, but the code split the check for resources in three parts, the first one (descriptors) followed the protocol, but the other two (completions and tx_buf) were only doing the first part and so race prone. Luckily netif_subqueue_maybe_stop macro already allows you to use a function to evaluate the start/stop conditions so the fix only requires the right helper function to evaluate all the conditions at once. The patch removes idpf_tx_maybe_stop_common since it's no longer needed and instead adjusts separately the conditions for singleq and splitq. Note that idpf_tx_buf_hw_update doesn't need to check for resources since that will be covered in idpf_tx_splitq_frame. To reproduce: Reduce the threshold for pending completions to increase the chances of hitting this pause by changing your kernel: drivers/net/ethernet/intel/idpf/idpf_txrx.h -#define IDPF_TX_COMPLQ_OVERFLOW_THRESH(txcq) ((txcq)->desc_count >> 1) +#define IDPF_TX_COMPLQ_OVERFLOW_THRESH(txcq) ((txcq)->desc_count >> 4) Use pktgen to force the host to push small pkts very aggressively: ./pktgen_sample02_multiqueue.sh -i eth1 -s 100 -6 -d $IP -m $MAC \ -p 10000-10000 -t 16 -n 0 -v -x -c 64 Fixes: 6818c4d ("idpf: add splitq start_xmit") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Josh Hay <joshua.a.hay@intel.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Luigi Rizzo <lrizzo@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> (cherry picked from commit 7292af0) Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>

jira KERNEL-170 commit-author Joshua Hay <joshua.a.hay@intel.com> commit cb83b55 upstream-diff | adjusted the number of bytes expected in libeth_cacheline_set_assert for struct idpf_tx_queue due to missing of some elements in the struct introduced in commit 1a49cf8 ("idpf: add Tx timestamp flows"). In certain production environments, it is possible for completion tags to collide, meaning N packets with the same completion tag are in flight at the same time. In this environment, any given Tx queue is effectively used to send both slower traffic and higher throughput traffic simultaneously. This is the result of a customer's specific configuration in the device pipeline, the details of which Intel cannot provide. This configuration results in a small number of out-of-order completions, i.e., a small number of packets in flight. The existing guardrails in the driver only protect against a large number of packets in flight. The slower flow completions are delayed which causes the out-of-order completions. The fast flow will continue sending traffic and generating tags. Because tags are generated on the fly, the fast flow eventually uses the same tag for a packet that is still in flight from the slower flow. The driver has no idea which packet it should clean when it processes the completion with that tag, but it will look for the packet on the buffer ring before the hash table. If the slower flow packet completion is processed first, it will end up cleaning the fast flow packet on the ring prematurely. This leaves the descriptor ring in a bad state resulting in a crash or Tx timeout. In summary, generating a tag when a packet is sent can lead to the same tag being associated with multiple packets. This can lead to resource leaks, crashes, and/or Tx timeouts. Before we can replace the tag generation, we need a new mechanism for the send path to know what tag to use next. The driver will allocate and initialize a refillq for each TxQ with all of the possible free tag values. During send, the driver grabs the next free tag from the refillq from next_to_clean. While cleaning the packet, the clean routine posts the tag back to the refillq's next_to_use to indicate that it is now free to use. This mechanism works exactly the same way as the existing Rx refill queues, which post the cleaned buffer IDs back to the buffer queue to be reposted to HW. Since we're using the refillqs for both Rx and Tx now, genericize some of the existing refillq support. Note: the refillqs will not be used yet. This is only demonstrating how they will be used to pass free tags back to the send path. Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> (cherry picked from commit cb83b55) Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>

jira KERNEL-170 commit-author Joshua Hay <joshua.a.hay@intel.com> commit f2d18e1 Track the gap between next_to_use and the last RE index. Set RE again if the gap is large enough to ensure RE bit is set frequently. This is critical before removing the stashing mechanisms because the opportunistic descriptor ring cleaning from the out-of-order completions will go away. Previously the descriptors would be "cleaned" by both the descriptor (RE) completion and the out-of-order completions. Without the latter, we must ensure the RE bit is set more frequently. Otherwise, it's theoretically possible for the descriptor ring next_to_clean to never advance. The previous implementation was dependent on the start of a packet falling on a 64th index in the descriptor ring, which is not guaranteed with large packets. Signed-off-by: Luigi Rizzo <lrizzo@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> (cherry picked from commit f2d18e1) Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>

jira KERNEL-170 commit-author Joshua Hay <joshua.a.hay@intel.com> commit b61dfa9 upstream-diff | adjusted context in 2 places: - when removing func idpf_tx_dma_map_error due to different memset call that uses the hardcoded struct type; - in func idpf_tx_splitq_frame due to missing expected union idpf_flex_tx_ctx_desc *ctx_desc; both differences were introduced in commit 1a49cf8 ("idpf: add Tx timestamp flows"). Move (and rename) the existing rollback logic to singleq.c since that will be the only consumer. Create a simplified splitq specific rollback function to loop through and unmap tx_bufs based on the completion tag. This is critical before replacing the Tx buffer ring with the buffer pool since the previous rollback indexing will not work to unmap the chained buffers from the pool. Cache the next_to_use index before any portion of the packet is put on the descriptor ring. In case of an error, the rollback will bump tail to the correct next_to_use value. Because the splitq path now supports different types of context descriptors (and potentially multiple in the future), this will take care of rolling back any and all context descriptors encoded on the ring for the erroneous packet. The previous rollback logic was broken for PTP packets since it would not account for the PTP context descriptor. Fixes: 1a49cf8 ("idpf: add Tx timestamp flows") Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> (cherry picked from commit b61dfa9) Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>

jira KERNEL-170 commit-author Joshua Hay <joshua.a.hay@intel.com> commit 5f417d5 upstream-diff | adjusted context in: - ifpf_tx_splitq_frame and idpf_tx_clean_bufs; - libeth_cacheline_set_assert for struct idpf_tx_queue due to missing of some elements in the struct; all cases are due to missing commit 1a49cf8 ("idpf: add Tx timestamp flows"). Replace the TxQ buffer ring with one large pool/array of buffers (only for flow scheduling). This eliminates the tag generation and makes it impossible for a tag to be associated with more than one packet. The completion tag passed to HW through the descriptor is the index into the array. That same completion tag is posted back to the driver in the completion descriptor, and used to index into the array to quickly retrieve the buffer during cleaning. In this way, the tags are treated as a fix sized resource. If all tags are in use, no more packets can be sent on that particular queue (until some are freed up). The tag pool size is 64K since the completion tag width is 16 bits. For each packet, the driver pulls a free tag from the refillq to get the next free buffer index. When cleaning is complete, the tag is posted back to the refillq. A multi-frag packet spans multiple buffers in the driver, therefore it uses multiple buffer indexes/tags from the pool. Each frag pulls from the refillq to get the next free buffer index. These are tracked in a next_buf field that replaces the completion tag field in the buffer struct. This chains the buffers together so that the packet can be cleaned from the starting completion tag taken from the completion descriptor, then from the next_buf field for each subsequent buffer. In case of a dma_mapping_error occurs or the refillq runs out of free buf_ids, the packet will execute the rollback error path. This unmaps any buffers previously mapped for the packet. Since several free buf_ids could have already been pulled from the refillq, we need to restore its original state as well. Otherwise, the buf_ids/tags will be leaked and not used again until the queue is reallocated. Descriptor completions only advance the descriptor ring index to "clean" the descriptors. The packet completions only clean the buffers associated with the given packet completion tag and do not update the descriptor ring index. When operating in queue based scheduling mode, the array still acts as a ring and will only have TxQ descriptor count entries. The tx_bufs are still associated 1:1 with the descriptor ring entries and we can use the conventional indexing mechanisms. Fixes: c2d548c ("idpf: add TX splitq napi poll support") Signed-off-by: Luigi Rizzo <lrizzo@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> (cherry picked from commit 5f417d5) Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>

jira KERNEL-170 commit-author Joshua Hay <joshua.a.hay@intel.com> commit 0c3f135 upstream-diff | adjusted conflict in idpf_tx_splitq_frame func due to missing 1a49cf8 ("idpf: add Tx timestamp flows"). The Tx refillq logic will cause packets to be silently dropped if there are not enough buffer resources available to send a packet in flow scheduling mode. Instead, determine how many buffers are needed along with number of descriptors. Make sure there are enough of both resources to send the packet, and stop the queue if not. Fixes: 7292af0 ("idpf: fix a race in txq wakeup") Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> (cherry picked from commit 0c3f135) Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>

jira KERNEL-170 commit-author Joshua Hay <joshua.a.hay@intel.com> commit 6c4e684 upstream-diff | - adjusted context due to missing idpf_tx_read_tstamp func; - adjusted the number of bytes expected in libeth_cacheline_set_assert for struct idpf_tx_queue due to missing of some elements in the struct; both are due to missing commit 1a49cf8 ("idpf: add Tx timestamp flows"). With the new Tx buffer management scheme, there is no need for all of the stashing mechanisms, the hash table, the reserve buffer stack, etc. Remove all of that. Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> (cherry picked from commit 6c4e684) Signed-off-by: Roxana Nicolescu <rnicolescu@ciq.com>

PlaidCat and others added 26 commits March 23, 2026 14:22

PlaidCat self-assigned this Mar 23, 2026

PlaidCat requested review from a team March 23, 2026 22:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLC-10] Rebase Custom Changes to rlc-10/6.12.0-124.45.1.el10_1#1004

[RLC-10] Rebase Custom Changes to rlc-10/6.12.0-124.45.1.el10_1#1004
PlaidCat wants to merge 26 commits intorlc-10/6.12.0-124.45.1.el10_1from
{jmaple}_rlc-10/6.12.0-124.45.1.el10_1

PlaidCat commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

Conversation

PlaidCat commented Mar 23, 2026

Update process (This kernel CentOS base for 6.12.0-124.45.1.el10_1)

Rebase Log

BUILD

KSelfTest

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants