In the Linux kernel, the following vulnerability has been resolved: net/mlx5: Fix deadlock between devlink lock and esw->wq esw->work_queue executes esw_functions_changed_event_handler -> esw_vfs_changed_event_handler and acquires the devlink lock. .eswitch_mode_set (acquires devlink lock in devlink_nl_pre_doit) -> mlx5_devlink_eswitch_mode_set -> mlx5_eswitch_disable_locked -> mlx5_eswitch_event_handler_unregister -> flush_workqueue deadlocks when esw_vfs_changed_event_handler executes. Fix that by no longer flushing the work to avoid the deadlock, and using a generation counter to keep track of work relevance. This avoids an old handler manipulating an esw that has undergone one or more mode changes: - the counter is incremented in mlx5_eswitch_event_handler_unregister. - the counter is read and passed to the ephemeral mlx5_host_work struct. - the work handler takes the devlink lock and bails out if the current generation is different than the one it was scheduled to operate on. - mlx5_eswitch_cleanup does the final draining before destroying the wq. No longer flushing the workqueue has the side effect of maybe no longer cancelling pending vport_change_handler work items, but that's ok since those are disabled elsewhere: - mlx5_eswitch_disable_locked disables the vport eq notifier. - mlx5_esw_vport_disable disarms the HW EQ notification and marks vport->enabled under state_lock to false to prevent pending vport handler from doing anything. - mlx5_eswitch_cleanup destroys the workqueue and makes sure all events are disabled/finished.
https://git.kernel.org/stable/c/aed763abf0e905b4b8d747d1ba9e172961572f57
https://git.kernel.org/stable/c/957d2a58f7f8ebcbdd0a85935e0d2675134b890d
https://git.kernel.org/stable/c/90e7e5d14d0bd25ffd019a3aa39d9f1c05fedbe1
https://git.kernel.org/stable/c/4a7838bebc38374f74baaf88bf2cf8d439a92923
https://git.kernel.org/stable/c/3c7313cb41b1b427078440364d2f042c276a1c0b
https://git.kernel.org/stable/c/0de867f6e34eae6907b367fd152c55e61cb98608