LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 1/4] habanalabs: add "in device creation" status
@ 2021-08-20 15:07 Oded Gabbay
  2021-08-20 15:07 ` [PATCH 2/4] habanalabs: disable IRQ in user interrupts spinlock Oded Gabbay
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Oded Gabbay @ 2021-08-20 15:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: Omer Shpigelman

From: Omer Shpigelman <oshpigelman@habana.ai>

On init, the disabled state is cleared right before hw_init and that
causes the device to report on "Operational" state before the device
initialization is finished. Although the char device is not yet exposed
to the user at this stage, the sysfs entries are exposed.

This can cause errors in monitoring applications that use the sysfs
entries.

In order to avoid this, a new state "in device creation" is introduced
to ne reported when the device is not disabled but is still in init
flow.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 drivers/misc/habanalabs/common/device.c         |  3 +++
 drivers/misc/habanalabs/common/habanalabs.h     |  2 +-
 drivers/misc/habanalabs/common/habanalabs_drv.c |  8 ++++++--
 drivers/misc/habanalabs/common/sysfs.c          | 14 +++++---------
 include/uapi/misc/habanalabs.h                  |  4 +++-
 5 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/drivers/misc/habanalabs/common/device.c b/drivers/misc/habanalabs/common/device.c
index 3751c915f731..c2641030d9ff 100644
--- a/drivers/misc/habanalabs/common/device.c
+++ b/drivers/misc/habanalabs/common/device.c
@@ -23,6 +23,8 @@ enum hl_device_status hl_device_status(struct hl_device *hdev)
 		status = HL_DEVICE_STATUS_NEEDS_RESET;
 	else if (hdev->disabled)
 		status = HL_DEVICE_STATUS_MALFUNCTION;
+	else if (!hdev->init_done)
+		status = HL_DEVICE_STATUS_IN_DEVICE_CREATION;
 	else
 		status = HL_DEVICE_STATUS_OPERATIONAL;
 
@@ -44,6 +46,7 @@ bool hl_device_operational(struct hl_device *hdev,
 	case HL_DEVICE_STATUS_NEEDS_RESET:
 		return false;
 	case HL_DEVICE_STATUS_OPERATIONAL:
+	case HL_DEVICE_STATUS_IN_DEVICE_CREATION:
 	default:
 		return true;
 	}
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index 465fd909a7b7..d22ad9e4d5e8 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -2010,7 +2010,7 @@ struct hl_state_dump_specs {
 
 #define HL_STR_MAX	32
 
-#define HL_DEV_STS_MAX (HL_DEVICE_STATUS_NEEDS_RESET + 1)
+#define HL_DEV_STS_MAX (HL_DEVICE_STATUS_LAST + 1)
 
 /* Theoretical limit only. A single host can only contain up to 4 or 8 PCIe
  * x16 cards. In extreme cases, there are hosts that can accommodate 16 cards.
diff --git a/drivers/misc/habanalabs/common/habanalabs_drv.c b/drivers/misc/habanalabs/common/habanalabs_drv.c
index 3df4313d72cd..2ef59fd465ba 100644
--- a/drivers/misc/habanalabs/common/habanalabs_drv.c
+++ b/drivers/misc/habanalabs/common/habanalabs_drv.c
@@ -317,12 +317,16 @@ int create_hdev(struct hl_device **dev, struct pci_dev *pdev,
 		hdev->asic_prop.fw_security_enabled = false;
 
 	/* Assign status description string */
-	strncpy(hdev->status[HL_DEVICE_STATUS_MALFUNCTION],
-					"disabled", HL_STR_MAX);
+	strncpy(hdev->status[HL_DEVICE_STATUS_OPERATIONAL],
+					"operational", HL_STR_MAX);
 	strncpy(hdev->status[HL_DEVICE_STATUS_IN_RESET],
 					"in reset", HL_STR_MAX);
+	strncpy(hdev->status[HL_DEVICE_STATUS_MALFUNCTION],
+					"disabled", HL_STR_MAX);
 	strncpy(hdev->status[HL_DEVICE_STATUS_NEEDS_RESET],
 					"needs reset", HL_STR_MAX);
+	strncpy(hdev->status[HL_DEVICE_STATUS_IN_DEVICE_CREATION],
+					"in device creation", HL_STR_MAX);
 
 	hdev->major = hl_major;
 	hdev->reset_on_lockup = reset_on_lockup;
diff --git a/drivers/misc/habanalabs/common/sysfs.c b/drivers/misc/habanalabs/common/sysfs.c
index db72df282ef8..3b590a0515fb 100644
--- a/drivers/misc/habanalabs/common/sysfs.c
+++ b/drivers/misc/habanalabs/common/sysfs.c
@@ -285,16 +285,12 @@ static ssize_t status_show(struct device *dev, struct device_attribute *attr,
 				char *buf)
 {
 	struct hl_device *hdev = dev_get_drvdata(dev);
-	char *str;
+	char str[HL_STR_MAX];
 
-	if (atomic_read(&hdev->in_reset))
-		str = "In reset";
-	else if (hdev->disabled)
-		str = "Malfunction";
-	else if (hdev->needs_reset)
-		str = "Needs Reset";
-	else
-		str = "Operational";
+	strncpy(str, hdev->status[hl_device_status(hdev)], HL_STR_MAX);
+
+	/* use uppercase for backward compatibility */
+	str[0] = 'A' + (str[0] - 'a');
 
 	return sprintf(buf, "%s\n", str);
 }
diff --git a/include/uapi/misc/habanalabs.h b/include/uapi/misc/habanalabs.h
index e3425bcc6d15..52ddb8365119 100644
--- a/include/uapi/misc/habanalabs.h
+++ b/include/uapi/misc/habanalabs.h
@@ -276,7 +276,9 @@ enum hl_device_status {
 	HL_DEVICE_STATUS_OPERATIONAL,
 	HL_DEVICE_STATUS_IN_RESET,
 	HL_DEVICE_STATUS_MALFUNCTION,
-	HL_DEVICE_STATUS_NEEDS_RESET
+	HL_DEVICE_STATUS_NEEDS_RESET,
+	HL_DEVICE_STATUS_IN_DEVICE_CREATION,
+	HL_DEVICE_STATUS_LAST = HL_DEVICE_STATUS_IN_DEVICE_CREATION
 };
 
 enum hl_server_type {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 2/4] habanalabs: disable IRQ in user interrupts spinlock
  2021-08-20 15:07 [PATCH 1/4] habanalabs: add "in device creation" status Oded Gabbay
@ 2021-08-20 15:07 ` Oded Gabbay
  2021-08-20 15:07 ` [PATCH 3/4] habanalabs: remove unnecessary device status check Oded Gabbay
  2021-08-20 15:07 ` [PATCH 4/4] habanalabs: never copy_from_user inside spinlock Oded Gabbay
  2 siblings, 0 replies; 5+ messages in thread
From: Oded Gabbay @ 2021-08-20 15:07 UTC (permalink / raw)
  To: linux-kernel

Because this spinlock is taken in an interrupt handler, we must use
the spin_lock_irqsave/irqrestore version to disable the interrupts
on the local CPU. Otherwise, we can have a potential deadlock (if
the interrupt handler is scheduled to run on the same cpu that the
code who took the lock was running on).

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 .../habanalabs/common/command_submission.c    | 25 ++++++++++---------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/misc/habanalabs/common/command_submission.c b/drivers/misc/habanalabs/common/command_submission.c
index 3a67265312ee..8a2f75de6df8 100644
--- a/drivers/misc/habanalabs/common/command_submission.c
+++ b/drivers/misc/habanalabs/common/command_submission.c
@@ -923,13 +923,14 @@ static void
 wake_pending_user_interrupt_threads(struct hl_user_interrupt *interrupt)
 {
 	struct hl_user_pending_interrupt *pend;
+	unsigned long flags;
 
-	spin_lock(&interrupt->wait_list_lock);
+	spin_lock_irqsave(&interrupt->wait_list_lock, flags);
 	list_for_each_entry(pend, &interrupt->wait_list_head, wait_list_node) {
 		pend->fence.error = -EIO;
 		complete_all(&pend->fence.completion);
 	}
-	spin_unlock(&interrupt->wait_list_lock);
+	spin_unlock_irqrestore(&interrupt->wait_list_lock, flags);
 }
 
 void hl_release_pending_user_interrupts(struct hl_device *hdev)
@@ -2714,9 +2715,9 @@ static int _hl_interrupt_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
 {
 	struct hl_user_pending_interrupt *pend;
 	struct hl_user_interrupt *interrupt;
-	unsigned long timeout;
-	long completion_rc;
+	unsigned long timeout, flags;
 	u32 completion_value;
+	long completion_rc;
 	int rc = 0;
 
 	if (timeout_us == U32_MAX)
@@ -2739,7 +2740,7 @@ static int _hl_interrupt_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
 	else
 		interrupt = &hdev->user_interrupt[interrupt_offset];
 
-	spin_lock(&interrupt->wait_list_lock);
+	spin_lock_irqsave(&interrupt->wait_list_lock, flags);
 	if (!hl_device_operational(hdev, NULL)) {
 		rc = -EPERM;
 		goto unlock_and_free_fence;
@@ -2765,7 +2766,7 @@ static int _hl_interrupt_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
 	 * handler to monitor
 	 */
 	list_add_tail(&pend->wait_list_node, &interrupt->wait_list_head);
-	spin_unlock(&interrupt->wait_list_lock);
+	spin_unlock_irqrestore(&interrupt->wait_list_lock, flags);
 
 wait_again:
 	/* Wait for interrupt handler to signal completion */
@@ -2777,12 +2778,12 @@ static int _hl_interrupt_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
 	 * If comparison fails, keep waiting until timeout expires
 	 */
 	if (completion_rc > 0) {
-		spin_lock(&interrupt->wait_list_lock);
+		spin_lock_irqsave(&interrupt->wait_list_lock, flags);
 
 		if (copy_from_user(&completion_value,
 				u64_to_user_ptr(user_address), 4)) {
 
-			spin_unlock(&interrupt->wait_list_lock);
+			spin_unlock_irqrestore(&interrupt->wait_list_lock, flags);
 
 			dev_err(hdev->dev,
 				"Failed to copy completion value from user\n");
@@ -2792,13 +2793,13 @@ static int _hl_interrupt_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
 		}
 
 		if (completion_value >= target_value) {
-			spin_unlock(&interrupt->wait_list_lock);
+			spin_unlock_irqrestore(&interrupt->wait_list_lock, flags);
 			*status = CS_WAIT_STATUS_COMPLETED;
 		} else {
 			reinit_completion(&pend->fence.completion);
 			timeout = completion_rc;
 
-			spin_unlock(&interrupt->wait_list_lock);
+			spin_unlock_irqrestore(&interrupt->wait_list_lock, flags);
 			goto wait_again;
 		}
 	} else if (completion_rc == -ERESTARTSYS) {
@@ -2812,11 +2813,11 @@ static int _hl_interrupt_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
 	}
 
 remove_pending_user_interrupt:
-	spin_lock(&interrupt->wait_list_lock);
+	spin_lock_irqsave(&interrupt->wait_list_lock, flags);
 	list_del(&pend->wait_list_node);
 
 unlock_and_free_fence:
-	spin_unlock(&interrupt->wait_list_lock);
+	spin_unlock_irqrestore(&interrupt->wait_list_lock, flags);
 	kfree(pend);
 	hl_ctx_put(ctx);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 3/4] habanalabs: remove unnecessary device status check
  2021-08-20 15:07 [PATCH 1/4] habanalabs: add "in device creation" status Oded Gabbay
  2021-08-20 15:07 ` [PATCH 2/4] habanalabs: disable IRQ in user interrupts spinlock Oded Gabbay
@ 2021-08-20 15:07 ` Oded Gabbay
  2021-08-20 15:07 ` [PATCH 4/4] habanalabs: never copy_from_user inside spinlock Oded Gabbay
  2 siblings, 0 replies; 5+ messages in thread
From: Oded Gabbay @ 2021-08-20 15:07 UTC (permalink / raw)
  To: linux-kernel

Checking if the device is operational when entering the function to
wait for user interrupt is not something that is useful or necessary.

It is not done in any other wait_for_cs ioctl path.

If the device becomes non-operational during the wait, the reset
function will make sure the process wait is interrupted.

Instead, move the check to the beginning of hl_wait_ioctl(). It will
block any attempt to wait on CS or user interrupt once the device
is already marked as non-operational.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 drivers/misc/habanalabs/common/command_submission.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/misc/habanalabs/common/command_submission.c b/drivers/misc/habanalabs/common/command_submission.c
index 8a2f75de6df8..a97bb27ebb90 100644
--- a/drivers/misc/habanalabs/common/command_submission.c
+++ b/drivers/misc/habanalabs/common/command_submission.c
@@ -2741,10 +2741,6 @@ static int _hl_interrupt_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
 		interrupt = &hdev->user_interrupt[interrupt_offset];
 
 	spin_lock_irqsave(&interrupt->wait_list_lock, flags);
-	if (!hl_device_operational(hdev, NULL)) {
-		rc = -EPERM;
-		goto unlock_and_free_fence;
-	}
 
 	if (copy_from_user(&completion_value, u64_to_user_ptr(user_address),
 									4)) {
@@ -2891,6 +2887,12 @@ int hl_wait_ioctl(struct hl_fpriv *hpriv, void *data)
 	u32 flags = args->in.flags;
 	int rc;
 
+	/* If the device is not operational, no point in waiting for any command submission or
+	 * user interrupt
+	 */
+	if (!hl_device_operational(hpriv->hdev, NULL))
+		return -EPERM;
+
 	if (flags & HL_WAIT_CS_FLAGS_INTERRUPT)
 		rc = hl_interrupt_wait_ioctl(hpriv, data);
 	else if (flags & HL_WAIT_CS_FLAGS_MULTI_CS)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 4/4] habanalabs: never copy_from_user inside spinlock
  2021-08-20 15:07 [PATCH 1/4] habanalabs: add "in device creation" status Oded Gabbay
  2021-08-20 15:07 ` [PATCH 2/4] habanalabs: disable IRQ in user interrupts spinlock Oded Gabbay
  2021-08-20 15:07 ` [PATCH 3/4] habanalabs: remove unnecessary device status check Oded Gabbay
@ 2021-08-20 15:07 ` Oded Gabbay
  2021-08-21 11:04   ` Oded Gabbay
  2 siblings, 1 reply; 5+ messages in thread
From: Oded Gabbay @ 2021-08-20 15:07 UTC (permalink / raw)
  To: linux-kernel

copy_from_user might sleep so we can never call it when we have
a spinlock.

Moreover, it is not necessary in waiting for user interrupt, because
if multiple threads will call this function on the same interrupt,
each one will have it's own fence object inside the driver. The
user address might be the same, but it doesn't really matter to us,
as we only read from it.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 .../habanalabs/common/command_submission.c    | 35 +++++++------------
 1 file changed, 12 insertions(+), 23 deletions(-)

diff --git a/drivers/misc/habanalabs/common/command_submission.c b/drivers/misc/habanalabs/common/command_submission.c
index a97bb27ebb90..7b0516cf808b 100644
--- a/drivers/misc/habanalabs/common/command_submission.c
+++ b/drivers/misc/habanalabs/common/command_submission.c
@@ -2740,14 +2740,10 @@ static int _hl_interrupt_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
 	else
 		interrupt = &hdev->user_interrupt[interrupt_offset];
 
-	spin_lock_irqsave(&interrupt->wait_list_lock, flags);
-
-	if (copy_from_user(&completion_value, u64_to_user_ptr(user_address),
-									4)) {
-		dev_err(hdev->dev,
-			"Failed to copy completion value from user\n");
+	if (copy_from_user(&completion_value, u64_to_user_ptr(user_address), 4)) {
+		dev_err(hdev->dev, "Failed to copy completion value from user\n");
 		rc = -EFAULT;
-		goto unlock_and_free_fence;
+		goto free_fence;
 	}
 
 	if (completion_value >= target_value)
@@ -2756,42 +2752,35 @@ static int _hl_interrupt_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
 		*status = CS_WAIT_STATUS_BUSY;
 
 	if (!timeout_us || (*status == CS_WAIT_STATUS_COMPLETED))
-		goto unlock_and_free_fence;
+		goto free_fence;
 
 	/* Add pending user interrupt to relevant list for the interrupt
 	 * handler to monitor
 	 */
+	spin_lock_irqsave(&interrupt->wait_list_lock, flags);
 	list_add_tail(&pend->wait_list_node, &interrupt->wait_list_head);
 	spin_unlock_irqrestore(&interrupt->wait_list_lock, flags);
 
 wait_again:
 	/* Wait for interrupt handler to signal completion */
-	completion_rc =
-		wait_for_completion_interruptible_timeout(
-				&pend->fence.completion, timeout);
+	completion_rc = wait_for_completion_interruptible_timeout(&pend->fence.completion,
+										timeout);
 
 	/* If timeout did not expire we need to perform the comparison.
 	 * If comparison fails, keep waiting until timeout expires
 	 */
 	if (completion_rc > 0) {
-		spin_lock_irqsave(&interrupt->wait_list_lock, flags);
-
-		if (copy_from_user(&completion_value,
-				u64_to_user_ptr(user_address), 4)) {
-
-			spin_unlock_irqrestore(&interrupt->wait_list_lock, flags);
-
-			dev_err(hdev->dev,
-				"Failed to copy completion value from user\n");
+		if (copy_from_user(&completion_value, u64_to_user_ptr(user_address), 4)) {
+			dev_err(hdev->dev, "Failed to copy completion value from user\n");
 			rc = -EFAULT;
 
 			goto remove_pending_user_interrupt;
 		}
 
 		if (completion_value >= target_value) {
-			spin_unlock_irqrestore(&interrupt->wait_list_lock, flags);
 			*status = CS_WAIT_STATUS_COMPLETED;
 		} else {
+			spin_lock_irqsave(&interrupt->wait_list_lock, flags);
 			reinit_completion(&pend->fence.completion);
 			timeout = completion_rc;
 
@@ -2811,9 +2800,9 @@ static int _hl_interrupt_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
 remove_pending_user_interrupt:
 	spin_lock_irqsave(&interrupt->wait_list_lock, flags);
 	list_del(&pend->wait_list_node);
-
-unlock_and_free_fence:
 	spin_unlock_irqrestore(&interrupt->wait_list_lock, flags);
+
+free_fence:
 	kfree(pend);
 	hl_ctx_put(ctx);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 4/4] habanalabs: never copy_from_user inside spinlock
  2021-08-20 15:07 ` [PATCH 4/4] habanalabs: never copy_from_user inside spinlock Oded Gabbay
@ 2021-08-21 11:04   ` Oded Gabbay
  0 siblings, 0 replies; 5+ messages in thread
From: Oded Gabbay @ 2021-08-21 11:04 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: Linux-Kernel@Vger. Kernel. Org

On Fri, Aug 20, 2021 at 6:09 PM Oded Gabbay <ogabbay@kernel.org> wrote:
>
> copy_from_user might sleep so we can never call it when we have
> a spinlock.
>
> Moreover, it is not necessary in waiting for user interrupt, because
> if multiple threads will call this function on the same interrupt,
> each one will have it's own fence object inside the driver. The
> user address might be the same, but it doesn't really matter to us,
> as we only read from it.
>
> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
> ---
>  .../habanalabs/common/command_submission.c    | 35 +++++++------------
>  1 file changed, 12 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/misc/habanalabs/common/command_submission.c b/drivers/misc/habanalabs/common/command_submission.c
> index a97bb27ebb90..7b0516cf808b 100644
> --- a/drivers/misc/habanalabs/common/command_submission.c
> +++ b/drivers/misc/habanalabs/common/command_submission.c
> @@ -2740,14 +2740,10 @@ static int _hl_interrupt_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
>         else
>                 interrupt = &hdev->user_interrupt[interrupt_offset];
>
> -       spin_lock_irqsave(&interrupt->wait_list_lock, flags);
> -
> -       if (copy_from_user(&completion_value, u64_to_user_ptr(user_address),
> -                                                                       4)) {
> -               dev_err(hdev->dev,
> -                       "Failed to copy completion value from user\n");
> +       if (copy_from_user(&completion_value, u64_to_user_ptr(user_address), 4)) {
> +               dev_err(hdev->dev, "Failed to copy completion value from user\n");
>                 rc = -EFAULT;
> -               goto unlock_and_free_fence;
> +               goto free_fence;
>         }
>
>         if (completion_value >= target_value)
> @@ -2756,42 +2752,35 @@ static int _hl_interrupt_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
>                 *status = CS_WAIT_STATUS_BUSY;
>
>         if (!timeout_us || (*status == CS_WAIT_STATUS_COMPLETED))
> -               goto unlock_and_free_fence;
> +               goto free_fence;
>
>         /* Add pending user interrupt to relevant list for the interrupt
>          * handler to monitor
>          */
> +       spin_lock_irqsave(&interrupt->wait_list_lock, flags);
>         list_add_tail(&pend->wait_list_node, &interrupt->wait_list_head);
>         spin_unlock_irqrestore(&interrupt->wait_list_lock, flags);
>
>  wait_again:
>         /* Wait for interrupt handler to signal completion */
> -       completion_rc =
> -               wait_for_completion_interruptible_timeout(
> -                               &pend->fence.completion, timeout);
> +       completion_rc = wait_for_completion_interruptible_timeout(&pend->fence.completion,
> +                                                                               timeout);
>
>         /* If timeout did not expire we need to perform the comparison.
>          * If comparison fails, keep waiting until timeout expires
>          */
>         if (completion_rc > 0) {
> -               spin_lock_irqsave(&interrupt->wait_list_lock, flags);
> -
> -               if (copy_from_user(&completion_value,
> -                               u64_to_user_ptr(user_address), 4)) {
> -
> -                       spin_unlock_irqrestore(&interrupt->wait_list_lock, flags);
> -
> -                       dev_err(hdev->dev,
> -                               "Failed to copy completion value from user\n");
> +               if (copy_from_user(&completion_value, u64_to_user_ptr(user_address), 4)) {
> +                       dev_err(hdev->dev, "Failed to copy completion value from user\n");
>                         rc = -EFAULT;
>
>                         goto remove_pending_user_interrupt;
>                 }
>
>                 if (completion_value >= target_value) {
> -                       spin_unlock_irqrestore(&interrupt->wait_list_lock, flags);
>                         *status = CS_WAIT_STATUS_COMPLETED;
>                 } else {
> +                       spin_lock_irqsave(&interrupt->wait_list_lock, flags);
>                         reinit_completion(&pend->fence.completion);
>                         timeout = completion_rc;
>
> @@ -2811,9 +2800,9 @@ static int _hl_interrupt_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
>  remove_pending_user_interrupt:
>         spin_lock_irqsave(&interrupt->wait_list_lock, flags);
>         list_del(&pend->wait_list_node);
> -
> -unlock_and_free_fence:
>         spin_unlock_irqrestore(&interrupt->wait_list_lock, flags);
> +
> +free_fence:
>         kfree(pend);
>         hl_ctx_put(ctx);
>
> --
> 2.17.1
>

Hi Greg,
Thanks for pointing this issue out. It slipped my CR (my bad).
I believe this fixes the problem and I've gone over the entire driver
and didn't see any other occurrence of this bug.

Oded

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-08-21 11:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-20 15:07 [PATCH 1/4] habanalabs: add "in device creation" status Oded Gabbay
2021-08-20 15:07 ` [PATCH 2/4] habanalabs: disable IRQ in user interrupts spinlock Oded Gabbay
2021-08-20 15:07 ` [PATCH 3/4] habanalabs: remove unnecessary device status check Oded Gabbay
2021-08-20 15:07 ` [PATCH 4/4] habanalabs: never copy_from_user inside spinlock Oded Gabbay
2021-08-21 11:04   ` Oded Gabbay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).