LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* [PATCH v2 0/1] Summary: hwmon driver for temperature sensors on SATA drives @ 2019-12-15 17:45 Guenter Roeck 2019-12-15 17:45 ` [PATCH v2] hwmon: Driver " Guenter Roeck 0 siblings, 1 reply; 31+ messages in thread From: Guenter Roeck @ 2019-12-15 17:45 UTC (permalink / raw) To: linux-hwmon Cc: Jean Delvare, Martin K . Petersen, Linus Walleij, Bart Van Assche, linux-doc, linux-kernel, linux-scsi, linux-ide, Guenter Roeck In the past, several attempts have been made to add support for reporting SCSI/[S]ATA drive temperatures to the Linux kernel. This is desirable to have a means to report drive temperatures to userspace without root privileges and in a standard format, but also to be able to tie reported temperatures with the thermal subsystem. The most recent attempt was [1] by Linus Walleij. It went through a total of seven iterations. At the end, it was rejected for a number of reasons; see the provided link for details. This implementation resides in the SCSI core. It originally resided in libata but was moved to SCSI per maintainer request, where it was ultimately rejected. The feedback on this approach suggests to use the SCSI Temperature log page [0x0d] as means to access drive temperature information. It is unknown if this is implemented in any real SCSI drive. The feedback also suggests to obtain temperature from ATA drives, convert it into the SCSI temperature log page in libata-scsi, and to use that information in a hardware monitoring driver. The format and method to do this is documented in [3]. This is not currently implemented in the Linux kernel. An earlier submission of a driver to report SCSI/SATA drive temperatures was made back in 2009 by Constantin Baranov [2]. This submission resides in the hardware monitoring subsystem. It does not rely on changes in the SCSI subsystem or in libata-scsi. Instead, it registers itself with the SCSI subsystem using scsi_register_interface(). It was rejected primarily because it executes ATA passthrough commands without verification that it is actually connected to an ATA drive. Both submissions use SMART attributes to read drive temperature information. [1] also tries to identify temperature limits from those attributes. Unfortunately, SMART attributes are not well defined, resulting in relative complex code trying to identify the exact format of the reported data. With the available information and feedback, we can make a number of observations and conclusions. a) Using available (S)ATA drive temperature information and convert it to a SCSI log page is an interesting idea. On the downside, it would add a substantial amount of complexity to libata-scsi. The code would either have to be optional, or it would have to be built into the kernel even if it is never used on a given system. Without access to SCSI drives supporting this feature, it would be all but impossible to test the code against such a drive. It would neither be possible to test correctness of the code in libata-scsi nor in the driver using that information. Overall it would be much easier and much less risky to implement such code on the receiving side (ie in a driver reporting the temperatures) instead of trying to convert the information from one format to another first. In summary, it is neither practical nor feasible. On top of that, there is no guarantee that code implementing this functionality would ever be accepted into the kernel for this very reason. b) The code needed to read and analyze SCSI temperature log pages is quite complex (see smartmontools [5]). There is no existing support code in the Linux kernel; such code would have to be written. This makes the approach discussed in a) even more risky and less practical. c) Overall, any attempt to report temperature information for anything but SATA drives in the kernel is not practical due to the complexity involved, and due to the inability to test the resulting code with non-SATA drives. d) Using SMART data for anything but basic temperature reporting is not really feasible due to the lack of standardization. Any attempt to do this would add a substantial amount of code, ambiguity, and risk. This submission implements a driver to report the temperature of SATA drives through the hardware monitoring subsystem. It is implemented as stand-alone driver in the hardware monitoring subsystem. The driver uses the mechanism from submission [1] to register with the SCSI subsystem. By using this mechanism, changes in the SCSI or ATA subsystems are not required. To reduce risk and complexity, it only instantiates after reliably validating that it is connected to a SATA drive. It does not attempt to report the temperature of non-SATA drives. The driver uses the SCT Command Transport feature set as specified in ATA8-ACS [4] to read and report the temperature as well as temperature limits and lowest/highest temperature information (if available) for SATA drives. If a drive does not support SCT Command Transport, the driver attempts to access a limited set of well known SMART attributes to read the drive temperature. In that case, only the current drive temperature is reported. --- v2: scsi_cmd variable is no longer static Fixed drive name in Kconfig Describe heuristics used to select SCT or SMART in commit message Added Reviewed-by: from Linus Walleij Note: I thought about waiting for more feedback, but maybe improvements can be made with follow-up patches. --- References: [1] https://patchwork.kernel.org/patch/10688021/ [2] https://lore.kernel.org/lkml/20090913040104.ab1d0b69.const@mimas.ru/ [3] http://www.t10.org/cgi-bin/ac.pl?t=f&f=sat5r02.pdf Information technology - SCSI / ATA Translation - 5 (SAT-5), section 10.3.8 (Temperature log page). [4] http://www.t13.org/documents/uploadeddocuments/docs2008/d1699r6a-ata8-acs.pdf ANS T13/1699-D "Information technology - AT Attachment 8 - ATA/ATAPI Command Set (ATA8-ACS)" [5] https://github.com/mirror/smartmontools.git ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2019-12-15 17:45 [PATCH v2 0/1] Summary: hwmon driver for temperature sensors on SATA drives Guenter Roeck @ 2019-12-15 17:45 ` Guenter Roeck 2019-12-19 0:15 ` Martin K. Petersen 0 siblings, 1 reply; 31+ messages in thread From: Guenter Roeck @ 2019-12-15 17:45 UTC (permalink / raw) To: linux-hwmon Cc: Jean Delvare, Martin K . Petersen, Linus Walleij, Bart Van Assche, linux-doc, linux-kernel, linux-scsi, linux-ide, Guenter Roeck, Chris Healy Reading the hard drive temperature has been supported for years by userspace tools such as smarttools or hddtemp. The downside of such tools is that they need to run with super-user privilege, that the temperatures are not reported by standard tools such as 'sensors' or 'libsensors', and that drive temperatures are not available for use in the kernel's thermal subsystem. This driver solves this problem by adding support for reading the temperature of SATA drives from the kernel using the hwmon API and by adding a temperature zone for each drive. With this driver, the hard disk temperature can be read using the unprivileged 'sensors' application: $ sensors satatemp-scsi-1-0 satatemp-scsi-1-0 Adapter: SCSI adapter temp1: +23.0°C or directly from sysfs: $ grep . /sys/class/hwmon/hwmon9/{name,temp1_input} /sys/class/hwmon/hwmon9/name:satatemp /sys/class/hwmon/hwmon9/temp1_input:23000 If the drive supports SCT transport and reports temperature limits, those are reported as well. satatemp-scsi-0-0 Adapter: SCSI adapter temp1: +27.0°C (low = +0.0°C, high = +60.0°C) (crit low = -41.0°C, crit = +85.0°C) (lowest = +23.0°C, highest = +34.0°C) The driver attempts to use SCT Command Transport to read the drive temperature. If the SCT Command Transport feature set is not available, or if it does not report the drive temperature, drive temperatures may be readable through SMART attributes. Since SMART attributes are not well defined, this method is only used as fallback mechanism. Cc: Chris Healy <cphealy@gmail.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> --- v2: scsi_cmd variable is no longer static Fixed drive name in Kconfig Describe heuristics used to select SCT or SMART in commit message Added Reviewed-by: from Linus Walleij Documentation/hwmon/index.rst | 1 + Documentation/hwmon/satatemp.rst | 48 +++ drivers/hwmon/Kconfig | 10 + drivers/hwmon/Makefile | 1 + drivers/hwmon/satatemp.c | 575 +++++++++++++++++++++++++++++++ 5 files changed, 635 insertions(+) create mode 100644 Documentation/hwmon/satatemp.rst create mode 100644 drivers/hwmon/satatemp.c diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst index 230ad59b462b..ecf1832dd013 100644 --- a/Documentation/hwmon/index.rst +++ b/Documentation/hwmon/index.rst @@ -133,6 +133,7 @@ Hardware Monitoring Kernel Drivers pxe1610 pwm-fan raspberrypi-hwmon + satatemp sch5627 sch5636 scpi-hwmon diff --git a/Documentation/hwmon/satatemp.rst b/Documentation/hwmon/satatemp.rst new file mode 100644 index 000000000000..59b105f3c79a --- /dev/null +++ b/Documentation/hwmon/satatemp.rst @@ -0,0 +1,48 @@ +Kernel driver satatemp +====================== + + +References +---------- + +ANS T13/1699-D +Information technology - AT Attachment 8 - ATA/ATAPI Command Set (ATA8-ACS) + +ANS Project T10/BSR INCITS 513 +Information technology - SCSI Primary Commands - 4 (SPC-4) + +ANS Project INCITS 557 +Information technology - SCSI / ATA Translation - 5 (SAT-5) + + +Description +----------- + +This driver supports reporting the temperature of SATA drives. +If supported, it uses the SCT Command Transport feature to read +the current drive temperature and, if available, temperature limits +as well as historic minimum and maximum temperatures. If SCT Command +Transport is not supported, the driver uses SMART attributes to read +the drive temperature. + + +Sysfs entries +------------- + +Only the temp1_input attribute is always available. Other attributes are +available only if reported by the drive. All temperatures are reported in +milli-degrees Celsius. + +======================= ===================================================== +temp1_input Current drive temperature +temp1_lcrit Minimum temperature limit. Operating the device below + this temperature may cause physical damage to the + device. +temp1_min Minimum recommended continuous operating limit +temp1_max Maximum recommended continuous operating temperature +temp1_crit Maximum temperature limit. Operating the device above + this temperature may cause physical damage to the + device. +temp1_lowest Minimum temperature seen this power cycle +temp1_highest Maximum temperature seen this power cycle +======================= ===================================================== diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig index 13a6b4afb4b3..97acf4ebd5ca 100644 --- a/drivers/hwmon/Kconfig +++ b/drivers/hwmon/Kconfig @@ -1346,6 +1346,16 @@ config SENSORS_RASPBERRYPI_HWMON This driver can also be built as a module. If so, the module will be called raspberrypi-hwmon. +config SENSORS_SATATEMP + tristate "SATA hard disk drives with temperature sensors" + depends on SCSI && ATA + help + If you say yes you get support for the temperature sensor on + SATA hard disk drives. + + This driver can also be built as a module. If so, the module + will be called satatemp. + config SENSORS_SHT15 tristate "Sensiron humidity and temperature sensors. SHT15 and compat." depends on GPIOLIB || COMPILE_TEST diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile index 40c036ea45e6..fe55b8f76af9 100644 --- a/drivers/hwmon/Makefile +++ b/drivers/hwmon/Makefile @@ -148,6 +148,7 @@ obj-$(CONFIG_SENSORS_S3C) += s3c-hwmon.o obj-$(CONFIG_SENSORS_SCH56XX_COMMON)+= sch56xx-common.o obj-$(CONFIG_SENSORS_SCH5627) += sch5627.o obj-$(CONFIG_SENSORS_SCH5636) += sch5636.o +obj-$(CONFIG_SENSORS_SATATEMP) += satatemp.o obj-$(CONFIG_SENSORS_SHT15) += sht15.o obj-$(CONFIG_SENSORS_SHT21) += sht21.o obj-$(CONFIG_SENSORS_SHT3x) += sht3x.o diff --git a/drivers/hwmon/satatemp.c b/drivers/hwmon/satatemp.c new file mode 100644 index 000000000000..9608716d422c --- /dev/null +++ b/drivers/hwmon/satatemp.c @@ -0,0 +1,575 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Hwmon client for SATA hard disk drives with temperature sensors + * Copyright (C) 2019 Zodiac Inflight Innovations + * + * With input from: + * Hwmon client for S.M.A.R.T. hard disk drives with temperature sensors. + * (C) 2018 Linus Walleij + * + * hwmon: Driver for SCSI/ATA temperature sensors + * by Constantin Baranov <const@mimas.ru>, submitted September 2009 + * + * The primary means to read hard drive temperatures and temperature limits + * is the SCT Command Transport feature set as specified in ATA8-ACS. + * It can be used to read the current drive temperature, temperature limits, + * and historic minimum and maximum temperatures. The SCT Command Transport + * feature set is documented in "AT Attachment 8 - ATA/ATAPI Command Set + * (ATA8-ACS)". + * + * If the SCT Command Transport feature set is not available, drive temperatures + * may be readable through SMART attributes. Since SMART attributes are not well + * defined, this method is only used as fallback mechanism. + * + * There are three SMART attributes which may report drive temperatures. + * Those are defined as follows (from + * http://www.cropel.com/library/smart-attribute-list.aspx). + * + * 190 Temperature Temperature, monitored by a sensor somewhere inside + * the drive. Raw value typicaly holds the actual + * temperature (hexadecimal) in its rightmost two digits. + * + * 194 Temperature Temperature, monitored by a sensor somewhere inside + * the drive. Raw value typicaly holds the actual + * temperature (hexadecimal) in its rightmost two digits. + * + * 231 Temperature Temperature, monitored by a sensor somewhere inside + * the drive. Raw value typicaly holds the actual + * temperature (hexadecimal) in its rightmost two digits. + * + * Wikipedia defines attributes a bit differently. + * + * 190 Temperature Value is equal to (100-temp. °C), allowing manufacturer + * Difference or to set a minimum threshold which corresponds to a + * Airflow maximum temperature. This also follows the convention of + * Temperature 100 being a best-case value and lower values being + * undesirable. However, some older drives may instead + * report raw Temperature (identical to 0xC2) or + * Temperature minus 50 here. + * 194 Temperature or Indicates the device temperature, if the appropriate + * Temperature sensor is fitted. Lowest byte of the raw value contains + * Celsius the exact temperature value (Celsius degrees). + * 231 Life Left Indicates the approximate SSD life left, in terms of + * (SSDs) or program/erase cycles or available reserved blocks. + * Temperature A normalized value of 100 represents a new drive, with + * a threshold value at 10 indicating a need for + * replacement. A value of 0 may mean that the drive is + * operating in read-only mode to allow data recovery. + * Previously (pre-2010) occasionally used for Drive + * Temperature (more typically reported at 0xC2). + * + * Common denominator is that the first raw byte reports the temperature + * in degrees C on almost all drives. Some drives may report a fractional + * temperature in the second raw byte. + * + * Known exceptions (from libatasmart): + * - SAMSUNG SV0412H and SAMSUNG SV1204H) report the temperature in 10th + * degrees C in the first two raw bytes. + * - A few Maxtor drives report an unknown or bad value in attribute 194. + * - Certain Apple SSD drives report an unknown value in attribute 190. + * Only certain firmware versions are affected. + * + * Those exceptions affect older ATA drives and are currently ignored. + * Also, the second raw byte (possibly reporting the fractional temperature) + * is currently ignored. + * + * Many drives also report temperature limits in additional SMART data raw + * bytes. The format of those is not well defined and varies widely. + * The driver does not currently attempt to report those limits. + * + * According to data in smartmontools, attribute 231 is rarely used to report + * drive temperatures. At the same time, several drives report SSD life left + * in attribute 231, but do not support temperature sensors. For this reason, + * attribute 231 is currently ignored. + * + * Following above definitions, temperatures are reported as follows. + * If SCT Command Transport is supported, it is used to read the + * temperature and, if available, temperature limits. + * - Otherwise, if SMART attribute 194 is supported, it is used to read + * the temperature. + * - Otherwise, if SMART attribute 190 is supported, it is used to read + * the temperature. + */ + +#include <linux/ata.h> +#include <linux/bits.h> +#include <linux/device.h> +#include <linux/hwmon.h> +#include <linux/kernel.h> +#include <linux/list.h> +#include <linux/module.h> +#include <linux/mutex.h> +#include <scsi/scsi_cmnd.h> +#include <scsi/scsi_device.h> +#include <scsi/scsi_driver.h> +#include <scsi/scsi_proto.h> + +struct satatemp_data { + struct list_head list; /* list of instantiated devices */ + struct mutex lock; /* protect data buffer accesses */ + struct scsi_device *sdev; /* SCSI device */ + struct device *dev; /* instantiating device */ + struct device *hwdev; /* hardware monitoring device */ + u8 smartdata[ATA_SECT_SIZE]; /* local buffer */ + int (*get_temp)(struct satatemp_data *st, u32 attr, long *val); + bool have_temp_lowest; /* lowest temp in SCT status */ + bool have_temp_highest; /* highest temp in SCT status */ + bool have_temp_min; /* have min temp */ + bool have_temp_max; /* have max temp */ + bool have_temp_lcrit; /* have lower critical limit */ + bool have_temp_crit; /* have critical limit */ + int temp_min; /* min temp */ + int temp_max; /* max temp */ + int temp_lcrit; /* lower critical limit */ + int temp_crit; /* critical limit */ +}; + +static LIST_HEAD(satatemp_devlist); + +#define ATA_MAX_SMART_ATTRS 30 +#define SMART_TEMP_PROP_190 190 +#define SMART_TEMP_PROP_194 194 + +#define SCT_STATUS_REQ_ADDR 0xe0 +#define SCT_STATUS_VERSION_LOW 0 /* log byte offsets */ +#define SCT_STATUS_VERSION_HIGH 1 +#define SCT_STATUS_TEMP 200 +#define SCT_STATUS_TEMP_LOWEST 201 +#define SCT_STATUS_TEMP_HIGHEST 202 +#define SCT_READ_LOG_ADDR 0xe1 +#define SMART_READ_LOG 0xd5 +#define SMART_WRITE_LOG 0xd6 + +#define INVALID_TEMP 0x80 + +#define temp_is_valid(temp) ((temp) != INVALID_TEMP) +#define temp_from_sct(temp) (((s8)(temp)) * 1000) + +static inline bool ata_id_smart_supported(u16 *id) +{ + return id[ATA_ID_COMMAND_SET_1] & BIT(0); +} + +static inline bool ata_id_smart_enabled(u16 *id) +{ + return id[ATA_ID_CFS_ENABLE_1] & BIT(0); +} + +static int satatemp_scsi_command(struct satatemp_data *st, + u8 ata_command, u8 feature, + u8 lba_low, u8 lba_mid, u8 lba_high) +{ + u8 scsi_cmd[MAX_COMMAND_SIZE]; + int data_dir; + + memset(scsi_cmd, 0, sizeof(scsi_cmd)); + scsi_cmd[0] = ATA_16; + if (ata_command == ATA_CMD_SMART && feature == SMART_WRITE_LOG) { + scsi_cmd[1] = (5 << 1); /* PIO Data-out */ + /* + * No off.line or cc, write to dev, block count in sector count + * field. + */ + scsi_cmd[2] = 0x06; + data_dir = DMA_TO_DEVICE; + } else { + scsi_cmd[1] = (4 << 1); /* PIO Data-in */ + /* + * No off.line or cc, read from dev, block count in sector count + * field. + */ + scsi_cmd[2] = 0x0e; + data_dir = DMA_FROM_DEVICE; + } + scsi_cmd[4] = feature; + scsi_cmd[6] = 1; /* 1 sector */ + scsi_cmd[8] = lba_low; + scsi_cmd[10] = lba_mid; + scsi_cmd[12] = lba_high; + scsi_cmd[14] = ata_command; + + return scsi_execute_req(st->sdev, scsi_cmd, data_dir, + st->smartdata, ATA_SECT_SIZE, NULL, HZ, 5, + NULL); +} + +static int satatemp_ata_command(struct satatemp_data *st, u8 feature, u8 select) +{ + return satatemp_scsi_command(st, ATA_CMD_SMART, feature, select, + ATA_SMART_LBAM_PASS, ATA_SMART_LBAH_PASS); +} + +static int satatemp_get_smarttemp(struct satatemp_data *st, u32 attr, + long *temp) +{ + u8 *buf = st->smartdata; + bool have_temp = false; + u8 temp_raw; + u8 csum; + int err; + int i; + + err = satatemp_ata_command(st, ATA_SMART_READ_VALUES, 0); + if (err) + return err; + + /* Checksum the read value table */ + csum = 0; + for (i = 0; i < ATA_SECT_SIZE; i++) + csum += buf[i]; + if (csum) { + dev_dbg(&st->sdev->sdev_gendev, + "checksum error reading SMART values\n"); + return -EIO; + } + + for (i = 0; i < ATA_MAX_SMART_ATTRS; i++) { + u8 *attr = buf + i * 12; + int id = attr[2]; + + if (!id) + continue; + + if (id == SMART_TEMP_PROP_190) { + temp_raw = attr[7]; + have_temp = true; + } + if (id == SMART_TEMP_PROP_194) { + temp_raw = attr[7]; + have_temp = true; + break; + } + } + + if (have_temp) { + *temp = temp_raw * 1000; + return 0; + } + + return -ENXIO; +} + +static int satatemp_get_scttemp(struct satatemp_data *st, u32 attr, long *val) +{ + u8 *buf = st->smartdata; + int err; + + err = satatemp_ata_command(st, SMART_READ_LOG, SCT_STATUS_REQ_ADDR); + if (err) + return err; + switch (attr) { + case hwmon_temp_input: + *val = temp_from_sct(buf[SCT_STATUS_TEMP]); + break; + case hwmon_temp_lowest: + *val = temp_from_sct(buf[SCT_STATUS_TEMP_LOWEST]); + break; + case hwmon_temp_highest: + *val = temp_from_sct(buf[SCT_STATUS_TEMP_HIGHEST]); + break; + default: + err = -EINVAL; + break; + } + return err; +} + +static int satatemp_identify(struct satatemp_data *st) +{ + struct scsi_device *sdev = st->sdev; + u8 *buf = st->smartdata; + bool is_ata, is_sata; + bool have_sct_data_table; + bool have_sct_temp; + bool have_smart; + bool have_sct; + u16 *ata_id; + u16 version; + long temp; + u8 *vpd; + int err; + + /* bail out immediately if there is no inquiry data */ + if (!sdev->inquiry || sdev->inquiry_len < 16) + return -ENODEV; + + /* + * Inquiry data sanity checks (per SAT-5): + * - peripheral qualifier must be 0 + * - peripheral device type must be 0x0 (Direct access block device) + * - SCSI Vendor ID is "ATA " + */ + if (sdev->inquiry[0] || + strncmp(&sdev->inquiry[8], "ATA ", 8)) + return -ENODEV; + + vpd = kzalloc(1024, GFP_KERNEL); + if (!vpd) + return -ENOMEM; + + err = scsi_get_vpd_page(sdev, 0x89, vpd, 1024); + if (err) { + kfree(vpd); + return err; + } + + /* + * More sanity checks. + * For VPD offsets and values see ANS Project INCITS 557, + * "Information technology - SCSI / ATA Translation - 5 (SAT-5)". + */ + if (vpd[1] != 0x89 || vpd[2] != 0x02 || vpd[3] != 0x38 || + vpd[36] != 0x34 || vpd[56] != ATA_CMD_ID_ATA) { + kfree(vpd); + return -ENODEV; + } + ata_id = (u16 *)&vpd[60]; + is_ata = ata_id_is_ata(ata_id); + is_sata = ata_id_is_sata(ata_id); + have_sct = ata_id_sct_supported(ata_id); + have_sct_data_table = ata_id_sct_data_tables(ata_id); + have_smart = ata_id_smart_supported(ata_id) && + ata_id_smart_enabled(ata_id); + + kfree(vpd); + + /* bail out if this is not a SATA device */ + if (!is_ata || !is_sata) + return -ENODEV; + if (!have_sct) + goto skip_sct; + + err = satatemp_ata_command(st, SMART_READ_LOG, SCT_STATUS_REQ_ADDR); + if (err) + goto skip_sct; + + version = (buf[SCT_STATUS_VERSION_HIGH] << 8) | + buf[SCT_STATUS_VERSION_LOW]; + if (version != 2 && version != 3) + goto skip_sct; + + have_sct_temp = temp_is_valid(buf[SCT_STATUS_TEMP]); + if (!have_sct_temp) + goto skip_sct; + + st->have_temp_lowest = temp_is_valid(buf[SCT_STATUS_TEMP_LOWEST]); + st->have_temp_highest = temp_is_valid(buf[SCT_STATUS_TEMP_HIGHEST]); + + if (!have_sct_data_table) + goto skip_sct; + + /* Request and read temperature history table */ + memset(buf, '\0', sizeof(st->smartdata)); + buf[0] = 5; /* data table command */ + buf[2] = 1; /* read table */ + buf[4] = 2; /* temperature history table */ + + err = satatemp_ata_command(st, SMART_WRITE_LOG, SCT_STATUS_REQ_ADDR); + if (err) + goto skip_sct_data; + + err = satatemp_ata_command(st, SMART_READ_LOG, SCT_READ_LOG_ADDR); + if (err) + goto skip_sct_data; + + /* + * Temperature limits per AT Attachment 8 - + * ATA/ATAPI Command Set (ATA8-ACS) + */ + st->have_temp_max = temp_is_valid(buf[6]); + st->have_temp_crit = temp_is_valid(buf[7]); + st->have_temp_min = temp_is_valid(buf[8]); + st->have_temp_lcrit = temp_is_valid(buf[9]); + + st->temp_max = temp_from_sct(buf[6]); + st->temp_crit = temp_from_sct(buf[7]); + st->temp_min = temp_from_sct(buf[8]); + st->temp_lcrit = temp_from_sct(buf[9]); + +skip_sct_data: + if (have_sct_temp) { + st->get_temp = satatemp_get_scttemp; + return 0; + } +skip_sct: + if (!have_smart) + return -ENODEV; + st->get_temp = satatemp_get_smarttemp; + return satatemp_get_smarttemp(st, hwmon_temp_input, &temp); +} + +static int satatemp_read(struct device *dev, enum hwmon_sensor_types type, + u32 attr, int channel, long *val) +{ + struct satatemp_data *st = dev_get_drvdata(dev); + int err = 0; + + if (type != hwmon_temp) + return -EINVAL; + + switch (attr) { + case hwmon_temp_input: + case hwmon_temp_lowest: + case hwmon_temp_highest: + mutex_lock(&st->lock); + err = st->get_temp(st, attr, val); + mutex_unlock(&st->lock); + break; + case hwmon_temp_lcrit: + *val = st->temp_lcrit; + break; + case hwmon_temp_min: + *val = st->temp_min; + break; + case hwmon_temp_max: + *val = st->temp_max; + break; + case hwmon_temp_crit: + *val = st->temp_crit; + break; + default: + err = -EINVAL; + break; + } + return err; +} + +static umode_t satatemp_is_visible(const void *data, + enum hwmon_sensor_types type, + u32 attr, int channel) +{ + const struct satatemp_data *st = data; + + switch (type) { + case hwmon_temp: + switch (attr) { + case hwmon_temp_input: + return 0444; + case hwmon_temp_lowest: + if (st->have_temp_lowest) + return 0444; + break; + case hwmon_temp_highest: + if (st->have_temp_highest) + return 0444; + break; + case hwmon_temp_min: + if (st->have_temp_min) + return 0444; + break; + case hwmon_temp_max: + if (st->have_temp_max) + return 0444; + break; + case hwmon_temp_lcrit: + if (st->have_temp_lcrit) + return 0444; + break; + case hwmon_temp_crit: + if (st->have_temp_crit) + return 0444; + break; + default: + break; + } + break; + default: + break; + } + return 0; +} + +static const struct hwmon_channel_info *satatemp_info[] = { + HWMON_CHANNEL_INFO(chip, + HWMON_C_REGISTER_TZ), + HWMON_CHANNEL_INFO(temp, HWMON_T_INPUT | + HWMON_T_LOWEST | HWMON_T_HIGHEST | + HWMON_T_MIN | HWMON_T_MAX | + HWMON_T_LCRIT | HWMON_T_CRIT), + NULL +}; + +static const struct hwmon_ops satatemp_ops = { + .is_visible = satatemp_is_visible, + .read = satatemp_read, +}; + +static const struct hwmon_chip_info satatemp_chip_info = { + .ops = &satatemp_ops, + .info = satatemp_info, +}; + +/* + * The device argument points to sdev->sdev_dev. Its parent is + * sdev->sdev_gendev, which we can use to get the scsi_device pointer. + */ +static int satatemp_add(struct device *dev, struct class_interface *intf) +{ + struct scsi_device *sdev = to_scsi_device(dev->parent); + struct satatemp_data *st; + int err; + + st = kzalloc(sizeof(*st), GFP_KERNEL); + if (!st) + return -ENOMEM; + + st->sdev = sdev; + st->dev = dev; + mutex_init(&st->lock); + + if (satatemp_identify(st)) { + err = -ENODEV; + goto abort; + } + + st->hwdev = hwmon_device_register_with_info(dev->parent, "satatemp", + st, &satatemp_chip_info, + NULL); + if (IS_ERR(st->hwdev)) { + err = PTR_ERR(st->hwdev); + goto abort; + } + + list_add(&st->list, &satatemp_devlist); + return 0; + +abort: + kfree(st); + return err; +} + +static void satatemp_remove(struct device *dev, struct class_interface *intf) +{ + struct satatemp_data *st, *tmp; + + list_for_each_entry_safe(st, tmp, &satatemp_devlist, list) { + if (st->dev == dev) { + list_del(&st->list); + hwmon_device_unregister(st->hwdev); + kfree(st); + break; + } + } +} + +static struct class_interface satatemp_interface = { + .add_dev = satatemp_add, + .remove_dev = satatemp_remove, +}; + +static int __init satatemp_init(void) +{ + return scsi_register_interface(&satatemp_interface); +} + +static void __exit satatemp_exit(void) +{ + scsi_unregister_interface(&satatemp_interface); +} + +module_init(satatemp_init); +module_exit(satatemp_exit); + +MODULE_AUTHOR("Guenter Roeck <linus@roeck-us.net>"); +MODULE_DESCRIPTION("ATA temperature monitor"); +MODULE_LICENSE("GPL"); -- 2.17.1 ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2019-12-15 17:45 ` [PATCH v2] hwmon: Driver " Guenter Roeck @ 2019-12-19 0:15 ` Martin K. Petersen 2019-12-19 0:32 ` Guenter Roeck ` (2 more replies) 0 siblings, 3 replies; 31+ messages in thread From: Martin K. Petersen @ 2019-12-19 0:15 UTC (permalink / raw) To: Guenter Roeck Cc: linux-hwmon, Jean Delvare, Martin K . Petersen, Linus Walleij, Bart Van Assche, linux-doc, linux-kernel, linux-scsi, linux-ide, Chris Healy Guenter, > This driver solves this problem by adding support for reading the > temperature of SATA drives from the kernel using the hwmon API and > by adding a temperature zone for each drive. My working tree is available here: https://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git/log/?h=5.6/drivetemp A few notes: - Before applying your patch I did s/satatemp/drivetemp/ - I get a crash in the driver core during probe if the drivetemp module is loaded prior to loading ahci or a SCSI HBA driver. This crash is unrelated to my changes. Haven't had time to debug. - I tweaked your ATA detection heuristics and now use the cached VPD page 0x89 instead of fetching one from the device. - I also added support for reading the temperature log page on SCSI drives. - Tested with a mixed bag of about 40 SCSI and SATA drives attached. - I still think sensor naming needs work. How and where are the "drivetemp-scsi-8-140" names generated? I'll tinker some more but thought I'd share what I have for now. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2019-12-19 0:15 ` Martin K. Petersen @ 2019-12-19 0:32 ` Guenter Roeck 2020-01-07 4:10 ` Martin K. Petersen 2019-12-19 7:37 ` Guenter Roeck 2020-01-01 17:46 ` Guenter Roeck 2 siblings, 1 reply; 31+ messages in thread From: Guenter Roeck @ 2019-12-19 0:32 UTC (permalink / raw) To: Martin K. Petersen Cc: linux-hwmon, Jean Delvare, Linus Walleij, Bart Van Assche, linux-doc, linux-kernel, linux-scsi, linux-ide, Chris Healy On Wed, Dec 18, 2019 at 07:15:07PM -0500, Martin K. Petersen wrote: > > Guenter, > > > This driver solves this problem by adding support for reading the > > temperature of SATA drives from the kernel using the hwmon API and > > by adding a temperature zone for each drive. > > My working tree is available here: > > https://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git/log/?h=5.6/drivetemp > > A few notes: > > - Before applying your patch I did s/satatemp/drivetemp/ > > - I get a crash in the driver core during probe if the drivetemp module > is loaded prior to loading ahci or a SCSI HBA driver. This crash is > unrelated to my changes. Haven't had time to debug. > > - I tweaked your ATA detection heuristics and now use the cached VPD > page 0x89 instead of fetching one from the device. > > - I also added support for reading the temperature log page on SCSI > drives. > > - Tested with a mixed bag of about 40 SCSI and SATA drives attached. > > - I still think sensor naming needs work. How and where are the > "drivetemp-scsi-8-140" names generated? > Quick one: In libsensors, outside the kernel. The naming is generic, along the line of <driver name>-<bus name>-<bus index>-<slot>. > I'll tinker some more but thought I'd share what I have for now. > Thanks for sharing. I'll be out on vacation until January 1. I'll look at the code after I am back. Guenter > -- > Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2019-12-19 0:32 ` Guenter Roeck @ 2020-01-07 4:10 ` Martin K. Petersen 2020-01-07 13:00 ` Guenter Roeck 0 siblings, 1 reply; 31+ messages in thread From: Martin K. Petersen @ 2020-01-07 4:10 UTC (permalink / raw) To: Guenter Roeck Cc: Martin K. Petersen, linux-hwmon, Jean Delvare, Linus Walleij, Bart Van Assche, linux-doc, linux-kernel, linux-scsi, linux-ide, Chris Healy Hi Guenter! >> - I still think sensor naming needs work. How and where are the >> "drivetemp-scsi-8-140" names generated? >> > Quick one: In libsensors, outside the kernel. The naming is generic, > along the line of <driver name>-<bus name>-<bus index>-<slot>. I understand that there are sensors that may not have an obvious associated topology and therefore necessitate coming up with a suitable naming or enumeration scheme. But in this case we already have a well-defined SCSI device name. Any particular reason you don't shift the chip.addr back and print the H:C:T:L format that you used as input? However arcane H:C:T:L may seem, I think that predictable naming would make things a lot easier for users that need to identify which device matches which sensor... -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-07 4:10 ` Martin K. Petersen @ 2020-01-07 13:00 ` Guenter Roeck 2020-01-08 1:29 ` Martin K. Petersen 0 siblings, 1 reply; 31+ messages in thread From: Guenter Roeck @ 2020-01-07 13:00 UTC (permalink / raw) To: Martin K. Petersen Cc: linux-hwmon, Jean Delvare, Linus Walleij, Bart Van Assche, linux-doc, linux-kernel, linux-scsi, linux-ide, Chris Healy On 1/6/20 8:10 PM, Martin K. Petersen wrote: > > Hi Guenter! > >>> - I still think sensor naming needs work. How and where are the >>> "drivetemp-scsi-8-140" names generated? >>> >> Quick one: In libsensors, outside the kernel. The naming is generic, >> along the line of <driver name>-<bus name>-<bus index>-<slot>. > > I understand that there are sensors that may not have an obvious > associated topology and therefore necessitate coming up with a suitable > naming or enumeration scheme. But in this case we already have a > well-defined SCSI device name. Any particular reason you don't shift the > chip.addr back and print the H:C:T:L format that you used as input? > > However arcane H:C:T:L may seem, I think that predictable naming would > make things a lot easier for users that need to identify which device > matches which sensor... > Not sure I understand. Do you mean to add "H:C:T:L" to "drivetemp" ? That would make it something like "drivetemp:H:C:T:L-scsi-8-140". Not sure if that is really useful, and it would at least be partially redundant. "scsi-8-140" is created by libsensors, so any change in that would have to be made there, not in the kernel driver. Guenter ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-07 13:00 ` Guenter Roeck @ 2020-01-08 1:29 ` Martin K. Petersen 2020-01-08 15:32 ` Guenter Roeck 0 siblings, 1 reply; 31+ messages in thread From: Martin K. Petersen @ 2020-01-08 1:29 UTC (permalink / raw) To: Guenter Roeck Cc: Martin K. Petersen, linux-hwmon, Jean Delvare, Linus Walleij, Bart Van Assche, linux-doc, linux-kernel, linux-scsi, linux-ide, Chris Healy Guenter, > "scsi-8-140" is created by libsensors, so any change in that would > have to be made there, not in the kernel driver. Yes. Something like the patch below which will produce actual SCSI device instance names: drivetemp-scsi-7:0:29:0 drivetemp-scsi-8:0:30:0 drivetemp-scsi-8:0:15:0 drivetemp-scsi-7:0:24:0 Instead of the current: drivetemp-scsi-7-1d0 drivetemp-scsi-8-1e0 drivetemp-scsi-8-f0 drivetemp-scsi-7-180 Other question: Does hwmon have any notion of sensor topology? As I mentioned earlier, SCSI installations typically rely on SAF-TE or SES instead of the physical drive sensors. SES also includes monitoring of fans, power supplies, etc. And more importantly, it provides a representation of the location of a given component. E.g.: Tray number #4, disk drive bay #5. So it would be helpful if libsensors had a way to represent sensors in a way that mimics the physical device layout reported by SES. -- Martin K. Petersen Oracle Linux Engineering diff --git a/lib/data.c b/lib/data.c index c5aea42967a6..06cfa86f353b 100644 --- a/lib/data.c +++ b/lib/data.c @@ -202,8 +202,9 @@ int sensors_snprintf_chip_name(char *str, size_t size, return snprintf(str, size, "%s-mdio-%x", chip->prefix, chip->addr); case SENSORS_BUS_TYPE_SCSI: - return snprintf(str, size, "%s-scsi-%hd-%x", chip->prefix, - chip->bus.nr, chip->addr); + return snprintf(str, size, "%s-scsi-%u:%u:%u:%lu", chip->prefix, + chip->bus.nr, chip->addr, chip->sub_addr, + chip->sub_sub_addr); } return -SENSORS_ERR_CHIP_NAME; diff --git a/lib/sensors.h b/lib/sensors.h index 94f6f23051d2..f468cccabc72 100644 --- a/lib/sensors.h +++ b/lib/sensors.h @@ -65,6 +65,8 @@ typedef struct sensors_chip_name { char *prefix; sensors_bus_id bus; int addr; + unsigned int sub_addr; + unsigned long sub_sub_addr; char *path; } sensors_chip_name; diff --git a/lib/sysfs.c b/lib/sysfs.c index e63688b72aba..f76b4a99aa7d 100644 --- a/lib/sysfs.c +++ b/lib/sysfs.c @@ -620,6 +620,7 @@ static int classify_device(const char *dev_name, sensors_chip_features *entry) { int domain, bus, slot, fn, vendor, product, id; + unsigned long lun; char bus_path[NAME_MAX]; char *bus_attr; int ret = 1; @@ -687,11 +688,13 @@ static int classify_device(const char *dev_name, entry->chip.bus.nr = 0; } else if (subsys && !strcmp(subsys, "scsi") && - sscanf(dev_name, "%d:%d:%d:%x", &domain, &bus, &slot, &fn) == 4) { + sscanf(dev_name, "%u:%u:%u:%lu", &domain, &bus, &id, &lun) == 4) { /* adapter(host), channel(bus), id(target), lun */ - entry->chip.addr = (bus << 8) + (slot << 4) + fn; - entry->chip.bus.type = SENSORS_BUS_TYPE_SCSI; entry->chip.bus.nr = domain; + entry->chip.addr = bus; + entry->chip.sub_addr = id; + entry->chip.sub_sub_addr = lun; + entry->chip.bus.type = SENSORS_BUS_TYPE_SCSI; } else { /* Unknown device */ ret = 0; ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-08 1:29 ` Martin K. Petersen @ 2020-01-08 15:32 ` Guenter Roeck 0 siblings, 0 replies; 31+ messages in thread From: Guenter Roeck @ 2020-01-08 15:32 UTC (permalink / raw) To: Martin K. Petersen Cc: linux-hwmon, Jean Delvare, Linus Walleij, Bart Van Assche, linux-doc, linux-kernel, linux-scsi, linux-ide, Chris Healy Hi Martin, On Tue, Jan 07, 2020 at 08:29:40PM -0500, Martin K. Petersen wrote: > > Guenter, > > > "scsi-8-140" is created by libsensors, so any change in that would > > have to be made there, not in the kernel driver. > > Yes. Something like the patch below which will produce actual SCSI > device instance names: > > drivetemp-scsi-7:0:29:0 > drivetemp-scsi-8:0:30:0 > drivetemp-scsi-8:0:15:0 > drivetemp-scsi-7:0:24:0 > > Instead of the current: > > drivetemp-scsi-7-1d0 > drivetemp-scsi-8-1e0 > drivetemp-scsi-8-f0 > drivetemp-scsi-7-180 > > Other question: Does hwmon have any notion of sensor topology? As I > mentioned earlier, SCSI installations typically rely on SAF-TE or SES > instead of the physical drive sensors. SES also includes monitoring of > fans, power supplies, etc. And more importantly, it provides a > representation of the location of a given component. E.g.: Tray number > #4, disk drive bay #5. > Please go ahead and submit the patch for libsensors. > So it would be helpful if libsensors had a way to represent sensors in a > way that mimics the physical device layout reported by SES. > I guess it would be up to you to come up with a proposal. Thanks, Guenter ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2019-12-19 0:15 ` Martin K. Petersen 2019-12-19 0:32 ` Guenter Roeck @ 2019-12-19 7:37 ` Guenter Roeck 2020-01-01 17:46 ` Guenter Roeck 2 siblings, 0 replies; 31+ messages in thread From: Guenter Roeck @ 2019-12-19 7:37 UTC (permalink / raw) To: Martin K. Petersen Cc: linux-hwmon, Jean Delvare, Linus Walleij, Bart Van Assche, linux-doc, linux-kernel, linux-scsi, linux-ide, Chris Healy Hi Martin, On 12/18/19 4:15 PM, Martin K. Petersen wrote: > > Guenter, > >> This driver solves this problem by adding support for reading the >> temperature of SATA drives from the kernel using the hwmon API and >> by adding a temperature zone for each drive. > > My working tree is available here: > > https://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git/log/?h=5.6/drivetemp > I had a quick look at the patch. Looks good overall. I think you should be able to use the buffer allocated in struct drivetemp_data (named smartdata). Maybe rename it to something more generic. If it needs to be dma-aligned, maybe we can use kzalloc() to allocate it. Only reason I didn't use it for vpd was because that needed 1024 bytes. > A few notes: > > - Before applying your patch I did s/satatemp/drivetemp/ > > - I get a crash in the driver core during probe if the drivetemp module > is loaded prior to loading ahci or a SCSI HBA driver. This crash is > unrelated to my changes. Haven't had time to debug. > Definitely something we'll need to look into. Do you have a traceback ? Thanks, Guenter > - I tweaked your ATA detection heuristics and now use the cached VPD > page 0x89 instead of fetching one from the device. > > - I also added support for reading the temperature log page on SCSI > drives. > > - Tested with a mixed bag of about 40 SCSI and SATA drives attached. > > - I still think sensor naming needs work. How and where are the > "drivetemp-scsi-8-140" names generated? > > I'll tinker some more but thought I'd share what I have for now. > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2019-12-19 0:15 ` Martin K. Petersen 2019-12-19 0:32 ` Guenter Roeck 2019-12-19 7:37 ` Guenter Roeck @ 2020-01-01 17:46 ` Guenter Roeck 2020-01-03 3:06 ` Martin K. Petersen 2020-01-08 1:12 ` Martin K. Petersen 2 siblings, 2 replies; 31+ messages in thread From: Guenter Roeck @ 2020-01-01 17:46 UTC (permalink / raw) To: Martin K. Petersen Cc: linux-hwmon, Jean Delvare, Linus Walleij, Bart Van Assche, linux-doc, linux-kernel, linux-scsi, linux-ide, Chris Healy Hi Martin, On 12/18/19 4:15 PM, Martin K. Petersen wrote: > > Guenter, > >> This driver solves this problem by adding support for reading the >> temperature of SATA drives from the kernel using the hwmon API and >> by adding a temperature zone for each drive. > > My working tree is available here: > > https://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git/log/?h=5.6/drivetemp > > A few notes: > > - Before applying your patch I did s/satatemp/drivetemp/ > > - I get a crash in the driver core during probe if the drivetemp module > is loaded prior to loading ahci or a SCSI HBA driver. This crash is > unrelated to my changes. Haven't had time to debug. > Any idea how I might be able to reproduce this ? So far I have been unsuccessful. Building drivetemp into the kernel, with ahci and everything SCSI built as module, doesn't trigger the crash for me. This is with the drivetemp patch (v3) as well as commit d188b0675b ("scsi: core: Add sysfs attributes for VPD pages 0h and 89h") applied on top of v5.4.7. Thanks, Guenter > - I tweaked your ATA detection heuristics and now use the cached VPD > page 0x89 instead of fetching one from the device. > > - I also added support for reading the temperature log page on SCSI > drives. > > - Tested with a mixed bag of about 40 SCSI and SATA drives attached. > > - I still think sensor naming needs work. How and where are the > "drivetemp-scsi-8-140" names generated? > > I'll tinker some more but thought I'd share what I have for now. > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-01 17:46 ` Guenter Roeck @ 2020-01-03 3:06 ` Martin K. Petersen 2020-01-08 1:12 ` Martin K. Petersen 1 sibling, 0 replies; 31+ messages in thread From: Martin K. Petersen @ 2020-01-03 3:06 UTC (permalink / raw) To: Guenter Roeck Cc: Martin K. Petersen, linux-hwmon, Jean Delvare, Linus Walleij, Bart Van Assche, linux-doc, linux-kernel, linux-scsi, linux-ide, Chris Healy Guenter, >> - I get a crash in the driver core during probe if the drivetemp module >> is loaded prior to loading ahci or a SCSI HBA driver. This crash is >> unrelated to my changes. Haven't had time to debug. >> > > Any idea how I might be able to reproduce this ? So far I have been > unsuccessful. I'm chipping away at a mountain of email. Will take a look tomorrow. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-01 17:46 ` Guenter Roeck 2020-01-03 3:06 ` Martin K. Petersen @ 2020-01-08 1:12 ` Martin K. Petersen 2020-01-08 15:33 ` Guenter Roeck 1 sibling, 1 reply; 31+ messages in thread From: Martin K. Petersen @ 2020-01-08 1:12 UTC (permalink / raw) To: Guenter Roeck Cc: Martin K. Petersen, linux-hwmon, Jean Delvare, Linus Walleij, Bart Van Assche, linux-doc, linux-kernel, linux-scsi, linux-ide, Chris Healy Guenter, > Any idea how I might be able to reproduce this ? So far I have been > unsuccessful. > > Building drivetemp into the kernel, with ahci and everything SCSI > built as module, doesn't trigger the crash for me. This is with the > drivetemp patch (v3) as well as commit d188b0675b ("scsi: core: Add > sysfs attributes for VPD pages 0h and 89h") applied on top of v5.4.7. This is with 5.5-rc1. I'll try another kernel. My repro is: # modprobe drivetemp # modprobe <any SCSI driver, including ahci> -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-08 1:12 ` Martin K. Petersen @ 2020-01-08 15:33 ` Guenter Roeck 2020-01-11 20:22 ` Guenter Roeck 0 siblings, 1 reply; 31+ messages in thread From: Guenter Roeck @ 2020-01-08 15:33 UTC (permalink / raw) To: Martin K. Petersen Cc: linux-hwmon, Jean Delvare, Linus Walleij, Bart Van Assche, linux-doc, linux-kernel, linux-scsi, linux-ide, Chris Healy On Tue, Jan 07, 2020 at 08:12:06PM -0500, Martin K. Petersen wrote: > > Guenter, > > > Any idea how I might be able to reproduce this ? So far I have been > > unsuccessful. > > > > Building drivetemp into the kernel, with ahci and everything SCSI > > built as module, doesn't trigger the crash for me. This is with the > > drivetemp patch (v3) as well as commit d188b0675b ("scsi: core: Add > > sysfs attributes for VPD pages 0h and 89h") applied on top of v5.4.7. > > This is with 5.5-rc1. I'll try another kernel. > > My repro is: > > # modprobe drivetemp > # modprobe <any SCSI driver, including ahci> > No luck on my side. Can you provide a traceback ? Maybe we can use it to find out what is happening. Thanks, Guenter ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-08 15:33 ` Guenter Roeck @ 2020-01-11 20:22 ` Guenter Roeck 2020-01-12 11:17 ` Gabriel C 2020-01-14 3:03 ` Martin K. Petersen 0 siblings, 2 replies; 31+ messages in thread From: Guenter Roeck @ 2020-01-11 20:22 UTC (permalink / raw) To: Martin K. Petersen Cc: linux-hwmon, Jean Delvare, Linus Walleij, Bart Van Assche, linux-doc, linux-kernel, linux-scsi, linux-ide, Chris Healy On 1/8/20 7:33 AM, Guenter Roeck wrote: > On Tue, Jan 07, 2020 at 08:12:06PM -0500, Martin K. Petersen wrote: >> >> Guenter, >> >>> Any idea how I might be able to reproduce this ? So far I have been >>> unsuccessful. >>> >>> Building drivetemp into the kernel, with ahci and everything SCSI >>> built as module, doesn't trigger the crash for me. This is with the >>> drivetemp patch (v3) as well as commit d188b0675b ("scsi: core: Add >>> sysfs attributes for VPD pages 0h and 89h") applied on top of v5.4.7. >> >> This is with 5.5-rc1. I'll try another kernel. >> >> My repro is: >> >> # modprobe drivetemp >> # modprobe <any SCSI driver, including ahci> >> > No luck on my side. Can you provide a traceback ? Maybe we can use it > to find out what is happening. > I tried again, this time with v5.5-rc5. Loading and unloading ahci and drivetemp in any order does not cause any problems for me. At this point I don't know what else I could test. I went ahead and applied the drivetemp patch to hwmon-next. Maybe we'll get some additional test feedback this way. Guenter ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-11 20:22 ` Guenter Roeck @ 2020-01-12 11:17 ` Gabriel C 2020-01-12 11:21 ` Linus Walleij 2020-01-14 3:03 ` Martin K. Petersen 1 sibling, 1 reply; 31+ messages in thread From: Gabriel C @ 2020-01-12 11:17 UTC (permalink / raw) To: Guenter Roeck Cc: Martin K. Petersen, linux-hwmon, Jean Delvare, Linus Walleij, Bart Van Assche, linux-doc, LKML, linux-scsi, linux-ide, Chris Healy Am Sa., 11. Jan. 2020 um 21:24 Uhr schrieb Guenter Roeck <linux@roeck-us.net>: > > On 1/8/20 7:33 AM, Guenter Roeck wrote: > > On Tue, Jan 07, 2020 at 08:12:06PM -0500, Martin K. Petersen wrote: > >> > >> Guenter, > >> > >>> Any idea how I might be able to reproduce this ? So far I have been > >>> unsuccessful. > >>> > >>> Building drivetemp into the kernel, with ahci and everything SCSI > >>> built as module, doesn't trigger the crash for me. This is with the > >>> drivetemp patch (v3) as well as commit d188b0675b ("scsi: core: Add > >>> sysfs attributes for VPD pages 0h and 89h") applied on top of v5.4.7. > >> > >> This is with 5.5-rc1. I'll try another kernel. > >> > >> My repro is: > >> > >> # modprobe drivetemp > >> # modprobe <any SCSI driver, including ahci> > >> > > No luck on my side. Can you provide a traceback ? Maybe we can use it > > to find out what is happening. > > > > I tried again, this time with v5.5-rc5. Loading and unloading ahci and > drivetemp in any order does not cause any problems for me. > > At this point I don't know what else I could test. I went ahead and > applied the drivetemp patch to hwmon-next. Maybe we'll get some additional > test feedback this way. I've tested Linus git tree from right now + hwmon-next and I cannot make it crash. The driver seems to work fine here and temperature reportings are very accurate on all HDDs on that box. ( 8 x Seagate IronWolf 2 TB (ST2000VN004) ) What I've noticed however is the nvme temperature low/high values on the Sensors X are strange here. I'm not sure it is a v5.5 issue or a hwmon-next one right now, I didn't boot a vanilla v5.5-rc5 yet. Both nvme's are Samsung SSD 960 EVO 250GB. They look like this: nvme-pci-1300 Adapter: PCI adapter Composite: +27.9°C (low = -273.1°C, high = +76.8°C) (crit = +78.8°C) Sensor 1: +27.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +29.9°C (low = -273.1°C, high = +65261.8°C) nvme-pci-6100 Adapter: PCI adapter Composite: +23.9°C (low = -273.1°C, high = +76.8°C) (crit = +78.8°C) Sensor 1: +23.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +25.9°C (low = -273.1°C, high = +65261.8°C) Best Regards, Gabriel C. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-12 11:17 ` Gabriel C @ 2020-01-12 11:21 ` Linus Walleij 2020-01-12 12:02 ` Gabriel C 0 siblings, 1 reply; 31+ messages in thread From: Linus Walleij @ 2020-01-12 11:21 UTC (permalink / raw) To: Gabriel C Cc: Guenter Roeck, Martin K. Petersen, linux-hwmon, Jean Delvare, Bart Van Assche, Linux Doc Mailing List, LKML, linux-scsi, open list:LIBATA SUBSYSTEM (Serial and Parallel ATA drivers), Chris Healy On Sun, Jan 12, 2020 at 12:18 PM Gabriel C <nix.or.die@gmail.com> wrote: > What I've noticed however is the nvme temperature low/high values on > the Sensors X are strange here. (...) > Sensor 1: +27.9°C (low = -273.1°C, high = +65261.8°C) > Sensor 2: +29.9°C (low = -273.1°C, high = +65261.8°C) (...) > Sensor 1: +23.9°C (low = -273.1°C, high = +65261.8°C) > Sensor 2: +25.9°C (low = -273.1°C, high = +65261.8°C) That doesn't look strange to me. It seems like reasonable defaults from the firmware if either it doesn't really log the min/max temperatures or hasn't been through a cycle of updating these yet. Just set both to absolute min/max temperatures possible. Yours, Linus Walleij ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-12 11:21 ` Linus Walleij @ 2020-01-12 12:02 ` Gabriel C 2020-01-12 12:07 ` Linus Walleij 0 siblings, 1 reply; 31+ messages in thread From: Gabriel C @ 2020-01-12 12:02 UTC (permalink / raw) To: Linus Walleij Cc: Guenter Roeck, Martin K. Petersen, linux-hwmon, Jean Delvare, Bart Van Assche, Linux Doc Mailing List, LKML, linux-scsi, open list:LIBATA SUBSYSTEM (Serial and Parallel ATA drivers), Chris Healy Am So., 12. Jan. 2020 um 12:22 Uhr schrieb Linus Walleij <linus.walleij@linaro.org>: > > On Sun, Jan 12, 2020 at 12:18 PM Gabriel C <nix.or.die@gmail.com> wrote: > > > What I've noticed however is the nvme temperature low/high values on > > the Sensors X are strange here. > (...) > > Sensor 1: +27.9°C (low = -273.1°C, high = +65261.8°C) > > Sensor 2: +29.9°C (low = -273.1°C, high = +65261.8°C) > (...) > > Sensor 1: +23.9°C (low = -273.1°C, high = +65261.8°C) > > Sensor 2: +25.9°C (low = -273.1°C, high = +65261.8°C) > > That doesn't look strange to me. It seems like reasonable defaults > from the firmware if either it doesn't really log the min/max temperatures > or hasn't been through a cycle of updating these yet. Just set both > to absolute min/max temperatures possible. Ok I'll check that. Do you mean by setting the temperatures to use a lmsensors config? Or is there a way to set these with a nvme command? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-12 12:02 ` Gabriel C @ 2020-01-12 12:07 ` Linus Walleij 2020-01-12 13:07 ` Guenter Roeck 0 siblings, 1 reply; 31+ messages in thread From: Linus Walleij @ 2020-01-12 12:07 UTC (permalink / raw) To: Gabriel C Cc: Guenter Roeck, Martin K. Petersen, linux-hwmon, Jean Delvare, Bart Van Assche, Linux Doc Mailing List, LKML, linux-scsi, open list:LIBATA SUBSYSTEM (Serial and Parallel ATA drivers), Chris Healy On Sun, Jan 12, 2020 at 1:03 PM Gabriel C <nix.or.die@gmail.com> wrote: > Am So., 12. Jan. 2020 um 12:22 Uhr schrieb Linus Walleij > <linus.walleij@linaro.org>: > > > > On Sun, Jan 12, 2020 at 12:18 PM Gabriel C <nix.or.die@gmail.com> wrote: > > > > > What I've noticed however is the nvme temperature low/high values on > > > the Sensors X are strange here. > > (...) > > > Sensor 1: +27.9°C (low = -273.1°C, high = +65261.8°C) > > > Sensor 2: +29.9°C (low = -273.1°C, high = +65261.8°C) > > (...) > > > Sensor 1: +23.9°C (low = -273.1°C, high = +65261.8°C) > > > Sensor 2: +25.9°C (low = -273.1°C, high = +65261.8°C) > > > > That doesn't look strange to me. It seems like reasonable defaults > > from the firmware if either it doesn't really log the min/max temperatures > > or hasn't been through a cycle of updating these yet. Just set both > > to absolute min/max temperatures possible. > > Ok I'll check that. > > Do you mean by setting the temperatures to use a lmsensors config? > Or is there a way to set these with a nvme command? Not that I know of. The min/max are the minumum and maximum temperatures the device has experienced during this power-on cycle. (If I understood things right!) If the device firmware doesn't log that, or the firmware hasn't ran through a log point, it makes sense to report absolute min/max of the scales. Yours, Linus Walleij ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-12 12:07 ` Linus Walleij @ 2020-01-12 13:07 ` Guenter Roeck 2020-01-12 13:45 ` Gabriel C 0 siblings, 1 reply; 31+ messages in thread From: Guenter Roeck @ 2020-01-12 13:07 UTC (permalink / raw) To: Linus Walleij, Gabriel C Cc: Martin K. Petersen, linux-hwmon, Jean Delvare, Bart Van Assche, Linux Doc Mailing List, LKML, linux-scsi, open list:LIBATA SUBSYSTEM (Serial and Parallel ATA drivers), Chris Healy On 1/12/20 4:07 AM, Linus Walleij wrote: > On Sun, Jan 12, 2020 at 1:03 PM Gabriel C <nix.or.die@gmail.com> wrote: >> Am So., 12. Jan. 2020 um 12:22 Uhr schrieb Linus Walleij >> <linus.walleij@linaro.org>: >>> >>> On Sun, Jan 12, 2020 at 12:18 PM Gabriel C <nix.or.die@gmail.com> wrote: >>> >>>> What I've noticed however is the nvme temperature low/high values on >>>> the Sensors X are strange here. >>> (...) >>>> Sensor 1: +27.9°C (low = -273.1°C, high = +65261.8°C) >>>> Sensor 2: +29.9°C (low = -273.1°C, high = +65261.8°C) >>> (...) >>>> Sensor 1: +23.9°C (low = -273.1°C, high = +65261.8°C) >>>> Sensor 2: +25.9°C (low = -273.1°C, high = +65261.8°C) >>> >>> That doesn't look strange to me. It seems like reasonable defaults >>> from the firmware if either it doesn't really log the min/max temperatures >>> or hasn't been through a cycle of updating these yet. Just set both >>> to absolute min/max temperatures possible. >> >> Ok I'll check that. >> >> Do you mean by setting the temperatures to use a lmsensors config? >> Or is there a way to set these with a nvme command? > > Not that I know of. > > The min/max are the minumum and maximum temperatures the > device has experienced during this power-on cycle. > No, that would be lowest/highest. The above are (or should be) per-sensor setpoints. The default for those is typically the absolute minimum / maximum of the supported range. Some SATA drives report the lowest/highest temperatures experienced since power cycle, like here. drivetemp-scsi-5-0 Adapter: SCSI adapter temp1: +23.0°C (low = +0.0°C, high = +60.0°C) (crit low = -41.0°C, crit = +85.0°C) (lowest = +20.0°C, highest = +31.0°C) Guenter ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-12 13:07 ` Guenter Roeck @ 2020-01-12 13:45 ` Gabriel C 2020-01-12 15:26 ` Guenter Roeck 0 siblings, 1 reply; 31+ messages in thread From: Gabriel C @ 2020-01-12 13:45 UTC (permalink / raw) To: Guenter Roeck Cc: Linus Walleij, Martin K. Petersen, linux-hwmon, Jean Delvare, Bart Van Assche, Linux Doc Mailing List, LKML, linux-scsi, open list:LIBATA SUBSYSTEM (Serial and Parallel ATA drivers), Chris Healy Am So., 12. Jan. 2020 um 14:07 Uhr schrieb Guenter Roeck <linux@roeck-us.net>: > > On 1/12/20 4:07 AM, Linus Walleij wrote: > > On Sun, Jan 12, 2020 at 1:03 PM Gabriel C <nix.or.die@gmail.com> wrote: > >> Am So., 12. Jan. 2020 um 12:22 Uhr schrieb Linus Walleij > >> <linus.walleij@linaro.org>: > >>> > >>> On Sun, Jan 12, 2020 at 12:18 PM Gabriel C <nix.or.die@gmail.com> wrote: > >>> > >>>> What I've noticed however is the nvme temperature low/high values on > >>>> the Sensors X are strange here. > >>> (...) > >>>> Sensor 1: +27.9°C (low = -273.1°C, high = +65261.8°C) > >>>> Sensor 2: +29.9°C (low = -273.1°C, high = +65261.8°C) > >>> (...) > >>>> Sensor 1: +23.9°C (low = -273.1°C, high = +65261.8°C) > >>>> Sensor 2: +25.9°C (low = -273.1°C, high = +65261.8°C) > >>> > >>> That doesn't look strange to me. It seems like reasonable defaults > >>> from the firmware if either it doesn't really log the min/max temperatures > >>> or hasn't been through a cycle of updating these yet. Just set both > >>> to absolute min/max temperatures possible. > >> > >> Ok I'll check that. > >> > >> Do you mean by setting the temperatures to use a lmsensors config? > >> Or is there a way to set these with a nvme command? > > > > Not that I know of. > > > > The min/max are the minumum and maximum temperatures the > > device has experienced during this power-on cycle. > > > > No, that would be lowest/highest. The above are (or should be) per-sensor > setpoints. The default for those is typically the absolute minimum / > maximum of the supported range. > > Some SATA drives report the lowest/highest temperatures experienced > since power cycle, like here. > > drivetemp-scsi-5-0 > Adapter: SCSI adapter > temp1: +23.0°C (low = +0.0°C, high = +60.0°C) > (crit low = -41.0°C, crit = +85.0°C) > (lowest = +20.0°C, highest = +31.0°C) > The SATA temperatures are fine and reported like this here too, just the nvme ones are strange. drivetemp-scsi-4-0 Adapter: SCSI adapter temp1: +28.0°C (low = +1.0°C, high = +61.0°C) (crit low = +2.0°C, crit = +60.0°C) (lowest = +16.0°C, highest = +31.0°C) drivetemp-scsi-12-0 Adapter: SCSI adapter temp1: +29.0°C (low = +1.0°C, high = +61.0°C) (crit low = +2.0°C, crit = +60.0°C) (lowest = +18.0°C, highest = +32.0°C) and so on. Btw, where I can find the code does these calculations? Best regards, Gabriel C. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-12 13:45 ` Gabriel C @ 2020-01-12 15:26 ` Guenter Roeck 2020-01-12 18:37 ` Gabriel C 0 siblings, 1 reply; 31+ messages in thread From: Guenter Roeck @ 2020-01-12 15:26 UTC (permalink / raw) To: Gabriel C Cc: Linus Walleij, Martin K. Petersen, linux-hwmon, Jean Delvare, Bart Van Assche, Linux Doc Mailing List, LKML, linux-scsi, open list:LIBATA SUBSYSTEM (Serial and Parallel ATA drivers), Chris Healy On 1/12/20 5:45 AM, Gabriel C wrote: > Am So., 12. Jan. 2020 um 14:07 Uhr schrieb Guenter Roeck <linux@roeck-us.net>: >> >> On 1/12/20 4:07 AM, Linus Walleij wrote: >>> On Sun, Jan 12, 2020 at 1:03 PM Gabriel C <nix.or.die@gmail.com> wrote: >>>> Am So., 12. Jan. 2020 um 12:22 Uhr schrieb Linus Walleij >>>> <linus.walleij@linaro.org>: >>>>> >>>>> On Sun, Jan 12, 2020 at 12:18 PM Gabriel C <nix.or.die@gmail.com> wrote: >>>>> >>>>>> What I've noticed however is the nvme temperature low/high values on >>>>>> the Sensors X are strange here. >>>>> (...) >>>>>> Sensor 1: +27.9°C (low = -273.1°C, high = +65261.8°C) >>>>>> Sensor 2: +29.9°C (low = -273.1°C, high = +65261.8°C) >>>>> (...) >>>>>> Sensor 1: +23.9°C (low = -273.1°C, high = +65261.8°C) >>>>>> Sensor 2: +25.9°C (low = -273.1°C, high = +65261.8°C) >>>>> >>>>> That doesn't look strange to me. It seems like reasonable defaults >>>>> from the firmware if either it doesn't really log the min/max temperatures >>>>> or hasn't been through a cycle of updating these yet. Just set both >>>>> to absolute min/max temperatures possible. >>>> >>>> Ok I'll check that. >>>> >>>> Do you mean by setting the temperatures to use a lmsensors config? >>>> Or is there a way to set these with a nvme command? >>> >>> Not that I know of. >>> >>> The min/max are the minumum and maximum temperatures the >>> device has experienced during this power-on cycle. >>> >> >> No, that would be lowest/highest. The above are (or should be) per-sensor >> setpoints. The default for those is typically the absolute minimum / >> maximum of the supported range. >> >> Some SATA drives report the lowest/highest temperatures experienced >> since power cycle, like here. >> >> drivetemp-scsi-5-0 >> Adapter: SCSI adapter >> temp1: +23.0°C (low = +0.0°C, high = +60.0°C) >> (crit low = -41.0°C, crit = +85.0°C) >> (lowest = +20.0°C, highest = +31.0°C) >> > > The SATA temperatures are fine and reported like this here too, just > the nvme ones are strange. > > drivetemp-scsi-4-0 > Adapter: SCSI adapter > temp1: +28.0°C (low = +1.0°C, high = +61.0°C) > (crit low = +2.0°C, crit = +60.0°C) > (lowest = +16.0°C, highest = +31.0°C) > > drivetemp-scsi-12-0 > Adapter: SCSI adapter > temp1: +29.0°C (low = +1.0°C, high = +61.0°C) > (crit low = +2.0°C, crit = +60.0°C) > (lowest = +18.0°C, highest = +32.0°C) > > and so on. > > Btw, where I can find the code does these calculations? > Not sure if that is what you are looking for, but the nvme hardware monitoring driver is at drivers/nvme/host/hwmon.c, the SATA hardware monitoring driver is at drivers/hwmon/drivetemp.c. The limits on nvme drives are configurable. root@server:/sys/class/hwmon# sensors nvme-pci-0100 nvme-pci-0100 Adapter: PCI adapter Composite: +40.9°C (low = -273.1°C, high = +84.8°C) (crit = +84.8°C) Sensor 1: +40.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +43.9°C (low = -273.1°C, high = +65261.8°C) root@server:/sys/class/hwmon# echo 0 > hwmon1/temp2_min root@server:/sys/class/hwmon# echo 100000 > hwmon1/temp2_max root@server:/sys/class/hwmon# sensors nvme-pci-0100 nvme-pci-0100 Adapter: PCI adapter Composite: +38.9°C (low = -273.1°C, high = +84.8°C) (crit = +84.8°C) Sensor 1: +38.9°C (low = -0.1°C, high = +99.8°C) Sensor 2: +42.9°C (low = -273.1°C, high = +65261.8°C) If you dislike the defaults, just configure whatever you think is appropriate for your system. Thanks, Guenter ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-12 15:26 ` Guenter Roeck @ 2020-01-12 18:37 ` Gabriel C 2020-01-12 20:08 ` Guenter Roeck 0 siblings, 1 reply; 31+ messages in thread From: Gabriel C @ 2020-01-12 18:37 UTC (permalink / raw) To: Guenter Roeck Cc: Linus Walleij, Martin K. Petersen, linux-hwmon, Jean Delvare, Bart Van Assche, Linux Doc Mailing List, LKML, linux-scsi, open list:LIBATA SUBSYSTEM (Serial and Parallel ATA drivers), Chris Healy Am So., 12. Jan. 2020 um 16:26 Uhr schrieb Guenter Roeck <linux@roeck-us.net>: > > On 1/12/20 5:45 AM, Gabriel C wrote: > > Am So., 12. Jan. 2020 um 14:07 Uhr schrieb Guenter Roeck <linux@roeck-us.net>: > >> > >> On 1/12/20 4:07 AM, Linus Walleij wrote: > >>> On Sun, Jan 12, 2020 at 1:03 PM Gabriel C <nix.or.die@gmail.com> wrote: > >>>> Am So., 12. Jan. 2020 um 12:22 Uhr schrieb Linus Walleij > >>>> <linus.walleij@linaro.org>: > >>>>> > >>>>> On Sun, Jan 12, 2020 at 12:18 PM Gabriel C <nix.or.die@gmail.com> wrote: > >>>>> > >>>>>> What I've noticed however is the nvme temperature low/high values on > >>>>>> the Sensors X are strange here. > >>>>> (...) > >>>>>> Sensor 1: +27.9°C (low = -273.1°C, high = +65261.8°C) > >>>>>> Sensor 2: +29.9°C (low = -273.1°C, high = +65261.8°C) > >>>>> (...) > >>>>>> Sensor 1: +23.9°C (low = -273.1°C, high = +65261.8°C) > >>>>>> Sensor 2: +25.9°C (low = -273.1°C, high = +65261.8°C) > >>>>> > >>>>> That doesn't look strange to me. It seems like reasonable defaults > >>>>> from the firmware if either it doesn't really log the min/max temperatures > >>>>> or hasn't been through a cycle of updating these yet. Just set both > >>>>> to absolute min/max temperatures possible. > >>>> > >>>> Ok I'll check that. > >>>> > >>>> Do you mean by setting the temperatures to use a lmsensors config? > >>>> Or is there a way to set these with a nvme command? > >>> > >>> Not that I know of. > >>> > >>> The min/max are the minumum and maximum temperatures the > >>> device has experienced during this power-on cycle. > >>> > >> > >> No, that would be lowest/highest. The above are (or should be) per-sensor > >> setpoints. The default for those is typically the absolute minimum / > >> maximum of the supported range. > >> > >> Some SATA drives report the lowest/highest temperatures experienced > >> since power cycle, like here. > >> > >> drivetemp-scsi-5-0 > >> Adapter: SCSI adapter > >> temp1: +23.0°C (low = +0.0°C, high = +60.0°C) > >> (crit low = -41.0°C, crit = +85.0°C) > >> (lowest = +20.0°C, highest = +31.0°C) > >> > > > > The SATA temperatures are fine and reported like this here too, just > > the nvme ones are strange. > > > > drivetemp-scsi-4-0 > > Adapter: SCSI adapter > > temp1: +28.0°C (low = +1.0°C, high = +61.0°C) > > (crit low = +2.0°C, crit = +60.0°C) > > (lowest = +16.0°C, highest = +31.0°C) > > > > drivetemp-scsi-12-0 > > Adapter: SCSI adapter > > temp1: +29.0°C (low = +1.0°C, high = +61.0°C) > > (crit low = +2.0°C, crit = +60.0°C) > > (lowest = +18.0°C, highest = +32.0°C) > > > > and so on. > > > > Btw, where I can find the code does these calculations? > > > > Not sure if that is what you are looking for, but the nvme hardware > monitoring driver is at drivers/nvme/host/hwmon.c, the SATA hardware > monitoring driver is at drivers/hwmon/drivetemp.c. > I have a look thanks. I'm using your v2 patch for the nvme part since you posted it on 5.4 kernels. This is probably why I find the way the temperatures are now reported very strange. The ADATA XPG SX8200 Pro in my laptop seems to work better: nvme-pci-0200 Adapter: PCI adapter Composite: +37.9°C (low = -0.1°C, high = +74.8°C) (crit = +79.8°C) Low is 0° which is what the spec suggests. > The limits on nvme drives are configurable. Yes, I found this out already. > root@server:/sys/class/hwmon# sensors nvme-pci-0100 > nvme-pci-0100 > Adapter: PCI adapter > Composite: +40.9°C (low = -273.1°C, high = +84.8°C) > (crit = +84.8°C) > Sensor 1: +40.9°C (low = -273.1°C, high = +65261.8°C) > Sensor 2: +43.9°C (low = -273.1°C, high = +65261.8°C) > > root@server:/sys/class/hwmon# echo 0 > hwmon1/temp2_min > root@server:/sys/class/hwmon# echo 100000 > hwmon1/temp2_max An lm-sensors configuration will work too. > root@server:/sys/class/hwmon# sensors nvme-pci-0100 > nvme-pci-0100 > Adapter: PCI adapter > Composite: +38.9°C (low = -273.1°C, high = +84.8°C) > (crit = +84.8°C) > Sensor 1: +38.9°C (low = -0.1°C, high = +99.8°C) > Sensor 2: +42.9°C (low = -273.1°C, high = +65261.8°C) > > If you dislike the defaults, just configure whatever you think is > appropriate for your system. It's not about disliking the values. I want to find out if these Samsung models don't support that, or it is a bug somewhere in writing/calculating the values. In the case, Samsung and others don't support such a thing wouldn't be better to just ignore the bogus reading altogether? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-12 18:37 ` Gabriel C @ 2020-01-12 20:08 ` Guenter Roeck 2020-01-12 22:26 ` Gabriel C 0 siblings, 1 reply; 31+ messages in thread From: Guenter Roeck @ 2020-01-12 20:08 UTC (permalink / raw) To: Gabriel C Cc: Linus Walleij, Martin K. Petersen, linux-hwmon, Jean Delvare, Bart Van Assche, Linux Doc Mailing List, LKML, linux-scsi, open list:LIBATA SUBSYSTEM (Serial and Parallel ATA drivers), Chris Healy On 1/12/20 10:37 AM, Gabriel C wrote: > Am So., 12. Jan. 2020 um 16:26 Uhr schrieb Guenter Roeck <linux@roeck-us.net>: >> >> On 1/12/20 5:45 AM, Gabriel C wrote: >>> Am So., 12. Jan. 2020 um 14:07 Uhr schrieb Guenter Roeck <linux@roeck-us.net>: >>>> >>>> On 1/12/20 4:07 AM, Linus Walleij wrote: >>>>> On Sun, Jan 12, 2020 at 1:03 PM Gabriel C <nix.or.die@gmail.com> wrote: >>>>>> Am So., 12. Jan. 2020 um 12:22 Uhr schrieb Linus Walleij >>>>>> <linus.walleij@linaro.org>: >>>>>>> >>>>>>> On Sun, Jan 12, 2020 at 12:18 PM Gabriel C <nix.or.die@gmail.com> wrote: >>>>>>> >>>>>>>> What I've noticed however is the nvme temperature low/high values on >>>>>>>> the Sensors X are strange here. >>>>>>> (...) >>>>>>>> Sensor 1: +27.9°C (low = -273.1°C, high = +65261.8°C) >>>>>>>> Sensor 2: +29.9°C (low = -273.1°C, high = +65261.8°C) >>>>>>> (...) >>>>>>>> Sensor 1: +23.9°C (low = -273.1°C, high = +65261.8°C) >>>>>>>> Sensor 2: +25.9°C (low = -273.1°C, high = +65261.8°C) >>>>>>> >>>>>>> That doesn't look strange to me. It seems like reasonable defaults >>>>>>> from the firmware if either it doesn't really log the min/max temperatures >>>>>>> or hasn't been through a cycle of updating these yet. Just set both >>>>>>> to absolute min/max temperatures possible. >>>>>> >>>>>> Ok I'll check that. >>>>>> >>>>>> Do you mean by setting the temperatures to use a lmsensors config? >>>>>> Or is there a way to set these with a nvme command? >>>>> >>>>> Not that I know of. >>>>> >>>>> The min/max are the minumum and maximum temperatures the >>>>> device has experienced during this power-on cycle. >>>>> >>>> >>>> No, that would be lowest/highest. The above are (or should be) per-sensor >>>> setpoints. The default for those is typically the absolute minimum / >>>> maximum of the supported range. >>>> >>>> Some SATA drives report the lowest/highest temperatures experienced >>>> since power cycle, like here. >>>> >>>> drivetemp-scsi-5-0 >>>> Adapter: SCSI adapter >>>> temp1: +23.0°C (low = +0.0°C, high = +60.0°C) >>>> (crit low = -41.0°C, crit = +85.0°C) >>>> (lowest = +20.0°C, highest = +31.0°C) >>>> >>> >>> The SATA temperatures are fine and reported like this here too, just >>> the nvme ones are strange. >>> >>> drivetemp-scsi-4-0 >>> Adapter: SCSI adapter >>> temp1: +28.0°C (low = +1.0°C, high = +61.0°C) >>> (crit low = +2.0°C, crit = +60.0°C) >>> (lowest = +16.0°C, highest = +31.0°C) >>> >>> drivetemp-scsi-12-0 >>> Adapter: SCSI adapter >>> temp1: +29.0°C (low = +1.0°C, high = +61.0°C) >>> (crit low = +2.0°C, crit = +60.0°C) >>> (lowest = +18.0°C, highest = +32.0°C) >>> >>> and so on. >>> >>> Btw, where I can find the code does these calculations? >>> >> >> Not sure if that is what you are looking for, but the nvme hardware >> monitoring driver is at drivers/nvme/host/hwmon.c, the SATA hardware >> monitoring driver is at drivers/hwmon/drivetemp.c. >> > > I have a look thanks. > > I'm using your v2 patch for the nvme part since you posted it on 5.4 kernels. > This is probably why I find the way the temperatures are now reported > very strange. > > The ADATA XPG SX8200 Pro in my laptop seems to work better: > > nvme-pci-0200 > Adapter: PCI adapter > Composite: +37.9°C (low = -0.1°C, high = +74.8°C) > (crit = +79.8°C) > > Low is 0° which is what the spec suggests. > >> The limits on nvme drives are configurable. > > Yes, I found this out already. > >> root@server:/sys/class/hwmon# sensors nvme-pci-0100 >> nvme-pci-0100 >> Adapter: PCI adapter >> Composite: +40.9°C (low = -273.1°C, high = +84.8°C) >> (crit = +84.8°C) >> Sensor 1: +40.9°C (low = -273.1°C, high = +65261.8°C) >> Sensor 2: +43.9°C (low = -273.1°C, high = +65261.8°C) >> >> root@server:/sys/class/hwmon# echo 0 > hwmon1/temp2_min >> root@server:/sys/class/hwmon# echo 100000 > hwmon1/temp2_max > > An lm-sensors configuration will work too. > Sure, the above was just an example. >> root@server:/sys/class/hwmon# sensors nvme-pci-0100 >> nvme-pci-0100 >> Adapter: PCI adapter >> Composite: +38.9°C (low = -273.1°C, high = +84.8°C) >> (crit = +84.8°C) >> Sensor 1: +38.9°C (low = -0.1°C, high = +99.8°C) >> Sensor 2: +42.9°C (low = -273.1°C, high = +65261.8°C) >> >> If you dislike the defaults, just configure whatever you think is >> appropriate for your system. > > It's not about disliking the values. I want to find out if these Samsung models > don't support that, or it is a bug somewhere in writing/calculating the values. > No, this is not a bug. It is perfectly valid for individual sensors to have uninitialized limits. If I recall correctly, the NVME specification specifically states that the default settings for individual sensors shall be those values (0 and 65535 Kelvin, specifically). And, yes, I would agree that is a bit odd that NVME drives report temperatures in Kelvin, but such is the world. > In the case, Samsung and others don't support such a thing wouldn't be > better to just ignore > the bogus reading altogether? Again, you can set whatever limits you like. The default limits on many hardware sensor chips have odd values. Just looking at my system: nct6797-isa-0a20 Adapter: ISA adapter in0: +0.48 V (min = +0.00 V, max = +1.74 V) in1: +1.02 V (min = +0.00 V, max = +0.00 V) ALARM in2: +3.39 V (min = +0.00 V, max = +0.00 V) ALARM in3: +3.31 V (min = +0.00 V, max = +0.00 V) ALARM in4: +1.00 V (min = +0.00 V, max = +0.00 V) ALARM in5: +0.14 V (min = +0.00 V, max = +0.00 V) ALARM in6: +0.82 V (min = +0.00 V, max = +0.00 V) ALARM in7: +3.38 V (min = +0.00 V, max = +0.00 V) ALARM in8: +3.26 V (min = +0.00 V, max = +0.00 V) ALARM in9: +1.82 V (min = +0.00 V, max = +0.00 V) ALARM in10: +0.00 V (min = +0.00 V, max = +0.00 V) in11: +0.74 V (min = +0.00 V, max = +0.00 V) ALARM in12: +1.20 V (min = +0.00 V, max = +0.00 V) ALARM in13: +0.68 V (min = +0.00 V, max = +0.00 V) ALARM in14: +1.50 V (min = +0.00 V, max = +0.00 V) ALARM Are you suggesting that we should not support setting min/max values for all drivers just because they are often not initialized to reasonable values by default ? Guenter ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-12 20:08 ` Guenter Roeck @ 2020-01-12 22:26 ` Gabriel C 0 siblings, 0 replies; 31+ messages in thread From: Gabriel C @ 2020-01-12 22:26 UTC (permalink / raw) To: Guenter Roeck Cc: Linus Walleij, Martin K. Petersen, linux-hwmon, Jean Delvare, Bart Van Assche, Linux Doc Mailing List, LKML, linux-scsi, open list:LIBATA SUBSYSTEM (Serial and Parallel ATA drivers), Chris Healy Am So., 12. Jan. 2020 um 21:08 Uhr schrieb Guenter Roeck <linux@roeck-us.net>: > > On 1/12/20 10:37 AM, Gabriel C wrote: > > Am So., 12. Jan. 2020 um 16:26 Uhr schrieb Guenter Roeck <linux@roeck-us.net>: > >> > >> On 1/12/20 5:45 AM, Gabriel C wrote: > >>> Am So., 12. Jan. 2020 um 14:07 Uhr schrieb Guenter Roeck <linux@roeck-us.net>: > >>>> > >>>> On 1/12/20 4:07 AM, Linus Walleij wrote: > >>>>> On Sun, Jan 12, 2020 at 1:03 PM Gabriel C <nix.or.die@gmail.com> wrote: > >>>>>> Am So., 12. Jan. 2020 um 12:22 Uhr schrieb Linus Walleij > >>>>>> <linus.walleij@linaro.org>: > >>>>>>> > >>>>>>> On Sun, Jan 12, 2020 at 12:18 PM Gabriel C <nix.or.die@gmail.com> wrote: > >>>>>>> > >>>>>>>> What I've noticed however is the nvme temperature low/high values on > >>>>>>>> the Sensors X are strange here. > >>>>>>> (...) > >>>>>>>> Sensor 1: +27.9°C (low = -273.1°C, high = +65261.8°C) > >>>>>>>> Sensor 2: +29.9°C (low = -273.1°C, high = +65261.8°C) > >>>>>>> (...) > >>>>>>>> Sensor 1: +23.9°C (low = -273.1°C, high = +65261.8°C) > >>>>>>>> Sensor 2: +25.9°C (low = -273.1°C, high = +65261.8°C) > >>>>>>> > >>>>>>> That doesn't look strange to me. It seems like reasonable defaults > >>>>>>> from the firmware if either it doesn't really log the min/max temperatures > >>>>>>> or hasn't been through a cycle of updating these yet. Just set both > >>>>>>> to absolute min/max temperatures possible. > >>>>>> > >>>>>> Ok I'll check that. > >>>>>> > >>>>>> Do you mean by setting the temperatures to use a lmsensors config? > >>>>>> Or is there a way to set these with a nvme command? > >>>>> > >>>>> Not that I know of. > >>>>> > >>>>> The min/max are the minumum and maximum temperatures the > >>>>> device has experienced during this power-on cycle. > >>>>> > >>>> > >>>> No, that would be lowest/highest. The above are (or should be) per-sensor > >>>> setpoints. The default for those is typically the absolute minimum / > >>>> maximum of the supported range. > >>>> > >>>> Some SATA drives report the lowest/highest temperatures experienced > >>>> since power cycle, like here. > >>>> > >>>> drivetemp-scsi-5-0 > >>>> Adapter: SCSI adapter > >>>> temp1: +23.0°C (low = +0.0°C, high = +60.0°C) > >>>> (crit low = -41.0°C, crit = +85.0°C) > >>>> (lowest = +20.0°C, highest = +31.0°C) > >>>> > >>> > >>> The SATA temperatures are fine and reported like this here too, just > >>> the nvme ones are strange. > >>> > >>> drivetemp-scsi-4-0 > >>> Adapter: SCSI adapter > >>> temp1: +28.0°C (low = +1.0°C, high = +61.0°C) > >>> (crit low = +2.0°C, crit = +60.0°C) > >>> (lowest = +16.0°C, highest = +31.0°C) > >>> > >>> drivetemp-scsi-12-0 > >>> Adapter: SCSI adapter > >>> temp1: +29.0°C (low = +1.0°C, high = +61.0°C) > >>> (crit low = +2.0°C, crit = +60.0°C) > >>> (lowest = +18.0°C, highest = +32.0°C) > >>> > >>> and so on. > >>> > >>> Btw, where I can find the code does these calculations? > >>> > >> > >> Not sure if that is what you are looking for, but the nvme hardware > >> monitoring driver is at drivers/nvme/host/hwmon.c, the SATA hardware > >> monitoring driver is at drivers/hwmon/drivetemp.c. > >> > > > > I have a look thanks. > > > > I'm using your v2 patch for the nvme part since you posted it on 5.4 kernels. > > This is probably why I find the way the temperatures are now reported > > very strange. > > > > The ADATA XPG SX8200 Pro in my laptop seems to work better: > > > > nvme-pci-0200 > > Adapter: PCI adapter > > Composite: +37.9°C (low = -0.1°C, high = +74.8°C) > > (crit = +79.8°C) > > > > Low is 0° which is what the spec suggests. > > > >> The limits on nvme drives are configurable. > > > > Yes, I found this out already. > > > >> root@server:/sys/class/hwmon# sensors nvme-pci-0100 > >> nvme-pci-0100 > >> Adapter: PCI adapter > >> Composite: +40.9°C (low = -273.1°C, high = +84.8°C) > >> (crit = +84.8°C) > >> Sensor 1: +40.9°C (low = -273.1°C, high = +65261.8°C) > >> Sensor 2: +43.9°C (low = -273.1°C, high = +65261.8°C) > >> > >> root@server:/sys/class/hwmon# echo 0 > hwmon1/temp2_min > >> root@server:/sys/class/hwmon# echo 100000 > hwmon1/temp2_max > > > > An lm-sensors configuration will work too. > > > Sure, the above was just an example. > > >> root@server:/sys/class/hwmon# sensors nvme-pci-0100 > >> nvme-pci-0100 > >> Adapter: PCI adapter > >> Composite: +38.9°C (low = -273.1°C, high = +84.8°C) > >> (crit = +84.8°C) > >> Sensor 1: +38.9°C (low = -0.1°C, high = +99.8°C) > >> Sensor 2: +42.9°C (low = -273.1°C, high = +65261.8°C) > >> > >> If you dislike the defaults, just configure whatever you think is > >> appropriate for your system. > > > > It's not about disliking the values. I want to find out if these Samsung models > > don't support that, or it is a bug somewhere in writing/calculating the values. > > > No, this is not a bug. It is perfectly valid for individual sensors to have > uninitialized limits. If I recall correctly, the NVME specification > specifically states that the default settings for individual sensors > shall be those values (0 and 65535 Kelvin, specifically). > > And, yes, I would agree that is a bit odd that NVME drives report temperatures > in Kelvin, but such is the world. > > > In the case, Samsung and others don't support such a thing wouldn't be > > better to just ignore > > the bogus reading altogether? > > Again, you can set whatever limits you like. The default limits on many > hardware sensor chips have odd values. Just looking at my system: > > nct6797-isa-0a20 > Adapter: ISA adapter > in0: +0.48 V (min = +0.00 V, max = +1.74 V) > in1: +1.02 V (min = +0.00 V, max = +0.00 V) ALARM > in2: +3.39 V (min = +0.00 V, max = +0.00 V) ALARM > in3: +3.31 V (min = +0.00 V, max = +0.00 V) ALARM > in4: +1.00 V (min = +0.00 V, max = +0.00 V) ALARM > in5: +0.14 V (min = +0.00 V, max = +0.00 V) ALARM > in6: +0.82 V (min = +0.00 V, max = +0.00 V) ALARM > in7: +3.38 V (min = +0.00 V, max = +0.00 V) ALARM > in8: +3.26 V (min = +0.00 V, max = +0.00 V) ALARM > in9: +1.82 V (min = +0.00 V, max = +0.00 V) ALARM > in10: +0.00 V (min = +0.00 V, max = +0.00 V) > in11: +0.74 V (min = +0.00 V, max = +0.00 V) ALARM > in12: +1.20 V (min = +0.00 V, max = +0.00 V) ALARM > in13: +0.68 V (min = +0.00 V, max = +0.00 V) ALARM > in14: +1.50 V (min = +0.00 V, max = +0.00 V) ALARM > > > Are you suggesting that we should not support setting min/max values for > all drivers just because they are often not initialized to reasonable values > by default ? No, I'm not suggesting that. I'm aware of strange I/O monitoring chips values and the lack of documentation, so in this case, something is better than nothing. In the nvme case, these are only 2 values who either are working/supported by firmware or not, so I thought it would be reasonable to have known-good values instead of 65261.8°C, which will probably cause users to report that as a bug a lot. Can we at least have that documented and explain how the values can be set/changed? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-11 20:22 ` Guenter Roeck 2020-01-12 11:17 ` Gabriel C @ 2020-01-14 3:03 ` Martin K. Petersen 2020-01-14 5:20 ` Guenter Roeck 1 sibling, 1 reply; 31+ messages in thread From: Martin K. Petersen @ 2020-01-14 3:03 UTC (permalink / raw) To: Guenter Roeck Cc: Martin K. Petersen, linux-hwmon, Jean Delvare, Linus Walleij, Bart Van Assche, linux-doc, linux-kernel, linux-scsi, linux-ide, Chris Healy Hi Guenter! > I tried again, this time with v5.5-rc5. Loading and unloading ahci and > drivetemp in any order does not cause any problems for me. I tried your hwmon-next branch and it still happens for me. Both in qemu and on real hw. I'm really low on bandwidth the next couple of days. Will try to look later this week unless you beat me to it. I get lots of these warnings after modprobe drivetemp; modprobe ahci: [ 1055.611922] WARNING: CPU: 3 PID: 3233 at drivers/base/dd.c:519 really_probe+0x436/0x4f0 A quick test forcing synchronous SCSI scanning made no difference. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-14 3:03 ` Martin K. Petersen @ 2020-01-14 5:20 ` Guenter Roeck 2020-01-16 4:12 ` Martin K. Petersen 0 siblings, 1 reply; 31+ messages in thread From: Guenter Roeck @ 2020-01-14 5:20 UTC (permalink / raw) To: Martin K. Petersen Cc: linux-hwmon, Jean Delvare, Linus Walleij, Bart Van Assche, linux-doc, linux-kernel, linux-scsi, linux-ide, Chris Healy Hi Martin, On 1/13/20 7:03 PM, Martin K. Petersen wrote: > > Hi Guenter! > >> I tried again, this time with v5.5-rc5. Loading and unloading ahci and >> drivetemp in any order does not cause any problems for me. > > I tried your hwmon-next branch and it still happens for me. Both in qemu > and on real hw. I'm really low on bandwidth the next couple of days. > Will try to look later this week unless you beat me to it. I get lots of > these warnings after modprobe drivetemp; modprobe ahci: > > [ 1055.611922] WARNING: CPU: 3 PID: 3233 at drivers/base/dd.c:519 really_probe+0x436/0x4f0 > > A quick test forcing synchronous SCSI scanning made no difference. > The hwmon-next branch is based on v5.5-rc1. It might be better to either merge hwmon-next into mainline, or to apply the drivetemp patch to mainline, and test the result. I have seen some (unrelated) weird tracebacks in the driver core with v5.5-rc1, so that may not be the best baseline for a test. Thanks, Guenter ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-14 5:20 ` Guenter Roeck @ 2020-01-16 4:12 ` Martin K. Petersen 2020-01-16 5:09 ` Guenter Roeck 2020-01-16 17:47 ` Guenter Roeck 0 siblings, 2 replies; 31+ messages in thread From: Martin K. Petersen @ 2020-01-16 4:12 UTC (permalink / raw) To: Guenter Roeck Cc: Martin K. Petersen, linux-hwmon, Jean Delvare, Linus Walleij, Bart Van Assche, linux-doc, linux-kernel, linux-scsi, linux-ide, Chris Healy Guenter, > The hwmon-next branch is based on v5.5-rc1. It might be better to > either merge hwmon-next into mainline, or to apply the drivetemp patch > to mainline, and test the result. I have seen some (unrelated) weird > tracebacks in the driver core with v5.5-rc1, so that may not be the > best baseline for a test. I'm afraid the warnings still happen with hwmon-next on top of linus/master. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-16 4:12 ` Martin K. Petersen @ 2020-01-16 5:09 ` Guenter Roeck 2020-01-16 17:47 ` Guenter Roeck 1 sibling, 0 replies; 31+ messages in thread From: Guenter Roeck @ 2020-01-16 5:09 UTC (permalink / raw) To: Martin K. Petersen Cc: linux-hwmon, Jean Delvare, Linus Walleij, Bart Van Assche, linux-doc, linux-kernel, linux-scsi, linux-ide, Chris Healy Hi Martin, On 1/15/20 8:12 PM, Martin K. Petersen wrote: > > Guenter, > >> The hwmon-next branch is based on v5.5-rc1. It might be better to >> either merge hwmon-next into mainline, or to apply the drivetemp patch >> to mainline, and test the result. I have seen some (unrelated) weird >> tracebacks in the driver core with v5.5-rc1, so that may not be the >> best baseline for a test. > > I'm afraid the warnings still happen with hwmon-next on top of > linus/master. > Can you possibly provide details, like the configuration you use for qemu, the qemu command line, and the exact command sequence you use in qemu to reproduce the problem ? Thanks, Guenter ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-16 4:12 ` Martin K. Petersen 2020-01-16 5:09 ` Guenter Roeck @ 2020-01-16 17:47 ` Guenter Roeck 2020-01-17 1:43 ` Martin K. Petersen 1 sibling, 1 reply; 31+ messages in thread From: Guenter Roeck @ 2020-01-16 17:47 UTC (permalink / raw) To: Martin K. Petersen Cc: linux-hwmon, Jean Delvare, Linus Walleij, Bart Van Assche, linux-doc, linux-kernel, linux-scsi, linux-ide, Chris Healy On Wed, Jan 15, 2020 at 11:12:23PM -0500, Martin K. Petersen wrote: > > Guenter, > > > The hwmon-next branch is based on v5.5-rc1. It might be better to > > either merge hwmon-next into mainline, or to apply the drivetemp patch > > to mainline, and test the result. I have seen some (unrelated) weird > > tracebacks in the driver core with v5.5-rc1, so that may not be the > > best baseline for a test. > > I'm afraid the warnings still happen with hwmon-next on top of > linus/master. > Can you by any chance provide a full traceback ? The warning you reported suggests that a devm_ function was called on a device pointer prior to a device registration. I don't immediately see how that happens. A full traceback might give us an idea. I suspect that the underlying problem is in hwmon_device_register_with_info(), which uses devm_ functions to allocate memory associated with the device pointer passed to it, in this case the SCSI device. This is inherently wrong (independent of this driver), since the lifetime of the hardware monitoring device does not necessarily match the lifetime of its parent. The impact here is that we may get a memory leak under some circumstances. I'll have to fix that in the hardware monitoring core. Either case, I would like to track down how the warning happens, so any information you can provide that lets me reproduce the problem would be very helpful. Thanks, Guenter ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-16 17:47 ` Guenter Roeck @ 2020-01-17 1:43 ` Martin K. Petersen 2020-01-17 3:53 ` Guenter Roeck 0 siblings, 1 reply; 31+ messages in thread From: Martin K. Petersen @ 2020-01-17 1:43 UTC (permalink / raw) To: Guenter Roeck Cc: Martin K. Petersen, linux-hwmon, Jean Delvare, Linus Walleij, Bart Van Assche, linux-doc, linux-kernel, linux-scsi, linux-ide, Chris Healy Guenter, > Can you by any chance provide a full traceback ? My test machines are tied up with something else right now. This is from a few days ago (pristine hwmon-next, I believe): [ 1055.611912] ------------[ cut here ]------------ [ 1055.611922] WARNING: CPU: 3 PID: 3233 at drivers/base/dd.c:519 really_probe+0x436/0x4f0 [ 1055.611925] Modules linked in: sd_mod sg ahci libahci libata drivetemp scsi_mod crc32c_intel igb i2c_algo_bit i2c_core dca hwmon ipv6 nf_defrag_ipv6 crc_ccitt [ 1055.611955] CPU: 3 PID: 3233 Comm: kworker/u17:1 Tainted: G W 5.5.0-rc1+ #21 [ 1055.611965] Workqueue: events_unbound async_run_entry_fn [ 1055.611973] RIP: 0010:really_probe+0x436/0x4f0 [ 1055.611979] Code: c7 30 69 f8 82 e8 ba 94 e5 ff e9 60 ff ff ff 48 8d 7b 38 e8 cc d9 b4 ff 48 8b 43 38 48 85 c0 0f 85 41 fd ff ff e9 4f fd ff ff <0f> 0b e9 66 fc ff ff 48 8d 7d 50 e8 aa d9 b4 ff 4c 8b 6d 50 4d 85 [ 1055.611983] RSP: 0018:ffff8881edb77c98 EFLAGS: 00010287 [ 1055.611989] RAX: ffff8881e1f8fb80 RBX: ffffffffa033a000 RCX: ffffffff8182e583 [ 1055.611993] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff8881dec506a8 [ 1055.611997] RBP: ffff8881dec50238 R08: 0000000000000001 R09: fffffbfff09629ed [ 1055.612000] R10: fffffbfff09629ec R11: 0000000000000003 R12: 0000000000000000 [ 1055.612004] R13: ffff8881dec506a8 R14: ffffffff8182eca0 R15: 000000000000000b [ 1055.612009] FS: 0000000000000000(0000) GS:ffff8881f8900000(0000) knlGS:0000000000000000 [ 1055.612013] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1055.612017] CR2: 00007f957884a000 CR3: 00000001df5ec003 CR4: 00000000000606e0 [ 1055.612020] Call Trace: [ 1055.612038] ? driver_probe_device+0x170/0x170 [ 1055.612045] driver_probe_device+0x82/0x170 [ 1055.612058] ? driver_probe_device+0x170/0x170 [ 1055.612064] __driver_attach_async_helper+0xa3/0xe0 [ 1055.612076] async_run_entry_fn+0x68/0x2a0 [ 1055.612094] process_one_work+0x4df/0x990 [ 1055.612121] ? pwq_dec_nr_in_flight+0x110/0x110 [ 1055.612127] ? do_raw_spin_lock+0x113/0x1d0 [ 1055.612161] worker_thread+0x78/0x5c0 [ 1055.612190] ? process_one_work+0x990/0x990 [ 1055.612195] kthread+0x1be/0x1e0 [ 1055.612202] ? kthread_create_worker_on_cpu+0xd0/0xd0 [ 1055.612215] ret_from_fork+0x3a/0x50 [ 1055.612251] irq event stamp: 3512 [ 1055.612259] hardirqs last enabled at (3511): [<ffffffff81d2b874>] _raw_spin_unlock_irq+0x24/0x30 [ 1055.612265] hardirqs last disabled at (3512): [<ffffffff810029c9>] trace_hardirqs_off_thunk+0x1a/0x1c [ 1055.612272] softirqs last enabled at (3500): [<ffffffff820003a5>] __do_softirq+0x3a5/0x5a8 [ 1055.612281] softirqs last disabled at (3489): [<ffffffff810cec7b>] irq_exit+0xfb/0x100 [ 1055.612284] ---[ end trace f0a8dd9a37bea031 ]--- > Either case, I would like to track down how the warning happens, so any > information you can provide that lets me reproduce the problem would be > very helpful. The three systems that exhibit the problem are stock (2010/2012/2014 vintage) x86_64 servers with onboard AHCI and a variety of 4-6 SATA drives each. For the qemu test I didn't have ahci configured but I had my SCSI temp patch on top of yours and ran modprobe drivetemp; modprobe scsi_debug to trigger the warnings. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] hwmon: Driver for temperature sensors on SATA drives 2020-01-17 1:43 ` Martin K. Petersen @ 2020-01-17 3:53 ` Guenter Roeck 0 siblings, 0 replies; 31+ messages in thread From: Guenter Roeck @ 2020-01-17 3:53 UTC (permalink / raw) To: Martin K. Petersen Cc: linux-hwmon, Jean Delvare, Linus Walleij, Bart Van Assche, linux-doc, linux-kernel, linux-scsi, linux-ide, Chris Healy On 1/16/20 5:43 PM, Martin K. Petersen wrote: > > Guenter, > >> Can you by any chance provide a full traceback ? > > My test machines are tied up with something else right now. This is from > a few days ago (pristine hwmon-next, I believe): > > [ 1055.611912] ------------[ cut here ]------------ > [ 1055.611922] WARNING: CPU: 3 PID: 3233 at drivers/base/dd.c:519 really_probe+0x436/0x4f0 > [ 1055.611925] Modules linked in: sd_mod sg ahci libahci libata drivetemp scsi_mod crc32c_intel igb i2c_algo_bit i2c_core dca hwmon ipv6 nf_defrag_ipv6 crc_ccitt > [ 1055.611955] CPU: 3 PID: 3233 Comm: kworker/u17:1 Tainted: G W 5.5.0-rc1+ #21 > [ 1055.611965] Workqueue: events_unbound async_run_entry_fn > [ 1055.611973] RIP: 0010:really_probe+0x436/0x4f0 > [ 1055.611979] Code: c7 30 69 f8 82 e8 ba 94 e5 ff e9 60 ff ff ff 48 8d 7b 38 e8 cc d9 b4 ff 48 8b 43 38 48 85 c0 0f 85 41 fd ff ff e9 4f fd ff ff <0f> 0b e9 66 fc ff ff 48 8d 7d 50 e8 aa d9 b4 ff 4c 8b 6d 50 4d 85 > [ 1055.611983] RSP: 0018:ffff8881edb77c98 EFLAGS: 00010287 > [ 1055.611989] RAX: ffff8881e1f8fb80 RBX: ffffffffa033a000 RCX: ffffffff8182e583 > [ 1055.611993] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff8881dec506a8 > [ 1055.611997] RBP: ffff8881dec50238 R08: 0000000000000001 R09: fffffbfff09629ed > [ 1055.612000] R10: fffffbfff09629ec R11: 0000000000000003 R12: 0000000000000000 > [ 1055.612004] R13: ffff8881dec506a8 R14: ffffffff8182eca0 R15: 000000000000000b > [ 1055.612009] FS: 0000000000000000(0000) GS:ffff8881f8900000(0000) knlGS:0000000000000000 > [ 1055.612013] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1055.612017] CR2: 00007f957884a000 CR3: 00000001df5ec003 CR4: 00000000000606e0 > [ 1055.612020] Call Trace: > [ 1055.612038] ? driver_probe_device+0x170/0x170 > [ 1055.612045] driver_probe_device+0x82/0x170 > [ 1055.612058] ? driver_probe_device+0x170/0x170 > [ 1055.612064] __driver_attach_async_helper+0xa3/0xe0 > [ 1055.612076] async_run_entry_fn+0x68/0x2a0 > [ 1055.612094] process_one_work+0x4df/0x990 > [ 1055.612121] ? pwq_dec_nr_in_flight+0x110/0x110 > [ 1055.612127] ? do_raw_spin_lock+0x113/0x1d0 > [ 1055.612161] worker_thread+0x78/0x5c0 > [ 1055.612190] ? process_one_work+0x990/0x990 > [ 1055.612195] kthread+0x1be/0x1e0 > [ 1055.612202] ? kthread_create_worker_on_cpu+0xd0/0xd0 > [ 1055.612215] ret_from_fork+0x3a/0x50 > [ 1055.612251] irq event stamp: 3512 > [ 1055.612259] hardirqs last enabled at (3511): [<ffffffff81d2b874>] _raw_spin_unlock_irq+0x24/0x30 > [ 1055.612265] hardirqs last disabled at (3512): [<ffffffff810029c9>] trace_hardirqs_off_thunk+0x1a/0x1c > [ 1055.612272] softirqs last enabled at (3500): [<ffffffff820003a5>] __do_softirq+0x3a5/0x5a8 > [ 1055.612281] softirqs last disabled at (3489): [<ffffffff810cec7b>] irq_exit+0xfb/0x100 > [ 1055.612284] ---[ end trace f0a8dd9a37bea031 ]--- > >> Either case, I would like to track down how the warning happens, so any >> information you can provide that lets me reproduce the problem would be >> very helpful. > > The three systems that exhibit the problem are stock (2010/2012/2014 > vintage) x86_64 servers with onboard AHCI and a variety of 4-6 SATA > drives each. > > For the qemu test I didn't have ahci configured but I had my SCSI temp > patch on top of yours and ran modprobe drivetemp; modprobe scsi_debug to > trigger the warnings. > Interesting. Looks like your system performs asynchronous probing. No idea how that can result in that kind of problem, but who knows. Can you send me the qemu command line ? Thanks, Guenter ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2020-01-17 3:53 UTC | newest] Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-12-15 17:45 [PATCH v2 0/1] Summary: hwmon driver for temperature sensors on SATA drives Guenter Roeck 2019-12-15 17:45 ` [PATCH v2] hwmon: Driver " Guenter Roeck 2019-12-19 0:15 ` Martin K. Petersen 2019-12-19 0:32 ` Guenter Roeck 2020-01-07 4:10 ` Martin K. Petersen 2020-01-07 13:00 ` Guenter Roeck 2020-01-08 1:29 ` Martin K. Petersen 2020-01-08 15:32 ` Guenter Roeck 2019-12-19 7:37 ` Guenter Roeck 2020-01-01 17:46 ` Guenter Roeck 2020-01-03 3:06 ` Martin K. Petersen 2020-01-08 1:12 ` Martin K. Petersen 2020-01-08 15:33 ` Guenter Roeck 2020-01-11 20:22 ` Guenter Roeck 2020-01-12 11:17 ` Gabriel C 2020-01-12 11:21 ` Linus Walleij 2020-01-12 12:02 ` Gabriel C 2020-01-12 12:07 ` Linus Walleij 2020-01-12 13:07 ` Guenter Roeck 2020-01-12 13:45 ` Gabriel C 2020-01-12 15:26 ` Guenter Roeck 2020-01-12 18:37 ` Gabriel C 2020-01-12 20:08 ` Guenter Roeck 2020-01-12 22:26 ` Gabriel C 2020-01-14 3:03 ` Martin K. Petersen 2020-01-14 5:20 ` Guenter Roeck 2020-01-16 4:12 ` Martin K. Petersen 2020-01-16 5:09 ` Guenter Roeck 2020-01-16 17:47 ` Guenter Roeck 2020-01-17 1:43 ` Martin K. Petersen 2020-01-17 3:53 ` Guenter Roeck
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).