Quantcast
Channel: Hacker's ramblings - Hardware
Viewing all articles
Browse latest Browse all 142

Nuvoton NCT6796D lm_sensors output

$
0
0

LM-Sensors is set of libraries and tools for accessing your Linux server's motheboard sensors. See more @ https://github.com/lm-sensors/lm-sensors.

If you're ever wondered why in Windows it is tricky to get readings from your CPU-fan rotation speed or core temperatures from you fancy GPU without manufacturer utilities. Obviously vendors do provide all the possible readings in their utilities, but people who would want to read, record and store the data for their own purposes, things get hairy. Nothing generic exists and for unknown reason, such API isn't even planned.

In Linux, The One toolkit to use is LM-Sensors. On kernel side, there exists The Linux Hardware Monitoring kernel API. For this stack to work, you also need a kernel module specific to your motherboard providing the requested sensor information via this beautiful API. It's also worth noting, your PC's hardware will have multiple sensors data providers. An incomplete list would include: motherboard, CPU, GPU, SSD, PSU, etc.

Now that sensors-detect found all your sensors, confirm sensors will output what you'd expect it to. In my case there was a major malfunction. On a boot, following thing happened when system started sensord (in case you didn't know, kernel-stuff can be read via dmesg):

systemd[1]: Starting lm_sensors.service - Hardware Monitoring Sensors...
kernel: nct6775: Enabling hardware monitor logical device mappings.
kernel: nct6775: Found NCT6793D or compatible chip at 0x2e:0x290
kernel: ACPI Warning: SystemIO range 0x0000000000000295-0x0000000000000296 conflicts with OpRegion 0x0000000000000290-0x0000000000000299 (_GPE.HWM) (20221020/utaddress-204)
kernel: ACPI: OSL: Resource conflict; ACPI support missing from driver?
systemd[1]: Finished lm_sensors.service - Hardware Monitoring Sensors.

This conflict resulted in no available mobo readings! NVMe, GPU and CPU-cores were ok, the part I was mostly looking for was fan RPMs and mobo temps just to verify my system health. No such joy. :-( Uff.

It seems, this particular Linux kernel module has issues. Or another way to state it: mobo manufacturers have trouble implementing Nuvoton chip into their mobos. On Gentoo forums, there is a helpful thread: [solved] nct6775 monitoring driver conflicts with ACPI
Disclaimer: For ROG Maximus X Code -mobo adding acpi_enforce_resources=no into kernel parameters is the correct solution. Results will vary depending on what mobo you have.

Such ACPI-setting can be permanently enforced by first querying about the Linux kernel version being used (I run a Fedora): grubby --info=$(grubby --default-index). The resulting kernel version can be updated by: grubby --args="acpi_enforce_resources=no" --update-kernel /boot/vmlinuz-6.3.8-200.fc38.x86_64. A reboot shows fix in effect, ACPI Warning is gone and mobo sensor data can be seen.

As a next step you'll need userland tooling to interpret the raw data into human-readable information with semantics. A new years back, I wrote about Improving Nuvoton NCT6776 lm_sensors output. It's mainly about bridging the flow of zeros and ones into something having meaning to humans. This is my LM-Sensors configuration for ROG Maximus X Code:

chip "nct6793-isa-0290"

    # 1. voltages
    ignore in0
    ignore in1
    ignore in2
    ignore in3
    ignore in4
    ignore in5
    ignore in6
    ignore in7
    ignore in8
    ignore in9
    ignore in10
    ignore in11

    label in12 "Cold Bug Killer"
    set in12_min 0.936
    set in12_max 2.613
    set in12_beep 1

    label in13 "DMI"
    set in13_min 0.550
    set in13_max 2.016
    set in13_beep 1

    ignore in14

    # 2. fans
    label fan1 "Chassis fan1"
    label fan2 "CPU fan"
    ignore fan3
    ignore fan4
    label fan5 "Ext fan?"

    # 3. temperatures
    label temp1 "MoBo"

    label temp2 "CPU"
    set temp2_max 90
    set temp2_beep 1

    ignore temp3
    ignore temp5
    ignore temp6
    ignore temp9
    ignore temp10
    ignore temp13
    ignore temp14
    ignore temp15
    ignore temp16
    ignore temp17
    ignore temp18

    # 4. other
    set beep_enable 1
    ignore intrusion0
    ignore intrusion1

I'd like to credit Mr. Peter Sulyok on his work about ASRock Z390 Taichi. This mobo happens to use the same Nuvoton NCT6796D -chip for LPC/eSPI SI/O (I have no idea what those acronyms are for, I just copy/pasted them from the chip data sheet). The configuration is in GitHub for everybody to see: https://github.com/petersulyok/asrock_z390_taichi

Also, I''d like to state my ignorance. After reading less than 500 pages of the NCT6796D data sheet, I have no idea what is:

  • Cold Bug Killer voltage
  • DMI voltage
  • AUXTIN1 is or exactly what temperature measurement it serves
  • PECI Agent 0 temperature
  • PECI Agent 0 Calibration temperature

Remember, I did mention semantics. From sensors-command output I can read a reading, what it translates into, no idea! Luckily there are some of the readings seen are easy to understand and interpret. As an example, fan RPMs are really easy to measure by removing the fan from its connector. Here is an excerpt from my mobo manual to explain fan-connectors:

As data quality is taken care of and output is meaningful, next step is to start recording data. In LM-Sensors, there is sensord for that. It is a system service taking a snapshot (you can define the frequency) and storing it for later use. I'll enrich the stored data points with system load averages, this enables me to estimate a relation with high temperatures and/or fan RPMs with how much my system is working.

Finally, all data gathered into a RRDtool database can be easily visualized with rrdcgi into HTML + set of PNG-images to present a web page like this:

Nice!


Viewing all articles
Browse latest Browse all 142

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>