博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
内存控制器错误信息[备忘]
阅读量:4078 次
发布时间:2019-05-25

本文共 4683 字,大约阅读时间需要 15 分钟。

参考日志错误信息:

[root@hh-yun-compute-130125 ~]# cat /var/log/messages | grep -i errorMar  1 04:58:05 hh-yun-compute-130125 kernel: sbridge: HANDLING MCE MEMORY ERRORMar  1 04:58:06 hh-yun-compute-130125 kernel: EDAC MC1: CE row 2, channel 0, label "CPU_SrcID#1_Channel#2_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=6 Err=0008:00c2 (ch=2), addr = 0x16113a9000 => socket=1, Channel=2(mask=4), rank=0Mar  1 10:27:08 hh-yun-compute-130125 kernel: sbridge: HANDLING MCE MEMORY ERRORMar  1 10:27:09 hh-yun-compute-130125 kernel: EDAC MC1: CE row 2, channel 0, label "CPU_SrcID#1_Channel#2_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=6 Err=0008:00c2 (ch=2), addr = 0x15e1c49000 => socket=1, Channel=2(mask=4), rank=0Mar  1 13:52:56 hh-yun-compute-130125 kernel: sbridge: HANDLING MCE MEMORY ERRORMar  1 13:52:57 hh-yun-compute-130125 kernel: EDAC MC1: CE row 2, channel 0, label "CPU_SrcID#1_Channel#2_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=6 Err=0008:00c2 (ch=2), addr = 0x160e949000 => socket=1, Channel=2(mask=4), rank=0Mar  2 04:16:56 hh-yun-compute-130125 kernel: sbridge: HANDLING MCE MEMORY ERRORMar  2 04:16:56 hh-yun-compute-130125 kernel: sbridge: HANDLING MCE MEMORY ERRORMar  2 04:16:57 hh-yun-compute-130125 kernel: EDAC MC1: CE row 2, channel 0, label "CPU_SrcID#1_Channel#2_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=6 Err=0008:00c2 (ch=2), addr = 0x1613a61000 => socket=1, Channel=2(mask=4), rank=0Mar  2 04:16:57 hh-yun-compute-130125 kernel: EDAC MC1: CE row 2, channel 0, label "CPU_SrcID#1_Channel#2_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=6 Err=0008:00c2 (ch=2), addr = 0x1613a79000 => socket=1, Channel=2(mask=4), rank=0

参考信息2:

[root@hh-yun-compute-130125 ~]# cat /sys/devices/system/edac/mc/mc?/ce*count0080[root@hh-yun-compute-130125 ~]# cat /sys/devices/system/edac/mc/mc1/ce_count8

模块信息

[root@hh-yun-compute-130125 ~]# modinfo sb_edacfilename:       /lib/modules/2.6.32-504.3.3.el6.x86_64/kernel/drivers/edac/sb_edac.kodescription:    MC Driver for Intel Sandy Bridge and Ivy Bridge memory controllers -  Ver: 1.1.0author:         Red Hat Inc. (http://www.redhat.com)author:         Mauro Carvalho Chehab 
license: GPLsrcversion: 01CFEEBE911D55B6FE660BEalias: pci:v00008086d00002FA0sv*sd*bc*sc*i*alias: pci:v00008086d00000EA8sv*sd*bc*sc*i*alias: pci:v00008086d00003CA8sv*sd*bc*sc*i*depends: edac_corevermagic: 2.6.32-504.3.3.el6.x86_64 SMP mod_unload modversionsparm: edac_op_state:EDAC Error Reporting state: 0=Poll,1=NMI (int)[root@hh-yun-compute-130125 ~]# modinfo edac_corefilename: /lib/modules/2.6.32-504.3.3.el6.x86_64/kernel/drivers/edac/edac_core.kodescription: Core library routines for EDAC reportingauthor: Doug Thompson www.softwarebitmaker.com, et allicense: GPLsrcversion: C21E296292A2174839A086Cdepends:vermagic: 2.6.32-504.3.3.el6.x86_64 SMP mod_unload modversionsparm: check_pci_errors:Check for PCI bus parity errors: 0=off 1=on (int)parm: edac_pci_panic_on_pe:Panic on PCI Bus Parity error: 0=off 1=on (int)parm: edac_mc_panic_on_ue:Panic on uncorrected error: 0=off 1=on (int)parm: edac_mc_log_ue:Log uncorrectable error to console: 0=off 1=on (int)parm: edac_mc_log_ce:Log correctable error to console: 0=off 1=on (int)parm: edac_mc_poll_msec:Polling period in milliseconds

官方解释:

Total Correctable Errors count attribute file:	'ce_count'	This attribute file displays the total count of correctable	errors that have occurred on this csrow. This	count is very important to examine. CEs provide early	indications that a DIMM is beginning to fail. This count	field should be monitored for non-zero values and report	such information to the system administrator.

启用 mcelog

[root@hh-yun-compute-130125 ~]# service  mcelogd restartStopping mcelog                                     [确定]Starting mcelog daemon                              [确定][root@hh-yun-compute-130125 ~]# mcelogmcelog: Family 6 Model 3e CPU: only decoding architectural errors

查询日志

[root@hh-yun-compute-130125 ~]# tail /var/log/mcelogmcelog: failed to prefill DIMM database from DMI datamcelog: mcelog server already running

相关评估

This is a harmless warning message. The DIMM database prefill relies on a specific non-standard format of the DIMMs in the DMI BIOS tables. If this format is not used by the BIOS, mcelog will only discover DIMMs as they get their first error (if the CPU reports DIMMs in machine check errors). Please understand for the most part, mcelog should be ignored.
因此最终决定忽略该信息



转载地址:http://nonni.baihongyu.com/

你可能感兴趣的文章
链接点--数据结构和算法
查看>>
servlet中请求转发(forword)与重定向(sendredirect)的区别
查看>>
Spring4的IoC和DI的区别
查看>>
springcloud 的eureka服务注册demo
查看>>
eureka-client.properties文件配置
查看>>
MODULE_DEVICE_TABLE的理解
查看>>
platform_device与platform_driver
查看>>
platform_driver平台驱动注册和注销过程(下)
查看>>
.net强制退出主窗口的方法——Application.Exit()方法和Environment.Exit(0)方法
查看>>
c# 如何调用win8自带的屏幕键盘(非osk.exe)
查看>>
build/envsetup.sh 简介
查看>>
Android framework中修改或者添加资源无变化或编译不通过问题详解
查看>>
linux怎么切换到root里面?
查看>>
linux串口操作及设置详解
查看>>
安装alien,DEB与RPM互换
查看>>
编译Android4.0源码时常见错误及解决办法
查看>>
Android 源码编译make的错误处理
查看>>
linux环境下C语言中sleep的问题
查看>>
ubuntu 12.04 安装 GMA3650驱动
查看>>
新版本的linux如何生成xorg.conf
查看>>