FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Red Hat Linux

 
 
LinkBack Thread Tools
 
Old 03-12-2012, 09:28 PM
unix syzadmin
 
Default Does redhat linux log all hardware events/issues/error in /var/log/mcelog?

Hi,

We run redhat linux on intel hardware (mostly Dell, lately dell R710s).
We want to be able to catch any hardware issues when they occur to act on
them as quickly as possible.

My understanding is that all hardware events/issues/errors are logged in
/var/log/mcelog (Machine Check Events log). Is this correct? Can't stress
this enough; does it log all hardware issues
(cpu,memory,disk,ethernet,fibre/hba etc) ?

Thanks,
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 03-13-2012, 12:00 AM
Paul Tader
 
Default Does redhat linux log all hardware events/issues/error in /var/log/mcelog?

On 3/12/12 5:28 PM, unix syzadmin wrote:

Hi,

We run redhat linux on intel hardware (mostly Dell, lately dell R710s).
We want to be able to catch any hardware issues when they occur to act on
them as quickly as possible.

My understanding is that all hardware events/issues/errors are logged in
/var/log/mcelog (Machine Check Events log). Is this correct? Can't stress
this enough; does it log all hardware issues
(cpu,memory,disk,ethernet,fibre/hba etc) ?

Thanks,


I've used MCElog to catch some CPU events but I think you might want to
check out Dell's OpenManage client. It will report/monitor a lot more
information.



http://linux.dell.com/wiki/index.php/Repository/OMSA


To install:

# wget -q -O - http://linux.dell.com/repo/hardware/latest/bootstrap.cgi
| bash

# yum install srvadmin-base
# yum install srvadmin-storageservices

(logout / login for environment variables to take effect)

# /opt/dell/srvadmin/sbin/srvadmin-services.sh start
...

# omreport chassis
Health

Main System Chassis

SEVERITY : COMPONENT
Ok : Fans
Ok : Intrusion
Ok : Memory
Ok : Power Supplies
Ok : Processors
Ok : Temperatures
Ok : Voltages
Ok : Hardware Log
Ok : Batteries

# omreport chassis temps
Temperature Probes Information

------------------------------------
Main System Chassis Temperatures: Ok
------------------------------------

Index : 0
Status : Ok
Probe Name : System Board Ambient Temp
Reading : 20.0 C
Minimum Warning Threshold : 8.0 C
Maximum Warning Threshold : 42.0 C
Minimum Failure Threshold : 3.0 C
Maximum Failure Threshold : 47.0 C

# omreport storage pdisk controller=0

List of Physical Disks on Controller SAS 6/iR Integrated (Embedded)

Controller SAS 6/iR Integrated (Embedded)
ID : 0:0:0
Status : Ok
Name : Physical Disk 0:0:0
State : Online
Failure Predicted : No
Certified : Not Applicable
Encryption Capable : No
Secured : Not Applicable
Progress : Not Applicable
Bus Protocol : SAS
Media : HDD
Capacity : 67.75 GB (72746008576 bytes)
Used RAID Disk Space : 67.75 GB (72746008576 bytes)
Available RAID Disk Space : 0.00 GB (0 bytes)
Hot Spare : No
Vendor ID : DELL
Product ID : ST973402SS
Revision : S229

<snip>

You get the idea.

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 03-13-2012, 12:36 PM
unix syzadmin
 
Default Does redhat linux log all hardware events/issues/error in /var/log/mcelog?

Thanks.
I have downloaded and installed the OpenManage from Dell.
The following commands say if the health of system components is OK.
omreport chassis - health of all main components of the system chassis
omreport chassis processors - cpu health
omreport chassis memory - memory health
omreport chassis pwrsupplies - power supply health
omreport storage controller - raid controller health

However this leaves out the integrated NIC ports and the HBA adapters.
What linux / dell open manage commands can be used to confirm if those are
healthy as well?

Thanks,


On Mon, Mar 12, 2012 at 9:00 PM, Paul Tader <ptader@linuxscope.com> wrote:

> On 3/12/12 5:28 PM, unix syzadmin wrote:
>
>> Hi,
>>
>> We run redhat linux on intel hardware (mostly Dell, lately dell R710s).
>> We want to be able to catch any hardware issues when they occur to act on
>> them as quickly as possible.
>>
>> My understanding is that all hardware events/issues/errors are logged in
>> /var/log/mcelog (Machine Check Events log). Is this correct? Can't
>> stress
>> this enough; does it log all hardware issues
>> (cpu,memory,disk,ethernet,**fibre/hba etc) ?
>>
>> Thanks,
>>
>
> I've used MCElog to catch some CPU events but I think you might want to
> check out Dell's OpenManage client. It will report/monitor a lot more
> information.
>
>
> http://linux.dell.com/wiki/**index.php/Repository/OMSA<http://linux.dell.com/wiki/index.php/Repository/OMSA>
>
>
> To install:
>
> # wget -q -O - http://linux.dell.com/repo/**hardware/latest/bootstrap.cgi<http://linux.dell.com/repo/hardware/latest/bootstrap.cgi>| bash
> # yum install srvadmin-base
> # yum install srvadmin-storageservices
>
> (logout / login for environment variables to take effect)
>
> # /opt/dell/srvadmin/sbin/**srvadmin-services.sh start
> ...
>
> # omreport chassis
> Health
>
> Main System Chassis
>
> SEVERITY : COMPONENT
> Ok : Fans
> Ok : Intrusion
> Ok : Memory
> Ok : Power Supplies
> Ok : Processors
> Ok : Temperatures
> Ok : Voltages
> Ok : Hardware Log
> Ok : Batteries
>
> # omreport chassis temps
> Temperature Probes Information
>
> ------------------------------**------
> Main System Chassis Temperatures: Ok
> ------------------------------**------
>
> Index : 0
> Status : Ok
> Probe Name : System Board Ambient Temp
> Reading : 20.0 C
> Minimum Warning Threshold : 8.0 C
> Maximum Warning Threshold : 42.0 C
> Minimum Failure Threshold : 3.0 C
> Maximum Failure Threshold : 47.0 C
>
> # omreport storage pdisk controller=0
>
> List of Physical Disks on Controller SAS 6/iR Integrated (Embedded)
>
> Controller SAS 6/iR Integrated (Embedded)
> ID : 0:0:0
> Status : Ok
> Name : Physical Disk 0:0:0
> State : Online
> Failure Predicted : No
> Certified : Not Applicable
> Encryption Capable : No
> Secured : Not Applicable
> Progress : Not Applicable
> Bus Protocol : SAS
> Media : HDD
> Capacity : 67.75 GB (72746008576 bytes)
> Used RAID Disk Space : 67.75 GB (72746008576 bytes)
> Available RAID Disk Space : 0.00 GB (0 bytes)
> Hot Spare : No
> Vendor ID : DELL
> Product ID : ST973402SS
> Revision : S229
>
> <snip>
>
> You get the idea.
>
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request@**redhat.com<redhat-list-request@redhat.com>
> ?subject=unsubscribe
> https://www.redhat.com/**mailman/listinfo/redhat-list<https://www.redhat.com/mailman/listinfo/redhat-list>
>
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 03-13-2012, 07:59 PM
Grzegorz Witkowski
 
Default Does redhat linux log all hardware events/issues/error in /var/log/mcelog?

On Tue, Mar 13, 2012 at 1:36 PM, unix syzadmin <unixsyzadmin@gmail.com>wrote:

> Thanks.
> I have downloaded and installed the OpenManage from Dell.
> The following commands say if the health of system components is OK.
> omreport chassis - health of all main components of the system chassis
> omreport chassis processors - cpu health
> omreport chassis memory - memory health
> omreport chassis pwrsupplies - power supply health
> omreport storage controller - raid controller health
>
> However this leaves out the integrated NIC ports and the HBA adapters.
> What linux / dell open manage commands can be used to confirm if those are
> healthy as well?
>
> Thanks,
>
>
> On Mon, Mar 12, 2012 at 9:00 PM, Paul Tader <ptader@linuxscope.com> wrote:
>
> > On 3/12/12 5:28 PM, unix syzadmin wrote:
> >
> >> Hi,
> >>
> >> We run redhat linux on intel hardware (mostly Dell, lately dell R710s).
> >> We want to be able to catch any hardware issues when they occur to act
> on
> >> them as quickly as possible.
> >>
> >> My understanding is that all hardware events/issues/errors are logged in
> >> /var/log/mcelog (Machine Check Events log). Is this correct? Can't
> >> stress
> >> this enough; does it log all hardware issues
> >> (cpu,memory,disk,ethernet,**fibre/hba etc) ?
> >>
> >> Thanks,
> >>
> >
> > I've used MCElog to catch some CPU events but I think you might want to
> > check out Dell's OpenManage client. It will report/monitor a lot more
> > information.
> >
> >
> > http://linux.dell.com/wiki/**index.php/Repository/OMSA<
> http://linux.dell.com/wiki/index.php/Repository/OMSA>
> >
> >
> > To install:
> >
> > # wget -q -O -
> http://linux.dell.com/repo/**hardware/latest/bootstrap.cgi<
> http://linux.dell.com/repo/hardware/latest/bootstrap.cgi>| bash
> > # yum install srvadmin-base
> > # yum install srvadmin-storageservices
> >
> > (logout / login for environment variables to take effect)
> >
> > # /opt/dell/srvadmin/sbin/**srvadmin-services.sh start
> > ...
> >
> > # omreport chassis
> > Health
> >
> > Main System Chassis
> >
> > SEVERITY : COMPONENT
> > Ok : Fans
> > Ok : Intrusion
> > Ok : Memory
> > Ok : Power Supplies
> > Ok : Processors
> > Ok : Temperatures
> > Ok : Voltages
> > Ok : Hardware Log
> > Ok : Batteries
> >
> > # omreport chassis temps
> > Temperature Probes Information
> >
> > ------------------------------**------
> > Main System Chassis Temperatures: Ok
> > ------------------------------**------
> >
> > Index : 0
> > Status : Ok
> > Probe Name : System Board Ambient Temp
> > Reading : 20.0 C
> > Minimum Warning Threshold : 8.0 C
> > Maximum Warning Threshold : 42.0 C
> > Minimum Failure Threshold : 3.0 C
> > Maximum Failure Threshold : 47.0 C
> >
> > # omreport storage pdisk controller=0
> >
> > List of Physical Disks on Controller SAS 6/iR Integrated (Embedded)
> >
> > Controller SAS 6/iR Integrated (Embedded)
> > ID : 0:0:0
> > Status : Ok
> > Name : Physical Disk 0:0:0
> > State : Online
> > Failure Predicted : No
> > Certified : Not Applicable
> > Encryption Capable : No
> > Secured : Not Applicable
> > Progress : Not Applicable
> > Bus Protocol : SAS
> > Media : HDD
> > Capacity : 67.75 GB (72746008576 bytes)
> > Used RAID Disk Space : 67.75 GB (72746008576 bytes)
> > Available RAID Disk Space : 0.00 GB (0 bytes)
> > Hot Spare : No
> > Vendor ID : DELL
> > Product ID : ST973402SS
> > Revision : S229
> >
> > <snip>
> >
> > You get the idea.
> >
> > --
> > redhat-list mailing list
> > unsubscribe mailto:redhat-list-request@**redhat.com<
> redhat-list-request@redhat.com>
> > ?subject=unsubscribe
> > https://www.redhat.com/**mailman/listinfo/redhat-list<
> https://www.redhat.com/mailman/listinfo/redhat-list>
>
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
>

Hi guys,

You can use OMSA (Op-enManage Server Administrator) to monitor a particular
system. It is installed locally on the server.
You can setup SNMP alerts (traps) from it, for example.
To monitor many systems on a central management system, you can use ITA
(OpenManage IT Assistant) or OME (OpenManage Essentials) now. There is also
DMC (Dell Management Console).

When you download OMSA and extract a tarball you will find an installation
script there. There will also be a folder with all required RPM. I would
recommend to install all of them to get all the features. You will need
compat-libstdc++-33 installed and gcc if I remember on a top of my head and
on x86_64 you will need 32bit versions of those installed. No problem to
have them installed in parallel with x64 bit libraries.
In iptables (or your firewall of choice), you need to have port 1311 open,
and you access your OMSA https://<host>:1311 with user root and its
password.

Before you download anything or try to install, check a compatibility
matrix. Here is for version 7.0
http://support.dell.com/support/edocs/software/smsom/7.0/en/index.htm

OpenManage Software (all versions)
http://support.dell.com/support/edocs/software/smsom/

OMSA detailed documentations:
http://support.dell.com/support/edocs/software/svradmin/

If you have a ProSupport contract with your system, do not hesitate to
contact Dell Support for assistance.

Hope this will help you

Regards,
Ges
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 03-13-2012, 08:08 PM
Grzegorz Witkowski
 
Default Does redhat linux log all hardware events/issues/error in /var/log/mcelog?

On Tue, Mar 13, 2012 at 1:36 PM, unix syzadmin <unixsyzadmin@gmail.com>wrote:

> Thanks.
> I have downloaded and installed the OpenManage from Dell.
> The following commands say if the health of system components is OK.
> omreport chassis - health of all main components of the system chassis
> omreport chassis processors - cpu health
> omreport chassis memory - memory health
> omreport chassis pwrsupplies - power supply health
> omreport storage controller - raid controller health
>
> However this leaves out the integrated NIC ports and the HBA adapters.
> What linux / dell open manage commands can be used to confirm if those are
> healthy as well?
>
> Thanks,
>
>
> On Mon, Mar 12, 2012 at 9:00 PM, Paul Tader <ptader@linuxscope.com> wrote:
>
> > On 3/12/12 5:28 PM, unix syzadmin wrote:
> >
> >> Hi,
> >>
> >> We run redhat linux on intel hardware (mostly Dell, lately dell R710s).
> >> We want to be able to catch any hardware issues when they occur to act
> on
> >> them as quickly as possible.
> >>
> >> My understanding is that all hardware events/issues/errors are logged in
> >> /var/log/mcelog (Machine Check Events log). Is this correct? Can't
> >> stress
> >> this enough; does it log all hardware issues
> >> (cpu,memory,disk,ethernet,**fibre/hba etc) ?
> >>
> >> Thanks,
> >>
> >
> > I've used MCElog to catch some CPU events but I think you might want to
> > check out Dell's OpenManage client. It will report/monitor a lot more
> > information.
> >
> >
> > http://linux.dell.com/wiki/**index.php/Repository/OMSA<
> http://linux.dell.com/wiki/index.php/Repository/OMSA>
> >
> >
> > To install:
> >
> > # wget -q -O -
> http://linux.dell.com/repo/**hardware/latest/bootstrap.cgi<
> http://linux.dell.com/repo/hardware/latest/bootstrap.cgi>| bash
> > # yum install srvadmin-base
> > # yum install srvadmin-storageservices
> >
> > (logout / login for environment variables to take effect)
> >
> > # /opt/dell/srvadmin/sbin/**srvadmin-services.sh start
> > ...
> >
> > # omreport chassis
> > Health
> >
> > Main System Chassis
> >
> > SEVERITY : COMPONENT
> > Ok : Fans
> > Ok : Intrusion
> > Ok : Memory
> > Ok : Power Supplies
> > Ok : Processors
> > Ok : Temperatures
> > Ok : Voltages
> > Ok : Hardware Log
> > Ok : Batteries
> >
> > # omreport chassis temps
> > Temperature Probes Information
> >
> > ------------------------------**------
> > Main System Chassis Temperatures: Ok
> > ------------------------------**------
> >
> > Index : 0
> > Status : Ok
> > Probe Name : System Board Ambient Temp
> > Reading : 20.0 C
> > Minimum Warning Threshold : 8.0 C
> > Maximum Warning Threshold : 42.0 C
> > Minimum Failure Threshold : 3.0 C
> > Maximum Failure Threshold : 47.0 C
> >
> > # omreport storage pdisk controller=0
> >
> > List of Physical Disks on Controller SAS 6/iR Integrated (Embedded)
> >
> > Controller SAS 6/iR Integrated (Embedded)
> > ID : 0:0:0
> > Status : Ok
> > Name : Physical Disk 0:0:0
> > State : Online
> > Failure Predicted : No
> > Certified : Not Applicable
> > Encryption Capable : No
> > Secured : Not Applicable
> > Progress : Not Applicable
> > Bus Protocol : SAS
> > Media : HDD
> > Capacity : 67.75 GB (72746008576 bytes)
> > Used RAID Disk Space : 67.75 GB (72746008576 bytes)
> > Available RAID Disk Space : 0.00 GB (0 bytes)
> > Hot Spare : No
> > Vendor ID : DELL
> > Product ID : ST973402SS
> > Revision : S229
> >
> > <snip>
> >
> > You get the idea.
> >
> > --
> > redhat-list mailing list
> > unsubscribe mailto:redhat-list-request@**redhat.com<
> redhat-list-request@redhat.com>
> > ?subject=unsubscribe
> > https://www.redhat.com/**mailman/listinfo/redhat-list<
> https://www.redhat.com/mailman/listinfo/redhat-list>
> >
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
>


Have you installed all OMSA components?
See the CLI User's Guide
http://support.dell.com/support/edocs/software/svradmin/7.0/en/index.htm
http://support.dell.com/support/edocs/software/svradmin/7.0/en/CLI/PDF/CLIUG.pdf
Have you tried to open 1311 port and connect to the https://<server>:1311
to check if the OMSA is showing everything?

You can try:

omreport system summary
or
omreport servermodule summary
or
omreport chassis nics index=n
or
omreport mainsystem nics index=n
or
omreport chassis nics config=team index=n
or
omreport mainsystem nics config=team index=n

Regards,
Ges
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 03-14-2012, 12:51 AM
ajay raghuraj
 
Default Does redhat linux log all hardware events/issues/error in /var/log/mcelog?

Hi

Try ethtool or peek into /proc to check the network interface stats. You
could use the HBA vendor's software installed on the OS to support the
hardware.

Example: If the HBA were from emulex then you could check the hbaanywhere
cli commands to run a healthcheck

- Ajay
On Mar 13, 2012 9:39 PM, "unix syzadmin" <unixsyzadmin@gmail.com> wrote:
>
> Thanks.
> I have downloaded and installed the OpenManage from Dell.
> The following commands say if the health of system components is OK.
> omreport chassis - health of all main components of the system chassis
> omreport chassis processors - cpu health
> omreport chassis memory - memory health
> omreport chassis pwrsupplies - power supply health
> omreport storage controller - raid controller health
>
> However this leaves out the integrated NIC ports and the HBA adapters.
> What linux / dell open manage commands can be used to confirm if those are
> healthy as well?
>
> Thanks,
>
>
> On Mon, Mar 12, 2012 at 9:00 PM, Paul Tader <ptader@linuxscope.com> wrote:
>
> > On 3/12/12 5:28 PM, unix syzadmin wrote:
> >
> >> Hi,
> >>
> >> We run redhat linux on intel hardware (mostly Dell, lately dell R710s).
> >> We want to be able to catch any hardware issues when they occur to act
on
> >> them as quickly as possible.
> >>
> >> My understanding is that all hardware events/issues/errors are logged
in
> >> /var/log/mcelog (Machine Check Events log). Is this correct? Can't
> >> stress
> >> this enough; does it log all hardware issues
> >> (cpu,memory,disk,ethernet,**fibre/hba etc) ?
> >>
> >> Thanks,
> >>
> >
> > I've used MCElog to catch some CPU events but I think you might want to
> > check out Dell's OpenManage client. It will report/monitor a lot more
> > information.
> >
> >
> > http://linux.dell.com/wiki/**index.php/Repository/OMSA<
http://linux.dell.com/wiki/index.php/Repository/OMSA>
> >
> >
> > To install:
> >
> > # wget -q -O -
http://linux.dell.com/repo/**hardware/latest/bootstrap.cgi<
http://linux.dell.com/repo/hardware/latest/bootstrap.cgi>| bash
> > # yum install srvadmin-base
> > # yum install srvadmin-storageservices
> >
> > (logout / login for environment variables to take effect)
> >
> > # /opt/dell/srvadmin/sbin/**srvadmin-services.sh start
> > ...
> >
> > # omreport chassis
> > Health
> >
> > Main System Chassis
> >
> > SEVERITY : COMPONENT
> > Ok : Fans
> > Ok : Intrusion
> > Ok : Memory
> > Ok : Power Supplies
> > Ok : Processors
> > Ok : Temperatures
> > Ok : Voltages
> > Ok : Hardware Log
> > Ok : Batteries
> >
> > # omreport chassis temps
> > Temperature Probes Information
> >
> > ------------------------------**------
> > Main System Chassis Temperatures: Ok
> > ------------------------------**------
> >
> > Index : 0
> > Status : Ok
> > Probe Name : System Board Ambient Temp
> > Reading : 20.0 C
> > Minimum Warning Threshold : 8.0 C
> > Maximum Warning Threshold : 42.0 C
> > Minimum Failure Threshold : 3.0 C
> > Maximum Failure Threshold : 47.0 C
> >
> > # omreport storage pdisk controller=0
> >
> > List of Physical Disks on Controller SAS 6/iR Integrated (Embedded)
> >
> > Controller SAS 6/iR Integrated (Embedded)
> > ID : 0:0:0
> > Status : Ok
> > Name : Physical Disk 0:0:0
> > State : Online
> > Failure Predicted : No
> > Certified : Not Applicable
> > Encryption Capable : No
> > Secured : Not Applicable
> > Progress : Not Applicable
> > Bus Protocol : SAS
> > Media : HDD
> > Capacity : 67.75 GB (72746008576 bytes)
> > Used RAID Disk Space : 67.75 GB (72746008576 bytes)
> > Available RAID Disk Space : 0.00 GB (0 bytes)
> > Hot Spare : No
> > Vendor ID : DELL
> > Product ID : ST973402SS
> > Revision : S229
> >
> > <snip>
> >
> > You get the idea.
> >
> > --
> > redhat-list mailing list
> > unsubscribe mailto:redhat-list-request@**redhat.com<
redhat-list-request@redhat.com>
> > ?subject=unsubscribe
> > https://www.redhat.com/**mailman/listinfo/redhat-list<
https://www.redhat.com/mailman/listinfo/redhat-list>
> >
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 03-14-2012, 12:55 AM
ajay raghuraj
 
Default Does redhat linux log all hardware events/issues/error in /var/log/mcelog?

Also if you are using RHEL 5 / 6, you might want to check under /sys or use
the systool comnand to extract enough info about the HBAs

- Ajay
On Mar 14, 2012 9:51 AM, "ajay raghuraj" <ajay.raghuraj@gmail.com> wrote:

> Hi
>
> Try ethtool or peek into /proc to check the network interface stats. You
> could use the HBA vendor's software installed on the OS to support the
> hardware.
>
> Example: If the HBA were from emulex then you could check the hbaanywhere
> cli commands to run a healthcheck
>
> - Ajay
> On Mar 13, 2012 9:39 PM, "unix syzadmin" <unixsyzadmin@gmail.com> wrote:
> >
> > Thanks.
> > I have downloaded and installed the OpenManage from Dell.
> > The following commands say if the health of system components is OK.
> > omreport chassis - health of all main components of the system chassis
> > omreport chassis processors - cpu health
> > omreport chassis memory - memory health
> > omreport chassis pwrsupplies - power supply health
> > omreport storage controller - raid controller health
> >
> > However this leaves out the integrated NIC ports and the HBA adapters.
> > What linux / dell open manage commands can be used to confirm if those
> are
> > healthy as well?
> >
> > Thanks,
> >
> >
> > On Mon, Mar 12, 2012 at 9:00 PM, Paul Tader <ptader@linuxscope.com>
> wrote:
> >
> > > On 3/12/12 5:28 PM, unix syzadmin wrote:
> > >
> > >> Hi,
> > >>
> > >> We run redhat linux on intel hardware (mostly Dell, lately dell
> R710s).
> > >> We want to be able to catch any hardware issues when they occur to
> act on
> > >> them as quickly as possible.
> > >>
> > >> My understanding is that all hardware events/issues/errors are logged
> in
> > >> /var/log/mcelog (Machine Check Events log). Is this correct? Can't
> > >> stress
> > >> this enough; does it log all hardware issues
> > >> (cpu,memory,disk,ethernet,**fibre/hba etc) ?
> > >>
> > >> Thanks,
> > >>
> > >
> > > I've used MCElog to catch some CPU events but I think you might want to
> > > check out Dell's OpenManage client. It will report/monitor a lot more
> > > information.
> > >
> > >
> > > http://linux.dell.com/wiki/**index.php/Repository/OMSA<
> http://linux.dell.com/wiki/index.php/Repository/OMSA>
> > >
> > >
> > > To install:
> > >
> > > # wget -q -O -
> http://linux.dell.com/repo/**hardware/latest/bootstrap.cgi<
> http://linux.dell.com/repo/hardware/latest/bootstrap.cgi>| bash
> > > # yum install srvadmin-base
> > > # yum install srvadmin-storageservices
> > >
> > > (logout / login for environment variables to take effect)
> > >
> > > # /opt/dell/srvadmin/sbin/**srvadmin-services.sh start
> > > ...
> > >
> > > # omreport chassis
> > > Health
> > >
> > > Main System Chassis
> > >
> > > SEVERITY : COMPONENT
> > > Ok : Fans
> > > Ok : Intrusion
> > > Ok : Memory
> > > Ok : Power Supplies
> > > Ok : Processors
> > > Ok : Temperatures
> > > Ok : Voltages
> > > Ok : Hardware Log
> > > Ok : Batteries
> > >
> > > # omreport chassis temps
> > > Temperature Probes Information
> > >
> > > ------------------------------**------
> > > Main System Chassis Temperatures: Ok
> > > ------------------------------**------
> > >
> > > Index : 0
> > > Status : Ok
> > > Probe Name : System Board Ambient Temp
> > > Reading : 20.0 C
> > > Minimum Warning Threshold : 8.0 C
> > > Maximum Warning Threshold : 42.0 C
> > > Minimum Failure Threshold : 3.0 C
> > > Maximum Failure Threshold : 47.0 C
> > >
> > > # omreport storage pdisk controller=0
> > >
> > > List of Physical Disks on Controller SAS 6/iR Integrated (Embedded)
> > >
> > > Controller SAS 6/iR Integrated (Embedded)
> > > ID : 0:0:0
> > > Status : Ok
> > > Name : Physical Disk 0:0:0
> > > State : Online
> > > Failure Predicted : No
> > > Certified : Not Applicable
> > > Encryption Capable : No
> > > Secured : Not Applicable
> > > Progress : Not Applicable
> > > Bus Protocol : SAS
> > > Media : HDD
> > > Capacity : 67.75 GB (72746008576 bytes)
> > > Used RAID Disk Space : 67.75 GB (72746008576 bytes)
> > > Available RAID Disk Space : 0.00 GB (0 bytes)
> > > Hot Spare : No
> > > Vendor ID : DELL
> > > Product ID : ST973402SS
> > > Revision : S229
> > >
> > > <snip>
> > >
> > > You get the idea.
> > >
> > > --
> > > redhat-list mailing list
> > > unsubscribe mailto:redhat-list-request@**redhat.com<
> redhat-list-request@redhat.com>
> > > ?subject=unsubscribe
> > > https://www.redhat.com/**mailman/listinfo/redhat-list<
> https://www.redhat.com/mailman/listinfo/redhat-list>
> > >
> > --
> > redhat-list mailing list
> > unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
> > https://www.redhat.com/mailman/listinfo/redhat-list
>
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 

Thread Tools




All times are GMT. The time now is 07:01 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org