FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Cluster Development

 
 
LinkBack Thread Tools
 
Old 03-19-2011, 05:34 AM
"Fabio M. Di Nitto"
 
Default RFC: generic improvement to fence agents api

Hi all,

while discussing on linux-cluster the support of the Tripp Lite switched
PDU, it occurred to me that we can effectively improve (almost half) the
time it takes to perform power fencing of certain devices, when for
example, more than one PSU needs to be powered off to complete the action.

Node X has 2 PSU.

In our current state, the config would look like:

<clusternode .....>
<fence>
<method...>
<device name="..." port="1"/>
<device name="..." port="2"/>
.....

it means effectively spawning, most likely the same agent, twice.
Increasing the time it takes to fence and maybe increasing the
possibility to fail to fence if the second connection fails.

My suggestion would be to allow to specify a list of ports instead.

<clusternode .....>
<fence>
<method...>
<device name="..." ports="1 2"/>
....

Either by using a new keyword "ports" or re-using "port" itself. If
using "port", current configuration will continue to work as-is and the
change effectively would not introduce any backward compatibility issue.

This way the agent can:

1) connect once (reducing in most cases the ssh/telnet/whatever time)
2) issue the OFF command as fast as possible (almost in parallel)
3) then wait for the results.

By adopting a list, the configuration would look cleaner too IMHO.

A quick glance, the change should not affect fenced (David can you
confirm please?), and most agents could handle it via the fencing python
lib (Marek?).

Does it sound reasonable?

Cheers
Fabio
 
Old 03-19-2011, 04:14 PM
Digimer
 
Default RFC: generic improvement to fence agents api

On 03/19/2011 02:34 AM, Fabio M. Di Nitto wrote:
> Hi all,
>
> while discussing on linux-cluster the support of the Tripp Lite switched
> PDU, it occurred to me that we can effectively improve (almost half) the
> time it takes to perform power fencing of certain devices, when for
> example, more than one PSU needs to be powered off to complete the action.
>
> Node X has 2 PSU.
>
> In our current state, the config would look like:
>
> <clusternode .....>
> <fence>
> <method...>
> <device name="..." port="1"/>
> <device name="..." port="2"/>
> .....
>
> it means effectively spawning, most likely the same agent, twice.
> Increasing the time it takes to fence and maybe increasing the
> possibility to fail to fence if the second connection fails.
>
> My suggestion would be to allow to specify a list of ports instead.
>
> <clusternode .....>
> <fence>
> <method...>
> <device name="..." ports="1 2"/>
> ....
>
> Either by using a new keyword "ports" or re-using "port" itself. If
> using "port", current configuration will continue to work as-is and the
> change effectively would not introduce any backward compatibility issue.
>
> This way the agent can:
>
> 1) connect once (reducing in most cases the ssh/telnet/whatever time)
> 2) issue the OFF command as fast as possible (almost in parallel)
> 3) then wait for the results.
>
> By adopting a list, the configuration would look cleaner too IMHO.
>
> A quick glance, the change should not affect fenced (David can you
> confirm please?), and most agents could handle it via the fencing python
> lib (Marek?).
>
> Does it sound reasonable?
>
> Cheers
> Fabio

I like this idea, but would like to suggest:

* Keep 'port' for a single port, as it is, and add 'ports' for multiple
port definitions.
* When using ports, I'd recommend comma-separated values and
dash-separated ranges (ie: ports="1,2", ports="1-4", ports="1,3-5") and
combinations there-of. This strikes me as more "standard" and possibly
less prone to typos.

--
Digimer
E-Mail: digimer@alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin: http://nodeassassin.org
 
Old 03-19-2011, 04:32 PM
"Fabio M. Di Nitto"
 
Default RFC: generic improvement to fence agents api

On 3/19/2011 6:14 PM, Digimer wrote:
> On 03/19/2011 02:34 AM, Fabio M. Di Nitto wrote:
>> Hi all,
>>
>> while discussing on linux-cluster the support of the Tripp Lite switched
>> PDU, it occurred to me that we can effectively improve (almost half) the
>> time it takes to perform power fencing of certain devices, when for
>> example, more than one PSU needs to be powered off to complete the action.
>>
>> Node X has 2 PSU.
>>
>> In our current state, the config would look like:
>>
>> <clusternode .....>
>> <fence>
>> <method...>
>> <device name="..." port="1"/>
>> <device name="..." port="2"/>
>> .....
>>
>> it means effectively spawning, most likely the same agent, twice.
>> Increasing the time it takes to fence and maybe increasing the
>> possibility to fail to fence if the second connection fails.
>>
>> My suggestion would be to allow to specify a list of ports instead.
>>
>> <clusternode .....>
>> <fence>
>> <method...>
>> <device name="..." ports="1 2"/>
>> ....
>>
>> Either by using a new keyword "ports" or re-using "port" itself. If
>> using "port", current configuration will continue to work as-is and the
>> change effectively would not introduce any backward compatibility issue.
>>
>> This way the agent can:
>>
>> 1) connect once (reducing in most cases the ssh/telnet/whatever time)
>> 2) issue the OFF command as fast as possible (almost in parallel)
>> 3) then wait for the results.
>>
>> By adopting a list, the configuration would look cleaner too IMHO.
>>
>> A quick glance, the change should not affect fenced (David can you
>> confirm please?), and most agents could handle it via the fencing python
>> lib (Marek?).
>>
>> Does it sound reasonable?
>>
>> Cheers
>> Fabio
>
> I like this idea, but would like to suggest:
>
> * Keep 'port' for a single port, as it is, and add 'ports' for multiple
> port definitions.
> * When using ports, I'd recommend comma-separated values and
> dash-separated ranges (ie: ports="1,2", ports="1-4", ports="1,3-5") and
> combinations there-of. This strikes me as more "standard" and possibly
> less prone to typos.
>

The only thing I have against "," or "-" is that they might be easily
part of a port name already.

Range doesnīt make sense to me and itīs complex to interpret/implement.
How many machines have you seen around with so many PSUīs anyway that
need a range to avoid headache? (leaving aside E10K or s390 ).

Fabio
 
Old 03-19-2011, 05:44 PM
Digimer
 
Default RFC: generic improvement to fence agents api

On 03/19/2011 01:32 PM, Fabio M. Di Nitto wrote:
> On 3/19/2011 6:14 PM, Digimer wrote:
>> On 03/19/2011 02:34 AM, Fabio M. Di Nitto wrote:
>>> Hi all,
>>>
>>> while discussing on linux-cluster the support of the Tripp Lite switched
>>> PDU, it occurred to me that we can effectively improve (almost half) the
>>> time it takes to perform power fencing of certain devices, when for
>>> example, more than one PSU needs to be powered off to complete the action.
>>>
>>> Node X has 2 PSU.
>>>
>>> In our current state, the config would look like:
>>>
>>> <clusternode .....>
>>> <fence>
>>> <method...>
>>> <device name="..." port="1"/>
>>> <device name="..." port="2"/>
>>> .....
>>>
>>> it means effectively spawning, most likely the same agent, twice.
>>> Increasing the time it takes to fence and maybe increasing the
>>> possibility to fail to fence if the second connection fails.
>>>
>>> My suggestion would be to allow to specify a list of ports instead.
>>>
>>> <clusternode .....>
>>> <fence>
>>> <method...>
>>> <device name="..." ports="1 2"/>
>>> ....
>>>
>>> Either by using a new keyword "ports" or re-using "port" itself. If
>>> using "port", current configuration will continue to work as-is and the
>>> change effectively would not introduce any backward compatibility issue.
>>>
>>> This way the agent can:
>>>
>>> 1) connect once (reducing in most cases the ssh/telnet/whatever time)
>>> 2) issue the OFF command as fast as possible (almost in parallel)
>>> 3) then wait for the results.
>>>
>>> By adopting a list, the configuration would look cleaner too IMHO.
>>>
>>> A quick glance, the change should not affect fenced (David can you
>>> confirm please?), and most agents could handle it via the fencing python
>>> lib (Marek?).
>>>
>>> Does it sound reasonable?
>>>
>>> Cheers
>>> Fabio
>>
>> I like this idea, but would like to suggest:
>>
>> * Keep 'port' for a single port, as it is, and add 'ports' for multiple
>> port definitions.
>> * When using ports, I'd recommend comma-separated values and
>> dash-separated ranges (ie: ports="1,2", ports="1-4", ports="1,3-5") and
>> combinations there-of. This strikes me as more "standard" and possibly
>> less prone to typos.
>>
>
> The only thing I have against "," or "-" is that they might be easily
> part of a port name already.
>
> Range doesnīt make sense to me and itīs complex to interpret/implement.
> How many machines have you seen around with so many PSUīs anyway that
> need a range to avoid headache? (leaving aside E10K or s390 ).
>
> Fabio

Lol, I've seen up to four in n-1 setups, but you are right, it's not
common enough to justify increasing complexity, so simple
space-separated numbers is fine. I still argue for the "port" vs.
"ports" though.

--
Digimer
E-Mail: digimer@alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin: http://nodeassassin.org
 
Old 03-21-2011, 07:40 AM
Marek Grac
 
Default RFC: generic improvement to fence agents api

Hi,

On 03/19/2011 07:34 AM, Fabio M. Di Nitto wrote:

<device name="..." ports="1 2"/>
....

Either by using a new keyword "ports" or re-using "port" itself. If
using "port", current configuration will continue to work as-is and the
change effectively would not introduce any backward compatibility issue.

This way the agent can:

1) connect once (reducing in most cases the ssh/telnet/whatever time)
2) issue the OFF command as fast as possible (almost in parallel)
3) then wait for the results.

By adopting a list, the configuration would look cleaner too IMHO.

A quick glance, the change should not affect fenced (David can you
confirm please?), and most agents could handle it via the fencing python
lib (Marek?).


1) connect once will work only for connection-based fence agents. It
won't help with SNMP + HTTP REST and there won't be any benefits for
drac/ilo/ipmi that can turn off only one machine. Rough estimate is that
it can help us to improve time in 1/3 to 1/2 fence agents.


2) parallelism is possible only on those fence devices that works in
async mode. Issuing more than one command will also increase a need for
QE. Some of those devices are not able even to handle 'get status'
immediately after 'power off' (reason for --power-wait). Serialization
within same connection is definitely possible and for fencing python lib
we can implement that directly in library.


-) "ports" is better than "port" because such change will have impact
also on UI and we have to distinguish if fence agent accept more than
one port or not.


-) There is no character that can't be used for name of virtual machine.

m,
 
Old 03-21-2011, 09:44 AM
"Fabio M. Di Nitto"
 
Default RFC: generic improvement to fence agents api

On 3/21/2011 9:40 AM, Marek Grac wrote:
> Hi,
>
> On 03/19/2011 07:34 AM, Fabio M. Di Nitto wrote:
>> <device name="..." ports="1 2"/>
>> ....
>>
>> Either by using a new keyword "ports" or re-using "port" itself. If
>> using "port", current configuration will continue to work as-is and the
>> change effectively would not introduce any backward compatibility issue.
>>
>> This way the agent can:
>>
>> 1) connect once (reducing in most cases the ssh/telnet/whatever time)
>> 2) issue the OFF command as fast as possible (almost in parallel)
>> 3) then wait for the results.
>>
>> By adopting a list, the configuration would look cleaner too IMHO.
>>
>> A quick glance, the change should not affect fenced (David can you
>> confirm please?), and most agents could handle it via the fencing python
>> lib (Marek?).
>
> 1) connect once will work only for connection-based fence agents. It
> won't help with SNMP + HTTP REST and there won't be any benefits for
> drac/ilo/ipmi that can turn off only one machine. Rough estimate is that
> it can help us to improve time in 1/3 to 1/2 fence agents.

Of course, itīs a benefit for a subset of the agents.

>
> 2) parallelism is possible only on those fence devices that works in
> async mode. Issuing more than one command will also increase a need for
> QE. Some of those devices are not able even to handle 'get status'
> immediately after 'power off' (reason for --power-wait). Serialization
> within same connection is definitely possible and for fencing python lib
> we can implement that directly in library.

Assuming we agree to do it, letīs get it upstream first, then we will
worry about QE at a later stage.

I think starting from serialization within the same connection is
already a good start. The parallelism is not real anyway. I donīt expect
forking of commands as that would lead to other issues, as you already
described.

>
> -) "ports" is better than "port" because such change will have impact
> also on UI and we have to distinguish if fence agent accept more than
> one port or not.

ACK.

>
> -) There is no character that can't be used for name of virtual machine.

I donīt think vms are a problem here, since each vm has only one port?

Fabio
 
Old 03-21-2011, 04:07 PM
David Teigland
 
Default RFC: generic improvement to fence agents api

On Sat, Mar 19, 2011 at 07:34:55AM +0100, Fabio M. Di Nitto wrote:
> My suggestion would be to allow to specify a list of ports instead.

This comes up now and then. The current rule of one action per agent
execution is a tried and true, fundamental property of the agent api.
It should not be changed IMO. I'll need some time to come up with the
various specific reasons against it, but at least one of them (a big
one) is partial failure/success.

Dave
 
Old 03-21-2011, 04:09 PM
Digimer
 
Default RFC: generic improvement to fence agents api

On 03/21/2011 01:07 PM, David Teigland wrote:
> On Sat, Mar 19, 2011 at 07:34:55AM +0100, Fabio M. Di Nitto wrote:
>> My suggestion would be to allow to specify a list of ports instead.
>
> This comes up now and then. The current rule of one action per agent
> execution is a tried and true, fundamental property of the agent api.
> It should not be changed IMO. I'll need some time to come up with the
> various specific reasons against it, but at least one of them (a big
> one) is partial failure/success.
>
> Dave

Could it not be set so that anything shy of a complete success is
treated as a failure?

--
Digimer
E-Mail: digimer@alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin: http://nodeassassin.org
 
Old 03-21-2011, 04:16 PM
"Fabio M. Di Nitto"
 
Default RFC: generic improvement to fence agents api

On 3/21/2011 6:07 PM, David Teigland wrote:
> On Sat, Mar 19, 2011 at 07:34:55AM +0100, Fabio M. Di Nitto wrote:
>> My suggestion would be to allow to specify a list of ports instead.
>
> This comes up now and then. The current rule of one action per agent
> execution is a tried and true, fundamental property of the agent api.
> It should not be changed IMO. I'll need some time to come up with the
> various specific reasons against it, but at least one of them (a big
> one) is partial failure/success.

No donīt waste your time on it. This is big enough to nullify the benefit.

Indeed it is something that I didnīt consider and would make the
recovery matrix from a failure scenario too complex to handle.

Thanks
Fabio
 
Old 03-21-2011, 04:37 PM
Lon Hohberger
 
Default RFC: generic improvement to fence agents api

On Mon, Mar 21, 2011 at 01:07:02PM -0400, David Teigland wrote:
> On Sat, Mar 19, 2011 at 07:34:55AM +0100, Fabio M. Di Nitto wrote:
> > My suggestion would be to allow to specify a list of ports instead.
>
> This comes up now and then. The current rule of one action per agent
> execution is a tried and true, fundamental property of the agent api.
> It should not be changed IMO. I'll need some time to come up with the
> various specific reasons against it, but at least one of them (a big
> one) is partial failure/success.

All or nothing --

Some devices actually support port grouping; turn off port "Foo" and it
operates on plugs 1,2,3 at the same time.

--
Lon Hohberger - Red Hat, Inc.
 

Thread Tools




All times are GMT. The time now is 01:55 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright Đ2007 - 2008, www.linux-archive.org