AW: AW: [Linux-HA] Apache failover / renaming the binary

Ehlers, Kolja ehlers at clinresearch.com
Thu Jul 3 03:15:27 MDT 2008


actually with your fix applied weird things happen. Now heartbeat/or me =
manually can start apache with the httpd renamed. Heartbeat reports=20

apache_2    (ocf::heartbeat:apache):        Stopped

But its running.=20

-----Urspr=C3=BCngliche Nachricht-----
Von: linux-ha-bounces at lists.linux-ha.org
[mailto:linux-ha-bounces at lists.linux-ha.org]Im Auftrag von Dominik Klein
Gesendet: Donnerstag, 3. Juli 2008 10:23
An: General Linux-HA mailing list
Betreff: Re: AW: [Linux-HA] Apache failover / renaming the binary


Your testcase is not exactly the best, but it should still cause a =
failover.

Please try the attached patch. I don't know why "start" was excluded at=20
that place. Does not make sense to me. Maybe someone can explain on the=20
dev list.

Imho, what you're doing should not produce what you're seeing and this=20
patch should fix it.

Comments please!

Regards
Dominik

Ehlers, Kolja wrote:
> thanks for the reply, still the problem remains. If apache cannot be =
started/restarted it is not failed over to the second node. I have two =
equal servers and I want to run the virtual ip + apache (grouped) on =
either one of the nodes. To test the configuration I have renamed httpd =
on the one node to httpd_ else I am not sure how to simulate a non =
starting apache. But either way when heartbeat is started the apache =
start is failed on www1test and nothing happens then. I have attached my =
CIB and the logs
>=20
> This is what crm_mon gives me:
>=20
> Refresh in 1s...
>=20
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> Last updated: Thu Jul  3 09:53:34 2008
> Current DC: www2test (5e0f97b7-6780-4487-baf9-6c36500b1276)
> 2 Nodes configured.
> 1 Resources configured.
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>=20
> Node: www2test (5e0f97b7-6780-4487-baf9-6c36500b1276): online
> Node: www1test (3a325e23-2184-46ed-9e88-42a11f28c2be): online
>=20
> Resource Group: group_1
>     IPaddr_192_168_11_25        (ocf::heartbeat:IPaddr):        =
Started www1test
>     apache_2    (ocf::heartbeat:apache):        Stopped
>=20
> Failed actions:
>     apache_2_start_0 (node=3Dwww1test, call=3D6, rc=3D6): complete
>=20
>=20
>=20
> www1test:~ # crm_verify -VVVVL
> crm_verify[8124]: 2008/07/03_09:54:55 info: main: =3D#=3D#=3D#=3D#=3D =
Getting XML =3D#=3D#=3D#=3D#=3D
> crm_verify[8124]: 2008/07/03_09:54:55 info: main: Reading XML from: =
live cluster
> crm_verify[8124]: 2008/07/03_09:54:55 notice: main: Required feature =
set: 2.0
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using =
default value 'false' for cluster option 'stonith-enabled'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using =
default value 'reboot' for cluster option 'stonith-action'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using =
default value '0' for cluster option =
'default-resource-failure-stickiness'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using =
default value '60s' for cluster option 'cluster-delay'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using =
default value '30' for cluster option 'batch-limit'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using =
default value '20s' for cluster option 'default-action-timeout'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using =
default value 'true' for cluster option 'stop-orphan-resources'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using =
default value 'true' for cluster option 'stop-orphan-actions'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using =
default value 'false' for cluster option 'remove-after-stop'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using =
default value '-1' for cluster option 'pe-error-series-max'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using =
default value '-1' for cluster option 'pe-warn-series-max'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using =
default value '-1' for cluster option 'pe-input-series-max'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using =
default value 'true' for cluster option 'startup-fencing'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using =
default value 'true' for cluster option 'start-failure-is-fatal'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default =
action timeout: 20s
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default =
stickiness: 1000000
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default =
failure stickiness: 0
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: STONITH of =
failed nodes is disabled
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Cluster is =
symmetric - resources can run anywhere by default
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: On loss of =
CCM Quorum: Stop ALL resources
> crm_verify[8124]: 2008/07/03_09:54:55 info: determine_online_status: =
Node www2test is online
> crm_verify[8124]: 2008/07/03_09:54:55 info: determine_online_status: =
Node www1test is online
> crm_verify[8124]: 2008/07/03_09:54:55 debug: common_apply_stickiness: =
fail-count-apache_2: INFINITY
> crm_verify[8124]: 2008/07/03_09:54:55 ERROR: unpack_rsc_op: Hard =
error: apache_2_start_0 failed with rc=3D6.
> crm_verify[8124]: 2008/07/03_09:54:55 ERROR: unpack_rsc_op:   =
Preventing apache_2 from re-starting anywhere in the cluster
> crm_verify[8124]: 2008/07/03_09:54:55 WARN: unpack_rsc_op: Processing =
failed op apache_2_start_0 on www1test: Error
> crm_verify[8124]: 2008/07/03_09:54:55 WARN: unpack_rsc_op: =
Compatability handling for failed op apache_2_start_0 on www1test
> crm_verify[8124]: 2008/07/03_09:54:55 notice: group_print: Resource =
Group: group_1
> crm_verify[8124]: 2008/07/03_09:54:55 notice: native_print:     =
IPaddr_192_168_11_25    (ocf::heartbeat:IPaddr):        Started www1test
> crm_verify[8124]: 2008/07/03_09:54:55 notice: native_print:     =
apache_2        (ocf::heartbeat:apache):        Stopped
> crm_verify[8124]: 2008/07/03_09:54:55 debug: group_rsc_location: =
Processing rsc_location pref_run_apache_group for group_1
> crm_verify[8124]: 2008/07/03_09:54:55 debug: native_merge_weights: =
IPaddr_192_168_11_25: Rolling back scores from apache_2
> crm_verify[8124]: 2008/07/03_09:54:55 debug: native_assign_node: =
Assigning www1test to IPaddr_192_168_11_25
> crm_verify[8124]: 2008/07/03_09:54:55 debug: native_assign_node: All =
nodes for resource apache_2 are unavailable, unclean or shutting down
> crm_verify[8124]: 2008/07/03_09:54:55 WARN: native_color: Resource =
apache_2 cannot run anywhere
> crm_verify[8124]: 2008/07/03_09:54:55 notice: NoRoleChange: Leave =
resource IPaddr_192_168_11_25 (www1test)
> Warnings found during check: config may not be valid
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cib_native_signoff: =
Signing out of the CIB Service
>=20
>=20
>=20
> -----Urspr=C3=BCngliche Nachricht-----
> Von: linux-ha-bounces at lists.linux-ha.org
> [mailto:linux-ha-bounces at lists.linux-ha.org]Im Auftrag von Dominik =
Klein
> Gesendet: Donnerstag, 3. Juli 2008 08:27
> An: General Linux-HA mailing list
> Betreff: Re: [Linux-HA] Apache failover / renaming the binary
>=20
>=20
> http://hg.linux-ha.org/dev/file/5072025b79b8/resources/OCF/apache
>=20
> lines 516-518
>=20
> another example of how to use exits codes incorrectly.
>=20
> I'll commit a patch soon.
>=20
> In your script: Make line 518 look like this (on all nodes!):
> exit $OCF_ERR_INSTALLED
>=20
> Then cleanup the resource or start the cluster from scratch and try=20
> again. Should fix it.
>=20
> Regards
> Dominik
>=20
>=20
> Ehlers, Kolja wrote:
>> Hello,
>>
>> my simple active/passive cluster seems to work but when running and I =
do:
>>
>> /opt/apache2/bin/apachectl stop && mv /opt/apache2/bin/httpd =
/opt/apache2/bin/httpd_
>>
>> Heartbeat is not failing over apache to node2 (Hard error: =
apache_2_start_0 failed with rc=3D6.) This is really odd because the log =
states "All 2 cluster nodes are eligible to run resources." but then 4 =
lines further it says "ERROR: unpack_rsc_op:   Preventing apache_2 from =
re-starting anywhere in the cluster". I am using a very simple CIB with =
one virtual ip and apache grouped. If i stop apache manually heartbeat =
does restart apache fine. By the way can I configure it so that it does =
failover right to the other node if apache is stopped or fails? When =
manually stopping heartbeat the failover does work.=20
>>
>> So I am not sure which part of my configuration or logs you need to =
see. I guess im missing something important here.=20
>>
>> This is my cib
>>
>>  <cib admin_epoch=3D"0" generated=3D"true" have_quorum=3D"true" =
ignore_dtd=3D"false" num_peers=3D"2" cib_feature_revision=3D"2.0" =
crm_feature_set=3D"2.0" epoch=3D"38" num_updates=3D"3" =
cib-last-written=3D"Wed Jul  2 16:16:51 2008" ccm_transition=3D"2" =
dc_uuid=3D"5e0f97b7-6780-4487-baf9-6c36500b1276">
>>    <configuration>
>>      <crm_config>
>>        <cluster_property_set id=3D"cib-bootstrap-options">
>>          <attributes>
>>            <nvpair id=3D"cib-bootstrap-options-symmetric-cluster" =
name=3D"symmetric-cluster" value=3D"true"/>
>>            <nvpair =
id=3D"cib-bootstrap-options-default-resource-stickiness" =
name=3D"default-resource-stickiness" value=3D"INFINITY"/>
>>            <nvpair id=3D"cib-bootstrap-options-is-managed-default" =
name=3D"is-managed-default" value=3D"true"/>
>>            <nvpair id=3D"cib-bootstrap-options-no-quorum-policy" =
name=3D"no-quorum-policy" value=3D"stop"/>
>>            <nvpair id=3D"cib-bootstrap-options-dc-version" =
name=3D"dc-version" value=3D"2.1.3-node: =
a3184d5240c6e7032aef9cce6e5b7752ded544b3"/>
>>          </attributes>
>>        </cluster_property_set>
>>      </crm_config>
>>      <nodes>
>>        <node id=3D"5e0f97b7-6780-4487-baf9-6c36500b1276" =
uname=3D"www2test" type=3D"normal"/>
>>        <node id=3D"3a325e23-2184-46ed-9e88-42a11f28c2be" =
uname=3D"www1test" type=3D"normal"/>
>>      </nodes>
>>      <resources>
>>        <group id=3D"group_1">
>>          <primitive class=3D"ocf" id=3D"IPaddr_192_168_11_25" =
provider=3D"heartbeat" type=3D"IPaddr">
>>            <operations>
>>              <op id=3D"IPaddr_192_168_11_25_mon" interval=3D"5s" =
name=3D"monitor" timeout=3D"5s"/>
>>            </operations>
>>            <instance_attributes =
id=3D"IPaddr_192_168_11_25_inst_attr">
>>              <attributes>
>>                <nvpair id=3D"IPaddr_192_168_11_25_attr_0" name=3D"ip" =
value=3D"192.168.11.25"/>
>>              </attributes>
>>            </instance_attributes>
>>          </primitive>
>>          <primitive class=3D"ocf" id=3D"apache_2" =
provider=3D"heartbeat" type=3D"apache">
>>            <operations>
>>              <op id=3D"apache_2_mon" interval=3D"5s" name=3D"monitor" =
timeout=3D"10s"/>
>>            </operations>
>>            <instance_attributes id=3D"apache_2_inst_attr">
>>              <attributes>
>>                <nvpair id=3D"apache_2_attr_0" name=3D"configfile" =
value=3D"/opt/apache2/conf/httpd.conf"/>
>>              </attributes>
>>            </instance_attributes>
>>            <instance_attributes id=3D"apache_2">
>>              <attributes>
>>                <nvpair id=3D"apache_2-httpd" name=3D"httpd" =
value=3D"/opt/apache2/bin/httpd"/>
>>              </attributes>
>>            </instance_attributes>
>>          </primitive>
>>        </group>
>>      </resources>
>>      <constraints>
>>        <rsc_location id=3D"run_group1" rsc=3D"group_1">
>>          <rule id=3D"pref_run_apache_group" score=3D"0">
>>            <expression attribute=3D"#uname" operation=3D"eq" =
value=3D"www1test" id=3D"7667baf9-522d-40ac-a901-195bfe84a3df"/>
>>          </rule>
>>        </rsc_location>
>>      </constraints>
>>    </configuration>
>>  </cib>


Geschäftsführung: Dr. Michael Fischer, Reinhard Eisebitt
Amtsgericht Köln HRB 32356
Steuer-Nr.: 217/5717/0536
Ust.Id.-Nr.: DE 204051920
--
This email transmission and any documents, files or previous email
messages attached to it may contain information that is confidential or
legally privileged. If you are not the intended recipient or a person
responsible for delivering this transmission to the intended recipient,
you are hereby notified that any disclosure, copying, printing,
distribution or use of this transmission is strictly prohibited. If you
have received this transmission in error, please immediately notify the
sender by telephone or return email and delete the original transmission
and its attachments without reading or saving in any manner.



More information about the Linux-HA mailing list