[Linux-HA] Apache failover / renaming the binary
Andrew Beekhof
beekhof at gmail.com
Thu Jul 3 03:07:07 MDT 2008
On Thu, Jul 3, 2008 at 09:59, Ehlers, Kolja <ehlers at clinresearch.com> wrote:
> thanks for the reply, still the problem remains.
Because you didn't follow his advice.
> Failed actions:
> apache_2_start_0 (node=www1test, call=6, rc=6): complete
Your RA is still returning 6 (OCF_ERR_CONFIGURED) instead of 5
(OCF_ERR_INSTALLED) when the binary is missing.
> If apache cannot be started/restarted it is not failed over to the second node. I have two equal servers and I want to run the virtual ip + apache (grouped) on either one of the nodes. To test the configuration I have renamed httpd on the one node to httpd_ else I am not sure how to simulate a non starting apache. But either way when heartbeat is started the apache start is failed on www1test and nothing happens then. I have attached my CIB and the logs
>
> This is what crm_mon gives me:
>
> Refresh in 1s...
>
> ============
> Last updated: Thu Jul 3 09:53:34 2008
> Current DC: www2test (5e0f97b7-6780-4487-baf9-6c36500b1276)
> 2 Nodes configured.
> 1 Resources configured.
> ============
>
> Node: www2test (5e0f97b7-6780-4487-baf9-6c36500b1276): online
> Node: www1test (3a325e23-2184-46ed-9e88-42a11f28c2be): online
>
> Resource Group: group_1
> IPaddr_192_168_11_25 (ocf::heartbeat:IPaddr): Started www1test
> apache_2 (ocf::heartbeat:apache): Stopped
>
> Failed actions:
> apache_2_start_0 (node=www1test, call=6, rc=6): complete
>
>
>
> www1test:~ # crm_verify -VVVVL
> crm_verify[8124]: 2008/07/03_09:54:55 info: main: =#=#=#=#= Getting XML =#=#=#=#=
> crm_verify[8124]: 2008/07/03_09:54:55 info: main: Reading XML from: live cluster
> crm_verify[8124]: 2008/07/03_09:54:55 notice: main: Required feature set: 2.0
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'false' for cluster option 'stonith-enabled'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'reboot' for cluster option 'stonith-action'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '0' for cluster option 'default-resource-failure-stickiness'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '60s' for cluster option 'cluster-delay'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '30' for cluster option 'batch-limit'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '20s' for cluster option 'default-action-timeout'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'stop-orphan-resources'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'stop-orphan-actions'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'false' for cluster option 'remove-after-stop'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '-1' for cluster option 'pe-error-series-max'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '-1' for cluster option 'pe-warn-series-max'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '-1' for cluster option 'pe-input-series-max'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'startup-fencing'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'start-failure-is-fatal'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default action timeout: 20s
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default stickiness: 1000000
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default failure stickiness: 0
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: STONITH of failed nodes is disabled
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Cluster is symmetric - resources can run anywhere by default
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: On loss of CCM Quorum: Stop ALL resources
> crm_verify[8124]: 2008/07/03_09:54:55 info: determine_online_status: Node www2test is online
> crm_verify[8124]: 2008/07/03_09:54:55 info: determine_online_status: Node www1test is online
> crm_verify[8124]: 2008/07/03_09:54:55 debug: common_apply_stickiness: fail-count-apache_2: INFINITY
> crm_verify[8124]: 2008/07/03_09:54:55 ERROR: unpack_rsc_op: Hard error: apache_2_start_0 failed with rc=6.
> crm_verify[8124]: 2008/07/03_09:54:55 ERROR: unpack_rsc_op: Preventing apache_2 from re-starting anywhere in the cluster
> crm_verify[8124]: 2008/07/03_09:54:55 WARN: unpack_rsc_op: Processing failed op apache_2_start_0 on www1test: Error
> crm_verify[8124]: 2008/07/03_09:54:55 WARN: unpack_rsc_op: Compatability handling for failed op apache_2_start_0 on www1test
> crm_verify[8124]: 2008/07/03_09:54:55 notice: group_print: Resource Group: group_1
> crm_verify[8124]: 2008/07/03_09:54:55 notice: native_print: IPaddr_192_168_11_25 (ocf::heartbeat:IPaddr): Started www1test
> crm_verify[8124]: 2008/07/03_09:54:55 notice: native_print: apache_2 (ocf::heartbeat:apache): Stopped
> crm_verify[8124]: 2008/07/03_09:54:55 debug: group_rsc_location: Processing rsc_location pref_run_apache_group for group_1
> crm_verify[8124]: 2008/07/03_09:54:55 debug: native_merge_weights: IPaddr_192_168_11_25: Rolling back scores from apache_2
> crm_verify[8124]: 2008/07/03_09:54:55 debug: native_assign_node: Assigning www1test to IPaddr_192_168_11_25
> crm_verify[8124]: 2008/07/03_09:54:55 debug: native_assign_node: All nodes for resource apache_2 are unavailable, unclean or shutting down
> crm_verify[8124]: 2008/07/03_09:54:55 WARN: native_color: Resource apache_2 cannot run anywhere
> crm_verify[8124]: 2008/07/03_09:54:55 notice: NoRoleChange: Leave resource IPaddr_192_168_11_25 (www1test)
> Warnings found during check: config may not be valid
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cib_native_signoff: Signing out of the CIB Service
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: linux-ha-bounces at lists.linux-ha.org
> [mailto:linux-ha-bounces at lists.linux-ha.org]Im Auftrag von Dominik Klein
> Gesendet: Donnerstag, 3. Juli 2008 08:27
> An: General Linux-HA mailing list
> Betreff: Re: [Linux-HA] Apache failover / renaming the binary
>
>
> http://hg.linux-ha.org/dev/file/5072025b79b8/resources/OCF/apache
>
> lines 516-518
>
> another example of how to use exits codes incorrectly.
>
> I'll commit a patch soon.
>
> In your script: Make line 518 look like this (on all nodes!):
> exit $OCF_ERR_INSTALLED
>
> Then cleanup the resource or start the cluster from scratch and try
> again. Should fix it.
>
> Regards
> Dominik
>
>
> Ehlers, Kolja wrote:
>> Hello,
>>
>> my simple active/passive cluster seems to work but when running and I do:
>>
>> /opt/apache2/bin/apachectl stop && mv /opt/apache2/bin/httpd /opt/apache2/bin/httpd_
>>
>> Heartbeat is not failing over apache to node2 (Hard error: apache_2_start_0 failed with rc=6.) This is really odd because the log states "All 2 cluster nodes are eligible to run resources." but then 4 lines further it says "ERROR: unpack_rsc_op: Preventing apache_2 from re-starting anywhere in the cluster". I am using a very simple CIB with one virtual ip and apache grouped. If i stop apache manually heartbeat does restart apache fine. By the way can I configure it so that it does failover right to the other node if apache is stopped or fails? When manually stopping heartbeat the failover does work.
>>
>> So I am not sure which part of my configuration or logs you need to see. I guess im missing something important here.
>>
>> This is my cib
>>
>> <cib admin_epoch="0" generated="true" have_quorum="true" ignore_dtd="false" num_peers="2" cib_feature_revision="2.0" crm_feature_set="2.0" epoch="38" num_updates="3" cib-last-written="Wed Jul 2 16:16:51 2008" ccm_transition="2" dc_uuid="5e0f97b7-6780-4487-baf9-6c36500b1276">
>> <configuration>
>> <crm_config>
>> <cluster_property_set id="cib-bootstrap-options">
>> <attributes>
>> <nvpair id="cib-bootstrap-options-symmetric-cluster" name="symmetric-cluster" value="true"/>
>> <nvpair id="cib-bootstrap-options-default-resource-stickiness" name="default-resource-stickiness" value="INFINITY"/>
>> <nvpair id="cib-bootstrap-options-is-managed-default" name="is-managed-default" value="true"/>
>> <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="stop"/>
>> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="2.1.3-node: a3184d5240c6e7032aef9cce6e5b7752ded544b3"/>
>> </attributes>
>> </cluster_property_set>
>> </crm_config>
>> <nodes>
>> <node id="5e0f97b7-6780-4487-baf9-6c36500b1276" uname="www2test" type="normal"/>
>> <node id="3a325e23-2184-46ed-9e88-42a11f28c2be" uname="www1test" type="normal"/>
>> </nodes>
>> <resources>
>> <group id="group_1">
>> <primitive class="ocf" id="IPaddr_192_168_11_25" provider="heartbeat" type="IPaddr">
>> <operations>
>> <op id="IPaddr_192_168_11_25_mon" interval="5s" name="monitor" timeout="5s"/>
>> </operations>
>> <instance_attributes id="IPaddr_192_168_11_25_inst_attr">
>> <attributes>
>> <nvpair id="IPaddr_192_168_11_25_attr_0" name="ip" value="192.168.11.25"/>
>> </attributes>
>> </instance_attributes>
>> </primitive>
>> <primitive class="ocf" id="apache_2" provider="heartbeat" type="apache">
>> <operations>
>> <op id="apache_2_mon" interval="5s" name="monitor" timeout="10s"/>
>> </operations>
>> <instance_attributes id="apache_2_inst_attr">
>> <attributes>
>> <nvpair id="apache_2_attr_0" name="configfile" value="/opt/apache2/conf/httpd.conf"/>
>> </attributes>
>> </instance_attributes>
>> <instance_attributes id="apache_2">
>> <attributes>
>> <nvpair id="apache_2-httpd" name="httpd" value="/opt/apache2/bin/httpd"/>
>> </attributes>
>> </instance_attributes>
>> </primitive>
>> </group>
>> </resources>
>> <constraints>
>> <rsc_location id="run_group1" rsc="group_1">
>> <rule id="pref_run_apache_group" score="0">
>> <expression attribute="#uname" operation="eq" value="www1test" id="7667baf9-522d-40ac-a901-195bfe84a3df"/>
>> </rule>
>> </rsc_location>
>> </constraints>
>> </configuration>
>> </cib>
>>
>> Gesch�ftsf�hrung: Dr. Michael Fischer, Reinhard Eisebitt
>> Amtsgericht K�ln HRB 32356
>> Steuer-Nr.: 217/5717/0536
>> Ust.Id.-Nr.: DE 204051920
>> --
>> This email transmission and any documents, files or previous email
>> messages attached to it may contain information that is confidential or
>> legally privileged. If you are not the intended recipient or a person
>> responsible for delivering this transmission to the intended recipient,
>> you are hereby notified that any disclosure, copying, printing,
>> distribution or use of this transmission is strictly prohibited. If you
>> have received this transmission in error, please immediately notify the
>> sender by telephone or return email and delete the original transmission
>> and its attachments without reading or saving in any manner.
>>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA at lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>
>
> --
>
> IN-telegence GmbH & Co. KG
> Oskar-Jäger-Str. 125
> 50825 Köln
>
> Registergericht Köln - HRA 14064, USt-ID Nr. DE 194 156 373
> ph Gesellschafter: komware Unternehmensverwaltungsgesellschaft mbH,
> Registergericht Köln - HRB 38396
> Geschäftsführende Gesellschafter: Christian Plätke und Holger Jansen
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
> Geschäftsführung: Dr. Michael Fischer, Reinhard Eisebitt
> Amtsgericht Köln HRB 32356
> Steuer-Nr.: 217/5717/0536
> Ust.Id.-Nr.: DE 204051920
> --
> This email transmission and any documents, files or previous email
> messages attached to it may contain information that is confidential or
> legally privileged. If you are not the intended recipient or a person
> responsible for delivering this transmission to the intended recipient,
> you are hereby notified that any disclosure, copying, printing,
> distribution or use of this transmission is strictly prohibited. If you
> have received this transmission in error, please immediately notify the
> sender by telephone or return email and delete the original transmission
> and its attachments without reading or saving in any manner.
>
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list