[Linux-HA] Filesystem OCF resource timeout problem

Andrew Beekhof beekhof at gmail.com
Mon Jan 29 02:24:31 MST 2007


On 1/26/07, George H <george.dma at gmail.com> wrote:
> Hi again,
>
> I keep having this re-occuring problem with heartbeat 2.0.8 (Realease
> 2). When I start up heartbeat it is supposed to run drbddisk, then
> mount a file system on the drbd device, and then pull up a virtual IP.
>
> running the drbddisk works, but it keeps failing when running the
> Filesystem script. It actually mounts the filesystem but it gives a
> Timeout error. It shows the timeout for 5000ms. Maybe it may take more
> than that but I don't know where to set the timeout value.

pattern match what you already have :-)
see below...

>
> Is the problem related to the timeout value ? or is the configuration
> wrong ?  I also read in the docs that resources enclosed in a <group>
> tag are executed in order, do I assume that the next resource doesn't
> get executed unless the resource before finishes completely?
>
> This is the cib.xml snip of the resource
>
>         <group id="group_1">
>          <primitive class="heartbeat" id="drbddisk_1"
> provider="heartbeat" type="drbddisk">
>            <operations>
>              <op id="drbddisk_1_mon" interval="120s" name="monitor"
> timeout="60s"/>
>            </operations>
>            <instance_attributes id="drbddisk_1_inst_attr">
>              <attributes>
>                <nvpair id="drbddisk_1_attr_1" name="1" value="mirror"/>
>              </attributes>
>            </instance_attributes>
>          </primitive>
>
>          <primitive class="ocf" id="Filesystem_2" provider="heartbeat"
> type="Filesystem">
>            <operations>
>              <op id="Filesystem_2_meta" interval="120s" name="meta-data"
> timeout="120s"/>

uhm, why are you calling the meta-data function every 2 minutes?

>              <op id="Filesystem_2_mon" interval="120s" name="monitor"
> timeout="60s"/>

Try adding this:
              <op id="Filesystem_2_start" name="start" timeout="60s"/>

You only need to specify the start action when you want something that
isn't the default (ie. a bigger timeout than everyone else).

>            </operations>
>            <instance_attributes id="Filesystem_2_inst_attr">
>              <attributes>
>                <nvpair id="Filesystem_2_attr_0" name="device"
> value="/dev/drbd0"/>
>                <nvpair id="Filesystem_2_attr_1" name="directory" value="/ha"/>
>                <nvpair id="Filesystem_2_attr_2" name="fstype" value="reiserfs"/>
>              </attributes>
>            </instance_attributes>
>          </primitive>
>      </group>
>
> This is the snip of the ha-log that shows teh timeout problem all the time.
>
> crmd[6188]: 2007/01/26_15:16:54 info: do_lrm_rsc_op: Performing
> op=drbddisk_1_monitor_120000
> key=6:1:38d50eca-554a-4f7e-b4c4-3733cb566939)
> crmd[6188]: 2007/01/26_15:16:54 info: do_lrm_rsc_op: Performing
> op=Filesystem_2_start_0 key=7:1:38d50eca-554a-4f7e-b4c4-3733cb566939)
> cib[6227]: 2007/01/26_15:16:54 info: write_cib_contents: Wrote version
> 0.1.17 of the CIB to disk (digest: 213e7ddb5c1eccedd56321e521ebaa29)
> crmd[6188]: 2007/01/26_15:16:54 info: process_lrm_event: LRM operation
> drbddisk_1_monitor_120000 (call=6, rc=0) complete
> Filesystem[6229][6238]: 2007/01/26_15:16:55 INFO: Running start for
> /dev/drbd0 on /ha
> cib[6184]: 2007/01/26_15:16:55 info: cib_diff_notify: Update (client:
> 6188, call:22): 0.1.17 -> 0.1.18 (ok)
> cib[6248]: 2007/01/26_15:16:55 info: write_cib_contents: Wrote version
> 0.1.18 of the CIB to disk (digest: 5154bae537ed8dcca6b753da36cd9a68)
> lrmd[6185]: 2007/01/26_15:16:59 WARN: on_op_timeout_expired: TIMEOUT:
> operation start[7] on ocf::Filesystem::Filesystem_2 for client 6188,
> its parameters: directory=[/ha] fstype=[reiserfs]
> CRM_meta_op_target_rc=[7] device=[/dev/drbd0] CRM_meta_timeout=[5000]
> crm_feature_set=[1.0.7] .
> crmd[6188]: 2007/01/26_15:16:59 ERROR: process_lrm_event: LRM
> operation Filesystem_2_start_0 (7) Timed Out (timeout=5000ms)
> crmd[6188]: 2007/01/26_15:16:59 info: append_restart_list: Resource
> Filesystem_2 does not support reloads
> crmd[6188]: 2007/01/26_15:17:01 info: do_lrm_rsc_op: Performing
> op=Filesystem_2_stop_0 key=2:2:38d50eca-554a-4f7e-b4c4-3733cb566939)
> cib[6184]: 2007/01/26_15:17:01 info: cib_diff_notify: Update (client:
> 6188, call:23): 0.1.18 -> 0.1.19 (ok)
> cib[6257]: 2007/01/26_15:17:01 info: write_cib_contents: Wrote version
> 0.1.19 of the CIB to disk (digest: 8837e7604b65b22aee0d92155673a49e)
> Filesystem[6258][6264]: 2007/01/26_15:17:01 INFO: Running stop for
> /dev/drbd0 on /ha
> Filesystem[6258][6274]: 2007/01/26_15:17:01 INFO: Trying to unmount /ha
> Filesystem[6258][6280]: 2007/01/26_15:17:01 INFO: unmounted /ha successfully
> crmd[6188]: 2007/01/26_15:17:01 info: process_lrm_event: LRM operation
> Filesystem_2_stop_0 (call=8, rc=0) complete
>
> Thanks for any help I can get.
>
> --
> "Nothing is impossible for the person that doesn't have to do it"
> "The probability of anything happening is in inverse ratio to its desirability"
> "If I were a roman statue, I'd be made alabastard"
> --
> George H
> george.dma at gmail.com
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>


More information about the Linux-HA mailing list