[Linux-HA] OCFS2 - Memory hog?
José Costa
meetra at gmail.com
Thu Feb 15 16:38:20 MST 2007
@ ocfs2 page:
OCFS2 1.2.4-2 patches have been incorporated into the SLES10 SP1 code tree.
and
r2981 fs - ocfs2_link() journal credits update
@ http://ftp.suse.com/pub/projects/kernel/kotd/sle10-sp-i386/SLES10_SP1_BRANCH/kernel-smp.abuild-extra.tar.gz
- patches.suse/ocfs2-1.2-svn-r2981.diff: ocfs2: ocfs2_link()
journal credits update.
http://oss.oracle.com/bugzilla/show_bug.cgi?id=815
http://oss.oracle.com/bugzilla/show_bug.cgi?id=774
Maybe use the bigsmp kernel?
On 2/15/07, John Lange <john.lange at open-it.ca> wrote:
> As a quick follow up to my own posting; I have the
> 2.6.16.37-SLES10_SP1_BRANCH_20070213192756-smp kernel running and I also
> found:
>
> ocfs2-tools-1.2.2-20.i586.rpm
>
> a package from OpenSUSE factory (10.3).
>
> Neither of these things has solved the problem as memory continues to
> climb out of control as soon as the nfs clients start accessing the
> ocfs2 file system.
>
> As a work around, I'm wondering if anyone has an opinion on if its safe
> to do this:
>
> sync ; echo 3 > /proc/sys/vm/drop_caches
> sync ; echo 0 > /proc/sys/vm/drop_caches
>
> on a cron every 15 minutes?
>
> This might keep this thing alive until we can figure out how to patch
> ocfs to 1.2.4. The alternative is to stop nfs and remount ocfs2 every 15
> minutes but that causes us major grief as it knocks the clients offline.
>
> John
>
> On Thu, 2007-02-15 at 13:38 -0600, John Lange wrote:
> > Unfortunately, I _am_ running that kernel already:
> >
> > # uname -a
> > 2.6.16.37-SLES10_SP1_BRANCH_20070213192756-smp #1 SMP Tue Feb 13
> >
> > I had upgraded it to that kernel yesterday as part of something else we
> > were trying to fix.
> >
> > Are there 1.2.4 ocfs2 packages available someplace for SLED 10?
> >
> > John
> >
> > On Thu, 2007-02-15 at 19:06 +0000, José Costa wrote:
> > > try this kernels from the SP1 Branch.
> > >
> > > http://ftp.suse.com/pub/projects/kernel/kotd/sle10-sp-i386/SLES10_SP1_BRANCH/
> > >
> > > On 2/15/07, John Lange <john.lange at open-it.ca> wrote:
> > > > Yes, the clients are doing lots of creates.
> > > >
> > > > But my question is, if this is a memory leak, why does ocfs2 eat up the
> > > > memory as soon as the clients start accessing the filesystem. Within
> > > > about 5-10 minutes all physical RAM is consumed but then the memory
> > > > consumption stops. It does not go into swap.
> > > >
> > > > Do you happen to know what version of ocfs2 has the fix and is there an
> > > > update available for SUSE?
> > > >
> > > > If it was a leak would the process not be more gradual and continuous?
> > > > It would continue to eat into swap no? And if it was a leak would the
> > > > ram be freed when ocfs was unmounted?
> > > >
> > > > Is there a command that shows what is using the kernel memory?
> > > >
> > > > Searching for ocfs2 memory leak I came across a posting which suggested:
> > > >
> > > > echo 3 > /proc/sys/vm/drop_caches
> > > >
> > > > So I do a sync (no memory is recovered) and then the above line and all
> > > > the RAM comes back! But I don't know what that proves?
> > > >
> > > > Here is what /proc/slabinfo shows (cut down for formatting). I don't
> > > > understand how to read this so maybe someone can indicate if something
> > > > looks wrong?
> > > >
> > > > =======
> > > > # cat /proc/slabinfo
> > > >
> > > > # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab>
> > > > nfsd4_delegations 0 0 596 13 2
> > > > nfsd4_stateids 0 0 72 53 1
> > > > nfsd4_files 0 0 36 101 1
> > > > nfsd4_stateowners 0 0 344 11 1
> > > > rpc_buffers 8 8 2048 2 1
> > > > rpc_tasks 8 15 256 15 1
> > > > rpc_inode_cache 0 0 512 7 1
> > > > ocfs2_lock 152 203 16 203 1
> > > > ocfs2_inode_cache 12484 12536 896 4 1
> > > > ocfs2_uptodate 1381 1469 32 113 1
> > > > ocfs2_em_ent 37005 37406 64 59 1
> > > > dlmfs_inode_cache 1 6 640 6 1
> > > > dlm_mle_cache 10 10 384 10 1
> > > > configfs_dir_cache 33 78 48 78 1
> > > > fib6_nodes 7 113 32 113 1
> > > > ip6_dst_cache 7 15 256 15 1
> > > > ndisc_cache 1 15 256 15 1
> > > > RAWv6 5 6 640 6 1
> > > > UDPv6 3 6 640 6 1
> > > > tw_sock_TCPv6 0 0 128 30 1
> > > > request_sock_TCPv6 0 0 128 30 1
> > > > TCPv6 8 9 1280 3 1
> > > > ip_fib_alias 16 113 32 113 1
> > > > ip_fib_hash 16 113 32 113 1
> > > > dm_events 16 169 20 169 1
> > > > dm_tio 4157 7308 16 203 1
> > > > dm_io 4155 6760 20 169 1
> > > > uhci_urb_priv 0 0 40 92 1
> > > > ext3_inode_cache 1062 2856 512 8 1
> > > > ext3_xattr 0 0 48 78 1
> > > > journal_handle 74 169 20 169 1
> > > > journal_head 583 1224 52 72 1
> > > > revoke_table 6 254 12 254 1
> > > > revoke_record 0 0 16 203 1
> > > > qla2xxx_srbs 244 360 128 30 1
> > > > scsi_cmd_cache 106 130 384 10 1
> > > > sgpool-256 32 32 4096 1 1
> > > > sgpool-128 42 42 2048 2 1
> > > > sgpool-64 44 44 1024 4 1
> > > > sgpool-32 48 48 512 8 1
> > > > sgpool-16 75 75 256 15 1
> > > > sgpool-8 153 210 128 30 1
> > > > scsi_io_context 0 0 104 37 1
> > > > UNIX 377 399 512 7 1
> > > > ip_mrt_cache 0 0 128 30 1
> > > > tcp_bind_bucket 14 203 16 203 1
> > > > inet_peer_cache 81 118 64 59 1
> > > > secpath_cache 0 0 128 30 1
> > > > xfrm_dst_cache 0 0 384 10 1
> > > > ip_dst_cache 176 240 256 15 1
> > > > arp_cache 6 30 256 15 1
> > > > RAW 3 7 512 7 1
> > > > UDP 29 42 512 7 1
> > > > tw_sock_TCP 0 0 128 30 1
> > > > request_sock_TCP 0 0 64 59 1
> > > > TCP 19 35 1152 7 2
> > > > flow_cache 0 0 128 30 1
> > > > cfq_ioc_pool 194 240 96 40 1
> > > > cfq_pool 185 240 96 40 1
> > > > crq_pool 312 468 48 78 1
> > > > deadline_drq 0 0 52 72 1
> > > > as_arq 0 0 64 59 1
> > > > mqueue_inode_cache 1 6 640 6 1
> > > > isofs_inode_cache 0 0 384 10 1
> > > > minix_inode_cache 0 0 420 9 1
> > > > hugetlbfs_inode_cache 1 11 356 11 1
> > > > ext2_inode_cache 0 0 492 8 1
> > > > ext2_xattr 0 0 48 78 1
> > > > dnotify_cache 1 169 20 169 1
> > > > dquot 0 0 128 30 1
> > > > eventpoll_pwq 1 101 36 101 1
> > > > eventpoll_epi 1 30 128 30 1
> > > > inotify_event_cache 0 0 28 127 1
> > > > inotify_watch_cache 40 92 40 92 1
> > > > kioctx 0 0 256 15 1
> > > > kiocb 0 0 128 30 1
> > > > fasync_cache 1 203 16 203 1
> > > > shmem_inode_cache 612 632 460 8 1
> > > > posix_timers_cache 0 0 100 39 1
> > > > uid_cache 7 59 64 59 1
> > > > blkdev_ioc 103 127 28 127 1
> > > > blkdev_queue 58 60 960 4 1
> > > > blkdev_requests 354 418 176 22 1
> > > > biovec-(256) 312 312 3072 2 2
> > > > biovec-128 368 370 1536 5 2
> > > > biovec-64 480 485 768 5 1
> > > > biovec-16 480 495 256 15 1
> > > > biovec-4 480 531 64 59 1
> > > > biovec-1 1104 5481 16 203 1
> > > > bio 1140 2250 128 30 1
> > > > sock_inode_cache 456 483 512 7 1
> > > > skbuff_fclone_cache 36 40 384 10 1
> > > > skbuff_head_cache 655 825 256 15 1
> > > > file_lock_cache 5 42 92 42 1
> > > > acpi_operand 634 828 40 92 1
> > > > acpi_parse_ext 0 0 44 84 1
> > > > acpi_parse 0 0 28 127 1
> > > > acpi_state 0 0 48 78 1
> > > > delayacct_cache 183 390 48 78 1
> > > > taskstats_cache 9 32 236 16 1
> > > > proc_inode_cache 49 170 372 10 1
> > > > sigqueue 96 135 144 27 1
> > > > radix_tree_node 16046 16786 276 14 1
> > > > bdev_cache 56 56 512 7 1
> > > > sysfs_dir_cache 4831 4876 40 92 1
> > > > mnt_cache 30 60 128 30 1
> > > > inode_cache 1041 1276 356 11 1
> > > > dentry_cache 11588 13688 132 29 1
> > > > filp 2734 2820 192 20 1
> > > > names_cache 25 25 4096 1 1
> > > > idr_layer_cache 204 232 136 29 1
> > > > buffer_head 456669 459936 52 72 1
> > > > mm_struct 109 126 448 9 1
> > > > vm_area_struct 5010 5632 88 44 1
> > > > fs_cache 109 177 64 59 1
> > > > files_cache 94 135 448 9 1
> > > > signal_cache 159 160 384 10 1
> > > > sighand_cache 147 147 1344 3 1
> > > > task_struct 175 175 1376 5 2
> > > > anon_vma 2355 2540 12 254 1
> > > > pgd 81 81 4096 1 1
> > > >
> > > >
> > > > On Thu, 2007-02-15 at 10:40 -0700, Robert Wipfel wrote:
> > > > > >>> On Thu, Feb 15, 2007 at 10:34 AM, in message
> > > > > <1171560898.4589.12.camel at ibmlaptop.darkcore.net>, John Lange
> > > > > <john.lange at open-it.ca> wrote:
> > > > > > System is SUSE SLES 10 running heartbeat, ocfs2, evms, and exporting the
> > > > > > file system via nfs.
> > > > > >
> > > > > > The ocfs2 partition is 12 Terabytes and is being exported via nfs.
> > > > > >
> > > > > > What we see is as soon as the nfs clients (80 nfs v2 clients) start
> > > > > > connecting, memory usage goes up and up and up until all the physical
> > > > > > RAM is consumed but it levels off before hitting swap. With 1G RAM, 1G
> > > > > > of ram is used. With 2G RAM, 2G of ram is used. It just seems to consume
> > > > > > everything.
> > > > > >
> > > > > > The system seems to run happily for a while. Then something happens and
> > > > > > there is a RAM spike. Next thing you know we see the dreaded kernel
> > > > > > oom- killer appear and start killing processes left and right resulting
> > > > > > in a complete crash.
> > > > > >
> > > > > > I can confirm it is NOT nfs using the ram because when nfs is stopped,
> > > > > > no ram is recovered. But when the ocfs2 partition is unmounted the RAM
> > > > > > is freed.
> > > > > >
> > > > > > Can someone shed some light on what is going on here? Any suggestions on
> > > > > > how to resolve this problem?
> > > > >
> > > > > Are your clients doing lots of creates? There was an OCFS2 bug
> > > > > that left DLM structures lying around for each file create, that iirc is now
> > > > > fixed.
> > > > >
> > > > > Hth,
> > > > > Robert
> > > > > _______________________________________________
> > > > > Linux-HA mailing list
> > > > > Linux-HA at lists.linux-ha.org
> > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > > See also: http://linux-ha.org/ReportingProblems
> > > > >
> > > > --
> > > > John Lange
> > > > Epic Information Solutions
> > > > p: (204) 975 7113
> > > >
> > > > _______________________________________________
> > > > Linux-HA mailing list
> > > > Linux-HA at lists.linux-ha.org
> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > See also: http://linux-ha.org/ReportingProblems
> > > >
> > > _______________________________________________
> > > Linux-HA mailing list
> > > Linux-HA at lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> >
> >
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list