[Linux-HA] OCFS2 - Memory hog?

José Costa meetra at gmail.com
Thu Feb 15 16:38:20 MST 2007


@ ocfs2 page:

OCFS2 1.2.4-2 patches have been incorporated into the SLES10 SP1 code tree.

and

r2981 fs  - ocfs2_link() journal credits update

@ http://ftp.suse.com/pub/projects/kernel/kotd/sle10-sp-i386/SLES10_SP1_BRANCH/kernel-smp.abuild-extra.tar.gz

- patches.suse/ocfs2-1.2-svn-r2981.diff: ocfs2: ocfs2_link()
  journal credits update.


http://oss.oracle.com/bugzilla/show_bug.cgi?id=815
http://oss.oracle.com/bugzilla/show_bug.cgi?id=774

Maybe use the bigsmp kernel?

On 2/15/07, John Lange <john.lange at open-it.ca> wrote:
> As a quick follow up to my own posting; I have the
> 2.6.16.37-SLES10_SP1_BRANCH_20070213192756-smp kernel running and I also
> found:
>
> ocfs2-tools-1.2.2-20.i586.rpm
>
> a package from OpenSUSE factory (10.3).
>
> Neither of these things has solved the problem as memory continues to
> climb out of control as soon as the nfs clients start accessing the
> ocfs2 file system.
>
> As a work around, I'm wondering if anyone has an opinion on if its safe
> to do this:
>
> sync ; echo 3 > /proc/sys/vm/drop_caches
> sync ; echo 0 > /proc/sys/vm/drop_caches
>
> on a cron every 15 minutes?
>
> This might keep this thing alive until we can figure out how to patch
> ocfs to 1.2.4. The alternative is to stop nfs and remount ocfs2 every 15
> minutes but that causes us major grief as it knocks the clients offline.
>
> John
>
> On Thu, 2007-02-15 at 13:38 -0600, John Lange wrote:
> > Unfortunately, I _am_ running that kernel already:
> >
> > # uname -a
> > 2.6.16.37-SLES10_SP1_BRANCH_20070213192756-smp #1 SMP Tue Feb 13
> >
> > I had upgraded it to that kernel yesterday as part of something else we
> > were trying to fix.
> >
> > Are there 1.2.4 ocfs2 packages available someplace for SLED 10?
> >
> > John
> >
> > On Thu, 2007-02-15 at 19:06 +0000, José Costa wrote:
> > > try this kernels from the SP1 Branch.
> > >
> > > http://ftp.suse.com/pub/projects/kernel/kotd/sle10-sp-i386/SLES10_SP1_BRANCH/
> > >
> > > On 2/15/07, John Lange <john.lange at open-it.ca> wrote:
> > > > Yes, the clients are doing lots of creates.
> > > >
> > > > But my question is, if this is a memory leak, why does ocfs2 eat up the
> > > > memory as soon as the clients start accessing the filesystem. Within
> > > > about 5-10 minutes all physical RAM is consumed but then the memory
> > > > consumption stops. It does not go into swap.
> > > >
> > > > Do you happen to know what version of ocfs2 has the fix and is there an
> > > > update available for SUSE?
> > > >
> > > > If it was a leak would the process not be more gradual and continuous?
> > > > It would continue to eat into swap no? And if it was a leak would the
> > > > ram be freed when ocfs was unmounted?
> > > >
> > > > Is there a command that shows what is using the kernel memory?
> > > >
> > > > Searching for ocfs2 memory leak I came across a posting which suggested:
> > > >
> > > > echo 3 > /proc/sys/vm/drop_caches
> > > >
> > > > So I do a sync (no memory is recovered) and then the above line and all
> > > > the RAM comes back! But I don't know what that proves?
> > > >
> > > > Here is what /proc/slabinfo shows (cut down for formatting). I don't
> > > > understand how to read this so maybe someone can indicate if something
> > > > looks wrong?
> > > >
> > > > =======
> > > > # cat /proc/slabinfo
> > > >
> > > > # name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab>
> > > > nfsd4_delegations      0      0    596   13    2
> > > > nfsd4_stateids         0      0     72   53    1
> > > > nfsd4_files            0      0     36  101    1
> > > > nfsd4_stateowners      0      0    344   11    1
> > > > rpc_buffers            8      8   2048    2    1
> > > > rpc_tasks              8     15    256   15    1
> > > > rpc_inode_cache        0      0    512    7    1
> > > > ocfs2_lock           152    203     16  203    1
> > > > ocfs2_inode_cache  12484  12536    896    4    1
> > > > ocfs2_uptodate      1381   1469     32  113    1
> > > > ocfs2_em_ent       37005  37406     64   59    1
> > > > dlmfs_inode_cache      1      6    640    6    1
> > > > dlm_mle_cache         10     10    384   10    1
> > > > configfs_dir_cache     33     78     48   78    1
> > > > fib6_nodes             7    113     32  113    1
> > > > ip6_dst_cache          7     15    256   15    1
> > > > ndisc_cache            1     15    256   15    1
> > > > RAWv6                  5      6    640    6    1
> > > > UDPv6                  3      6    640    6    1
> > > > tw_sock_TCPv6          0      0    128   30    1
> > > > request_sock_TCPv6      0      0    128   30    1
> > > > TCPv6                  8      9   1280    3    1
> > > > ip_fib_alias          16    113     32  113    1
> > > > ip_fib_hash           16    113     32  113    1
> > > > dm_events             16    169     20  169    1
> > > > dm_tio              4157   7308     16  203    1
> > > > dm_io               4155   6760     20  169    1
> > > > uhci_urb_priv          0      0     40   92    1
> > > > ext3_inode_cache    1062   2856    512    8    1
> > > > ext3_xattr             0      0     48   78    1
> > > > journal_handle        74    169     20  169    1
> > > > journal_head         583   1224     52   72    1
> > > > revoke_table           6    254     12  254    1
> > > > revoke_record          0      0     16  203    1
> > > > qla2xxx_srbs         244    360    128   30    1
> > > > scsi_cmd_cache       106    130    384   10    1
> > > > sgpool-256            32     32   4096    1    1
> > > > sgpool-128            42     42   2048    2    1
> > > > sgpool-64             44     44   1024    4    1
> > > > sgpool-32             48     48    512    8    1
> > > > sgpool-16             75     75    256   15    1
> > > > sgpool-8             153    210    128   30    1
> > > > scsi_io_context        0      0    104   37    1
> > > > UNIX                 377    399    512    7    1
> > > > ip_mrt_cache           0      0    128   30    1
> > > > tcp_bind_bucket       14    203     16  203    1
> > > > inet_peer_cache       81    118     64   59    1
> > > > secpath_cache          0      0    128   30    1
> > > > xfrm_dst_cache         0      0    384   10    1
> > > > ip_dst_cache         176    240    256   15    1
> > > > arp_cache              6     30    256   15    1
> > > > RAW                    3      7    512    7    1
> > > > UDP                   29     42    512    7    1
> > > > tw_sock_TCP            0      0    128   30    1
> > > > request_sock_TCP       0      0     64   59    1
> > > > TCP                   19     35   1152    7    2
> > > > flow_cache             0      0    128   30    1
> > > > cfq_ioc_pool         194    240     96   40    1
> > > > cfq_pool             185    240     96   40    1
> > > > crq_pool             312    468     48   78    1
> > > > deadline_drq           0      0     52   72    1
> > > > as_arq                 0      0     64   59    1
> > > > mqueue_inode_cache      1      6    640    6    1
> > > > isofs_inode_cache      0      0    384   10    1
> > > > minix_inode_cache      0      0    420    9    1
> > > > hugetlbfs_inode_cache      1     11    356   11    1
> > > > ext2_inode_cache       0      0    492    8    1
> > > > ext2_xattr             0      0     48   78    1
> > > > dnotify_cache          1    169     20  169    1
> > > > dquot                  0      0    128   30    1
> > > > eventpoll_pwq          1    101     36  101    1
> > > > eventpoll_epi          1     30    128   30    1
> > > > inotify_event_cache      0      0     28  127    1
> > > > inotify_watch_cache     40     92     40   92    1
> > > > kioctx                 0      0    256   15    1
> > > > kiocb                  0      0    128   30    1
> > > > fasync_cache           1    203     16  203    1
> > > > shmem_inode_cache    612    632    460    8    1
> > > > posix_timers_cache      0      0    100   39    1
> > > > uid_cache              7     59     64   59    1
> > > > blkdev_ioc           103    127     28  127    1
> > > > blkdev_queue          58     60    960    4    1
> > > > blkdev_requests      354    418    176   22    1
> > > > biovec-(256)         312    312   3072    2    2
> > > > biovec-128           368    370   1536    5    2
> > > > biovec-64            480    485    768    5    1
> > > > biovec-16            480    495    256   15    1
> > > > biovec-4             480    531     64   59    1
> > > > biovec-1            1104   5481     16  203    1
> > > > bio                 1140   2250    128   30    1
> > > > sock_inode_cache     456    483    512    7    1
> > > > skbuff_fclone_cache     36     40    384   10    1
> > > > skbuff_head_cache    655    825    256   15    1
> > > > file_lock_cache        5     42     92   42    1
> > > > acpi_operand         634    828     40   92    1
> > > > acpi_parse_ext         0      0     44   84    1
> > > > acpi_parse             0      0     28  127    1
> > > > acpi_state             0      0     48   78    1
> > > > delayacct_cache      183    390     48   78    1
> > > > taskstats_cache        9     32    236   16    1
> > > > proc_inode_cache      49    170    372   10    1
> > > > sigqueue              96    135    144   27    1
> > > > radix_tree_node    16046  16786    276   14    1
> > > > bdev_cache            56     56    512    7    1
> > > > sysfs_dir_cache     4831   4876     40   92    1
> > > > mnt_cache             30     60    128   30    1
> > > > inode_cache         1041   1276    356   11    1
> > > > dentry_cache       11588  13688    132   29    1
> > > > filp                2734   2820    192   20    1
> > > > names_cache           25     25   4096    1    1
> > > > idr_layer_cache      204    232    136   29    1
> > > > buffer_head       456669 459936     52   72    1
> > > > mm_struct            109    126    448    9    1
> > > > vm_area_struct      5010   5632     88   44    1
> > > > fs_cache             109    177     64   59    1
> > > > files_cache           94    135    448    9    1
> > > > signal_cache         159    160    384   10    1
> > > > sighand_cache        147    147   1344    3    1
> > > > task_struct          175    175   1376    5    2
> > > > anon_vma            2355   2540     12  254    1
> > > > pgd                   81     81   4096    1    1
> > > >
> > > >
> > > > On Thu, 2007-02-15 at 10:40 -0700, Robert Wipfel wrote:
> > > > > >>> On Thu, Feb 15, 2007 at 10:34 AM, in message
> > > > > <1171560898.4589.12.camel at ibmlaptop.darkcore.net>, John Lange
> > > > > <john.lange at open-it.ca> wrote:
> > > > > > System is SUSE SLES 10 running heartbeat, ocfs2, evms, and exporting the
> > > > > > file system via nfs.
> > > > > >
> > > > > > The ocfs2 partition is 12 Terabytes and is being exported via nfs.
> > > > > >
> > > > > > What we see is as soon as the nfs clients (80 nfs v2 clients) start
> > > > > > connecting, memory usage goes up and up and up until all the physical
> > > > > > RAM is consumed but it levels off before hitting swap. With 1G RAM, 1G
> > > > > > of ram is used. With 2G RAM, 2G of ram is used. It just seems to consume
> > > > > > everything.
> > > > > >
> > > > > > The system seems to run happily for a while. Then something happens and
> > > > > > there is a RAM spike. Next thing you know we see the dreaded kernel
> > > > > > oom- killer appear and start killing processes left and right resulting
> > > > > > in a complete crash.
> > > > > >
> > > > > > I can confirm it is NOT nfs using the ram because when nfs is stopped,
> > > > > > no ram is recovered. But when the ocfs2 partition is unmounted the RAM
> > > > > > is freed.
> > > > > >
> > > > > > Can someone shed some light on what is going on here? Any suggestions on
> > > > > > how to resolve this problem?
> > > > >
> > > > > Are your clients doing lots of creates? There was an OCFS2 bug
> > > > > that left DLM structures lying around for each file create, that iirc is now
> > > > > fixed.
> > > > >
> > > > > Hth,
> > > > > Robert
> > > > > _______________________________________________
> > > > > Linux-HA mailing list
> > > > > Linux-HA at lists.linux-ha.org
> > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > > See also: http://linux-ha.org/ReportingProblems
> > > > >
> > > > --
> > > > John Lange
> > > > Epic Information Solutions
> > > > p: (204) 975 7113
> > > >
> > > > _______________________________________________
> > > > Linux-HA mailing list
> > > > Linux-HA at lists.linux-ha.org
> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > See also: http://linux-ha.org/ReportingProblems
> > > >
> > > _______________________________________________
> > > Linux-HA mailing list
> > > Linux-HA at lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> >
> >
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>


More information about the Linux-HA mailing list