1
0
Fork 0
mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-01-24 09:13:20 -05:00
linux/fs
Fengguang Wu 8bc3be2751 writeback: speed up writeback of big dirty files
After making dirty a 100M file, the normal behavior is to start the
writeback for all data after 30s delays.  But sometimes the following
happens instead:

	- after 30s:    ~4M
	- after 5s:     ~4M
	- after 5s:     all remaining 92M

Some analyze shows that the internal io dispatch queues goes like this:

		s_io            s_more_io
		-------------------------
	1)	100M,1K         0
	2)	1K              96M
	3)	0               96M
1) initial state with a 100M file and a 1K file

2) 4M written, nr_to_write <= 0, so write more

3) 1K written, nr_to_write > 0, no more writes(BUG)

nr_to_write > 0 in (3) fools the upper layer to think that data have all
been written out.  The big dirty file is actually still sitting in
s_more_io.  We cannot simply splice s_more_io back to s_io as soon as s_io
becomes empty, and let the loop in generic_sync_sb_inodes() continue: this
may starve newly expired inodes in s_dirty.  It is also not an option to
draw inodes from both s_more_io and s_dirty, an let the loop go on: this
might lead to live locks, and might also starve other superblocks in sync
time(well kupdate may still starve some superblocks, that's another bug).

We have to return when a full scan of s_io completes.  So nr_to_write > 0
does not necessarily mean that "all data are written".  This patch
introduces a flag writeback_control.more_io to indicate that more io should
be done.  With it the big dirty file no longer has to wait for the next
kupdate invokation 5s later.

In sync_sb_inodes() we only set more_io on super_blocks we actually
visited.  This avoids the interaction between two pdflush deamons.

Also in __sync_single_inode() we don't blindly keep requeuing the io if the
filesystem cannot progress.  Failing to do so may lead to 100% iowait.

Tested-by: Mike Snitzer <snitzer@gmail.com>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Michael Rubin <mrubin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:19 -08:00
..
9p 9p: use copy of the options value instead of original 2007-11-06 08:02:53 -06:00
adfs
affs
afs vfs: Add 64 bit i_version support 2008-01-28 23:58:27 -05:00
autofs Use task_pid_nr() instead of pid_nr(task_pid()) 2007-10-19 11:53:43 -07:00
autofs4 pid namespaces: round up the API 2007-10-19 11:53:37 -07:00
befs fs/: Spelling fixes 2008-02-03 17:33:42 +02:00
bfs regression: bfs endianness bug 2007-12-05 09:25:20 -08:00
cifs Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
coda coda: convert struct class_device to struct device 2008-01-24 20:40:05 -08:00
configfs configfs: file.c fix possible recursive locking 2008-01-25 15:05:47 -08:00
cramfs fs/cramfs/inode.c: replace hardcoded value with preprocessor constant 2007-10-18 14:37:29 -07:00
debugfs Kobject: convert fs/* from kobject_unregister() to kobject_put() 2008-01-24 20:40:40 -08:00
devpts
dlm dlm: static initialization improvements 2008-01-30 11:04:43 -06:00
ecryptfs Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
efs exportfs: make struct export_operations const 2007-10-22 08:13:21 -07:00
exportfs exportfs: update documentation 2007-10-22 08:13:21 -07:00
ext2 ext2: Fix the max file size for ext2 file system. 2008-01-28 23:58:26 -05:00
ext3 Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
ext4 Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
fat fat: optimize fat_count_free_clusters() 2008-01-08 16:10:35 -08:00
freevxfs fs/: Spelling fixes 2008-02-03 17:33:42 +02:00
fuse Kobject: convert fs/* from kobject_unregister() to kobject_put() 2008-01-24 20:40:40 -08:00
gfs2 Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
hfs hfs: fix coverity-found null deref 2008-01-17 15:38:58 -08:00
hfsplus
hostfs
hpfs
hppfs
hugetlbfs hugetlb: allow sticky directory mount option 2008-02-05 09:44:14 -08:00
isofs exportfs: make struct export_operations const 2007-10-22 08:13:21 -07:00
jbd spinlock: lockbreak cleanup 2008-01-30 13:31:20 +01:00
jbd2 spinlock: lockbreak cleanup 2008-01-30 13:31:20 +01:00
jffs2 Typoes: "whith" -> "with" 2008-02-03 15:14:02 +02:00
jfs Spelling fixes: lenght->length 2008-02-03 15:42:53 +02:00
lockd NLM: tear down RPC clients in nlm_shutdown_hosts 2008-02-01 16:42:15 -05:00
minix
msdos
ncpfs vm audit: add VM_DONTEXPAND to mmap for drivers that need it 2008-02-04 07:55:38 -08:00
nfs Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
nfs_common
nfsd nfsd: more careful input validation in nfsctl write methods 2008-02-01 16:42:15 -05:00
nls sparse pointer use of zero as null 2007-10-18 14:37:31 -07:00
ntfs is_vmalloc_addr(): Check if an address is within the vmalloc boundaries 2008-02-05 09:44:14 -08:00
ocfs2 Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
openpromfs [SPARC]: Constify function pointer tables. 2008-01-22 18:29:20 -08:00
partitions Kobject: convert fs/* from kobject_unregister() to kobject_put() 2008-01-24 20:40:40 -08:00
proc Fix /proc dcache deadlock in do_exit 2008-02-05 09:44:18 -08:00
qnx4
ramfs
reiserfs Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
romfs
smbfs Merge branch 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc 2008-02-01 11:45:47 +11:00
sysfs Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 2008-01-25 17:19:08 -08:00
sysv
udf fs/udf/balloc.c: mark a variable as uninitialized_var() 2007-10-17 08:43:00 -07:00
ufs ufs: fix nexstep dir block size 2007-12-05 09:21:18 -08:00
vfat
xfs is_vmalloc_addr(): Check if an address is within the vmalloc boundaries 2008-02-05 09:44:14 -08:00
aio.c core: remove last users of empty FASTCALL macro 2008-01-30 13:31:17 +01:00
anon_inodes.c anon-inodes use open coded atomic_inc for the shared inode 2007-10-17 08:43:00 -07:00
attr.c VFS: make notify_change pass ATTR_KILL_S*ID to setattr operations 2007-10-18 14:37:22 -07:00
bad_inode.c
binfmt_aout.c mm: fix exit_mmap BUG() on a.out binary exit 2007-12-20 07:49:53 -08:00
binfmt_elf.c fs/binfmt_elf.c: spello fix 2008-02-03 18:05:15 +02:00
binfmt_elf_fdpic.c pid namespaces: changes to show virtual ids to user 2007-10-19 11:53:40 -07:00
binfmt_em86.c Convert files to UTF-8 and some cleanups 2007-10-19 23:21:04 +02:00
binfmt_flat.c
binfmt_misc.c Convert files to UTF-8 and some cleanups 2007-10-19 23:21:04 +02:00
binfmt_script.c Convert files to UTF-8 and some cleanups 2007-10-19 23:21:04 +02:00
binfmt_som.c
bio.c __bio_clone: don't calculate hw/phys segment counts 2008-01-28 10:04:46 +01:00
block_dev.c Driver core: convert block from raw kobjects to core devices 2008-01-24 20:40:36 -08:00
buffer.c bufferhead: revert constructor removal 2008-02-05 09:44:14 -08:00
char_dev.c Kobject: rename kobject_init_ng() to kobject_init() 2008-01-24 20:40:38 -08:00
compat.c timerfd: new timerfd API 2008-02-05 09:44:07 -08:00
compat_binfmt_elf.c x86: compat_binfmt_elf 2008-01-30 13:31:46 +01:00
compat_ioctl.c remove __attribute_used__ 2008-01-28 23:21:18 +01:00
dcache.c dcache: don't expose uninitialized memory in /proc/<pid>/fd/<fd> 2007-10-22 08:13:18 -07:00
dcookies.c
direct-io.c Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
dnotify.c
dquot.c Don't send quota messages repeatedly when hardlimit reached 2007-12-23 12:54:36 -08:00
drop_caches.c
eventfd.c
eventpoll.c lockdep: annotate epoll 2008-02-05 09:44:07 -08:00
exec.c exec: rework the group exit and fix the race with kill 2008-02-05 09:44:07 -08:00
fcntl.c pid namespaces: changes to show virtual ids to user 2007-10-19 11:53:40 -07:00
fifo.c
file.c
file_table.c fs/file_table.c: use list_for_each_entry() instead of list_for_each() 2007-10-19 11:53:38 -07:00
filesystems.c
fs-writeback.c writeback: speed up writeback of big dirty files 2008-02-05 09:44:19 -08:00
generic_acl.c
inode.c ext4: Add inode version support in ext4 2008-01-28 23:58:27 -05:00
inotify.c [PATCH] new helper - inotify_evict_watch() 2007-10-21 02:37:38 -04:00
inotify_user.c change inotifyfs magic as the same magic is used for futexfs 2007-10-17 08:43:00 -07:00
internal.h
ioctl.c
ioprio.c cfq-iosched: relax IOPRIO_CLASS_IDLE restrictions 2008-01-28 11:38:15 +01:00
Kconfig nfsd: select CONFIG_PROC_FS in nfsv4 and gss server cases 2008-02-01 16:42:04 -05:00
Kconfig.binfmt x86: compat_binfmt_elf Kconfig 2008-01-30 13:31:46 +01:00
libfs.c Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
locks.c pid-namespaces-vs-locks-interaction 2008-02-03 17:51:36 -05:00
Makefile x86: compat_binfmt_elf Kconfig 2008-01-30 13:31:46 +01:00
mbcache.c fs: Fix to correct the mbcache entries counter 2007-10-25 15:18:29 -07:00
mpage.c Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user 2008-02-05 09:44:13 -08:00
namei.c Use access mode instead of open flags to determine needed permissions 2008-01-12 14:47:58 -08:00
namespace.c kobject: convert main fs kobject to use kobject_create 2008-01-24 20:40:13 -08:00
nfsctl.c
no-block.c
open.c mark sys_open/sys_read exports unused 2007-11-14 18:45:42 -08:00
pipe.c
pnode.c
pnode.h [PATCH] new helpers - collect_mounts() and release_collected_mounts() 2007-10-21 02:37:25 -04:00
posix_acl.c
quota.c
quota_v1.c
quota_v2.c
read_write.c ext4: export iov_shorten from kernel for ext4's use 2008-01-28 23:58:27 -05:00
read_write.h
readdir.c Use mutex_lock_killable in vfs_readdir 2007-12-06 17:39:54 -05:00
select.c fs/select, remove unused macros 2007-10-19 11:53:41 -07:00
seq_file.c
signalfd.c Fix a small number of "memeber" typoes. 2008-02-03 15:12:15 +02:00
splice.c splice: always updated atime in direct splice 2008-02-01 09:26:32 +01:00
stack.c
stat.c
super.c Convert files to UTF-8 and some cleanups 2007-10-19 23:21:04 +02:00
sync.c
timerfd.c timerfd: new timerfd API 2008-02-05 09:44:07 -08:00
utimes.c
xattr.c [PATCH] pass dentry to audit_inode()/audit_inode_child() 2007-10-21 02:37:18 -04:00
xattr_acl.c