[exim.git] / doc / doc-misc / Ext-mbx-locking

	 UNIX Advisory File Locking Implications on c-client
		    Mark Crispin, 28 November 1995


	THIS DOCUMENT HAS BEEN UPDATED TO REFLECT THE CODE IN THE
	IMAP-4 TOOLKIT AS OF NOVEMBER 28, 1995.  SOME STATEMENTS
	IN THIS DOCUMENT DO NOT APPLY TO EARLIER VERSIONS OF THE
	IMAP TOOLKIT.

INTRODUCTION

     Advisory locking is a mechanism by which cooperating processes
can signal to each other their usage of a resource and whether or not
that usage is critical.  It is not a mechanism to protect against
processes which do not cooperate in the locking.

     The most basic form of locking involves a counter.  This counter
is -1 when the resource is available.  If a process wants the lock, it
executes an atomic increment-and-test-if-zero.  If the value is zero,
the process has the lock and can execute the critical code that needs
exclusive usage of a resource.  When it is finished, it sets the lock
back to -1.  In C terms:

  while (++lock)		/* try to get lock */
    invoke_other_threads ();	/* failed, try again */
   .
   .	/* critical code  here */
   .
  lock = -1;			/* release lock */

     This particular form of locking appears most commonly in
multi-threaded applications such as operating system kernels.  It
makes several presumptions:
 (1) it is alright to keep testing the lock (no overflow)
 (2) the critical resource is single-access only
 (3) there is shared writeable memory between the two threads
 (4) the threads can be trusted to release the lock when finished

     In applications programming on multi-user systems, most commonly
the other threads are in an entirely different process, which may even
be logged in as a different user.  Few operating systems offer shared
writeable memory between such processes.

     A means of communicating this is by use of a file with a mutually
agreed upon name.  A binary semaphore can be passed by means of the
existence or non-existence of that file, provided that there is an
atomic means to create a file if and only if that file does not exist.
In C terms:

				/* try to get lock */
  while ((fd = open ("lockfile",O_WRONLY|O_CREAT|O_EXCL,0666)) < 0)
    sleep (1);			/* failed, try again */
  close (fd);			/* got the lock */
   .
   .	/* critical code  here */
   .
  unlink ("lockfile");		/* release lock */

     This form of locking makes fewer presumptions, but it still is
guilty of presumptions (2) and (4) above.  Presumption (2) limits the
ability to have processes sharing a resource in a non-conflicting
fashion (e.g. reading from a file).  Presumption (4) leads to
deadlocks should the process crash while it has a resource locked.

     Most modern operating systems provide a resource locking system
call that has none of these presumptions.  In particular, a mechanism
is provided for identifying shared locks as opposed to exclusive
locks.  A shared lock permits other processes to obtain a shared lock,
but denies exclusive locks.  In other words:

	current state		want shared	want exclusive
	-------------		-----------	--------------
	 unlocked		 YES		 YES
	 locked shared		 YES		 NO
	 locked exclusive	 NO		 NO

     Furthermore, the operating system automatically relinquishes all
locks held by that process when it terminates.

     A useful operation is the ability to upgrade a shared lock to
exclusive (provided there are no other shared users of the lock) and
to downgrade an exclusive lock to shared.  It is important that at no
time is the lock ever removed; a process upgrading to exclusive must
not relinquish its shared lock.

     Most commonly, the resources being locked are files.  Shared
locks are particularly important with files; multiple simultaneous
processes can read from a file, but only one can safely write at a
time.  Some writes may be safer than others; an append to the end of
the file is safer than changing existing file data.  In turn, changing
a file record in place is safer than rewriting the file with an
entirely different structure.


FILE LOCKING ON UNIX

     In the oldest versions of UNIX, the use of a semaphore lockfile
was the only available form of locking.  Advisory locking system calls
were not added to UNIX until after the BSD vs. System V split.  Both
of these system calls deal with file resources only.

     Most systems only have one or the other form of locking.  AIX
emulates the BSD form of locking as a jacket into the System V form.
Ultrix and OSF/1 implement both forms.
\f
BSD

     BSD added the flock() system call.  It offers capabilities to
acquire shared lock, acquire exclusive lock, and unlock.  Optionally,
the process can request an immediate error return instead of blocking
when the lock is unavailable.


FLOCK() BUGS

     flock() advertises that it permits upgrading of shared locks to
exclusive and downgrading of exclusive locks to shared, but it does so
by releasing the former lock and then trying to acquire the new lock.
This creates a window of vulnerability in which another process can
grab the exclusive lock.  Therefore, this capability is not useful,
although many programmers have been deluded by incautious reading of
the flock() man page to believe otherwise.  This problem can be
programmed around, once the programmer is aware of it.

     flock() always returns as if it succeeded on NFS files, when in
fact it is a no-op.  There is no way around this.

     Leaving aside these two problems, flock() works remarkably well,
and has shown itself to be robust and trustworthy.
\f
SYSTEM V/POSIX

     System V added new functions to the fnctl() system call, and a
simple interface through the lockf() subroutine.  This was
subsequently included in POSIX.  Both offer the facility to apply the
lock to a particular region of the file instead of to the entire file.
lockf() only supports exclusive locks, and calls fcntl() internally;
hence it won't be discussed further.

     Functionally, fcntl() locking is a superset of flock(); it is
possible to implement a flock() emulator using fcntl(), with one minor
exception: it is not possible to acquire an exclusive lock if the file
is not open for write.

     The fcntl() locking functions are: query lock station of a file
region, lock/unlock a region, and lock/unlock a region and block until
have the lock.  The locks may be shared or exclusive.  By means of the
statd and lockd daemons, fcntl() locking is available on NFS files.

     When statd is started at system boot, it reads its /etc/state
file (which contains the number of times it has been invoked) and
/etc/sm directory (which contains a list of all remote sites which are
client or server locking with this site), and notifies the statd on
each of these systems that it has been restarted.  Each statd then
notifies the local lockd of the restart of that system.

     lockd receives fcntl() requests for NFS files.  It communicates
with the lockd at the server and requests it to apply the lock, and
with the statd to request it for notification when the server goes
down.  It blocks until all these requests are completed.

     There is quite a mythos about fcntl() locking.

     One religion holds that fcntl() locking is the best thing since
sliced bread, and that programs which use flock() should be converted
to fcntl() so that NFS locking will work.  However, as noted above,
very few systems support both calls, so such an exercise is pointless
except on Ultrix and OSF/1.

     Another religion, which I adhere to, has the opposite viewpoint.


FCNTL() BUGS

     For all of the hairy code to do individual section locking of a
file, it's clear that the designers of fcntl() locking never
considered some very basic locking operations.  It's as if all they
knew about locking they got out of some CS textbook with not
investigation of real-world needs.

     It is not possible to acquire an exclusive lock unless the file
is open for write.  You could have append with shared read, and thus
you could have a case in which a read-only access may need to go
exclusive.  This problem can be programmed around once the programmer
is aware of it.

     If the file is opened on another file designator in the same
process, the file is unlocked even if no attempt is made to do any
form of locking on the second designator.  This is a very bad bug.  It
means that an application must keep track of all the files that it has
opened and locked.

     If there is no statd/lockd on the NFS server, fcntl() will hang
forever waiting for them to appear.  This is a bad bug.  It means that
any attempt to lock on a server that doesn't run these daemons will
hang.  There is no way for an application to request flock() style
``try to lock, but no-op if the mechanism ain't there''.

     There is a rumor to the effect that fcntl() will hang forever on
local files too if there is no local statd/lockd.  These daemons are
running on mailer.u, although they appear not to have much CPU time.
A useful experiment would be to kill them and see if imapd is affected
in any way, but I decline to do so without an OK from UCS!  ;-) If
killing statd/lockd can be done without breaking fcntl() on local
files, this would become one of the primary means of dealing with this
problem.

     The statd and lockd daemons have quite a reputation for extreme
fragility.  There have been numerous reports about the locking
mechanism being wedged on a systemwide or even clusterwide basis,
requiring a reboot to clear.  It is rumored that this wedge, once it
happens, also blocks local locking.  Presumably killing and restarting
statd would suffice to clear the wedge, but I haven't verified this.

     There appears to be a limit to how many locks may be in use at a
time on the system, although the documentation only mentions it in
passing.  On some of their systems, UCS has increased lockd's ``size
of the socket buffer'', whatever that means.
\f
C-CLIENT USAGE

     c-client uses flock().  On System V systems, flock() is simulated
by an emulator that calls fcntl().  This emulator is provided by some
systems (e.g. AIX), or uses c-client's flock.c module.


BEZERK AND MMDF

     Locking in the traditional UNIX formats was largely dictated by
the status quo in other applications; however, additional protection
is added against inadvertently running multiple instances of a
c-client application on the same mail file.

     (1) c-client attempts to create a .lock file (mail file name with
``.lock'' appended) whenever it reads from, or writes to, the mail
file.  This is an exclusive lock, and is held only for short periods
of time while c-client is actually doing the I/O.  There is a 5-minute
timeout for this lock, after which it is broken on the presumption
that it is a stale lock.  If it can not create the .lock file due to
an EACCES (protection failure) error, it once silently proceeded
without this lock; this was for systems which protect /usr/spool/mail
from unprivileged processes creating files.  Today, c-client reports
an error unless it is built otherwise.  The purpose of this lock is to
prevent against unfavorable interactions with mail delivery.

     (2) c-client applies a shared flock() to the mail file whenever
it reads from the mail file, and an exclusive flock() whenever it
writes to the mail file.  This lock is freed as soon as it finishes
reading.  The purpose of this lock is to prevent against unfavorable
interactions with mail delivery.

     (3) c-client applies an exclusive flock() to a file on /tmp
(whose name represents the device and inode number of the file) when
it opens the mail file.  This lock is maintained throughout the
session, although c-client has a feature (called ``kiss of death'')
which permits c-client to forcibly and irreversibly seize the lock
from a cooperating c-client application that surrenders the lock on
demand.  The purpose of this lock is to prevent against unfavorable
interactions with other instances of c-client (rewriting the mail
file).

     Mail delivery daemons use lock (1), (2), or both.  Lock (1) works
over NFS; lock (2) is the only one that works on sites that protect
/usr/spool/mail against unprivileged file creation.  Prudent mail
delivery daemons use both forms of locking, and of course so does
c-client.

     If only lock (2) is used, then multiple processes can read from
the mail file simultaneously, although in real life this doesn't
really change things.  The normal state of locks (1) and (2) is
unlocked except for very brief periods.


TENEX AND MTX

     The design of the locking mechanism of these formats was
motivated by a design to enable multiple simultaneous read/write
access.  It is almost the reverse of how locking works with
bezerk/mmdf.

     (1) c-client applies a shared flock() to the mail file when it
opens the mail file.  It upgrades this lock to exclusive whenever it
tries to expunge the mail file.  Because of the flock() bug that
upgrading a lock actually releases it, it will not do so until it has
acquired an exclusive lock (2) first.  The purpose of this lock is to
prevent against expunge taking place while some other c-client has the
mail file open (and thus knows where all the messages are).

     (2) c-client applies a shared flock() to a file on /tmp (whose
name represents the device and inode number of the file) when it
parses the mail file.  It applies an exclusive flock() to this file
when it appends new mail to the mail file, as well as before it
attempts to upgrade lock (1) to exclusive.  The purpose of this lock
is to prevent against data being appended while some other c-client is
parsing mail in the file (to prevent reading of incomplete messages).
It also protects against the lock-releasing timing race on lock (1).
\f
OBSERVATIONS

     In a perfect world, locking works.  You are protected against
unfavorable interactions with the mailer and against your own mistake
by running more than one instance of your mail reader.  In tenex/mtx
formats, you have the additional benefit that multiple simultaneous
read/write access works, with the sole restriction being that you
can't expunge if there are any sharers of the mail file.

     If the mail file is NFS-mounted, then flock() locking is a silent
no-op.  This is the way BSD implements flock(), and c-client's
emulation of flock() through fcntl() tests for NFS files and
duplicates this functionality.  There is no locking protection for
tenex/mtx mail files at all, and only protection against the mailer
for bezerk/mmdf mail files.  This has been the accepted state of
affairs on UNIX for many sad years.

     If you can not create .lock files, it should not affect locking,
since the flock() locks suffice for all protection.  This is, however,
not true if the mailer does not check for flock() locking, or if the
the mail file is NFS-mounted.

     What this means is that there is *no* locking protection at all
in the case of a client using an NFS-mounted /usr/spool/mail that does
not permit file creation by unprivileged programs.  It is impossible,
under these circumstances, for an unprivileged program to do anything
about it.  Worse, if EACCES errors on .lock file creation are no-op'ed
, the user won't even know about it.  This is arguably a site
configuration error.

     The problem with not being able to create .lock files exists on
System V as well, but the failure modes for flock() -- which is
implemented via fcntl() -- are different.

     On System V, if the mail file is NFS-mounted and either the
client or the server lacks a functioning statd/lockd pair, then the
lock attempt would have hung forever if it weren't for the fact that
c-client tests for NFS and no-ops the flock() emulator in this case.
Systemwide or clusterwide failures of statd/lockd have been known to
occur which cause all locks in all processes to hang (including
local?).  Without the special NFS test made by c-client, there would
be no way to request BSD-style no-op behavior, nor is there any way to
determine that this is happening other than the system being hung.

     The additional locking introduced by c-client was shown to cause
much more stress on the System V locking mechanism than has
traditionally been placed upon it.  If it was stressed too far, all
hell broke loose.  Fortunately, this is now past history.
\f
TRADEOFFS

     c-client based applications have a reasonable chance of winning
as long as you don't use NFS for remote access to mail files.  That's
what IMAP is for, after all.  It is, however, very important to
realize that you can *not* use the lock-upgrade feature by itself
because it releases the lock as an interim step -- you need to have
lock-upgrading guarded by another lock.

     If you have the misfortune of using System V, you are likely to
run into problems sooner or later having to do with statd/lockd.  You
basically end up with one of three unsatisfactory choices:
	1) Grit your teeth and live with it.
	2) Try to make it work:
	   a) avoid NFS access so as not to stress statd/lockd.
	   b) try to understand the code in statd/lockd and hack it
	      to be more robust.
	   c) hunt out the system limit of locks, if there is one,
	      and increase it.	Figure on at least two locks per
	      simultaneous imapd process and four locks per Pine
	      process.	Better yet, make the limit be 10 times the
	      maximum number of processes.
	   d) increase the socket buffer (-S switch to lockd) if
	      it is offered.  I don't know what this actually does,
	      but giving lockd more resources to do its work can't
	      hurt.  Maybe.
	3) Decide that it can't possibly work, and turn off the
	   fcntl() calls in your program.
	4) If nuking statd/lockd can be done without breaking local
	   locking, then do so.  This would make SVR4 have the same
	   limitations as BSD locking, with a couple of additional
	   bugs.
	5) Check for NFS, and don't do the fcntl() in the NFS case.
	   This is what c-client does.

     Note that if you are going to use NFS to access files on a server
which does not have statd/lockd running, your only choice is (3), (4),
or (5).  Here again, IMAP can bail you out.

     These problems aren't unique to c-client applications; they have
also been reported with Elm, Mediamail, and other email tools.

     Of the other two SVR4 locking bugs:

     Programmer awareness is necessary to deal with the bug that you
can not get an exclusive lock unless the file is open for write.  I
believe that c-client has fixed all of these cases.

     The problem about opening a second designator smashing any
current locks on the file has not been addressed satisfactorily yet.
This is not an easy problem to deal with, especially in c-client which
really doesn't know what other files/streams may be open by Pine.

     Aren't you so happy that you bought an System V system?
Commit	Line	Data
	1	UNIX Advisory File Locking Implications on c-client
	2	Mark Crispin, 28 November 1995
	3
	4
	5	THIS DOCUMENT HAS BEEN UPDATED TO REFLECT THE CODE IN THE
	6	IMAP-4 TOOLKIT AS OF NOVEMBER 28, 1995. SOME STATEMENTS
	7	IN THIS DOCUMENT DO NOT APPLY TO EARLIER VERSIONS OF THE
	8	IMAP TOOLKIT.
	9
	10	INTRODUCTION
	11
	12	Advisory locking is a mechanism by which cooperating processes
	13	can signal to each other their usage of a resource and whether or not
	14	that usage is critical. It is not a mechanism to protect against
	15	processes which do not cooperate in the locking.
	16
	17	The most basic form of locking involves a counter. This counter
	18	is -1 when the resource is available. If a process wants the lock, it
	19	executes an atomic increment-and-test-if-zero. If the value is zero,
	20	the process has the lock and can execute the critical code that needs
	21	exclusive usage of a resource. When it is finished, it sets the lock
	22	back to -1. In C terms:
	23
	24	while (++lock) /* try to get lock */
	25	invoke_other_threads (); /* failed, try again */
	26	.
	27	. /* critical code here */
	28	.
	29	lock = -1; /* release lock */
	30
	31	This particular form of locking appears most commonly in
	32	multi-threaded applications such as operating system kernels. It
	33	makes several presumptions:
	34	(1) it is alright to keep testing the lock (no overflow)
	35	(2) the critical resource is single-access only
	36	(3) there is shared writeable memory between the two threads
	37	(4) the threads can be trusted to release the lock when finished
	38
	39	In applications programming on multi-user systems, most commonly
	40	the other threads are in an entirely different process, which may even
	41	be logged in as a different user. Few operating systems offer shared
	42	writeable memory between such processes.
	43
	44	A means of communicating this is by use of a file with a mutually
	45	agreed upon name. A binary semaphore can be passed by means of the
	46	existence or non-existence of that file, provided that there is an
	47	atomic means to create a file if and only if that file does not exist.
	48	In C terms:
	49
	50	/* try to get lock */
	51	while ((fd = open ("lockfile",O_WRONLY\|O_CREAT\|O_EXCL,0666)) < 0)
	52	sleep (1); /* failed, try again */
	53	close (fd); /* got the lock */
	54	.
	55	. /* critical code here */
	56	.
	57	unlink ("lockfile"); /* release lock */
	58
	59	This form of locking makes fewer presumptions, but it still is
	60	guilty of presumptions (2) and (4) above. Presumption (2) limits the
	61	ability to have processes sharing a resource in a non-conflicting
	62	fashion (e.g. reading from a file). Presumption (4) leads to
	63	deadlocks should the process crash while it has a resource locked.
	64
	65	Most modern operating systems provide a resource locking system
	66	call that has none of these presumptions. In particular, a mechanism
	67	is provided for identifying shared locks as opposed to exclusive
	68	locks. A shared lock permits other processes to obtain a shared lock,
	69	but denies exclusive locks. In other words:
	70
	71	current state want shared want exclusive
	72	------------- ----------- --------------
	73	unlocked YES YES
	74	locked shared YES NO
	75	locked exclusive NO NO
	76
	77	Furthermore, the operating system automatically relinquishes all
	78	locks held by that process when it terminates.
	79
	80	A useful operation is the ability to upgrade a shared lock to
	81	exclusive (provided there are no other shared users of the lock) and
	82	to downgrade an exclusive lock to shared. It is important that at no
	83	time is the lock ever removed; a process upgrading to exclusive must
	84	not relinquish its shared lock.
	85
	86	Most commonly, the resources being locked are files. Shared
	87	locks are particularly important with files; multiple simultaneous
	88	processes can read from a file, but only one can safely write at a
	89	time. Some writes may be safer than others; an append to the end of
	90	the file is safer than changing existing file data. In turn, changing
	91	a file record in place is safer than rewriting the file with an
	92	entirely different structure.
	93
	94
	95	FILE LOCKING ON UNIX
	96
	97	In the oldest versions of UNIX, the use of a semaphore lockfile
	98	was the only available form of locking. Advisory locking system calls
	99	were not added to UNIX until after the BSD vs. System V split. Both
	100	of these system calls deal with file resources only.
	101
	102	Most systems only have one or the other form of locking. AIX
	103	emulates the BSD form of locking as a jacket into the System V form.
	104	Ultrix and OSF/1 implement both forms.
	105	\f
	106	BSD
	107
	108	BSD added the flock() system call. It offers capabilities to
	109	acquire shared lock, acquire exclusive lock, and unlock. Optionally,
	110	the process can request an immediate error return instead of blocking
	111	when the lock is unavailable.
	112
	113
	114	FLOCK() BUGS
	115
	116	flock() advertises that it permits upgrading of shared locks to
	117	exclusive and downgrading of exclusive locks to shared, but it does so
	118	by releasing the former lock and then trying to acquire the new lock.
	119	This creates a window of vulnerability in which another process can
	120	grab the exclusive lock. Therefore, this capability is not useful,
	121	although many programmers have been deluded by incautious reading of
	122	the flock() man page to believe otherwise. This problem can be
	123	programmed around, once the programmer is aware of it.
	124
	125	flock() always returns as if it succeeded on NFS files, when in
	126	fact it is a no-op. There is no way around this.
	127
	128	Leaving aside these two problems, flock() works remarkably well,
	129	and has shown itself to be robust and trustworthy.
	130	\f
	131	SYSTEM V/POSIX
	132
	133	System V added new functions to the fnctl() system call, and a
	134	simple interface through the lockf() subroutine. This was
	135	subsequently included in POSIX. Both offer the facility to apply the
	136	lock to a particular region of the file instead of to the entire file.
	137	lockf() only supports exclusive locks, and calls fcntl() internally;
	138	hence it won't be discussed further.
	139
	140	Functionally, fcntl() locking is a superset of flock(); it is
	141	possible to implement a flock() emulator using fcntl(), with one minor
	142	exception: it is not possible to acquire an exclusive lock if the file
	143	is not open for write.
	144
	145	The fcntl() locking functions are: query lock station of a file
	146	region, lock/unlock a region, and lock/unlock a region and block until
	147	have the lock. The locks may be shared or exclusive. By means of the
	148	statd and lockd daemons, fcntl() locking is available on NFS files.
	149
	150	When statd is started at system boot, it reads its /etc/state
	151	file (which contains the number of times it has been invoked) and
	152	/etc/sm directory (which contains a list of all remote sites which are
	153	client or server locking with this site), and notifies the statd on
	154	each of these systems that it has been restarted. Each statd then
	155	notifies the local lockd of the restart of that system.
	156
	157	lockd receives fcntl() requests for NFS files. It communicates
	158	with the lockd at the server and requests it to apply the lock, and
	159	with the statd to request it for notification when the server goes
	160	down. It blocks until all these requests are completed.
	161
	162	There is quite a mythos about fcntl() locking.
	163
	164	One religion holds that fcntl() locking is the best thing since
	165	sliced bread, and that programs which use flock() should be converted
	166	to fcntl() so that NFS locking will work. However, as noted above,
	167	very few systems support both calls, so such an exercise is pointless
	168	except on Ultrix and OSF/1.
	169
	170	Another religion, which I adhere to, has the opposite viewpoint.
	171
	172
	173	FCNTL() BUGS
	174
	175	For all of the hairy code to do individual section locking of a
	176	file, it's clear that the designers of fcntl() locking never
	177	considered some very basic locking operations. It's as if all they
	178	knew about locking they got out of some CS textbook with not
	179	investigation of real-world needs.
	180
	181	It is not possible to acquire an exclusive lock unless the file
	182	is open for write. You could have append with shared read, and thus
	183	you could have a case in which a read-only access may need to go
	184	exclusive. This problem can be programmed around once the programmer
	185	is aware of it.
	186
	187	If the file is opened on another file designator in the same
	188	process, the file is unlocked even if no attempt is made to do any
	189	form of locking on the second designator. This is a very bad bug. It
	190	means that an application must keep track of all the files that it has
	191	opened and locked.
	192
	193	If there is no statd/lockd on the NFS server, fcntl() will hang
	194	forever waiting for them to appear. This is a bad bug. It means that
	195	any attempt to lock on a server that doesn't run these daemons will
	196	hang. There is no way for an application to request flock() style
	197	``try to lock, but no-op if the mechanism ain't there''.
	198
	199	There is a rumor to the effect that fcntl() will hang forever on
	200	local files too if there is no local statd/lockd. These daemons are
	201	running on mailer.u, although they appear not to have much CPU time.
	202	A useful experiment would be to kill them and see if imapd is affected
	203	in any way, but I decline to do so without an OK from UCS! ;-) If
	204	killing statd/lockd can be done without breaking fcntl() on local
	205	files, this would become one of the primary means of dealing with this
	206	problem.
	207
	208	The statd and lockd daemons have quite a reputation for extreme
	209	fragility. There have been numerous reports about the locking
	210	mechanism being wedged on a systemwide or even clusterwide basis,
	211	requiring a reboot to clear. It is rumored that this wedge, once it
	212	happens, also blocks local locking. Presumably killing and restarting
	213	statd would suffice to clear the wedge, but I haven't verified this.
	214
	215	There appears to be a limit to how many locks may be in use at a
	216	time on the system, although the documentation only mentions it in
	217	passing. On some of their systems, UCS has increased lockd's ``size
	218	of the socket buffer'', whatever that means.
	219	\f
	220	C-CLIENT USAGE
	221
	222	c-client uses flock(). On System V systems, flock() is simulated
	223	by an emulator that calls fcntl(). This emulator is provided by some
	224	systems (e.g. AIX), or uses c-client's flock.c module.
	225
	226
	227	BEZERK AND MMDF
	228
	229	Locking in the traditional UNIX formats was largely dictated by
	230	the status quo in other applications; however, additional protection
	231	is added against inadvertently running multiple instances of a
	232	c-client application on the same mail file.
	233
	234	(1) c-client attempts to create a .lock file (mail file name with
	235	``.lock'' appended) whenever it reads from, or writes to, the mail
	236	file. This is an exclusive lock, and is held only for short periods
	237	of time while c-client is actually doing the I/O. There is a 5-minute
	238	timeout for this lock, after which it is broken on the presumption
	239	that it is a stale lock. If it can not create the .lock file due to
	240	an EACCES (protection failure) error, it once silently proceeded
	241	without this lock; this was for systems which protect /usr/spool/mail
	242	from unprivileged processes creating files. Today, c-client reports
	243	an error unless it is built otherwise. The purpose of this lock is to
	244	prevent against unfavorable interactions with mail delivery.
	245
	246	(2) c-client applies a shared flock() to the mail file whenever
	247	it reads from the mail file, and an exclusive flock() whenever it
	248	writes to the mail file. This lock is freed as soon as it finishes
	249	reading. The purpose of this lock is to prevent against unfavorable
	250	interactions with mail delivery.
	251
	252	(3) c-client applies an exclusive flock() to a file on /tmp
	253	(whose name represents the device and inode number of the file) when
	254	it opens the mail file. This lock is maintained throughout the
	255	session, although c-client has a feature (called ``kiss of death'')
	256	which permits c-client to forcibly and irreversibly seize the lock
	257	from a cooperating c-client application that surrenders the lock on
	258	demand. The purpose of this lock is to prevent against unfavorable
	259	interactions with other instances of c-client (rewriting the mail
	260	file).
	261
	262	Mail delivery daemons use lock (1), (2), or both. Lock (1) works
	263	over NFS; lock (2) is the only one that works on sites that protect
	264	/usr/spool/mail against unprivileged file creation. Prudent mail
	265	delivery daemons use both forms of locking, and of course so does
	266	c-client.
	267
	268	If only lock (2) is used, then multiple processes can read from
	269	the mail file simultaneously, although in real life this doesn't
	270	really change things. The normal state of locks (1) and (2) is
	271	unlocked except for very brief periods.
	272
	273
	274	TENEX AND MTX
	275
	276	The design of the locking mechanism of these formats was
	277	motivated by a design to enable multiple simultaneous read/write
	278	access. It is almost the reverse of how locking works with
	279	bezerk/mmdf.
	280
	281	(1) c-client applies a shared flock() to the mail file when it
	282	opens the mail file. It upgrades this lock to exclusive whenever it
	283	tries to expunge the mail file. Because of the flock() bug that
	284	upgrading a lock actually releases it, it will not do so until it has
	285	acquired an exclusive lock (2) first. The purpose of this lock is to
	286	prevent against expunge taking place while some other c-client has the
	287	mail file open (and thus knows where all the messages are).
	288
	289	(2) c-client applies a shared flock() to a file on /tmp (whose
	290	name represents the device and inode number of the file) when it
	291	parses the mail file. It applies an exclusive flock() to this file
	292	when it appends new mail to the mail file, as well as before it
	293	attempts to upgrade lock (1) to exclusive. The purpose of this lock
	294	is to prevent against data being appended while some other c-client is
	295	parsing mail in the file (to prevent reading of incomplete messages).
	296	It also protects against the lock-releasing timing race on lock (1).
	297	\f
	298	OBSERVATIONS
	299
	300	In a perfect world, locking works. You are protected against
	301	unfavorable interactions with the mailer and against your own mistake
	302	by running more than one instance of your mail reader. In tenex/mtx
	303	formats, you have the additional benefit that multiple simultaneous
	304	read/write access works, with the sole restriction being that you
	305	can't expunge if there are any sharers of the mail file.
	306
	307	If the mail file is NFS-mounted, then flock() locking is a silent
	308	no-op. This is the way BSD implements flock(), and c-client's
	309	emulation of flock() through fcntl() tests for NFS files and
	310	duplicates this functionality. There is no locking protection for
	311	tenex/mtx mail files at all, and only protection against the mailer
	312	for bezerk/mmdf mail files. This has been the accepted state of
	313	affairs on UNIX for many sad years.
	314
	315	If you can not create .lock files, it should not affect locking,
	316	since the flock() locks suffice for all protection. This is, however,
	317	not true if the mailer does not check for flock() locking, or if the
	318	the mail file is NFS-mounted.
	319
	320	What this means is that there is no locking protection at all
	321	in the case of a client using an NFS-mounted /usr/spool/mail that does
	322	not permit file creation by unprivileged programs. It is impossible,
	323	under these circumstances, for an unprivileged program to do anything
	324	about it. Worse, if EACCES errors on .lock file creation are no-op'ed
	325	, the user won't even know about it. This is arguably a site
	326	configuration error.
	327
	328	The problem with not being able to create .lock files exists on
	329	System V as well, but the failure modes for flock() -- which is
	330	implemented via fcntl() -- are different.
	331
	332	On System V, if the mail file is NFS-mounted and either the
	333	client or the server lacks a functioning statd/lockd pair, then the
	334	lock attempt would have hung forever if it weren't for the fact that
	335	c-client tests for NFS and no-ops the flock() emulator in this case.
	336	Systemwide or clusterwide failures of statd/lockd have been known to
	337	occur which cause all locks in all processes to hang (including
	338	local?). Without the special NFS test made by c-client, there would
	339	be no way to request BSD-style no-op behavior, nor is there any way to
	340	determine that this is happening other than the system being hung.
	341
	342	The additional locking introduced by c-client was shown to cause
	343	much more stress on the System V locking mechanism than has
	344	traditionally been placed upon it. If it was stressed too far, all
	345	hell broke loose. Fortunately, this is now past history.
	346	\f
	347	TRADEOFFS
	348
	349	c-client based applications have a reasonable chance of winning
	350	as long as you don't use NFS for remote access to mail files. That's
	351	what IMAP is for, after all. It is, however, very important to
	352	realize that you can not use the lock-upgrade feature by itself
	353	because it releases the lock as an interim step -- you need to have
	354	lock-upgrading guarded by another lock.
	355
	356	If you have the misfortune of using System V, you are likely to
	357	run into problems sooner or later having to do with statd/lockd. You
	358	basically end up with one of three unsatisfactory choices:
	359	1) Grit your teeth and live with it.
	360	2) Try to make it work:
	361	a) avoid NFS access so as not to stress statd/lockd.
	362	b) try to understand the code in statd/lockd and hack it
	363	to be more robust.
	364	c) hunt out the system limit of locks, if there is one,
	365	and increase it. Figure on at least two locks per
	366	simultaneous imapd process and four locks per Pine
	367	process. Better yet, make the limit be 10 times the
	368	maximum number of processes.
	369	d) increase the socket buffer (-S switch to lockd) if
	370	it is offered. I don't know what this actually does,
	371	but giving lockd more resources to do its work can't
	372	hurt. Maybe.
	373	3) Decide that it can't possibly work, and turn off the
	374	fcntl() calls in your program.
	375	4) If nuking statd/lockd can be done without breaking local
	376	locking, then do so. This would make SVR4 have the same
	377	limitations as BSD locking, with a couple of additional
	378	bugs.
	379	5) Check for NFS, and don't do the fcntl() in the NFS case.
	380	This is what c-client does.
	381
	382	Note that if you are going to use NFS to access files on a server
	383	which does not have statd/lockd running, your only choice is (3), (4),
	384	or (5). Here again, IMAP can bail you out.
	385
	386	These problems aren't unique to c-client applications; they have
	387	also been reported with Elm, Mediamail, and other email tools.
	388
	389	Of the other two SVR4 locking bugs:
	390
	391	Programmer awareness is necessary to deal with the bug that you
	392	can not get an exclusive lock unless the file is open for write. I
	393	believe that c-client has fixed all of these cases.
	394
	395	The problem about opening a second designator smashing any
	396	current locks on the file has not been addressed satisfactorily yet.
	397	This is not an easy problem to deal with, especially in c-client which
	398	really doesn't know what other files/streams may be open by Pine.
	399
	400	Aren't you so happy that you bought an System V system?