lmbench3-alpha1
	Added new benchmark line, which determines the cache line size

	Added new benchmark tlb, which determines the effective TLB size.
	Note that this may differ from the hardware TLB size due to OS
	TLB entries and super-pages.

	Added new benchmark par_mem, which determines the possible
	speedup due to multiple memory reads progressing in parallel.
	This number usually depends highly on the portion of the
	memory hierarchy being probed, with higher caches generally
	having greater parallelism.

	Added new benchmark cache, which determines the number of caches,
	their sizes, latency, and available parallelism.  It also 
	reports the latency and available parallelism for main memory.

	Added new benchmark lat_ops, which attempts to determine the
	latency of basic operations, such as add, multiply and divide,
	for a variety of data types, such as int, int64, float and
	double.

	Added new benchmark par_ops, which attempts to determine the
	available scaling of the various basic operations for various
	data types.

	Added new benchmark stream, which reports memory bandwidth
	numbers using benchmark kernels from John McCalpin's STREAM
	and STREAM version 2 benchmarks.

	Added new benchmark lat_sem, which reports SysV semaphore latency.

	Added getopt() command line parsing to most benchmarks.

	Added a new benchmark timing harness, benchmp(), which makes
	it relatively easy to design and build benchmarks which
	measure system performance under a fixed load.  It takes
	a few parameters:
		- initialize: a function pointer.  If this is non-NULL
		  the function is called in the child processes after
		  the fork but before any benchmark-related work is 
		  done.  The function is passed a cookie from the
		  benchmp() call.  This can be a pointer to a
		  data structure which lets the function know what
		  it needs to do.
		- benchmark: a function pointer.  This function
		  takes two parameters, an iteration count "iters", 
		  and a cookie.  The benchmarked activity must be
		  run "iters" times (or some integer multiple of
		  "iters".  This function must be idempotent; ie.,
		  the benchmark harness must be able to call it
		  as many times as necessary.
		- cleanup: a function pointer.  If this is non-NULL
		  the function is called after all benchmarking is
		  completed to cleanup any resources that may have
		  been allocated.
		- enough: If this is non-zero then it is the minimum
		  amount of time, in micro-seconds, that the benchmark
		  must be run to provide reliable results.  In most
		  cases this is left to zero to allow the harness to
		  autoscale the timing intervals to the system clock's
		  resolution/accuracy.
		- parallel: this is the number of child processes
		  running the benchmark that should be run in parallel.
		  This is really the load factor.
		- warmup: a time period in micro-seconds that each
		  child process must run the benchmarked process
		  before any timing intervals can begin.  This is
		  to allow the system scheduler time to settle in
		  a parallel/distributed system before we begin
		  measurements.  (If so desired)
		- repetitions: If non-zero this is the number of
		  times we need to repeat each measurement.  The
		  default is 11.
		- cookie: An opaque value which can be used to
		  pass information to the initialize(), benchmark(),
		  and cleanup() routines.
	This new harness is now used by: bw_file_rd, bw_mem, bw_mmap_rd,
	bw_pipe, bw_tcp, bw_unix, lat_connect, lat_ctx, lat_fcntl,
	lat_fifo, lat_mem_rd, lat_mmap, lat_ops, lat_pagefault, lat_pipe,
	lat_proc, lat_rpc, lat_select, lat_sem, lat_sig, lat_syscall,
	lat_tcp, lat_udp, lat_unix, lat_unix_connect, and stream.