Make SUN compiler happy. Fixes #902
[exim.git] / doc / doc-docbook / HowItWorks.txt
CommitLineData
01496481 1$Cambridge: exim/doc/doc-docbook/HowItWorks.txt,v 1.9 2008/02/04 17:28:44 fanf2 Exp $
168e428f
PH
2
3CREATING THE EXIM DOCUMENTATION
4
5"You are lost in a maze of twisty little scripts."
6
7
8This document describes how the various versions of the Exim documentation, in
9different output formats, are created from DocBook XML, and also how the
10DocBook XML is itself created.
11
12
13BACKGROUND: THE OLD WAY
14
15From the start of Exim, in 1995, the specification was written in a local text
16formatting system known as SGCAL. This is capable of producing PostScript and
17plain text output from the same source file. Later, when the "ps2pdf" command
18became available with GhostScript, that was used to create a PDF version from
19the PostScript. (A few earlier versions were created by a helpful user who had
20bought the Adobe distiller software.)
21
22A demand for a version in "info" format led me to write a Perl script that
23converted the SGCAL input into a Texinfo file. Because of the somewhat
f89d2485 24restrictive requirements of Texinfo, this script always needed a lot of
9b371988 25maintenance, and was never totally satisfactory.
168e428f
PH
26
27The HTML version of the documentation was originally produced from the Texinfo
28version, but later I wrote another Perl script that produced it directly from
29the SGCAL input, which made it possible to produce better HTML.
30
31There were a small number of diagrams in the documentation. For the PostScript
32and PDF versions, these were created using Aspic, a local text-driven drawing
33program that interfaces directly to SGCAL. For the text and texinfo versions,
34alternative ascii-art diagrams were used. For the HTML version, screen shots of
35the PostScript output were turned into gifs.
36
37
38A MORE STANDARD APPROACH
39
40Although in principle SGCAL and Aspic could be generally released, they would
41be unlikely to receive much (if any) maintenance, especially after I retire.
42Furthermore, the old production method was only semi-automatic; I still did a
43certain amount of hand tweaking of spec.txt, for example. As the maintenance of
44Exim itself was being opened up to a larger group of people, it seemed sensible
45to move to a more standard way of producing the documentation, preferable fully
46automated. However, we wanted to use only non-commercial software to do this.
47
48At the time I was thinking about converting (early 2005), the "obvious"
49standard format in which to keep the documentation was DocBook XML. The use of
50XML in general, in many different applications, was increasing rapidly, and it
51seemed likely to remain a standard for some time to come. DocBook offered a
52particular form of XML suited to documents that were effectively "books".
53
54Maintaining an XML document by hand editing is a tedious, verbose, and
55error-prone process. A number of specialized XML text editors were available,
56but all the free ones were at a very primitive stage. I therefore decided to
9b371988
PH
57keep the master source in AsciiDoc format, from which a secondary XML master
58could be automatically generated.
59
60The first "new" versions of the documents, for the 4.60 release, were generated
61this way. However, there were a number of problems with using AsciiDoc for a
62document as large and as complex as the Exim manual. As a result, I wrote a new
63application called xfpt ("XML From Plain Text") which creates XML from a
64relatively simple and consistent markup language. This application has been
65released for general use, and the master sources for the Exim documentation are
66now in xfpt format.
168e428f
PH
67
68All the output formats are generated from the XML file. If, in the future, a
69better way of maintaining the XML source becomes available, this can be adopted
70without changing any of the processing that produces the output documents.
71Equally, if better ways of processing the XML become available, they can be
72adopted without affecting the source maintenance.
73
74A number of issues arose while setting this all up, which are best summed up by
f89d2485
PH
75the statement that a lot of the technology was (in 2006) still very immature.
76Trying to do this conversion any earlier would probably not have been anywhere
e492cc8d 77near as successful. The main issues that bother me in the XML-generated
f89d2485
PH
78documentation are described in the penultimate section of this document.
79
e492cc8d 80Initially, the major problems were in producing PostScript and PDF outputs. The
f89d2485
PH
81available free software for doing this was and still is (we are now in 2007)
82cumbersome and slow, and does not support certain output features that I would
83like. My response to this was, over a period of two years, to write an XML
84processor called SDoP (Simple DocBook Processor). This program reads DocBook
85XML and writes PostScript, without using any of the heavyweight apparatus that
86is required for xmlto and fop (the previously used software).
87
e492cc8d
PH
88An experimental first version of SDoP was used for the Exim 4.67
89documentation. Subsequently SDoP was released for general use. SDoP's output
f89d2485 90includes features that are missing when xmlto/fop is used, and it also runs
e492cc8d
PH
91about 60 times faster. The main manual can be formatted in 2.5 seconds instead
92of 2.5 minutes, which makes checking and fixing mistakes much easier.
f89d2485
PH
93
94The Makefile that is used to build the various forms of output will, for the
95moment, support both ways of producing PostScript and PDF output, though the
96default is now to use SDoP.
168e428f 97
9b371988 98The following sections describe the processes by which the xfpt files are
168e428f 99transformed into the final output documents. In practice, the details are coded
9b371988 100into a Makefile that specifies the chain of commands for each output format.
168e428f
PH
101
102
103REQUIRED SOFTWARE
104
105Installing software to process XML puts lots and lots of stuff on your box. I
106run Gentoo Linux, and a lot of things have been installed as dependencies that
107I am not fully aware of. This is what I know about (version numbers are current
108at the time of writing):
109
e492cc8d 110. xfpt 0.03
168e428f 111
9b371988 112 This converts the master source file into a DocBook XML file.
168e428f 113
e492cc8d 114. sdop 0.03
f89d2485 115
e492cc8d 116 This is my new DocBook-to-PostScript processor.
f89d2485
PH
117
118. ps2pdf
119
120 This is a wrapper script that is part of the GhostScript distribution. It
121 converts a PostScript file into a PDF file. It is used to process the output
122 from SDoP. It is not required when xmlto/fop is being used to generate PDF
123 output.
124
168e428f
PH
125. xmlto 0.0.18
126
127 This is a shell script that drives various XML processors. It is used to
f89d2485
PH
128 produce "formatted objects" when PostScript and PDF output is being generated
129 using fop (the old way) rather than SDoP. It is always used to produce HTML
130 output. It uses xsltproc, libxml, libxslt, libexslt, and possibly other
168e428f
PH
131 things that I have not figured out, to apply the DocBook XSLT stylesheets.
132
133. libxml 1.8.17
e492cc8d
PH
134 libxml2 2.6.28
135 libxslt 1.1.20
168e428f
PH
136
137 These are all installed on my box; I do not know which of libxml or libxml2
138 the various scripts are actually using.
139
f89d2485 140. xsl-stylesheets-1.70.1
168e428f
PH
141
142 These are the standard DocBook XSL stylesheets.
143
e492cc8d 144. fop 0.93
168e428f
PH
145
146 FOP is a processor for "formatted objects". It is written in Java. The fop
f89d2485
PH
147 command is a shell script that drives it. It required only if you do not
148 want to use SDoP and ps2pdf to generate PostScript and PDF output.
168e428f 149
e492cc8d 150. w3m 0.5.2
168e428f 151
595028e4 152 This is a text-oriented web brower. It is used to produce the ASCII form of
9b371988
PH
153 the Exim documentation (spec.txt) from a specially-created HTML format. It
154 seems to do a better job than lynx.
168e428f
PH
155
156. docbook2texi (part of docbook2X 0.8.5)
157
158 This is a wrapper script for a two-stage conversion process from DocBook to a
159 Texinfo file. It uses db2x_xsltproc and db2x_texixml. Unfortunately, there
160 are two versions of this command; the old one is based on an earlier fork of
161 docbook2X and does not work.
162
163. db2x_xsltproc and db2x_texixml (part of docbook2X 0.8.5)
164
165 More wrapping scripts (see previous item).
166
167. makeinfo 4.8
168
e492cc8d 169 This is used to make an "info" file from a Texinfo file.
168e428f 170
9b371988
PH
171In addition, there are a number of locally written Perl scripts. These are
172described below.
168e428f
PH
173
174
175THE MAKEFILE
176
177The makefile supports a number of targets of the form x.y, where x is one of
178"filter", "spec", or "test", and y is one of "xml", "fo", "ps", "pdf", "html",
179"txt", or "info". The intermediate targets "x.xml" and "x.fo" are provided for
180testing purposes. The other five targets are production targets. For example:
181
182 make spec.pdf
183
184This runs the necessary tools in order to create the file spec.pdf from the
9b371988 185original source spec.xfpt. A number of intermediate files are created during
168e428f
PH
186this process, including the master DocBook source, called spec.xml. Of course,
187the usual features of "make" ensure that if this already exists and is
188up-to-date, it is not needlessly rebuilt.
189
f89d2485
PH
190Because there are now two ways of creating the PostScript and PDF outputs,
191there are two targets for each one. For example fop-spec.ps makes PostScript
192using fop, and sdop-spec.ps makes it using SDoP. The generic targets spec.ps
193and spec.pdf now point to the SDoP versions.
194
168e428f 195The "test" series of targets were created so that small tests could easily be
9b371988 196run fairly quickly, because processing even the shortish XML document takes
f89d2485
PH
197a bit of time, and processing the main specification takes ages -- except when
198using SDoP for PostScript and PDF.
168e428f
PH
199
200Another target is "exim.8". This runs a locally written Perl script called
201x2man, which extracts the list of command line options from the spec.xml file,
202and creates a man page. There are some XML comments in the spec.xml file to
203enable the script to find the start and end of the options list.
204
205There is also a "clean" target that deletes all the generated files.
206
207
9b371988 208CREATING DOCBOOK XML FROM XFPT INPUT
168e428f 209
9b371988
PH
210The small amount of local configuration for xfpt is included at the start of
211the two .xfpt files; there are no separate local xfpt configuration files.
212Running the xfpt command creates a .xml file from a .xfpt file. When this
213succeeds, there is no output.
168e428f
PH
214
215
216DOCBOOK PROCESSING
217
218Processing a .xml file into the five different output formats is not entirely
219straightforward. For a start, the same XML is not suitable for all the
220different output styles. When the final output is in a text format (.txt,
595028e4
PH
221.texinfo) for instance, all non-ASCII characters in the input must be converted
222to ASCII transliterations because the current processing tools do not do this
168e428f
PH
223correctly automatically.
224
225In order to cope with these issues in a flexible way, a Perl script called
226Pre-xml was written. This is used to preprocess the .xml files before they are
227handed to the main processors. Adding one more tool onto the front of the
228processing chain does at least seem to be in the spirit of XML processing.
229
f89d2485
PH
230The XML processors other than SDoP make use of style files, which can be
231overridden by local versions. There is one that applies to all styles, called
232MyStyle.xsl, and others for the different output formats. I have included
233comments in these style files to explain what changes I have made. Some of the
234changes are quite significant.
168e428f
PH
235
236
237THE PRE-XML SCRIPT
238
239The Pre-xml script copies a .xml file, making certain changes according to the
240options it is given. The currently available options are as follows:
241
168e428f
PH
242-ascii
243
595028e4 244 This option is used for ASCII output formats. It makes the following
168e428f
PH
245 character replacements:
246
168e428f 247 ’ => ' apostrophe
9b371988
PH
248 © => (c) copyright
249 † => * dagger
250 ‡ => ** double dagger
251   => a space hard space
252 – => - en dash
253
254 The apostrophe is specified numerically because that is what xfpt generates
595028e4 255 from an ASCII single quote character. Non-ASCII characters that are not in
9b371988 256 this list should not be used without thinking about how they might be
595028e4 257 converted for the ASCII formats.
9b371988
PH
258
259 In addition to the character replacements, this option causes quotes to be
260 put round <literal> text items, and <quote> and </quote> to be replaced by
595028e4
PH
261 ASCII quote marks. You would think the stylesheet would cope with the latter,
262 but it seems to generate non-ASCII characters that w3m then turns into
9b371988 263 question marks.
168e428f
PH
264
265-bookinfo
266
267 This option causes the <bookinfo> element to be removed from the XML. It is
268 used for the PostScript/PDF forms of the filter document, in order to avoid
269 the generation of a full title page.
270
271-fi
272
273 Replace any occurrence of "fi" by the ligature &#xFB01; except when it is
274 inside an XML element, or inside a <literal> part of the text.
275
276 The use of ligatures would be nice for the PostScript and PDF formats. Sadly,
277 it turns out that fop cannot at present handle the FB01 character correctly.
f89d2485
PH
278 Happily this problem is now avoided when SDoP is used to generate PostScript
279 (and thence PDF) because SDoP automatically uses an "fi" ligature for
280 non-fixed-width fonts.
281
282 The only xmlto format that handles FB01 is the HTML format, but when I used
283 this in the test version, people complained that it made searching for words
284 difficult. So this option is in practice not used at all.
168e428f
PH
285
286-noindex
287
9b371988 288 Remove the XML to generate a Concept Index and an Options index. The source
e492cc8d
PH
289 document has three types of index entry, for variables, options, and concept
290 indexes. However, no index is required for the .txt and .texinfo outputs.
168e428f
PH
291
292-oneindex
293
e492cc8d
PH
294 Remove the XML to generate separate variables, options, and concept indexes,
295 and add XML to generate a single index. The only output processors that
296 support multiple indexes are SDoP and the processor that produces "formatted
297 objects" for PostScript and PDF output for fop. The HTML processor ignores
298 the XML settings for multiple indexes and just makes one unified index.
299 Specifying three indexes gets you three copies of the same index, so this has
300 to be changed.
168e428f 301
9b371988
PH
302-optbreak
303
304 Look for items of the form <option>...</option> and <varname>...</varname> in
305 ordinary paragraphs, and insert &#x200B; after each underscore in the
306 enclosed text. The same is done for any word containing four or more upper
307 case letters (compile-time options in the Exim specification). The character
308 &#x200B; is a zero-width space. This means that the line may be split after
309 one of these underscores, but no hyphen is inserted.
168e428f
PH
310
311
312CREATING POSTSCRIPT AND PDF
313
f89d2485
PH
314These two output formats are created either by using my new SDoP program to
315produce PostScript which can then be run through ps2pdf to make a PDF, or by
316using xmlto and fop in the old way.
317
318
319USING SDOP TO CREATE POSTSCRIPT AND PDF
320
321PostScript output is created in two stages. First, the XML is pre-processed by
322the Pre-xml script. For the filter document, the <bookinfo> element is removed
323so that no title page is generated. For the main specification, the only change
324is to insert line breakpoints via -optbreak.
325
326The SDoP program is then used to create PostScript output directly from the XML
327input. Then the ps2pdf command is used to generated a PDF from the PostScript.
328There are no external stylesheets that are used by SDoP. Any variations to the
329default format are specified inline using "processing instructions".
330
331
332USING XMLTO AND FOP TO CREATE POSTSCRIPT AND PDF
333
334This is the original way of creating PostScript and PDF output. The processing
335happens in three stages, with an additional fourth stage for PDF. First, the
336XML is pre-processed by the Pre-xml script. For the filter document, the
337<bookinfo> element is removed so that no title page is generated. For the main
338specification, the only change is to insert line breakpoints via -optbreak.
168e428f
PH
339
340Second, the xmlto command is used to produce a "formatted objects" (.fo) file.
341This process uses the following stylesheets:
342
343 (1) Either MyStyle-filter-fo.xsl or MyStyle-spec-fo.xsl
344 (2) MyStyle-fo.xsl
345 (3) MyStyle.xsl
346 (4) MyTitleStyle.xsl
347
348The last of these is not used for the filter document, which does not have a
349title page. The first three stylesheets were created manually, either by typing
350directly, or by coping from the standard style sheet and editing.
351
352The final stylesheet has to be created from a template document, which is
353called MyTitlepage.templates.xml. This was copied from the standard styles and
354modified. The template is processed with xsltproc to produce the stylesheet.
355All this apparatus is appallingly heavyweight. The processing is also very slow
356in the case of the specification document. However, there should be no errors.
357
9b371988
PH
358The reference book that saved my life while I was trying to get all this to
359work is "DocBook XSL, The Complete Guide", third edition (2005), by Bob
360Stayton, published by Sagehill Enterprises.
361
362In the third part of the processing, the .fo file that is produced by the xmlto
363command is processed by the fop command to generate either PostScript or PDF.
364This is also very slow, and you get a whole slew of errors, of which these are
365a sample:
168e428f
PH
366
367 [ERROR] property - "background-position-horizontal" is not implemented yet.
368
369 [ERROR] property - "background-position-vertical" is not implemented yet.
370
371 [INFO] JAI support was not installed (read: not present at build time).
372 Trying to use Jimi instead
373 Error creating background image: Error creating FopImage object (Error
374 creating FopImage object
375 (http://docbook.sourceforge.net/release/images/draft.png) :
376 org.apache.fop.image.JimiImage
377
378 [WARNING] table-layout=auto is not supported, using fixed!
379
380 [ERROR] Unknown enumerated value for property 'span': inherit
381
382 [ERROR] Error in span property value 'inherit':
383 org.apache.fop.fo.expr.PropertyException: No conversion defined
384
385 [ERROR] Areas pending, text probably lost in lineinclude parts matched in the
386 response by response_pattern by means of numeric variables such as
387
388The last one is particularly meaningless gobbledegook. Some of the errors and
389warnings are repeated many times. Nevertheless, it does eventually produce
390usable output, though I have a number of issues with it (see a later section of
391this document). Maybe one day there will be a new release of fop that does
f89d2485
PH
392better. In the meantime, I have written my own program for making PostScript
393output -- see the previous section -- because the problems with xmlto/fop were
394sufficiently annoying.
9b371988
PH
395
396The PDF file that is produced by this process has one problem: the pages, as
397shown by acroread in its thumbnail display, are numbered sequentially from one
398to the end. Those numbers do not correspond with the page numbers of the body
399of the document, which makes finding a page from the index awkward. There is a
400facility in the PDF format to give pages appropriate "labels", but I cannot
401find a way of persuading fop to generate these. Fortunately, it is possibly to
402fix up the PDF to add page labels. I wrote a script called PageLabelPDF which
db9452a9
PH
403does this. They are shown correctly by acroread and xpdf, but not by
404GhostScript (gv).
9b371988
PH
405
406
407THE PAGELABELPDF SCRIPT
408
f89d2485
PH
409This script reads the standard input and writes the standard output. It is used
410to "tidy up" the PDF output that is produced by fop. It is not needed when
411PDF output is generated from SDoP's output using ps2pdf.
412
413The PageLabelPDF script searches for the PDF object that sets data in its
414"Catalog", and adds appropriate information about page labels. The number of
415front-matter pages (those before chapter 1) is hard-wired into this script as
41612 because I could not find a way of determining it automatically. As the
417current table of contents finishes near the top of the 11th page, there is
418plenty of room for expansion, so it is unlikely to be a problem.
9b371988
PH
419
420Having added data to the PDF file, the script then finds the xref table at the
421end of the file, and adjusts its entries to allow for the added text. This
422simple processing seems to be enough to generate a new, valid, PDF file.
168e428f
PH
423
424
425CREATING HTML
426
427Only two stages are needed to produce HTML, but the main specification is
9b371988 428subsequently postprocessed. The Pre-xml script is called with the -optbreak and
168e428f
PH
429-oneindex options to preprocess the XML. Then the xmlto command creates the
430HTML output directly. For the specification document, a directory of files is
431created, whereas the filter document is output as a single HTML page. The
432following stylesheets are used:
433
434 (1) Either MyStyle-chunk-html.xsl or MyStyle-nochunk-html.xsl
435 (2) MyStyle-html.xsl
436 (3) MyStyle.xsl
437
9b371988 438The first stylesheet references the chunking or non-chunking standard DocBook
168e428f
PH
439stylesheet, as appropriate.
440
9b371988
PH
441You may see a number of these errors when creating HTML: "Revisionflag on
442unexpected element: literallayout (Assuming block)". They seem to be harmless;
443the output appears to be what is intended.
444
168e428f
PH
445The original HTML that I produced from the SGCAL input had hyperlinks back from
446chapter and section titles to the table of contents. These links are not
447generated by xmlto. One of the testers pointed out that the lack of these
448links, or simple self-referencing links for titles, makes it harder to copy a
449link name into, for example, a mailing list response.
450
451I could not find where to fiddle with the stylesheets to make such a change, if
452indeed the stylesheets are capable of it. Instead, I wrote a Perl script called
453TidyHTML-spec to do the job for the specification document. It updates the
454index.html file (which contains the the table of contents) setting up anchors,
455and then updates all the chapter files to insert appropriate links.
456
457The index.html file as built by xmlto contains the whole table of contents in a
458single line, which makes is hard to debug by hand. Since I was postprocessing
459it anyway, I arranged to insert newlines after every '>' character.
460
068aaea8
PH
461The TidyHTML-spec script also processes every HTML file, to tidy up some of the
462untidy features therein. It turns <div class="literallayout"><p> into <div
463class="literallayout"> and a matching </p></div> into </div> to get rid of
464unwanted vertical white space in literallayout blocks. Before each occurrence
465of </td> it inserts &nbsp; so that the table's cell is a little bit wider than
466the text itself.
467
168e428f 468The TidyHTML-spec script also takes the opportunity to postprocess the
4f578862 469spec_html/ix01.html file, which contains the document index. Again, the index
168e428f
PH
470is generated as one single line, so it splits it up. Then it creates a list of
471letters at the top of the index and hyperlinks them both ways from the
472different letter portions of the index.
473
474People wanted similar postprocessing for the filter.html file, so that is now
475done using a similar script called TidyHTML-filter. It was easier to use a
476separate script because filter.html is a single file rather than a directory,
477so the logic is somewhat different.
478
479
480CREATING TEXT FILES
481
9b371988 482This happens in four stages. The Pre-xml script is called with the -ascii,
595028e4 483-optbreak, and -noindex options to convert the input to ASCII characters,
9b371988
PH
484insert line break points, and disable the production of an index. Then the
485xmlto command converts the XML to a single HTML document, using these
486stylesheets:
168e428f
PH
487
488 (1) MyStyle-txt-html.xsl
489 (2) MyStyle-html.xsl
490 (3) MyStyle.xsl
491
492The MyStyle-txt-html.xsl stylesheet is the same as MyStyle-nochunk-html.xsl,
493except that it contains an addition item to ensure that a generated "copyright"
494symbol is output as "(c)" rather than the Unicode character. This is necessary
495because the stylesheet itself generates a copyright symbol as part of the
496document title; the character is not in the original input.
497
595028e4 498The w3m command is used with the -dump option to turn the HTML file into ASCII
168e428f 499text, but this contains multiple sequences of blank lines that make it look
9b371988
PH
500awkward. Furthermore, chapter and section titles do not stand out very well. A
501local Perl script called Tidytxt is used to post-process the output. First, it
502converts sequences of blank lines into a single blank lines. Then it searches
503for chapter and section headings. Each chapter heading is uppercased, and
504preceded by an extra two blank lines and a line of equals characters. An extra
505newline is inserted before each section heading, and they are underlined with
506hyphens.
168e428f 507
01496481
TF
508The output of xmlto also contains non-ASCII Unicode characters that w3m passes
509through. Fortunately, they are few, and Tidytxt cleans them up as well. Some
510headings use "box drawing" characters in the range U+2500 to U+253F which are
511translated into -+| as appropriate, and U+00A0 (hard space) and U+25CF (bullet)
512are translated into plain spaces and asterisks. (It might be possible to do all
513this in the same way as I dealt with copyright - see above - but adding a few
514lines of Perl to an existing script was a lot easier.)
595028e4 515
168e428f
PH
516
517CREATING INFO FILES
518
9b371988
PH
519This process starts with the same Pre-xml call as for text files. Non-ascii
520characters in the source are transliterated, and the <index> elements are
521removed. The docbook2texi script is then called to convert the XML file into a
522Texinfo file. However, this is not quite enough. The converted file ends up
523with "conceptindex" and "optionindex" items, which are not recognized by the
4f578862
PH
524makeinfo command. These have to be changed to "cindex" and "findex"
525respectively in the final .texinfo file. Furthermore, the main menu lacks a
526pointer to the index, and indeed the index node itself is missing. These
527problems are fixed by running the file through a script called TidyInfo.
e492cc8d 528Finally, a call of makeinfo creates a .info file.
168e428f
PH
529
530There is one apparently unconfigurable feature of docbook2texi: it does not
531seem possible to give it a file name for its output. It chooses a name based on
532the title of the document. Thus, the main specification ends up in a file
533called the_exim_mta.texi and the filter document in exim_filtering.texi. These
534files are removed after their contents have been copied and modified by the
4f578862 535TidyInfo script, which writes to a .texinfo file.
168e428f
PH
536
537
538CREATING THE MAN PAGE
539
540I wrote a Perl script called x2man to create the exim.8 man page from the
9b371988 541DocBook XML source. I deliberately did NOT start from the xfpt source,
168e428f
PH
542because it is the DocBook source that is the "standard". This comment line in
543the DocBook source marks the start of the command line options:
544
545 <!-- === Start of command line options === -->
546
547A similar line marks the end. If at some time in the future another way other
9b371988 548than xfpt is used to maintain the DocBook source, it needs to be capable of
168e428f
PH
549maintaining these comments.
550
551
552UNRESOLVED PROBLEMS
553
554There are a number of unresolved problems with producing the Exim documentation
555in the manner described above. I will describe them here in the hope that in
f89d2485
PH
556future some way round them can be found. Some of the problems are solved by
557using SDoP instead of xmlto/fop to produce PostScript and PDF output.
168e428f 558
9b371988
PH
559(1) When a whole chain of tools is processing a file, an error somewhere
560 in the middle is often very hard to debug. For instance, an error in the
561 xfpt file might not show up until an XML processor throws a wobbly because
168e428f
PH
562 the generated XML is bad. You have to be able to read XML and figure out
563 what generated what. One of the reasons for creating the "test" series of
564 targets was to help in checking out these kinds of problem.
565
566(2) There is a mechanism in XML for marking parts of the document as
f89d2485
PH
567 "revised", and I have arranged for xfpt markup to use it. However, the
568 only xmlto output format that pays attention to this is the HTML output,
569 which sets a green background. If xmlto/fop is used to generate PostScript
570 and PDF, there are no revision marks (change bars). This problem
571 is not present when SDoP is used. However, the text and Texinfo output
572 format lack revision indications.
168e428f
PH
573
574(3) The index entries in the HTML format take you to the top of the section
575 that is referenced, instead of to the point in the section where the index
576 marker was set.
577
e492cc8d
PH
578(4) The HTML output supports only a single index, so the variable, options,
579 and concept index entries have to be merged.
168e428f 580
f89d2485
PH
581(5) The index for the PostScript/PDF output created by xmlto/fop does not
582 merge identical page numbers, which makes some entries look ugly. This is
583 not a problem when SDoP is used.
168e428f 584
e492cc8d
PH
585(6) The HTML index and the PostScript/PDF indexes, when made with xmlto/fop,
586 make no use of textual markup; the text is all roman, without any italic
587 or boldface. For PostScript/PDF, this is not a problem when SDoP is used.
168e428f 588
f89d2485
PH
589(7) I turned off hyphenation in the PostScript/PDF output produced by
590 xmlto/fop, because it was being done so badly. Needless to say, I made
591 SDoP do a better job. These comments apply to xmlto/fop:
168e428f
PH
592
593 (a) It seems to force hyphenation if it is at all possible, without
594 regard to the "tightness" or "looseness" of the line. Decent
595 formatting software should attempt hyphenation only if the line is
596 over some "looseness" threshold; otherwise you get far too many
597 hyphenations, often for several lines in succession.
598
599 (b) It uses an algorithmic form of hyphenation that doesn't always produce
e492cc8d
PH
600 acceptable word breaks. (I prefer to use a hyphenation dictionary,
601 which is what SDoP does.)
168e428f 602
f89d2485 603(8) The PostScript/PDF output produced by xmlto/fop is badly paginated:
168e428f
PH
604
605 (a) There seems to be no attempt to avoid "widow" and "orphan" lines on
606 pages. A "widow" is the last line of a paragraph at the top of a page,
607 and an "orphan" is the first line of a paragraph at the bottom of a
608 page.
609
610 (b) There seems to be no attempt to prevent section headings being placed
611 last on a page, with no following text on the page.
612
f89d2485
PH
613 Neither of these problems occurs when SDoP is used to produce the
614 PostScript/PDF output.
615
168e428f 616(9) The fop processor does not support "fi" ligatures, not even if you put the
f89d2485
PH
617 appropriate Unicode character into the source by hand. Again, this is not
618 a problem if SDoP is used.
168e428f 619
9b371988
PH
620(10) There are no diagrams in the new documentation. This is something I hope
621 to work on. The previously used Aspic command for creating line art from a
168e428f
PH
622 textual description can output Encapsulated PostScript or Scalar Vector
623 Graphics, which are two standard diagram representations. Aspic could be
624 formally released and used to generate output that could be included in at
625 least some of the output formats.
626
9b371988
PH
627(11) The use of a "zero-width space" works well as a way of specifying that
628 Exim option names can be split, without hyphens, over line breaks.
9b371988 629
f89d2485
PH
630 However, when xmlto/fop is being used and an option is not split, if the
631 line is very "loose", the zero-width space is expanded, along with other
632 spaces. This is a totally crazy thing to, but unfortunately it is
633 suggested by the Unicode definition of the zero-width space, which says
634 "its presence between two characters does not prevent increased letter
635 spacing in justification". It seems that the implementors of fop have
636 understood "letter spacing" also to include "word spacing". Sigh.
637
638 This problem does not arise when SDoP is used.
639
640The consequence of (7), (8), and (9) is that the PostScript/PDF output as
641produced by xmlto/fop looks as if it comes from some of the very early attempts
642at text formatting of around 20 years ago. We can only hope that 20 years'
643progress is not going to get lost, and that things will improve in this area.
644My small contribution to this has been to write SDoP, which, though simple and
645"non-standard", does get some of these formatting issues right.
168e428f
PH
646
647
648LIST OF FILES
649
9b371988 650Markup.txt Describes the xfpt markup that is used
168e428f
PH
651HowItWorks.txt This document
652Makefile The makefile
168e428f
PH
653MyStyle-chunk-html.xsl Stylesheet for chunked HTML output
654MyStyle-filter-fo.xsl Stylesheet for filter fo output
655MyStyle-fo.xsl Stylesheet for any fo output
656MyStyle-html.xsl Stylesheet for any HTML output
657MyStyle-nochunk-html.xsl Stylesheet for non-chunked HTML output
658MyStyle-spec-fo.xsl Stylesheet for spec fo output
659MyStyle-txt-html.xsl Stylesheet for HTML=>text output
660MyStyle.xsl Stylesheet for all output
661MyTitleStyle.xsl Stylesheet for spec title page
662MyTitlepage.templates.xml Template for creating MyTitleStyle.xsl
663Myhtml.css Experimental css stylesheet for HTML output
f89d2485 664PageLabelPDF Script to postprocess xmlto/fop PDF output
168e428f
PH
665Pre-xml Script to preprocess XML
666TidyHTML-filter Script to tidy up the filter HTML output
667TidyHTML-spec Script to tidy up the spec HTML output
4f578862 668TidyInfo Script to sort index problems in Texinfo output
168e428f 669Tidytxt Script to compact multiple blank lines
9b371988
PH
670filter.xfpt xfpt source of the filter document
671spec.xfpt xfpt source of the specification document
168e428f
PH
672x2man Script to make the Exim man page from the XML
673
168e428f
PH
674
675Philip Hazel
e492cc8d 676Last updated: 31 August 2007