tidying
[exim.git] / doc / doc-docbook / HowItWorks.txt
CommitLineData
168e428f
PH
1CREATING THE EXIM DOCUMENTATION
2
3"You are lost in a maze of twisty little scripts."
4
5
6This document describes how the various versions of the Exim documentation, in
7different output formats, are created from DocBook XML, and also how the
8DocBook XML is itself created.
9
10
11BACKGROUND: THE OLD WAY
12
13From the start of Exim, in 1995, the specification was written in a local text
14formatting system known as SGCAL. This is capable of producing PostScript and
15plain text output from the same source file. Later, when the "ps2pdf" command
16became available with GhostScript, that was used to create a PDF version from
17the PostScript. (A few earlier versions were created by a helpful user who had
18bought the Adobe distiller software.)
19
20A demand for a version in "info" format led me to write a Perl script that
21converted the SGCAL input into a Texinfo file. Because of the somewhat
f89d2485 22restrictive requirements of Texinfo, this script always needed a lot of
9b371988 23maintenance, and was never totally satisfactory.
168e428f
PH
24
25The HTML version of the documentation was originally produced from the Texinfo
26version, but later I wrote another Perl script that produced it directly from
27the SGCAL input, which made it possible to produce better HTML.
28
29There were a small number of diagrams in the documentation. For the PostScript
30and PDF versions, these were created using Aspic, a local text-driven drawing
31program that interfaces directly to SGCAL. For the text and texinfo versions,
32alternative ascii-art diagrams were used. For the HTML version, screen shots of
33the PostScript output were turned into gifs.
34
35
36A MORE STANDARD APPROACH
37
38Although in principle SGCAL and Aspic could be generally released, they would
39be unlikely to receive much (if any) maintenance, especially after I retire.
40Furthermore, the old production method was only semi-automatic; I still did a
41certain amount of hand tweaking of spec.txt, for example. As the maintenance of
42Exim itself was being opened up to a larger group of people, it seemed sensible
43to move to a more standard way of producing the documentation, preferable fully
44automated. However, we wanted to use only non-commercial software to do this.
45
46At the time I was thinking about converting (early 2005), the "obvious"
47standard format in which to keep the documentation was DocBook XML. The use of
48XML in general, in many different applications, was increasing rapidly, and it
49seemed likely to remain a standard for some time to come. DocBook offered a
50particular form of XML suited to documents that were effectively "books".
51
52Maintaining an XML document by hand editing is a tedious, verbose, and
53error-prone process. A number of specialized XML text editors were available,
54but all the free ones were at a very primitive stage. I therefore decided to
9b371988
PH
55keep the master source in AsciiDoc format, from which a secondary XML master
56could be automatically generated.
57
58The first "new" versions of the documents, for the 4.60 release, were generated
59this way. However, there were a number of problems with using AsciiDoc for a
60document as large and as complex as the Exim manual. As a result, I wrote a new
61application called xfpt ("XML From Plain Text") which creates XML from a
62relatively simple and consistent markup language. This application has been
63released for general use, and the master sources for the Exim documentation are
64now in xfpt format.
168e428f
PH
65
66All the output formats are generated from the XML file. If, in the future, a
67better way of maintaining the XML source becomes available, this can be adopted
68without changing any of the processing that produces the output documents.
69Equally, if better ways of processing the XML become available, they can be
70adopted without affecting the source maintenance.
71
72A number of issues arose while setting this all up, which are best summed up by
f89d2485
PH
73the statement that a lot of the technology was (in 2006) still very immature.
74Trying to do this conversion any earlier would probably not have been anywhere
e492cc8d 75near as successful. The main issues that bother me in the XML-generated
f89d2485
PH
76documentation are described in the penultimate section of this document.
77
e492cc8d 78Initially, the major problems were in producing PostScript and PDF outputs. The
f89d2485
PH
79available free software for doing this was and still is (we are now in 2007)
80cumbersome and slow, and does not support certain output features that I would
81like. My response to this was, over a period of two years, to write an XML
82processor called SDoP (Simple DocBook Processor). This program reads DocBook
83XML and writes PostScript, without using any of the heavyweight apparatus that
84is required for xmlto and fop (the previously used software).
85
e492cc8d
PH
86An experimental first version of SDoP was used for the Exim 4.67
87documentation. Subsequently SDoP was released for general use. SDoP's output
f89d2485 88includes features that are missing when xmlto/fop is used, and it also runs
e492cc8d
PH
89about 60 times faster. The main manual can be formatted in 2.5 seconds instead
90of 2.5 minutes, which makes checking and fixing mistakes much easier.
f89d2485
PH
91
92The Makefile that is used to build the various forms of output will, for the
93moment, support both ways of producing PostScript and PDF output, though the
94default is now to use SDoP.
168e428f 95
9b371988 96The following sections describe the processes by which the xfpt files are
168e428f 97transformed into the final output documents. In practice, the details are coded
9b371988 98into a Makefile that specifies the chain of commands for each output format.
168e428f
PH
99
100
101REQUIRED SOFTWARE
102
103Installing software to process XML puts lots and lots of stuff on your box. I
104run Gentoo Linux, and a lot of things have been installed as dependencies that
105I am not fully aware of. This is what I know about (version numbers are current
106at the time of writing):
107
e492cc8d 108. xfpt 0.03
168e428f 109
9b371988 110 This converts the master source file into a DocBook XML file.
168e428f 111
e492cc8d 112. sdop 0.03
f89d2485 113
e492cc8d 114 This is my new DocBook-to-PostScript processor.
f89d2485
PH
115
116. ps2pdf
117
118 This is a wrapper script that is part of the GhostScript distribution. It
119 converts a PostScript file into a PDF file. It is used to process the output
120 from SDoP. It is not required when xmlto/fop is being used to generate PDF
121 output.
122
168e428f
PH
123. xmlto 0.0.18
124
125 This is a shell script that drives various XML processors. It is used to
f89d2485
PH
126 produce "formatted objects" when PostScript and PDF output is being generated
127 using fop (the old way) rather than SDoP. It is always used to produce HTML
128 output. It uses xsltproc, libxml, libxslt, libexslt, and possibly other
168e428f
PH
129 things that I have not figured out, to apply the DocBook XSLT stylesheets.
130
131. libxml 1.8.17
e492cc8d
PH
132 libxml2 2.6.28
133 libxslt 1.1.20
168e428f
PH
134
135 These are all installed on my box; I do not know which of libxml or libxml2
136 the various scripts are actually using.
137
f6bde1c8 138. xsl-stylesheets-<version>
168e428f
PH
139
140 These are the standard DocBook XSL stylesheets.
141
13eb9497 142 The documents use http://docbook.sourceforge.net/release/xsl/current/ which
f6bde1c8
PP
143 should be mapped to an appropriate local path via the system catalogs.
144
e492cc8d 145. fop 0.93
168e428f
PH
146
147 FOP is a processor for "formatted objects". It is written in Java. The fop
f89d2485
PH
148 command is a shell script that drives it. It required only if you do not
149 want to use SDoP and ps2pdf to generate PostScript and PDF output.
168e428f 150
e492cc8d 151. w3m 0.5.2
168e428f 152
595028e4 153 This is a text-oriented web brower. It is used to produce the ASCII form of
9b371988
PH
154 the Exim documentation (spec.txt) from a specially-created HTML format. It
155 seems to do a better job than lynx.
168e428f
PH
156
157. docbook2texi (part of docbook2X 0.8.5)
158
159 This is a wrapper script for a two-stage conversion process from DocBook to a
160 Texinfo file. It uses db2x_xsltproc and db2x_texixml. Unfortunately, there
161 are two versions of this command; the old one is based on an earlier fork of
162 docbook2X and does not work.
163
164. db2x_xsltproc and db2x_texixml (part of docbook2X 0.8.5)
165
166 More wrapping scripts (see previous item).
167
168. makeinfo 4.8
169
e492cc8d 170 This is used to make an "info" file from a Texinfo file.
168e428f 171
9b371988
PH
172In addition, there are a number of locally written Perl scripts. These are
173described below.
168e428f
PH
174
175
176THE MAKEFILE
177
178The makefile supports a number of targets of the form x.y, where x is one of
179"filter", "spec", or "test", and y is one of "xml", "fo", "ps", "pdf", "html",
180"txt", or "info". The intermediate targets "x.xml" and "x.fo" are provided for
181testing purposes. The other five targets are production targets. For example:
182
183 make spec.pdf
184
185This runs the necessary tools in order to create the file spec.pdf from the
9b371988 186original source spec.xfpt. A number of intermediate files are created during
168e428f
PH
187this process, including the master DocBook source, called spec.xml. Of course,
188the usual features of "make" ensure that if this already exists and is
189up-to-date, it is not needlessly rebuilt.
190
f89d2485
PH
191Because there are now two ways of creating the PostScript and PDF outputs,
192there are two targets for each one. For example fop-spec.ps makes PostScript
193using fop, and sdop-spec.ps makes it using SDoP. The generic targets spec.ps
194and spec.pdf now point to the SDoP versions.
195
168e428f 196The "test" series of targets were created so that small tests could easily be
9b371988 197run fairly quickly, because processing even the shortish XML document takes
f89d2485
PH
198a bit of time, and processing the main specification takes ages -- except when
199using SDoP for PostScript and PDF.
168e428f
PH
200
201Another target is "exim.8". This runs a locally written Perl script called
202x2man, which extracts the list of command line options from the spec.xml file,
203and creates a man page. There are some XML comments in the spec.xml file to
204enable the script to find the start and end of the options list.
205
206There is also a "clean" target that deletes all the generated files.
207
208
9b371988 209CREATING DOCBOOK XML FROM XFPT INPUT
168e428f 210
9b371988
PH
211The small amount of local configuration for xfpt is included at the start of
212the two .xfpt files; there are no separate local xfpt configuration files.
213Running the xfpt command creates a .xml file from a .xfpt file. When this
214succeeds, there is no output.
168e428f
PH
215
216
217DOCBOOK PROCESSING
218
219Processing a .xml file into the five different output formats is not entirely
220straightforward. For a start, the same XML is not suitable for all the
221different output styles. When the final output is in a text format (.txt,
595028e4
PH
222.texinfo) for instance, all non-ASCII characters in the input must be converted
223to ASCII transliterations because the current processing tools do not do this
168e428f
PH
224correctly automatically.
225
226In order to cope with these issues in a flexible way, a Perl script called
227Pre-xml was written. This is used to preprocess the .xml files before they are
228handed to the main processors. Adding one more tool onto the front of the
229processing chain does at least seem to be in the spirit of XML processing.
230
f89d2485
PH
231The XML processors other than SDoP make use of style files, which can be
232overridden by local versions. There is one that applies to all styles, called
233MyStyle.xsl, and others for the different output formats. I have included
234comments in these style files to explain what changes I have made. Some of the
235changes are quite significant.
168e428f
PH
236
237
f6bde1c8
PP
238XSL INCLUDES
239
240References to XSL paths should use the public URLs, such as:
241 http://docbook.sourceforge.net/release/xsl/current/xhtml/docbook.xsl
242If this fails to work for you, then there is a problem with your system
243catalogs. As a work-around, you can adjust the OS-Fixups script and then:
244$ make os-fixup
245
246As an example of how this should normally work, on a FreeBSD system the
247resolution goes to /usr/local/share/xml/catalog which contains a directive:
248 <nextCatalog catalog="/usr/local/share/xml/catalog.ports" />
249to pull in the file automatically maintained by the Ports system. That file
250will contain:
251 <delegateSystem
252 systemIdStartString="http://docbook.sourceforge.net/release/xsl/"
253 catalog="file:///usr/local/share/xsl/docbook/catalog" />
254 <delegateURI
255 uriStartString="http://docbook.sourceforge.net/release/xsl/"
256 catalog="file:///usr/local/share/xsl/docbook/catalog" />
257and that catalog file contains:
258 <rewriteSystem
259 systemIdStartString="http://docbook.sourceforge.net/release/xsl/current"
260 rewritePrefix="file:///usr/local/share/xsl/docbook" />
261 <rewriteURI
262 uriStartString="http://docbook.sourceforge.net/release/xsl/current"
263 rewritePrefix="file:///usr/local/share/xsl/docbook" />
264and the full path is thus eventually arrived at.
265
266See also the tools:
267 xmlcatalog(1) from libxml2
268 xmlcatmgr(1) for a lightweight tool written for the NetBSD Packages system.
269
270
168e428f
PH
271THE PRE-XML SCRIPT
272
273The Pre-xml script copies a .xml file, making certain changes according to the
274options it is given. The currently available options are as follows:
275
168e428f
PH
276-ascii
277
595028e4 278 This option is used for ASCII output formats. It makes the following
168e428f
PH
279 character replacements:
280
168e428f 281 &#x2019; => ' apostrophe
9b371988
PH
282 &copy; => (c) copyright
283 &dagger; => * dagger
284 &Dagger; => ** double dagger
285 &nbsp; => a space hard space
286 &ndash; => - en dash
287
288 The apostrophe is specified numerically because that is what xfpt generates
595028e4 289 from an ASCII single quote character. Non-ASCII characters that are not in
9b371988 290 this list should not be used without thinking about how they might be
595028e4 291 converted for the ASCII formats.
9b371988
PH
292
293 In addition to the character replacements, this option causes quotes to be
294 put round <literal> text items, and <quote> and </quote> to be replaced by
595028e4
PH
295 ASCII quote marks. You would think the stylesheet would cope with the latter,
296 but it seems to generate non-ASCII characters that w3m then turns into
9b371988 297 question marks.
168e428f
PH
298
299-bookinfo
300
301 This option causes the <bookinfo> element to be removed from the XML. It is
302 used for the PostScript/PDF forms of the filter document, in order to avoid
303 the generation of a full title page.
304
305-fi
306
307 Replace any occurrence of "fi" by the ligature &#xFB01; except when it is
308 inside an XML element, or inside a <literal> part of the text.
309
310 The use of ligatures would be nice for the PostScript and PDF formats. Sadly,
311 it turns out that fop cannot at present handle the FB01 character correctly.
f89d2485
PH
312 Happily this problem is now avoided when SDoP is used to generate PostScript
313 (and thence PDF) because SDoP automatically uses an "fi" ligature for
314 non-fixed-width fonts.
315
316 The only xmlto format that handles FB01 is the HTML format, but when I used
317 this in the test version, people complained that it made searching for words
318 difficult. So this option is in practice not used at all.
168e428f
PH
319
320-noindex
321
9b371988 322 Remove the XML to generate a Concept Index and an Options index. The source
e492cc8d
PH
323 document has three types of index entry, for variables, options, and concept
324 indexes. However, no index is required for the .txt and .texinfo outputs.
168e428f
PH
325
326-oneindex
327
e492cc8d
PH
328 Remove the XML to generate separate variables, options, and concept indexes,
329 and add XML to generate a single index. The only output processors that
330 support multiple indexes are SDoP and the processor that produces "formatted
331 objects" for PostScript and PDF output for fop. The HTML processor ignores
332 the XML settings for multiple indexes and just makes one unified index.
333 Specifying three indexes gets you three copies of the same index, so this has
334 to be changed.
168e428f 335
9b371988
PH
336-optbreak
337
338 Look for items of the form <option>...</option> and <varname>...</varname> in
339 ordinary paragraphs, and insert &#x200B; after each underscore in the
340 enclosed text. The same is done for any word containing four or more upper
341 case letters (compile-time options in the Exim specification). The character
342 &#x200B; is a zero-width space. This means that the line may be split after
343 one of these underscores, but no hyphen is inserted.
168e428f
PH
344
345
346CREATING POSTSCRIPT AND PDF
347
f89d2485
PH
348These two output formats are created either by using my new SDoP program to
349produce PostScript which can then be run through ps2pdf to make a PDF, or by
350using xmlto and fop in the old way.
351
352
353USING SDOP TO CREATE POSTSCRIPT AND PDF
354
355PostScript output is created in two stages. First, the XML is pre-processed by
356the Pre-xml script. For the filter document, the <bookinfo> element is removed
357so that no title page is generated. For the main specification, the only change
358is to insert line breakpoints via -optbreak.
359
360The SDoP program is then used to create PostScript output directly from the XML
361input. Then the ps2pdf command is used to generated a PDF from the PostScript.
362There are no external stylesheets that are used by SDoP. Any variations to the
363default format are specified inline using "processing instructions".
364
365
366USING XMLTO AND FOP TO CREATE POSTSCRIPT AND PDF
367
368This is the original way of creating PostScript and PDF output. The processing
369happens in three stages, with an additional fourth stage for PDF. First, the
370XML is pre-processed by the Pre-xml script. For the filter document, the
371<bookinfo> element is removed so that no title page is generated. For the main
372specification, the only change is to insert line breakpoints via -optbreak.
168e428f
PH
373
374Second, the xmlto command is used to produce a "formatted objects" (.fo) file.
375This process uses the following stylesheets:
376
377 (1) Either MyStyle-filter-fo.xsl or MyStyle-spec-fo.xsl
378 (2) MyStyle-fo.xsl
379 (3) MyStyle.xsl
380 (4) MyTitleStyle.xsl
381
382The last of these is not used for the filter document, which does not have a
383title page. The first three stylesheets were created manually, either by typing
384directly, or by coping from the standard style sheet and editing.
385
386The final stylesheet has to be created from a template document, which is
387called MyTitlepage.templates.xml. This was copied from the standard styles and
388modified. The template is processed with xsltproc to produce the stylesheet.
389All this apparatus is appallingly heavyweight. The processing is also very slow
390in the case of the specification document. However, there should be no errors.
391
9b371988
PH
392The reference book that saved my life while I was trying to get all this to
393work is "DocBook XSL, The Complete Guide", third edition (2005), by Bob
394Stayton, published by Sagehill Enterprises.
395
396In the third part of the processing, the .fo file that is produced by the xmlto
397command is processed by the fop command to generate either PostScript or PDF.
398This is also very slow, and you get a whole slew of errors, of which these are
399a sample:
168e428f
PH
400
401 [ERROR] property - "background-position-horizontal" is not implemented yet.
402
403 [ERROR] property - "background-position-vertical" is not implemented yet.
404
405 [INFO] JAI support was not installed (read: not present at build time).
406 Trying to use Jimi instead
407 Error creating background image: Error creating FopImage object (Error
408 creating FopImage object
409 (http://docbook.sourceforge.net/release/images/draft.png) :
410 org.apache.fop.image.JimiImage
411
412 [WARNING] table-layout=auto is not supported, using fixed!
413
414 [ERROR] Unknown enumerated value for property 'span': inherit
415
416 [ERROR] Error in span property value 'inherit':
417 org.apache.fop.fo.expr.PropertyException: No conversion defined
418
419 [ERROR] Areas pending, text probably lost in lineinclude parts matched in the
420 response by response_pattern by means of numeric variables such as
421
422The last one is particularly meaningless gobbledegook. Some of the errors and
423warnings are repeated many times. Nevertheless, it does eventually produce
424usable output, though I have a number of issues with it (see a later section of
425this document). Maybe one day there will be a new release of fop that does
f89d2485
PH
426better. In the meantime, I have written my own program for making PostScript
427output -- see the previous section -- because the problems with xmlto/fop were
428sufficiently annoying.
9b371988
PH
429
430The PDF file that is produced by this process has one problem: the pages, as
431shown by acroread in its thumbnail display, are numbered sequentially from one
432to the end. Those numbers do not correspond with the page numbers of the body
433of the document, which makes finding a page from the index awkward. There is a
434facility in the PDF format to give pages appropriate "labels", but I cannot
435find a way of persuading fop to generate these. Fortunately, it is possibly to
436fix up the PDF to add page labels. I wrote a script called PageLabelPDF which
db9452a9
PH
437does this. They are shown correctly by acroread and xpdf, but not by
438GhostScript (gv).
9b371988
PH
439
440
441THE PAGELABELPDF SCRIPT
442
f89d2485
PH
443This script reads the standard input and writes the standard output. It is used
444to "tidy up" the PDF output that is produced by fop. It is not needed when
445PDF output is generated from SDoP's output using ps2pdf.
446
447The PageLabelPDF script searches for the PDF object that sets data in its
448"Catalog", and adds appropriate information about page labels. The number of
449front-matter pages (those before chapter 1) is hard-wired into this script as
45012 because I could not find a way of determining it automatically. As the
451current table of contents finishes near the top of the 11th page, there is
452plenty of room for expansion, so it is unlikely to be a problem.
9b371988
PH
453
454Having added data to the PDF file, the script then finds the xref table at the
455end of the file, and adjusts its entries to allow for the added text. This
456simple processing seems to be enough to generate a new, valid, PDF file.
168e428f
PH
457
458
459CREATING HTML
460
461Only two stages are needed to produce HTML, but the main specification is
9b371988 462subsequently postprocessed. The Pre-xml script is called with the -optbreak and
168e428f
PH
463-oneindex options to preprocess the XML. Then the xmlto command creates the
464HTML output directly. For the specification document, a directory of files is
465created, whereas the filter document is output as a single HTML page. The
466following stylesheets are used:
467
468 (1) Either MyStyle-chunk-html.xsl or MyStyle-nochunk-html.xsl
469 (2) MyStyle-html.xsl
470 (3) MyStyle.xsl
471
9b371988 472The first stylesheet references the chunking or non-chunking standard DocBook
168e428f
PH
473stylesheet, as appropriate.
474
9b371988
PH
475You may see a number of these errors when creating HTML: "Revisionflag on
476unexpected element: literallayout (Assuming block)". They seem to be harmless;
477the output appears to be what is intended.
478
168e428f
PH
479The original HTML that I produced from the SGCAL input had hyperlinks back from
480chapter and section titles to the table of contents. These links are not
481generated by xmlto. One of the testers pointed out that the lack of these
482links, or simple self-referencing links for titles, makes it harder to copy a
483link name into, for example, a mailing list response.
484
485I could not find where to fiddle with the stylesheets to make such a change, if
486indeed the stylesheets are capable of it. Instead, I wrote a Perl script called
487TidyHTML-spec to do the job for the specification document. It updates the
488index.html file (which contains the the table of contents) setting up anchors,
489and then updates all the chapter files to insert appropriate links.
490
491The index.html file as built by xmlto contains the whole table of contents in a
492single line, which makes is hard to debug by hand. Since I was postprocessing
493it anyway, I arranged to insert newlines after every '>' character.
494
068aaea8
PH
495The TidyHTML-spec script also processes every HTML file, to tidy up some of the
496untidy features therein. It turns <div class="literallayout"><p> into <div
497class="literallayout"> and a matching </p></div> into </div> to get rid of
498unwanted vertical white space in literallayout blocks. Before each occurrence
499of </td> it inserts &nbsp; so that the table's cell is a little bit wider than
500the text itself.
501
168e428f 502The TidyHTML-spec script also takes the opportunity to postprocess the
4f578862 503spec_html/ix01.html file, which contains the document index. Again, the index
168e428f
PH
504is generated as one single line, so it splits it up. Then it creates a list of
505letters at the top of the index and hyperlinks them both ways from the
506different letter portions of the index.
507
508People wanted similar postprocessing for the filter.html file, so that is now
509done using a similar script called TidyHTML-filter. It was easier to use a
510separate script because filter.html is a single file rather than a directory,
511so the logic is somewhat different.
512
513
514CREATING TEXT FILES
515
9b371988 516This happens in four stages. The Pre-xml script is called with the -ascii,
595028e4 517-optbreak, and -noindex options to convert the input to ASCII characters,
9b371988
PH
518insert line break points, and disable the production of an index. Then the
519xmlto command converts the XML to a single HTML document, using these
520stylesheets:
168e428f
PH
521
522 (1) MyStyle-txt-html.xsl
523 (2) MyStyle-html.xsl
524 (3) MyStyle.xsl
525
526The MyStyle-txt-html.xsl stylesheet is the same as MyStyle-nochunk-html.xsl,
527except that it contains an addition item to ensure that a generated "copyright"
528symbol is output as "(c)" rather than the Unicode character. This is necessary
529because the stylesheet itself generates a copyright symbol as part of the
530document title; the character is not in the original input.
531
595028e4 532The w3m command is used with the -dump option to turn the HTML file into ASCII
168e428f 533text, but this contains multiple sequences of blank lines that make it look
9b371988
PH
534awkward. Furthermore, chapter and section titles do not stand out very well. A
535local Perl script called Tidytxt is used to post-process the output. First, it
536converts sequences of blank lines into a single blank lines. Then it searches
537for chapter and section headings. Each chapter heading is uppercased, and
538preceded by an extra two blank lines and a line of equals characters. An extra
539newline is inserted before each section heading, and they are underlined with
540hyphens.
168e428f 541
01496481
TF
542The output of xmlto also contains non-ASCII Unicode characters that w3m passes
543through. Fortunately, they are few, and Tidytxt cleans them up as well. Some
544headings use "box drawing" characters in the range U+2500 to U+253F which are
545translated into -+| as appropriate, and U+00A0 (hard space) and U+25CF (bullet)
546are translated into plain spaces and asterisks. (It might be possible to do all
547this in the same way as I dealt with copyright - see above - but adding a few
548lines of Perl to an existing script was a lot easier.)
595028e4 549
168e428f
PH
550
551CREATING INFO FILES
552
9b371988
PH
553This process starts with the same Pre-xml call as for text files. Non-ascii
554characters in the source are transliterated, and the <index> elements are
555removed. The docbook2texi script is then called to convert the XML file into a
556Texinfo file. However, this is not quite enough. The converted file ends up
557with "conceptindex" and "optionindex" items, which are not recognized by the
4f578862
PH
558makeinfo command. These have to be changed to "cindex" and "findex"
559respectively in the final .texinfo file. Furthermore, the main menu lacks a
560pointer to the index, and indeed the index node itself is missing. These
561problems are fixed by running the file through a script called TidyInfo.
e492cc8d 562Finally, a call of makeinfo creates a .info file.
168e428f
PH
563
564There is one apparently unconfigurable feature of docbook2texi: it does not
565seem possible to give it a file name for its output. It chooses a name based on
566the title of the document. Thus, the main specification ends up in a file
567called the_exim_mta.texi and the filter document in exim_filtering.texi. These
568files are removed after their contents have been copied and modified by the
4f578862 569TidyInfo script, which writes to a .texinfo file.
168e428f
PH
570
571
572CREATING THE MAN PAGE
573
574I wrote a Perl script called x2man to create the exim.8 man page from the
9b371988 575DocBook XML source. I deliberately did NOT start from the xfpt source,
168e428f
PH
576because it is the DocBook source that is the "standard". This comment line in
577the DocBook source marks the start of the command line options:
578
579 <!-- === Start of command line options === -->
580
581A similar line marks the end. If at some time in the future another way other
9b371988 582than xfpt is used to maintain the DocBook source, it needs to be capable of
168e428f
PH
583maintaining these comments.
584
585
586UNRESOLVED PROBLEMS
587
588There are a number of unresolved problems with producing the Exim documentation
589in the manner described above. I will describe them here in the hope that in
f89d2485
PH
590future some way round them can be found. Some of the problems are solved by
591using SDoP instead of xmlto/fop to produce PostScript and PDF output.
168e428f 592
9b371988
PH
593(1) When a whole chain of tools is processing a file, an error somewhere
594 in the middle is often very hard to debug. For instance, an error in the
595 xfpt file might not show up until an XML processor throws a wobbly because
168e428f
PH
596 the generated XML is bad. You have to be able to read XML and figure out
597 what generated what. One of the reasons for creating the "test" series of
598 targets was to help in checking out these kinds of problem.
599
600(2) There is a mechanism in XML for marking parts of the document as
f89d2485
PH
601 "revised", and I have arranged for xfpt markup to use it. However, the
602 only xmlto output format that pays attention to this is the HTML output,
603 which sets a green background. If xmlto/fop is used to generate PostScript
604 and PDF, there are no revision marks (change bars). This problem
605 is not present when SDoP is used. However, the text and Texinfo output
606 format lack revision indications.
168e428f
PH
607
608(3) The index entries in the HTML format take you to the top of the section
609 that is referenced, instead of to the point in the section where the index
610 marker was set.
611
e492cc8d
PH
612(4) The HTML output supports only a single index, so the variable, options,
613 and concept index entries have to be merged.
168e428f 614
f89d2485
PH
615(5) The index for the PostScript/PDF output created by xmlto/fop does not
616 merge identical page numbers, which makes some entries look ugly. This is
617 not a problem when SDoP is used.
168e428f 618
e492cc8d
PH
619(6) The HTML index and the PostScript/PDF indexes, when made with xmlto/fop,
620 make no use of textual markup; the text is all roman, without any italic
621 or boldface. For PostScript/PDF, this is not a problem when SDoP is used.
168e428f 622
f89d2485
PH
623(7) I turned off hyphenation in the PostScript/PDF output produced by
624 xmlto/fop, because it was being done so badly. Needless to say, I made
625 SDoP do a better job. These comments apply to xmlto/fop:
168e428f
PH
626
627 (a) It seems to force hyphenation if it is at all possible, without
628 regard to the "tightness" or "looseness" of the line. Decent
629 formatting software should attempt hyphenation only if the line is
630 over some "looseness" threshold; otherwise you get far too many
631 hyphenations, often for several lines in succession.
632
633 (b) It uses an algorithmic form of hyphenation that doesn't always produce
e492cc8d
PH
634 acceptable word breaks. (I prefer to use a hyphenation dictionary,
635 which is what SDoP does.)
168e428f 636
f89d2485 637(8) The PostScript/PDF output produced by xmlto/fop is badly paginated:
168e428f
PH
638
639 (a) There seems to be no attempt to avoid "widow" and "orphan" lines on
640 pages. A "widow" is the last line of a paragraph at the top of a page,
641 and an "orphan" is the first line of a paragraph at the bottom of a
642 page.
643
644 (b) There seems to be no attempt to prevent section headings being placed
645 last on a page, with no following text on the page.
646
f89d2485
PH
647 Neither of these problems occurs when SDoP is used to produce the
648 PostScript/PDF output.
649
168e428f 650(9) The fop processor does not support "fi" ligatures, not even if you put the
f89d2485
PH
651 appropriate Unicode character into the source by hand. Again, this is not
652 a problem if SDoP is used.
168e428f 653
9b371988
PH
654(10) There are no diagrams in the new documentation. This is something I hope
655 to work on. The previously used Aspic command for creating line art from a
168e428f
PH
656 textual description can output Encapsulated PostScript or Scalar Vector
657 Graphics, which are two standard diagram representations. Aspic could be
658 formally released and used to generate output that could be included in at
659 least some of the output formats.
660
9b371988
PH
661(11) The use of a "zero-width space" works well as a way of specifying that
662 Exim option names can be split, without hyphens, over line breaks.
9b371988 663
f89d2485
PH
664 However, when xmlto/fop is being used and an option is not split, if the
665 line is very "loose", the zero-width space is expanded, along with other
666 spaces. This is a totally crazy thing to, but unfortunately it is
667 suggested by the Unicode definition of the zero-width space, which says
668 "its presence between two characters does not prevent increased letter
669 spacing in justification". It seems that the implementors of fop have
670 understood "letter spacing" also to include "word spacing". Sigh.
671
672 This problem does not arise when SDoP is used.
673
674The consequence of (7), (8), and (9) is that the PostScript/PDF output as
675produced by xmlto/fop looks as if it comes from some of the very early attempts
676at text formatting of around 20 years ago. We can only hope that 20 years'
677progress is not going to get lost, and that things will improve in this area.
678My small contribution to this has been to write SDoP, which, though simple and
679"non-standard", does get some of these formatting issues right.
168e428f
PH
680
681
682LIST OF FILES
683
9b371988 684Markup.txt Describes the xfpt markup that is used
168e428f
PH
685HowItWorks.txt This document
686Makefile The makefile
168e428f
PH
687MyStyle-chunk-html.xsl Stylesheet for chunked HTML output
688MyStyle-filter-fo.xsl Stylesheet for filter fo output
689MyStyle-fo.xsl Stylesheet for any fo output
690MyStyle-html.xsl Stylesheet for any HTML output
691MyStyle-nochunk-html.xsl Stylesheet for non-chunked HTML output
692MyStyle-spec-fo.xsl Stylesheet for spec fo output
693MyStyle-txt-html.xsl Stylesheet for HTML=>text output
694MyStyle.xsl Stylesheet for all output
695MyTitleStyle.xsl Stylesheet for spec title page
696MyTitlepage.templates.xml Template for creating MyTitleStyle.xsl
697Myhtml.css Experimental css stylesheet for HTML output
f89d2485 698PageLabelPDF Script to postprocess xmlto/fop PDF output
168e428f
PH
699Pre-xml Script to preprocess XML
700TidyHTML-filter Script to tidy up the filter HTML output
701TidyHTML-spec Script to tidy up the spec HTML output
4f578862 702TidyInfo Script to sort index problems in Texinfo output
168e428f 703Tidytxt Script to compact multiple blank lines
9b371988
PH
704filter.xfpt xfpt source of the filter document
705spec.xfpt xfpt source of the specification document
168e428f
PH
706x2man Script to make the Exim man page from the XML
707
168e428f 708
f6bde1c8
PP
709(Originally, and for the most part: Philip Hazel)
710The Exim Maintainers
711Last updated: 5 July 2010