Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

XProc 3.0 Programmer Reference
XProc 3.0 Programmer Reference
XProc 3.0 Programmer Reference
Ebook573 pages4 hours

XProc 3.0 Programmer Reference

Rating: 0 out of 5 stars

()

Read preview

About this ebook

XProc 3.0 is a programming language for processing XML, JSON, and other documents in pipelines. XProc chains conversions and other steps, allowing for potentially complex processing. XProc is especially useful for applications, such as publishing, where content may come from multiple input sources, pass through multiple processing steps and result in multiple output streams.

XProc 3.0 Programmer Reference is aimed at programmers and others who process XML. It explains the language in detail, provides examples, and contains a set of example use cases. Anyone who uses the XProc language will find a wealth of information in this book.

LanguageEnglish
PublisherXML Press
Release dateMar 16, 2020
ISBN9781937434717
XProc 3.0 Programmer Reference
Author

Erik Siegel

Erik Siegel runs Xatapult, a consultancy that offers coaching, training, applications, and more to the publishing world.

Related to XProc 3.0 Programmer Reference

Related ebooks

Software Development & Engineering For You

View More

Related articles

Reviews for XProc 3.0 Programmer Reference

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    XProc 3.0 Programmer Reference - Erik Siegel

    Front cover of XProc 3.0 Programmer Reference

    XProc 3.0 Programmer Reference

    Table of Contents

    Preface

    Who is this book for?

    How to use this book

    The XProc logo

    Conventions used

    File extension

    Explaining XML structures

    Special data types

    Using and finding code examples

    Additional resources

    XProc and underlying standards

    Other information

    Contact information

    Acknowledgements

    1. Introduction

    1.1. What is XProc?

    1.2. A first example

    1.3. The language specification, step library, and this book

    1.4. A little history

    1.5. Who is using XProc and for what?

    1.6. Alternatives

    2. Getting started with XProc

    2.1. Installing and using an XProc processor

    2.1.1. XML Calabash

    2.1.2. MorganaXProc

    2.1.3. Hello XProc world

    2.1.4. Integration in oXygen

    2.2. XProc 101

    2.2.1. Converting to HTML

    2.2.1.1. Basic conversion

    2.2.2. Adding additional header information

    2.2.2.1. Using an option instead of a variable

    2.2.2.2. Loading from another input port

    2.2.2.3. Supplying defaults for input ports

    2.2.2.4. Indenting the result XML

    2.2.3. Inserting additional data

    2.2.4. Generic data conversion to HTML

    3. XProc fundamentals

    3.1. Pipelines and steps

    3.2. Documents

    3.2.1. Representations and properties

    3.2.2. Document types

    3.2.2.1. XML Documents

    3.2.2.2. HTML documents

    3.2.2.3. JSON documents

    3.2.2.4. Text documents

    3.2.2.5. Other documents

    3.3. Steps

    3.3.1. Properties of a step

    3.3.1.1. Type

    3.3.1.2. Ports

    3.3.1.3. Options

    3.3.2. Kinds of steps

    3.4. Ports

    3.4.1. Properties of ports

    3.4.2. Connecting or binding ports

    3.4.2.1. Implicit bindings

    3.4.2.2. Explicit binding

    3.4.2.3. The Default Readable Port

    4. Programming concepts

    4.1. XPath in XProc

    4.1.1. Extension functions

    4.2. XPath expressions and step options

    4.3. QName magic

    4.4. Map usage

    4.5. Document details

    4.5.1. Document properties

    4.5.1.1. content-type

    4.5.1.2. base-uri

    4.5.1.3. serialization

    4.5.2. JSON documents

    4.5.3. Documents and step results

    4.6. Attribute and Text Value Templates

    4.6.1. Attribute Value Templates in the XProc code

    4.6.2. Value Templates and the Default Readable Port

    4.6.3. Value Template expansion in inline content

    4.6.3.1. Turning Value Templates on or off for inline content

    4.6.3.2. TVT expansion in inline content

    4.7. Common attributes

    4.7.1. The [p:] notation for attributes

    4.7.2. The [p:]expand-text attribute

    4.7.3. The [p:]use-when attribute

    4.7.4. The xml:base attribute

    4.7.5. The xml:id attribute

    4.8. Documentation and annotations

    4.8.1. Documentation

    4.8.2. Annotations

    5. Defining steps: Prolog

    5.1. Declaring a step:

    5.1.1. Declaring an atomic step

    5.2. Declaring ports

    5.2.1. Declaring input ports:

    5.2.2. Declaring output ports:

    5.2.3. Specifying connections for input and output port declarations

    5.2.4. Specifying serialization

    5.2.5. Specifying content types

    5.3. Declaring options:

    5.3.1. Specifying an option’s acceptable values

    5.3.2. Static options

    5.3.3. Visibility of options

    5.4. Importing steps and step libraries:

    5.5. Importing function libraries:

    5.6. Building step libraries:

    6. Populating steps

    6.1. A word upfront: understandable steps

    6.2. Anatomy of a step’s body

    6.3. Invoking steps

    6.3.1. Default step names

    6.3.2. Step dependencies: [p:]depends attribute

    6.4. Connecting input ports:

    6.5. Setting options: (or by attribute)

    6.6. Declaring variables:

    6.6.1. Addressing multiple documents

    6.6.2. Visibility of variables

    6.7. Common connection constructs

    6.7.1. Specifying nothing:

    6.7.2. Reading external documents:

    6.7.3. Explicitly connecting to another port:

    6.7.4. Specifying documents inline:

    6.7.4.1. Excluding namespace prefixes

    7. Core steps

    7.1. Looping:

    7.2. Acting on a part of a document:

    7.3. Making decisions: and

    7.3.1. Multiple decisions:

    7.3.1.1. p:when

    7.3.1.2. p:otherwise

    7.3.2. Single decision:

    7.4. Grouping:

    7.5. Error handling:

    7.5.1. p:catch

    7.5.2. p:finally

    7.5.3. Error specification

    7.6. Output ports of subpipelines

    7.6.1. The subpipeline’s anonymous output port

    7.6.2. Explicitly defining output ports in subpipelines

    8. Built-in steps

    8.1. Classification of built-in steps

    8.2. The standard step library

    8.3. Some commonly used steps

    8.3.1. Doing nothing:

    8.3.2. XSLT Transformation:

    8.3.3. Adding attributes: and

    8.3.4. Deleting stuff:

    8.3.5. Inserting stuff:

    8.3.6. Wrapping a sequence:

    8.3.7. Resolving XIncludes:

    8.3.8. Storing documents:

    8.3.9. Changing a document’s type:

    9. Extension functions

    9.1. Environment information related functions

    9.2. Iteration related functions

    9.3. Document properties related functions

    9.4. Other functions

    10. XProc examples and recipes

    10.1. Using to clarify pipeline structure

    10.2. Working with document-properties

    10.3. Working with JSON documents

    10.4. Working with text documents

    10.5. Working with other documents

    10.6. Working with zip archives

    10.6.1. Inspect a zip archive

    10.6.2. Extract and change a file from a zip archive

    10.6.3. Creating a new zip archive

    10.7. Debugging hints

    10.7.1. Follow progress using messages

    10.7.2. See intermediate results using

    10.7.3. Turn code on/off with the [p:]use-when attribute and static options

    A. Standard step library

    A.1. p:add-attribute

    A.2. p:add-xml-base

    A.3. p:archive

    A.3.1. The archive manifest

    A.3.2. Handling of ZIP archives

    A.4. p:archive-manifest

    A.5. p:cast-content-type

    A.5.1. Casting from an XML media type

    A.5.2. Casting from an HTML media type

    A.5.3. Casting from a JSON media type

    A.5.4. Casting from a text media type

    A.5.5. Casting from any other media type

    A.6. p:compare

    A.7. p:compress

    A.8. p:count

    A.9. p:delete

    A.10. p:error

    A.11. p:filter

    A.12. p:hash

    A.13. p:http-request

    A.13.1. Construction of a multipart request

    A.13.2. Managing a multipart response

    A.14. p:identity

    A.15. p:in-scope-names

    A.16. p:insert

    A.17. p:json-join

    A.18. p:json-merge

    A.19. p:label-elements

    A.20. p:load

    A.20.1. Loading XML data

    A.20.2. Loading text data

    A.20.3. Loading JSON data

    A.20.4. Loading HTML data

    A.20.5. Loading binary data

    A.21. p:make-absolute-uris

    A.22. p:namespace-delete

    A.23. p:namespace-rename

    A.24. p:pack

    A.25. p:parameters

    A.25.1. The c:param element

    A.25.2. The c:param-set element

    A.26. p:rename

    A.27. p:replace

    A.28. p:set-attributes

    A.29. p:set-properties

    A.30. p:sink

    A.31. p:split-sequence

    A.32. p:store

    A.33. p:string-replace

    A.34. p:text-count

    A.35. p:text-head

    A.36. p:text-join

    A.37. p:text-replace

    A.38. p:text-sort

    A.39. p:text-tail

    A.40. p:unarchive

    A.41. p:uncompress

    A.42. p:unwrap

    A.43. p:uuid

    A.44. p:wrap

    A.45. p:wrap-sequence

    A.46. p:www-form-urldecode

    A.47. p:www-form-urlencode

    A.48. p:xinclude

    A.49. p:xquery

    A.49.1. Example

    A.50. p:xslt

    A.50.1. Invoking an XSLT 3.0 stylesheet

    A.50.2. Invoking an XSLT 2.0 stylesheet

    A.50.3. Invoking an XSLT 1.0 stylesheet

    B. Optional built-in steps overview

    B.1. Dynamic pipeline execution

    B.1.1. p:run

    B.2. File steps

    B.2.1. p:directory-list

    B.2.1.1. Directory list details

    B.2.2. p:file-copy

    B.2.3. p:file-create-tempfile

    B.2.4. p:file-delete

    B.2.5. p:file-info

    B.2.6. p:file-mkdir

    B.2.7. p:file-move

    B.2.8. p:file-touch

    B.3. Operating system steps

    B.3.1. p:os-exec

    B.3.2. p:os-info

    B.4. Mail steps

    B.4.1. p:send-mail

    B.5. Paged media steps

    B.5.1. p:css-formatter

    B.5.2. p:xsl-formatter

    B.6. Text steps

    B.6.1. p:markdown-to-html

    B.7. Validation steps

    B.7.1. Validate with NVDL

    B.7.2. Validate with RELAX NG

    B.7.3. Validate with Schematron

    B.7.4. Validate with XML Schema

    C. Namespaces

    D. Copyright credits

    Step Index

    E. Copyright and Legal Notices

    XProc 3.0

    Programmer Reference

    Erik Siegel

    Preface

    Welcome to this book about XProc 3.0, a programming language for processing XML (and other) documents in pipelines. XProc chains conversions and other steps together to achieve potentially complex results.

    XProc is currently not a well-known or a widely used programming language. Most systems that process XML are relatively simple. A single XSLT or even some DOM manipulation in some programming language often suffices. There is no need to bother with yet another processor and programming language.

    However, there are areas where XML processing gets complicated. One example is publishing, where XML is widely used to mark up content. Most publishers struggle with consolidating input from various sources (Word, InDesign, PDF, XML, HTML, etc.) into a central XML format such as DITA or DocBook. From this XML they create their output: books, magazines, newspapers, websites, etc. Both sides of this process—​to and from the central XML format—​involve heavy XML processing, and this is where XProc can play a starring role and sometimes even make the difference between impracticable and feasible.

    But people love what they know, and unknown means unloved. The syntax of XProc version 1.0 also didn’t help: it put off a lot of people, even seasoned XML programmers who were used to other pipeline tools such as Cocoon. In addition, introductory material for learning the language was hard to find.

    XProc 3.0 was designed with these shortcomings in mind. The syntax was simplified and new features added. These language advancements, backed up by this programmer’s manual, will hopefully make it a lot easier for people to learn and use XProc.

    Heavy XML processing is a niche in the programming world, so the XProc language will probably never become really mainstream. However, I hope that our efforts on the specification, together with the availability of this book, will make it more likely that people will use XProc.

    Who is this book for?

    XProc is a programming language originating from the XML world. So, no surprise, this book is aimed at programmers and others who do XML processing and use, or are considering using, XProc. It explains the language in detail, provides examples, and contains a selected set of example use cases (Chapter 10).

    Readers should have some background knowledge of XML and XML processing. Knowing, even superficially, languages like XSLT, XQuery, or XPath will certainly help you understand XProc better. This book does not set out to teach these other languages; it concentrates on XProc only. Consult the section called Additional resources if you want to know more about the underlying technologies, tools, and standards.

    How to use this book

    There are several ways you can use this book, including the following:

    If you want to learn it all, read it all. Skim the appendices and use them for reference.

    If you want an introduction, a basic understanding of what XProc is, what it can be used for, and how it does its magic, read Chapter 1 and Chapter 3.

    Use it as a reference guide as you use XProc.

    Find inspiration in the use cases in Chapter 10.

    From my own experience in learning a new programming language, I know it helps to read, or at least glance at, the entire book. Simply find out what’s there so you know what the capabilities are, even if you don’t initially dig into all of the details.


    A personal sorry tale: I did some foolish and heavy programming to get loop counters in loops before finding out that XProc has extension functions (Chapter 9). These would have done the job in a few keystrokes. Even just having known in the back of my head that these kinds of functions were there would have made me look them up. Functions for iterations are likely candidates for inclusion in a language, aren’t they? And yes, there it is: p:iteration-position()…


    Here is an overview of the structure of the book:

    Chapter 1 introduces XProc and provides a little (non-technical) background information.

    Chapter 2 helps you get started using XProc. It tells you how to install and run the appropriate software. Examples range from the proverbial Hello world! to a 101 crash course.

    Chapter 3 provides background information about XProc and how it sees the world without delving into the syntax just yet: what is a pipeline and a document, what are their main properties, how do you connect steps, etc.

    Chapter 4 explains the various programming concepts in XProc: the use of XPath, maps, Attribute and Text Value Templates (AVT/TVT), etc.

    Chapters 5–7 are key in creating XProc pipelines. They explain the full language:

    Chapter 5 explains how to declare a step: what does a pipeline document look like and how do you declare input ports, output ports, and options. It’s about the prolog of a pipeline.

    Chapter 6 tells you how populate your pipelines with functionality: chaining steps, using variables, etc. It’s about the body of a pipeline.

    Chapter 7 handles the core steps in XProc, including looping, branching, etc.

    Chapter 8 lists the built-in steps that populate pipelines. There are many built-in steps. This chapter doesn’t handle them in detail, it mainly provides an overview.

    Chapter 9 lists the XProc extension functions that are available for use in the XPath expressions in your pipeline. Among many others it contains functions for iteration information.

    Chapter 10 provides some examples of XProc pipelines for specific use cases.

    There are several appendices with reference information and a Step Index.

    The XProc logo

    To make absolutely sure XProc will be taken serious in the XML standards world, it has a logo (designed by Bethan Tovey-Walsh):

    Figure 1. The Kanava XProc logo

    The Kanava XProc logo

    We even gave it a name: Kanava. This not only sounds funny and is easy to pronounce, but also appears to be Finnish for pipeline!

    Conventions used

    File extension

    This book and its code examples use the preferred file extension for XProc documents: .xpl.

    Explaining XML structures

    This book uses a specific notation to explain XML structures. Although this notation looks like XML, it is not well-formed! So don’t copy/paste any of these examples directly into your code and expect them to work.

    Figure 2 shows an XML structure for the fictitious element.

    Figure 2. Sample XML structure for a fictitious element

                  attribute-2-required = (type)

                  attribute-3-fixed-values? = value-1 | value-2 | value-3

                  attribute-4-avt? = { (type) } >

      ?

     

      *

      +

      ( |

        )*

    Each example is followed by a table that describes the attributes and child elements in the example.

    Here are some details about the syntax used in Figure 2:

    Occurrences are given in DTD fashion:

    Attributes are followed by a data type (for instance xs:string or xs:boolean) or by the list of values it can have (like the attribute-3-fixed-values attribute in Figure 2).

    The xs namespace prefix for data types must be bound to the namespace http://www.w3.org/2001/XMLSchema (xmlns:xs=http://www.w3.org/2001/XMLSchema).

    There are some special, XProc specific, data types, for example SelPattern or XPathType. These are explained in the section called Special data types.

    When the type of an attribute is in curly braces {…} (like the attribute-4-avt attribute in the example), its value is an Attribute Value Template (AVT, see Section 4.6, Attribute and Text Value Templates). It can contain XPath expressions between curly braces, {…}, which will be evaluated and expanded.

    Child elements can be grouped, like the and elements in Figure 2. The elements in a group are separated with the pipe character (|) and surrounded by parentheses. This means you have a choice of elements, and you can repeat this choice as often as the occurrences indicator on the group allows.

    In Figure 2, the occurrence indicator on this group is * (zero-or-more), so any combination of both elements would be valid, as would an empty value.

    Special data types

    Some attributes have a special data type. Table 1 shows the data-type indicators used in XProc.

    Table 1 – Special data type indicators

    Using and finding code examples

    The code examples presented in this book, especially the ones in Chapters 2 and 10, can be found on GitHub at https://github.com/eriksiegel/XProc-3.0-book-sources.git. These examples are offered under the MIT License (see Appendix D). Feel free to download and use them.

    Because I don’t know where you’ll install these source files, every reference to them is prefixed with $SOURCES, for example $SOURCES/hello-world/hello-world.xpl.

    Additional resources

    This section contains additional information resources. It’s compiled based on my personal preferences combined with suggestions from others. Although incomplete and biased, it is nevertheless a good place to start exploring.

    XProc and underlying standards

    The XProc 3.0 specification (http://spec.xproc.org/)

    The XProc specification is written in the style of a W3C recommendation, using the terse prose necessary to reach the level of exactness a standard needs. This book was written based on the specification.

    If during your XProc adventures you’re ever in doubt who is right, this book or the specification, stop asking. The specification is always right.

    XPath 3.1 standard (https://www.w3.org/TR/xpath-31), XPath and XQuery Functions and Operators 3.1 (https://www.w3.org/TR/xpath-functions-31)

    XProc uses XPath 3.1 as an underlying standard. These links bring you to the official standard documents.

    If you’re new to XPath, I suggest you read Michael Kay’s book XSLT 2.0 and XPath 2.0. Although written for a previous version, it still stands. Its core concepts are explained well, and since 3.1 is backwards compatible with 2.0, anything you read is still true.

    Another well-written and informative resource for learning XPath is Priscilla Walmsley’s book XQuery: Search Across a Variety of XML Data. This book is very complete, covering difficult subjects such as higher-order functions, maps, and arrays. However, it doesn’t make the distinction between what’s XPath and what’s XQuery as clear as it could. So you might end up trying XQuery constructs (like FLWOR expressions) as XPath expressions. That won’t work inside an XProc pipeline. The book also includes a full list of available XPath functions.

    Other information

    The World Wide Web Consortium (W3C) (http://www.w3c.org)The W3C is the body that manages, among other things, the XProc specification and related XML standards. Their website is easy to use and informative.W3 schools (http://w3schools.com)W3Schools is a developer site with tutorials and references on web development languages, including XSLT, XQuery and XPath.[1] XSLT 2.0 and XPath 2.0: Programmer’s Reference by Michael Kay (Wiley, 2008)Although this book is a bit outdated (XSLT is on version 3.0 and XPath is on version 3.1), it is still an excellent resource and a good place to start with XPath. XSLT, 2nd Edition by Doug Tidwell (O’Reilly, 2008)A good book to learn and explore XSLT. It’s a bit outdated but still useful for the basics. XSLT Cookbook, 2nd Edition by Sal Mangano (O’Reilly, 2006)XSLT examples in cookbook style. Useful when you have an XSLT problem to crack, and you need an example to start from.XSL-List (https://www.mulberrytech.com/xsl/xsl-list/)The XSL-List is a place to ask XSLT questions and receive help. The list is usually very responsive. It also has an archive. XQuery: Search Across a Variety of XML Data by Priscilla Walmsley (O’Reilly, 2016)Considered the authoritative resource for programming XQuery. Also a very good place to start when you need information about the underlying XPath standard.

    Contact information

    This book was written by Erik Siegel (Xatapult, http://www.xatapult.com). You can reach me at erik@xatapult.nl.

    I would definitely like to hear from you. Whatever you have to say about this book, suggestions, omissions, errors, praise, likes, dislikes, disgust, please send me an e-mail. Knowing that there are people out there actually using what I’ve written will help me stay motivated.

    Acknowledgements

    I would like to thank the other members of the XProc 3.0 editorial team for their support, discussions and encouragements: Achim Berndzen, Gerrit Imsieke, and Norman Tovey-Walsh. Extra kudos for Achim and Norman because they somehow managed to work on an XProc processor in addition to their day-time jobs and working on the specification.

    Beside the XProc editors, a lot of people participated in the discussions and meetings around XProc 3.0. This community group, chaired by Ari Nordström, varied in attendance and so I’m not going to mention other names, because I will undoubtedly forget someone. But if you were there you know it: thanks!

    And there were, unbelievably, people that took the time and the effort to actually read all this and helped me with detailed feedback and criticism: my fellow XProc editorial crew-mates Achim, Gerrit, and Norman as well as Pieter Masereeuw, who looked at it from an outsider’s viewpoint. Thanks for all your time and effort!


    [1] Be aware, W3Schools is a controversial website. Just mentioning it here caused a surprising amount of criticism from reviewers. It apparently has a history of errors and is infamous for heavy advertising. Nonetheless, I personally consult it regularly and find the information well-presented. Judge it for yourself.

    Chapter 1. Introduction

    1.1. What is XProc?

    Let’s try to answer this question with an overview of XProc’s main high-level characteristics:

    XProc is a programming language, expressed in XML, in which you can write pipelines.

    An XProc pipeline takes data as its input (often XML) and passes it through specialized steps to produce end results.

    Steps range from simple tasks, like reading and writing data, to complex ones like splitting/​combining/​pruning data, transforming data with XSLT or XQuery, and validating data.

    Within a pipeline you can work with variables, create branches and loops, catch errors, etc. All of this processing is based on the data flowing through the pipeline.

    XProc pipelines are not limited to a linear succession of steps. They can fork and merge.

    XProc allows you to create custom steps by combining other steps. These custom steps can be used just like any other step. Custom steps can be collected into libraries.

    XProc supports housekeeping functions such as inspecting directories, reading documents from zip files, writing data to disk, etc.

    There is software—​XProc processors—​that can execute these pipelines.

    Why and when would these capabilities be useful? In the physical world, pipelining and working in specialized steps is not unusual. For instance, an oil refinery takes crude oil as input and, through a series of steps and intermediate products, produces petrol/gasoline, kerosene, diesel, etc. Refineries take the word pipeline literally.

    Another classic example of working in pipelines (although not literal) with specialized steps is a factory producing cars. This usually consists of a conveyor belt (the main pipeline) that takes the cars-to-be from one specialized assembly station to the next (the steps). There will probably be sub-conveyor belts (subpipelines) for parts like the engine, the cabling, the interior, etc. At the end you have a complete and functioning car.

    Yet another example is the UNIX pipe. A command produces some output and you pipe that output to another command—​for example grep or tail—​which does further processing to get the desired result. The character used for chaining steps, |, is even called the pipe character.

    So why do this in the world of information and document processing? One of the main reasons is that data is often not in the format you need it to be. Here are some examples:

    You have XML coming from some data source but need HTML for your website.

    You have data coming from multiple weather stations that needs to be merged into a single consolidated view. From this you produce a map with the information nicely laid out.

    Word processors produce zip files with lots of XML documents inside (most word processors do nowadays). You need the text in some other format so you have to inspect the zip file, combine the XML documents inside, and transform the result into what you need.

    For straight transformation of XML data you can use languages such as XSLT and XQuery. But often tasks such as chaining, splitting, or merging are more complex than can be handled in a single transformation. For such tasks, you may need to perform housekeeping functions such as determining where to read from or write to, inspecting directories, creating zip files,

    Enjoying the preview?
    Page 1 of 1