Welcome to xmlplain’s documentation!

xmlplain module

XML as plain object module.

This module is a set of utility functions for parsing XML input into plain list/dict/string types.

These plain XML objects in turn can be emitted through YAML for instance as bare YAML without python objects.

The motivating usage is to dump XML to YAML, manually edit files as YAML, and emit XML output back.

The original XML file is supposed to be preserved except for comments and if requested spaces between elements.

Note that there are alternative modules with nearly the same functionality, but none of them both provide simple plain objects and preserve the initial XML content for non structured XML.

XML namespaces are preserved for attributes/elements and re-emitted as is.

WARNING: from the original XML documents, DTD specification, XML comments and processing entities will be discarded. Also system external entities are not allowed and will generate an exception. If one needs some of these features, it’s probably not the right usage for this module. Fill an issue if unsure.

Example:
>>> import xmlplain, sys
>>> _ = sys.stdout.write(open("tests/example-1.xml").read())
<example>
  <doc>This is an example for xmlobj documentation. </doc>
  <content version="beta">
    <kind>document</kind>
    <class>example</class>
    <structured/>
    <elements>
      <item>Elt 1</item>
      <doc>Elt 2</doc>
      <item>Elt 3</item>
      <doc>Elt 4</doc>
    </elements>
  </content>
</example>
>>> root = xmlplain.xml_to_obj(open("tests/example-1.xml"), strip_space=True, fold_dict=True)
>>> xmlplain.obj_to_yaml(root, sys.stdout)
example:
  doc: 'This is an example for xmlobj documentation. '
  content:
    '@version': beta
    kind: document
    class: example
    structured: ''
    elements:
    - item: Elt 1
    - doc: Elt 2
    - item: Elt 3
    - doc: Elt 4
>>> xmlplain.xml_from_obj(root, sys.stdout)
<?xml version="1.0" encoding="UTF-8"?>
<example>
  <doc>This is an example for xmlobj documentation. </doc>
  <content version="beta">
    <kind>document</kind>
    <class>example</class>
    <structured></structured>
    <elements>
      <item>Elt 1</item>
      <doc>Elt 2</doc>
      <item>Elt 3</item>
      <doc>Elt 4</doc>
    </elements>
  </content>
</example>
xmlplain.events_filter_pretty(events, handler=None, indent=' ')[source]

Augment an XML event list for pretty printing.

This is a filter function taking an event stream and returning the augmented event stream including ignorable whitespaces for an indented pretty print. the generated events stream is still a valid events stream suitable for xml_from_events().

Parameters:
  • events – the input XML events stream
  • handler – events receiver implementing the append() method or None, in which case a new list will be generated
  • indent – the base indent string, defaults to 2-space indent
Returns:

the handler if not None or the newly created events list

xmlplain.events_from_obj(root, handler=None)[source]

Creates an XML events stream from plain object.

Generates an XML event stream suitable for xml_from_events() from a well formed XML plain object and pass it through the append() method to the receiver or to a newly created list.

Parameters:
  • root – root of the XML plain object
  • handler – events receiver implementing the append() method or None, in which case a new list will be generated
Returns:

the handler if not None or the created events list

xmlplain.obj_from_yaml(inf, encoding='UTF-8', process_string=None)[source]

Read a YAML object, possibly holding a XML plain object.

Returns the XML plain obj from the YAML stream or string. The dicts read from the YAML stream are stored as OrderedDict such that the XML plain object elements are kept in order.

Parameters:
  • inf – input YAML file stream or string or bytestring
  • encoding – encoding of the input when a byte stream or byte string
  • process_string – a function to apply to strings (str for python3 or unicode for python2) after the YAML reader input
Returns:

the constructed plain object

xmlplain.obj_to_yaml(root, outf=None, encoding='UTF-8', process_string=None)[source]

Output an XML plain object to yaml.

Output an object to yaml with some specific management for OrderedDict, Strings and Tuples. The specific treatment for these objects are there in order to preserve the XML ordered structure while generating a bare yaml file without any python object.

Note that reading back the emitted YAML object should be done though obj_from_yaml() in order to preserve dict order.

To be used as an alternative to a bare yaml.dump if one needs an editable YAML view of the XML plain object.

Parameters:
  • root – root of the plain object to dump
  • outf – output file stream or None for bytestring output
  • encoding – output bytestring or file stream encoding
  • process_string – a function to apply to strings (str for python3 or unicode for python2) before the YAML writer output
Returns:

None or the generated byte string if stream is None

xmlplain.xml_from_events(events, outf=None, encoding='UTF-8', process_content=None)[source]

Outputs the XML document from the events tuples.

From the given events tuples lists as specified in xml_to_events(), generated a well formed XML document. The XML output is generated through xml.saxutils.XMLGenerator().

Parameters:
  • events – events tuples list or iterator
  • outf – output file stream or None for bytestring output
  • encoding – output encoding
  • process_content – a function to apply to the cdata content (str for python3 or unicode for python2) before being processed by the XML writer
Returns:

created byte string when outf if None

xmlplain.xml_from_obj(root, outf=None, encoding='UTF-8', pretty=True, indent=' ', process_content=None)[source]

Generate a XML output from a plain object

Generates to the XML representation for the plain object as generated by this module.. This function does the opposite of xml_to_obj().

Parameters:
  • root – the root of the plain object
  • outf – output file stream or None for bytestring output
  • encoding – the encoding to be used (default to “UTF-8”)
  • pretty – does indentation when True
  • indent – base indent string (default to 2-space)
  • process_content – a function to apply to the cdata content (str for python3 or unicode for python2) before being processed by the XML writer
Returns:

created byte string when outf if None

xmlplain.xml_to_events(inf, handler=None, encoding='UTF-8', process_content=None)[source]

Generates XML events tuples from the input stream.

The generated events consist of pairs: (type, value) where type is a single char identifier for the event and value is a variable length tuple. Events correspond to xml.sax events with the exception that attributes are generated as events instead of being part of the start element event. The XML stresm is parsed with xml.sax.make_parser().

Parameters:
  • inf – input stream file or string or bytestring
  • handler – events receiver implementing the append() method or None, in which case a new list will be generated
  • encoding – encoding used whebn the input is a bytes string
  • process_content – a function to apply to the cdata content (str for python3 or unicode for python2) after the XML reader content generation
Returns:

returns the handler or the generated list

The defined XML events tuples in this module are:

  • (“[“, (“”,)) for the document start
  • (“]”, (“”,)) for the document end
  • (“<”, (elt_name,)) for an element start
  • (“>”, (elt_name,)) for an element end
  • (“@”, (attr_name, attr_value)) for an attribute associated to the last start element
  • (“|”, (content,)) for a CDATA string content
  • (“#”, (whitespace,)) for an ignorable whitespace string
xmlplain.xml_to_obj(inf, encoding='UTF-8', strip_space=False, fold_dict=False, process_content=None)[source]

Generate an plain object representation from the XML input.

The representation consists of lists of plain elements which are either XML elements as dict { elt_name: children_list } or XML CDATA text contents as plain strings. This plain object for a XML document can be emitted to YAML for instance with no python dependency.

When the ‘fold’ option is given, an elements list may be simplified into a multiple key ordered dict or a single text content. Note that in this case, some Ordered dict python objects may be generated, one should then use the obj_to_yaml() method in order to get a bare YAML output.

When the ‘strip_space’ option is given, non-leaf text content are striped, this is in most case safe when managing structured XML, though, note that this change your XML document content. Generally one would use this in conjonction with pretty=true when emitting back the object to XML with xml_from_obj().

Parameters:
  • inf – input stream file or string or bytestring
  • encoding – encoding used when the input is bytes string
  • strip_space – strip spaces from non-leaf text content
  • fold_dict – optimized unambiguous lists of dict into ordered dicts
  • process_content – a function to apply to the cdata content (str for python3 or unicode for python2) after the XML reader content generation
Returns:

the root of the generated plain object, actually a single key dict

Example:
>>> import xmlplain, yaml, sys
>>> root = xmlplain.xml_to_obj(open("tests/example-1.xml"), strip_space=True)
>>> yaml.safe_dump(root, sys.stdout, default_flow_style=False, allow_unicode=True)
example:
- doc: 'This is an example for xmlobj documentation. '
- content:
  - '@version': beta
  - kind: document
  - class: example
  - structured: ''
  - elements:
    - item: Elt 1
    - doc: Elt 2
    - item: Elt 3
    - doc: Elt 4
>>> root = xmlplain.xml_to_obj(open("tests/example-1.xml"), strip_space=True, fold_dict=True)
>>> xmlplain.obj_to_yaml(root, sys.stdout)
example:
  doc: 'This is an example for xmlobj documentation. '
  content:
    '@version': beta
    kind: document
    class: example
    structured: ''
    elements:
    - item: Elt 1
    - doc: Elt 2
    - item: Elt 3
    - doc: Elt 4

Indices and tables