P4DTI issue job000827

TitleP4DTI test suite with PyXML 0.8.3 fails without XHTML DTD file
Statusclosed
Priorityessential
Assigned userNick Barnes
OrganizationRavenbrook
DescriptionThe P4DTI test suite uses Python's XML libraries and PyXML extensions to check the XHTML documentation which forms part of the P4DTI product sources (e.g. manuals, design documents, etc). When used with PyXML 0.8.3 (unlike PyXML 0.7.x), this part of the test suite fails because it can't the XHTML DTD named in the doctype element.
The failure is reported in various ways, but the underlying error is this:
ValueError: unknown url type: /tmp/DTD/xhtml1-transitional.dtd
AnalysisA full backtrace of the failure looks like this:
 File "check_xhtml.py", line 1054, in check
   xml.sax.parse(path_or_stream, self, self)
 File "/usr/local/lib/python2.2/site-packages/_xmlplus/sax/__init__.py", line 31, in parse
   parser.parse(filename_or_stream)
 File "/usr/local/lib/python2.2/site-packages/_xmlplus/sax/expatreader.py", line 109, in parse
   xmlreader.IncrementalParser.parse(self, source)
 File "/usr/local/lib/python2.2/site-packages/_xmlplus/sax/xmlreader.py", line 123, in parse
   self.feed(buffer)
 File "/usr/local/lib/python2.2/site-packages/_xmlplus/sax/expatreader.py", line 216, in feed
   self._parser.Parse(data, isFinal)
 File "/usr/local/lib/python2.2/site-packages/_xmlplus/sax/expatreader.py", line 395, in external_entity_ref
   self._source.getSystemId() or
 File "/usr/local/lib/python2.2/site-packages/_xmlplus/sax/saxutils.py", line 515, in prepare_input_source
   f = urllib2.urlopen(source.getSystemId())
 File "/usr/local/lib/python2.2/urllib2.py", line 138, in urlopen
   return _opener.open(url, data)
 File "/usr/local/lib/python2.2/urllib2.py", line 320, in open
   type_ = req.get_type()
 File "/usr/local/lib/python2.2/urllib2.py", line 224, in get_type
   raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: /tmp/DTD/xhtml1-transitional.dtd
The problem is that the default XML parser (expatreader) provided by the PyXML library wants to access the DTD entity. In principle this is controllable by setting a parameter _parser.SetParamEntityParsing() or by a parser feature switch (xml.sax.make_parser().setFeature(xml.sax.handler.feature_external_ges, 0).
Note that the XML parser doesn't care about the content of the DTD. The actual parsing is driven by the XML SAX handler which we provide. Providing an empty file in the location DTD/xhtml1-transitional.dtd fixes this problem, but it is better to turn off this parsing feature.
How foundautomated_test
EvidenceRun the test!
Observed in2.0.0
Introduced in1.1.4
Created byNick Barnes
Created on2003-12-02 15:01:06
Last modified byNick Barnes
Last modified on2003-12-02 15:05:12
History2003-12-02 NB Created.

Fixes

Change Effect Date User Description
66825 closed 2003-12-02 15:04:55 Nick Barnes Explicitly make an XML parser when we parse, so that we can turn off external entity reading.