python - BeautifulSoup fails parsing when it hits an unescaped bracket -


I am having trouble loading the page that contains a literal (invisible) email tag, such as

  & lt; Html & gt; & Lt; Top & gt; & Lt; Title & gt; Trial & lt; / Title & gt; & Lt; / Head & gt; & Lt; Body & gt; & Lt; P & gt; test test. & Lt; / P & gt; & Lt; P & gt; This & lt; Joe@somewhere.com> & Lt; / P & gt; Has an email address for & lt; / Body & gt; & Lt; / Html & gt;  

Parsing fails when it runs that block:

The file "/tools/oss/packages/x86_64-rhel5/python/2.7.1/" Lib / python2.7 / HTMLParser.py ", line 115, error HTMLParseError raise (message, self.getpos ()) HTMLParseError: malformed start tag, on line 748, column 82

I do not believe I have been the first one to hit it, but I can not get any help or useful document immediately, is there anything clear to me?

Thank you,

- Paul

Always the way - When you post a question you get an answer later.

It seems that I hit the bug described - later updating Sundasup really fixes the problem.


Comments

Popular posts from this blog

Python SQLAlchemy:AttributeError: Neither 'Column' object nor 'Comparator' object has an attribute 'schema' -

java - How not to audit a join table and related entities using Hibernate Envers? -

mongodb - CakePHP paginator ignoring order, but only for certain values -