python - BeautifulSoup fails parsing when it hits an unescaped bracket -
I am having trouble loading the page that contains a literal (invisible) email tag, such as
& lt; Html & gt; & Lt; Top & gt; & Lt; Title & gt; Trial & lt; / Title & gt; & Lt; / Head & gt; & Lt; Body & gt; & Lt; P & gt; test test. & Lt; / P & gt; & Lt; P & gt; This & lt; Joe@somewhere.com> & Lt; / P & gt; Has an email address for & lt; / Body & gt; & Lt; / Html & gt;
Parsing fails when it runs that block:
The file "/tools/oss/packages/x86_64-rhel5/python/2.7.1/" Lib / python2.7 / HTMLParser.py ", line 115, error HTMLParseError raise (message, self.getpos ()) HTMLParseError: malformed start tag, on line 748, column 82
I do not believe I have been the first one to hit it, but I can not get any help or useful document immediately, is there anything clear to me?
Thank you,
- Paul
Always the way - When you post a question you get an answer later.
It seems that I hit the bug described - later updating Sundasup really fixes the problem.
Comments
Post a Comment