HTML & CSS Validating Parser

SqueakSource3 project page

This is an HTML and CSS parser and DOM that handles rotten HTML and broken CSS quite well. Developed by Todd Blanchard, it provides validation of web pages and it is the underlying technology behind http://www.badpage.info. The tag nesting and attribute rules are determined by interpreting the DTD's at the W3C. Hopefully this will make it fairly future proof. The CSS parser understands most of CSS 2 and some CSS 3 and the CSS selectors can tell if they match a DOM node. There is no visual rendering and no calculation of layout.

It's also useful for scraping web pages.

MIT license.

HTML-cmm.40.mcz
HTML-cmm.39.mcz
HTML-cmm.38.mcz
HTML-tb.37.mcz