Developing parsers for web app “Is it Kosher?“, I found pretty interesting way to parse the content of a html page.
$ch = curl_init($content_url); curl_setopt($ch, CURLOPT_USERAGENT, 'Googlebot/2.1 (+http://www.google.com/bot.html)'); curl_setopt($ch, CURLOPT_REFERER, 'http://www.google.com'); curl_setopt($ch, CURLOPT_ENCODING, 'UTF-8'); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $response = curl_exec($ch); curl_close($ch); $dom = new DOMDocument(); @$dom->loadHTML($response); $xpath = new DOMXPath($dom); $table_cells = $xpath->evaluate("//div[@class='entry-content']//tr[contains(translate(td, 'ABCDEFGHJIKLMNOPQRSTUVWXYZ', 'abcdefghjiklmnopqrstuvwxyz'), '".strtolower($request)."')]"); foreach ($table_cells as $val) { $cell = $table_cells->item($i); $inner_html = $dom->saveXML($val); echo $inner_html; }
In this case you will get all the TR tags, which child TDs contain a requested keyword. And it doesn’t matter what case you used for the keyword.