php XPath: case-insensitive text search on the page

Developing parsers for web app “Is it Kosher?“, I found pretty interesting way to parse the content of a html page.

$ch = curl_init($content_url);

curl_setopt($ch, CURLOPT_USERAGENT, 'Googlebot/2.1 (+http://www.google.com/bot.html)');
curl_setopt($ch, CURLOPT_REFERER, 'http://www.google.com');
curl_setopt($ch, CURLOPT_ENCODING, 'UTF-8');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$response = curl_exec($ch);

curl_close($ch);

$dom = new DOMDocument();
@$dom->loadHTML($response);
$xpath = new DOMXPath($dom);

$table_cells = $xpath->evaluate("//div[@class='entry-content']//tr[contains(translate(td, 'ABCDEFGHJIKLMNOPQRSTUVWXYZ', 'abcdefghjiklmnopqrstuvwxyz'), '".strtolower($request)."')]");

foreach ($table_cells as $val)
{
    $cell = $table_cells->item($i);
    $inner_html = $dom->saveXML($val);
    echo $inner_html;
}

In this case you will get all the TR tags, which child TDs contain a requested keyword. And it doesn’t matter what case you used for the keyword.

Leave a Reply

%d bloggers like this: