Based on Michael Knapp's code, and adding some regex, here's a function that will get all meta tags and the title based on a URL. If there's an error, it will return false. Using the function getUrlContents(), also included, it takes care of META REFRESH re-directions, following up to the specified number of redirections. Please note that the regular expressions included were split into strings because php.net was complaining about the line being to long ;)
<?php
function getUrlData($url)
{
$result = false;
$contents = getUrlContents($url);
if (isset($contents) && is_string($contents))
{
$title = null;
$metaTags = null;
preg_match('/<title>([^>]*)<\/title>/si', $contents, $match );
if (isset($match) && is_array($match) && count($match) > 0)
{
$title = strip_tags($match[1]);
}
preg_match_all('/<[\s]*meta[\s]*name="?' . '([^>"]*)"?[\s]*' . 'content="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match);
if (isset($match) && is_array($match) && count($match) == 3)
{
$originals = $match[0];
$names = $match[1];
$values = $match[2];
if (count($originals) == count($names) && count($names) == count($values))
{
$metaTags = array();
for ($i=0, $limiti=count($names); $i < $limiti; $i++)
{
$metaTags[$names[$i]] = array (
'html' => htmlentities($originals[$i]),
'value' => $values[$i]
);
}
}
}
$result = array (
'title' => $title,
'metaTags' => $metaTags
);
}
return $result;
}
function getUrlContents($url, $maximumRedirections = null, $currentRedirection = 0)
{
$result = false;
$contents = @file_get_contents($url);
if (isset($contents) && is_string($contents))
{
preg_match_all('/<[\s]*meta[\s]*http-equiv="?REFRESH"?' . '[\s]*content="?[0-9]*;[\s]*URL[\s]*=[\s]*([^>"]*)"?' . '[\s]*[\/]?[\s]*>/si', $contents, $match);
if (isset($match) && is_array($match) && count($match) == 2 && count($match[1]) == 1)
{
if (!isset($maximumRedirections) || $currentRedirection < $maximumRedirections)
{
return getUrlContents($match[1][0], $maximumRedirections, ++$currentRedirection);
}
$result = false;
}
else
{
$result = $contents;
}
}
return $contents;
}
?>
Here's an example of its usage. Check that the included URL has a META REFRESH redirection:
<?php
$result = getUrlData('http://www.marianoiglesias.com.ar/');
echo '<pre>'; print_r($result); echo '</pre>';
?>
For the above code the output would be:
<?php
Array
(
[title] => Mariano Iglesias: El Eternauta
[metaTags] => Array
(
[description] => Array
(
[html] => <meta name="description" content="Java, PHP, and some other technological mumble jumble. Also, some real-life stuff as well." />
[value] => Java, PHP, and some other technological mumble jumble. Also, some real-life stuff as well.
)
[DC.title] => Array
(
[html] => <meta name="DC.title" content="Mariano Iglesias - Weblog" />
[value] => Mariano Iglesias - Weblog
)
[ICBM] => Array
(
[html] => <meta name="ICBM" content="-34.6017, -58.3956" />
[value] => -34.6017, -58.3956
)
[geo.position] => Array
(
[html] => <meta name="geo.position" content="-34.6017;-58.3956" />
[value] => -34.6017;-58.3956
)
[geo.region] => Array
(
[html] => <meta name="geo.region" content="AR-BA">
[value] => AR-BA
)
[geo.placename] => Array
(
[html] => <meta name="geo.placename" content="Buenos Aires">
[value] => Buenos Aires
)
)
)
?>