Home      Products & Services      Contact Us      Links

WebHatchers will design & develop your site for you.

Website Menu Heaven: menus, buttons, etc.

Send us your questions.

site search by freefind

SEO, Google, Privacy
   and Anonymity
Browser Insanity
Popups and Tooltips
Free Website Search
HTML Form Creator
Buttons and Menus
Image Uploading
Website Poll
IM and Texting
   or Not MySQL
Personal Status Boards
Content Management
Article Content
   Management Systems
Website Directory
   CMS Systems
Photo Gallery CMS
Forum CMS
Blog CMS
Customer Records
   Management CMS
Address Book CMS
Private Messaging CMS
Chat Room CMS
JavaScript Charts
   and Graphs

Free Personal Status Boards (PSB™)

Free Standard Free PSB

Free PSB Pro Version

Free Social PSB

Free Social PSB Plus (with Email)

Free Business PSB

Free Business PSB Plus (with Email)

PSB demo

Social PSB demo

Business PSB demo

So what's all this PSB stuff about?

Chart comparing business status boards

PSB hosting diagram

PSB Licence Agreement

Copyright © 2002 -
MCS Investments, Inc. sitemap

PSBs, social networking, social evolution, microcommunities, personal status boards
PSBs, social networking, business personal status boards
website design, ecommerce solutions
website menus, buttons, image rotators
Ez-Architect, home design software
the magic carpet and the cement wall, children's adventure book
the squirrel valley railroad, model railroad videos, model train dvds
the deep rock railroad, model railroad videos, model train dvds

List Urls in XML Sitemap Using XPATH Query and registerNamespace and PHP

This script will List Urls in XML Sitemap Using XPATH Query and registerNamespace and PHP. In this case, we've used a sample file that is a website sitemap file which we generate with XML Sitemaps.

The script uses the PHP DOM extension and PHP 5. The DOM extension is enabled by default in most PHP installations, so the following should work fine—it does for us. The DOM extension allows you to operate on XML documents through the DOM API with PHP 5. It supports XPATH 1.0, which this script uses extensively. XPATH has been around awhile. What is it? XPath is a syntax for defining parts of an XML document (or an HTML or XHTML one). It uses path expressions to navigate in documents. It contains a library of standard functions.

The DOMXPath class has the DOMDocument property and several very useful methods: DOMXPath::__construct, DOMXPath::evaluate (which evaluates the given XPath expression and returns a typed result if possible or a DOMNodeList containing all nodes matching the given XPath expression), DOMXPath::query (which evaluates and executes the given XPath expression and returns a DOMNodeList containing all nodes matching the given XPath expression), DOMXPath::registerNamespace (which is necessary to use XPath to handle documents which have default namespaces described in the xmlns declaration which in the case of a sitemap is in the urlset tag), and DOMXPath::registerPhpFunctions. Most XML files seem to have no xmlns declaration (e.g., PAD files), therefore needing no namespace registration.

We perform the page URL listing task, also listing the date last modified. We do it using XPath. The getElementsByTagName() method seems more straightforward for this task of listing URLs in an XML sitemap since using it you don't even need the registerNamespace method. But we use XPath query to illustrate how it's done. Also, keep in mind that XPath can do a lot that DOMDocument objects alone could never do. The non-XPath version is at List Urls in XML Sitemap by Tag Name Using XPATH and PHP. The XPath version using XPath query is below.

A new DOMDocument object is created because for XPATH use, you have to create a DomDocument object. We load in the XML file with the load method.
The $doc->load('http://www.theliquidateher.com/sitemap.xml') code loads $doc as it gets a sitemap file's contents into the DOM object. Next we use $xpath = new DOMXPath($doc) to create a DOMXPath object with the file contents inside. Now we use the registerNamespace() method to register the namespace, because we happen to know about this file's xmlns declaration, which in the case of a sitemap is in the urlset tag. Next we define the $url_info array. It is not needed, but it's a convenient place to store XML document info if you need to. Now we perform an XPath Query going after all elements with url as the tag, and using the namespace prefix we declared before the namespaceURI in our registerNamespace method. The prefix can be whatever you want it to be, but it cannot be omitted.

Then we use the length of this DOMNodeList in a for loop to loop through these nodes, getting strings we can echo by use of: ->item(0)->nodeValue. The results of our query is a DOMNodeList, and we put the node values into the $url_info array. Examine the parameter in the XPath query: //m:url/*. The // means find the url tag anywhere in the XML file. The m is the namespace prefix. The url is the tag we are selecting. The /* means select all children under the url tag. In this case, this means the loc tag and the lastmod tag. We need strings that we can echo since raw DOM objects do not echo until you get their value as a string since echo only outputs strings, and nodeValue gets the nodes as strings.

Note that we see if the array value starts with http, and if so we echo the counter value and array element value. Otherwise we just echo the array element value. This is a precaution in case a tag is left out—checking for http solves that problem. Since //m:url/* gets all the children of the url tag, this means that the array gets two new elements for every url tag—the node values of loc and lastmod. But if the sitemap omits a tag for some reason, this complicates the display task. See the paragraph in the middle of the PHP scripts, below.

The XPath syntax page needs to explain namespace prefixes better. Right now, they are not even mentioned.

As you will see in List Specified Elements in XML Document by Tag Name Using XPATH and PHP, you can get tags one at a time using getElementsByTagName, but this is useful only if there are a lot a unique tags with few or no children. In the script on this page, there are hundreds of loc tags in the sitemap file, so we loop. In the List Urls in XML Sitemap by Tag Name Using XPATH and PHP script, we loop through results we get when using the getElementsByTagName() method, since this method returns a new instance of class DOMNodeList containing the elements with a given tag name. These are easy to loop through.

For DOM-only versions using the getElementsByTagName() method, there's no need for $xpath = new DOMXPath($doc), which creates an XPath object to use with the getElementsByTagName() method, because you do not need XPath for a getElementsByTagName method. But for $xpath->query() methods, XPath is essential. Note that we did not need to deal with namespace registration with the getElementsByTagName() method, because no XPATH is involved, but we needed it for the XPATH version. The method registerNamespace registers the namespace with the DOMXPath object we create in the script below. It won't work without it, nor will it work if we leave the prefix off of the XPath query parameter.

If an XPATH expression or non-XPATH expression returns a node set, you will get a DOMNodeList which can be looped through to get values. In the non-XPATH version in List Specified Elements in XML Document by Tag Name Using XPATH and PHP, we simply forget the loop and just get the node values of four different tags found in the file. This is good if there are no tags with the same tag name or few child tags under any one parent tag. But, as in the script below, it is essential to loop through many elements with the same tag name, as below.

In XPath, there are seven kinds of nodes: element, attribute, text, namespace, processing-instruction, comment, and document nodes. You can get more information on the syntax to use in XPath expressions in the W3Schools XPath expression page.


$doc = new DomDocument("1.0");
$xpath = new DOMXPath($doc);
$url_info = array();
$appNodes = $xpath->query('//m:url/*');
for($i=0;$i<$appNodes->length;$i++) {
$url_info[$i] = $appNodes->item($i)->nodeValue;
if(substr($url_info[$i],0,4)=="http"){$n++;echo $n." ".$url_info[$i]."<BR>";
echo "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;".$url_info[$i]."<BR>";}}
echo "<BR>";

Below is the script that will work perfect only if the sitemaps made at http://www.xml-sitemaps.com/ are perfect. They very nearly are. But close only counts in horseshoes. We examined the sitemap http://www.theliquidateher.com/sitemap.xml they made for theliquidateher.com and found that it had 300 sets of tags with each set composed—in XML—of a url parent tag, and under that two children tags: a loc tag and a lastmod tag. But the 128th set has no lastmod (date that page was last modified) tag. It would make much more sense if they had an empty lastmod tag or put a space or question mark if they couldn't find last modified info, in our opinion. The script below trips on its face at the 128th tag set, putting numbers in front of the remaining lastmod tags rather than the loc (page's URL location) tag, like the first 127 tag sets. So use the script above, not the script below (point it to your own sitemaps, if you like), which is GIGO, meaning it can only do as well as the sitemap it displays. The script above takes this problem into account and makes it a non-issue.

$doc = new DomDocument("1.0");
$xpath = new DOMXPath($doc);
$url_info = array();
$appNodes = $xpath->query('//m:url/*');
for($i=0;$i<$appNodes->length;$i++) {
$url_info[$i] = $appNodes->item($i)->nodeValue;
if(!$f){echo (($i/2)+1)." ";
echo "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;";}
echo $url_info[$i]."<BR>";}
echo "<BR>";