XPath Intro

XPath is just a another language that’s built upon XML, XML is the foundation for most of the current enterprise applications, for both data representation/ storage, and for data interchange between systems within same organization which is EAI ( Enterprise Application Integration ), or between heterogeneous or disparate systems that’s B2B ( Business 2 Business ), .NET is using XML everywhere, for saving configurations, for saving solution files, etc.., BizTalk Server 2006 is built upon XML, and Office 2007 is using XML to save files, it’s much more better than before no proprietary format anymore, if you are curious enough, just like me, open any docx or xlsx file after changing its extension to zip, and you will find out that it’s no more than a collection of xml files storing the format and data and everything about the Office 2007 file.
XPath is a language to navigate XML trees, it supports operators, wildcards, and not surprisingly functions, like every language else!, the core of XPath is built upon context , context is simply where you are located within the tree, wherever you are at the tree you will get different results after running your expression, so it’s very important to know your location or the context of the query to better anticipate your result set, so in the following post I am gonna try to give a brief about XPath.

Why do you need XPath?

Very interesting answer, that you should have one answer to, suppose you have the following XML snippet :

<bookstore specialty="novel">
<book style="autobiography">
<author>
<first-name>Joe</first-name>
<last-name>Bob</last-name>
<award>Trenton Literary </award>
</author>
<price>12</price>
</book>
<book style="reference">
<author>
<first-name>Kareem</first-name>
<last-name>Shaker</last-name>
<award>Hopefully getting one soon</award>
</author>
<price>29</price>
</book>
</bookstore>

And you just want to change the price value, first you need to navigate to this node then you can change it, so to navigate to this node you need a Location Path, which is simply your XPath Expression, and surprisingly XPath expression is so similar to URL pattern, in other words to navigate to the price element you would write the following expression :

/bookstore/book/price

So simple isn’t it ?, I guess so, however here you use forward slashes to separate your steps, just like URLs, and each text enclosed between forward slashes is called Location Step, that’s the simplest expression you may need to write, but world is more complex than this, suppose you want to get prices that are greater than 20, you guessed it, you need to write a condition to filter results, in this case the expression is just like the following:

/bookstore/book[price>20]/price

See it’s so simple, you just opened to brackets and you write your condition inside, if you run this expression against the above snippet, you will get second result, as the first price is less than 20, also you should realize that the price element passed between the brackets is the one under the complex type book, and this is what’s meant by context, if you put this expression after the bookstore location step you’d get no matches because the price will be out of context.

Inside the above predicate you can write conditional operators that you are used to, and when you use square brackets and include a condition inside this is called a Predicate, noting that in other situations you use square brackets to path indexes for XPath collection of results, huh ?, yes you’ve got it, you can get results by index from your results, ain’t it nice ?
Also as a brief about functions, you have a bunch of functions available, and few are about navigating recordset, like first, last, etc.., try this expression:

/bookstore/book/author[last()]

This would give the last author element of each and every result returned within the results collection, for the above snippet it’ll give you the 2 authors, but what if you want to get the last author element considering the entire document content, in this case you will need to apply grouping by supplying parentheses, let’s try it:

(/bookstore/book/author)[last()]

As you can see from the above location path, the parentheses imposes precedence for this expression to execute before applying the function, so firstly the engine will get the collection of results then it will apply the function passed within the 2 square brackets, always use parentheses to decide the execution precedence, but the default one is just like most of other languages from left to right.
Interestingly, as mentioned before, you can supply indexes to extract specific elements within results collection, for example to extract the first book element you can write:

/bookstore/book[1]

This would extract the first book element with all its child ones.

Note:
you can get the full xml file from, http://msdn2.microsoft.com/en-us/library/ms256095.aspx, also I’ll try to cover more advanced topics in future posts, that was just an appetizer to get your hands wet with XPath, and I am sure you got tempted!

Labels: