Fully support position() in xquery - by phe

Status : 

 


39
0
Sign in
to vote
ID 383888 Comments
Status Active Workarounds
Type Suggestion Repros 3
Opened 11/25/2008 7:28:57 AM
Access Restriction Public

Description

Right now position() can only be used in predict in xquery. However, in lots of scenarios, the position of the xml data is significant. 

DECLARE @x xml;
SET @x = N'<SelectAnswers xmlns="http://MyNamespace">
<AnswerType Type="selectOne" IsRequired="true">
<SelectAnswer>Once a month</SelectAnswer>
<SelectAnswer>Two or three times a month</SelectAnswer>
<SelectAnswer>Usually every week</SelectAnswer>
<SelectAnswer>More than once a week</SelectAnswer>
</AnswerType>
<AnswerType Type="MultipleChoice" IsRequired="true">
<SelectAnswer>This is my first time</SelectAnswer>
<SelectAnswer>Hardly ever/special occasions only</SelectAnswer>
<SelectAnswer>Less than once a month</SelectAnswer>
</AnswerType>
</SelectAnswers>'

-- Suppose you need to get the postion of AnswerType in the xml as AnswerTypeID  and the position of SelectAnswer in each AnswerType as AnswerID, you can not use position() to achieve it:

;WITH XMLNAMESPACES (DEFAULT 'http://MyNamespace')
SELECT
	T.answer.value('position()','int') AS AnswerTypeID,
	T.answer.value('@Type', 'nvarchar(50)') as AnswerTypeName,
	T.answer.value('@IsRequired', 'nvarchar(50)') as IsRequired,
	TS.A.value('position()','int') AS AnswerID,
	TS.A.value('.', 'nvarchar(50)') as SelectAnswer
FROM @x.nodes('SelectAnswers/AnswerType') AS T(answer)
	CROSS APPLY T.answer.nodes('SelectAnswer') TS(A)
ORDER BY AnswerTypeID,AnswerID;

--It gives the following error
/*
Msg 2371, Level 16, State 1, Line 58
XQuery [value()]: 'position()' can only be used within a predicate or XPath selector
*/

-- There are two workarounds now. One is to use row_number() function:

;WITH XMLNAMESPACES (DEFAULT 'http://MyNamespace')
SELECT
	TAnswerTypes.AnswerTypeID,
	TAnswerTypes.AnswerTypeName,
	TAnswerTypes.IsRequired,
	TS.A.value('.', 'nvarchar(50)') as SelectAnswer,
	ROW_NUMBER() OVER(PARTITION BY TAnswerTypes.AnswerTypeID ORDER BY TS.A.value('.', 'nvarchar(50)')) AS AnswerID
FROM (
	SELECT ROW_NUMBER() OVER(ORDER BY T.answer.value('@Type', 'nvarchar(50)')) AS AnswerTypeID,
	T.answer.value('@Type', 'nvarchar(50)') as AnswerTypeName,
	T.answer.value('@IsRequired', 'nvarchar(50)') as IsRequired,
	T.answer.query('.') AS xmlAnswer
	FROM @x.nodes('SelectAnswers/AnswerType') AS T(answer) ) TAnswerTypes
	CROSS APPLY TAnswerTypes.xmlAnswer.nodes('AnswerType/SelectAnswer') TS(A)
ORDER BY AnswerTypeID,AnswerID;

-- However, the AnswerTypeID and AnswerID may be different than the postion of the data in the xml because we need to specify the order by column for row_number function

-- One more work around is to use a number table like master..spt_values (http://www.sqlserverandxml.com/2008/08/tsql-lab-23-retrieving-values-and.html): 

;WITH XMLNAMESPACES (DEFAULT 'http://MyNamespace')
SELECT 
	TAnswerTypes.AnswerTypeID,
	TAnswerTypes.AnswerTypeName,
	TAnswerTypes.IsRequired,
	TS.A.value('.', 'nvarchar(50)') as SelectAnswer,
	--ROW_NUMBER() OVER(PARTITION BY TAnswerTypes.AnswerTypeID ORDER BY TS.A.value('.', 'nvarchar(50)')) 
	p1.number AS AnswerID
FROM master..spt_values p1 CROSS APPLY (
	SELECT 
	--ROW_NUMBER() OVER(ORDER BY T.answer.value('@Type', 'nvarchar(50)')) 
	p.number AS AnswerTypeID,
	T.answer.value('@Type', 'nvarchar(50)') as AnswerTypeName,
	T.answer.value('@IsRequired', 'nvarchar(50)') as IsRequired,
	T.answer.query('.') AS xmlAnswer
	FROM master..spt_values p CROSS APPLY @x.nodes('SelectAnswers/AnswerType[position()=sql:column("p.number")]') AS T(answer)
	WHERE p.type=N'P') TAnswerTypes
	CROSS APPLY TAnswerTypes.xmlAnswer.nodes('AnswerType/SelectAnswer[position()=sql:column("p1.number")]') TS(A)
WHERE p1.type=N'P'
ORDER BY AnswerTypeID,AnswerID

-- The problem is the workaround will be slow for large xml and not convenient for queries of complex xml documents.

If position() can be used in .value method in the first query, it will be a neat,simple, and quick solution.

Note: Sample data/queires are based on a post in MS forum: http://forums.microsoft.com/msdn/ShowPost.aspx?PostID=4152191&SiteID=1

Sign in to post a comment.
Posted by Bronk on 12/28/2015 at 9:55 PM
If position() is already allowed to be used in the context of a context-dependent predicate, i.e. [position()<=2], that means the position() is already known during execution and it just need to be output, isn't it?
Posted by Benhaha on 2/14/2014 at 2:33 AM
Also works (but not deterministic): row_number() over (partition by 0 order by nullif(0*rand(),0))
Posted by Benhaha on 2/14/2014 at 2:30 AM
Workaround: Use row_number() over (partition by 0 order by nullif(0*b.value('count(.)','int'),0))
Posted by Jacob Sebastian on 2/5/2009 at 10:11 PM
Just wanted to correct the url mentioned above. The resource http://www.sqlserverandxml.com/2008/08/tsql-lab-23-retrieving-values-and.htm is moved to http://blog.sqlserver.me/2008/08/tsql-lab-23-retrieving-values-and.html
Posted by Microsoft on 12/3/2008 at 4:42 PM
Just as a quick add-on...

currently are explaining

select x.value('c', 'int')
from @x.nodes('/a/b') N(x)

as being equivalent in a sense to /a/b/c

however, /a/b/position() based on the definition above would always return 1 for every row. So we would need to change that definition and before we do that, we need to understand the general impact.

Thanks
Michael
Posted by Microsoft on 12/3/2008 at 4:28 PM
Thanks for the feedback. The following shows how we would have to approach this.

The nodes() method returns N rows one for each node selected by the expression inside the nodes() method. Each of that node is being set as the context item for the expressions inside the value/query/exist/nodes methods. The definition of the position() function is

http://www.w3.org/TR/2007/REC-xpath-functions-20070123/#func-position

16.1 fn:position
fn:position() as xs:integer
Summary: Returns the context position from the dynamic context.

where context position is defined as:

[Definition: The context position is the position of the context item within the sequence of items currently being processed.] It changes whenever the context item changes. When the focus is defined, the value of the context position is an integer greater than zero. The context position is returned by the expression fn:position(). When an expression E1/E2 or E1[E2] is evaluated, the context position in the inner focus for an evaluation of E2 is the position of the context item in the sequence obtained by evaluating E1. The position of the first item in a sequence is always 1 (one). The context position is always less than or equal to the context size.

So we would need to define the context position for each of this node and consider each nodes() method to create an inner focus for the generated rows. If we would not do that, we would end up always returning 1 for every node.value('position()', 'int').

We will take a closer look at this in conjunction with some other improvements (like integrating hierarchyID with the nodes() method), based on the urgency of your feedback and resource availability.

Best regards
Michael