Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSS foo~:nth-child(2) gives incorrect XPath #707

Open
SimonSapin opened this issue Jun 18, 2012 · 3 comments
Open

CSS foo~:nth-child(2) gives incorrect XPath #707

SimonSapin opened this issue Jun 18, 2012 · 3 comments

Comments

@SimonSapin
Copy link

Hi,

I’m the maintainer of cssselect, which does in Python pretty much the same as Nokogiri for CSS selectors: translate them to XPath. It looks like the scrapy/cssselect#12 bug also applies to Nokogiri. Namely, the XPath translation of :nth-child() and similar pseudo-classes is wrong when used after the or ~ combinator. Here is a test case:

require 'nokogiri'
doc = Nokogiri::XML('<root><child1/><child2/><child3/></root>')
puts doc.css(':nth-child(2)').map { |e| e.name }
puts doc.css('child1 ~ :nth-child(2)').map { |e| e.name }

Expected output: child2 child2. Actual output child2 child3.

The problem is in the XPath translation of the later selector: //child1/following-sibling::*[position() = 2 and self::*] gives the element at position 2 when counting from child1, while we want the position among the parent’s children.

I am not sure it is even possible to correctly translate this selector to XPath: the = XPath operator on node-sets compares the text content of elements, not their identity.

The issue is similar for scrapy/cssselect#4 and Nokogiri’s #394.

@AurelPaulovic
Copy link

you should not use position() which depends on the context position

instead try

//child1/following-sibling::*[(count(preceding-sibling::*)  1)=2]

and similarly for Xn

//child1/following-sibling::*[(count(preceding-sibling::*) 1) mod X = 0]

@SimonSapin
Copy link
Author

Thank you @AurelPaulovic , I think that should work.

Now for a selector h2 ~ div:nth-of-type(2) the XPath expression could be //h2/following-sibling::div[count(preceding-sibling::div)=1].
But for the more general case: h2 ~ *:nth-of-type(2), XPath is //h2/following-sibling::*[count(preceding-sibling::*[name(.)=name(…)])=1] is there some expression we could put instead of to refer to the outer scope? Or maybe a way to bind the outer scope to a variable?

@AurelPaulovic
Copy link

Sadly, there is no way how to do that in XPath 1.0. You can't assign any variables and there is no way how to get to the outer context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants