-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cookiejars exposed #6218
base: master
Are you sure you want to change the base?
Cookiejars exposed #6218
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6218 /- ##
==========================================
Coverage 88.48% 88.67% 0.18%
==========================================
Files 160 161 1
Lines 11607 11792 185
Branches 1883 1912 29
==========================================
Hits 10271 10457 186
Misses 1009 1007 -2
- Partials 327 328 1
|
I haven't read the original ticket recently, but why is this feature optional? |
This is how it works on current version of PR script_sample.pyimport scrapy
from scrapy.crawler import CrawlerProcess
class Quotes(scrapy.Spider):
name = "quotes"; custom_settings = {"DOWNLOAD_DELAY": 1}
def start_requests(self):
yield scrapy.Request(url='https://quotes.toscrape.com/login', callback=self.login)
def login(self, response):
self.logger.info(self.cookie_jars[None]) # scrapy.http.cookies.CookieJar object
self.logger.info(self.cookie_jars[None].jar) # http.cookiejar object
locale_cookie = self.cookie_jars[None]._cookies["quotes.toscrape.com"]["/"].get("session")
locale_cookie.value = locale_cookie.value.upper()
self.logger.info(self.cookie_jars[None].jar)
if __name__ == "__main__":
p = CrawlerProcess(); p.crawl(Quotes); p.start() log_output (fragment)
|
I‘m slightly hesitant about setting a spider attribute from a middleware, and I wonder if maybe it should be set from a different place or in a different place (e.g. the crawler), but in general I’m fine with the approach. @kmike Any thoughts on the general approach? Should @GeorgeA92 go on with tests and docs? |
Hey! My main worry is the obscure API, which we'd need to document & support in the future. It'd require good documentation to explain a line like
It also need an access to a private property (._cookies). |
Another option is to update Lines 18 to 30 in 1d11ea3
|
@@ -293,6 293,52 @@ Here's an example of a log with :setting:`COOKIES_DEBUG` enabled:: | |||
[...] | |||
|
|||
|
|||
.. _cookiejars: | |||
|
|||
Direct access to cookiejars from spider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this section could go before COOKIES_ENABLED
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this. Obviously COOKIES_ENABLED
is more important than this, then as I think it should be after as it placed now
@GeorgeA92 @kmike @wRAR What about an API like this? # cookie middleware code
from scrapy import Request
from http.cookies imort SimpleCookie
_UNSET = object()
# We define a Cookie class that we can extend in the future, based on the
# Python API for server-side cookie handling.
class Cookie(SimpleCookie):
pass
# We define functions, in line with the get_retry_request approach, that can be
# used to easily interact with the cookies of a request. We extract the cookie
# jar ID and domain from the request, the user indicates the key and optionally
# the path.
def get_cookie(request: Request, key: str, path="/") -> Cookie:
...
def pop_cookie(request: Request, key: str, path="/", default=_UNSET) -> Cookie:
...
def set_cookie(request: Request, cookie: Cookie) -> None:
... |
Co-authored-by: Adrián Chaves <[email protected]>
Co-authored-by: Adrián Chaves <[email protected]>
Isn't it like this? def get_cookie(self, request: Request, key: str, path="/") -> Cookie:
jar = self.jars[request.meta.get("cookiejar")]
cookie = jar._cookies[key][path]
return cookie This looks clean enough and doesn't require users to touch any other APIs, or am I missing something? |
Aimed to fix #1878
based on suggestion from #1878 (comment)