SCP45: Unsafe meta copy
What it does
Reports the use of response.meta when
creating a request.
Why is this bad?
response.meta is an alias to
response.request.meta, and includes request
metadata, set by components, that is specific to the corresponding request and
should not be passed on to new requests.
For example, RetryMiddleware uses
meta to keep track of how many times a request has been
retried. If you pass response.meta to a new
request, you will also pass the retry count, which will lower the number of
times that the new request will be retried.
How to fix it?
Options include:
Use
cb_kwargs.For example, instead of:
def parse(self, response): return response.follow("/foo", self.parse2, meta={"foo": "bar"}) def parse2(self, response): return response.follow("/bar", self.parse3, meta=response.meta) def parse3(self, response): foo = response.meta["foo"]
Do:
def parse(self, response): return response.follow("/foo", self.parse2, cb_kwargs={"foo": "bar"}) def parse2(self, response, foo): return response.follow("/bar", self.parse3, cb_kwargs={"foo": foo}) def parse3(self, response, foo): ...
If
cb_kwargsfeels too verbose, use the scrapy-sticky-meta-params plugin.For example, instead of:
def parse(self, response): return response.follow("/foo", self.parse2, meta={"foo": "bar"}) def parse2(self, response): return response.follow("/bar", self.parse3, meta=response.meta) def parse3(self, response): foo = response.meta["foo"]
Configure the
StickyMetaParamsMiddlewaremiddleware, setsticky_meta_keys = ["foo"]in your spider class, and do:def parse(self, response): return response.follow("/foo", self.parse2, meta={"foo": "bar"}) def parse2(self, response): return response.follow("/bar", self.parse3) def parse3(self, response): foo = response.meta["foo"]
Explicitly map the meta keys to pass along.
For example, instead of:
def parse(self, response): return response.follow("/foo", self.parse2, meta={"foo": "bar"}) def parse2(self, response): return response.follow("/bar", self.parse3, meta=response.meta) def parse3(self, response): foo = response.meta["foo"]
Do:
def parse(self, response): return response.follow("/foo", self.parse2, meta={"foo": "bar"}) def parse2(self, response): return response.follow("/bar", self.parse3, meta={"foo": response.meta["foo"]}) def parse3(self, response): foo = response.meta["foo"]