SCP03: Improper response URL join

What it does

Finds usage of urljoin() that can be replaced with Response.urljoin().

Why is this bad?

Response.urljoin()

  • uses the HTML <base> tag to properly resolve relative URLs,

  • doesn’t require an extra import, and

  • is more readable.

Example

import scrapy
from urllib.parse import urljoin


class MySpider(scrapy.Spider):
    name = "myspider"

    def parse(self, response):
        yield {"homepage": urljoin(response.url, "/")}

Use instead:

import scrapy


class MySpider(scrapy.Spider):
    name = "myspider"

    def parse(self, response):
        yield {"homepage": response.urljoin("/")}