Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

调整重试逻辑 #1170

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open

Conversation

niuxiaozu
Copy link
Contributor

修改download成功后对状态码的处理,如果状态码不被site.acceptStatCode接收的话就算失败,进行doCycleRetry重试逻辑。并且加入队列前先sleep。

@sutra
Copy link
Collaborator

sutra commented Jun 17, 2024

原本的设计是只在发生异常的时候才去 doCycleRetry

@niuxiaozu
Copy link
Contributor Author

主要是缺少一个处理状态码错误的方式,之前我是通过继承HttpClientDownloader,在内部判定状态码不对的情况将page。downloadSuccess设置为false,但总感觉不太好,后面才选用的这种方式。如果你觉得设计不是这样的,那就算啦,不过我感觉确实得有一个处理状态码不对时的重试逻辑

@sutra
Copy link
Collaborator

sutra commented Jun 17, 2024

我推荐用 SpiderListener 来实现,甚至那些内置的重试,都可以统一改成这种机制。

@niuxiaozu
Copy link
Contributor Author

那样得大改了,我只是借用现有的机制小做修改😅

@sutra
Copy link
Collaborator

sutra commented Jun 17, 2024 via email

@niuxiaozu
Copy link
Contributor Author

好像不行,因为onError只有在抛出异常的时候才调用,而状态码不对的时候,不抛出异常,只是打了个log

private void onDownloadSuccess(Request request, Page page) {
        if (site.getAcceptStatCode().contains(page.getStatusCode())){
            pageProcessor.process(page);
            extractAndAddRequests(page, spawnUrl);
            if (!page.getResultItems().isSkip()) {
                for (Pipeline pipeline : pipelines) {
                    pipeline.process(page.getResultItems(), this);
                }
            }
        } else {
            logger.info("page status code error, page {} , code: {}", request.getUrl(), page.getStatusCode());
        }
        sleep(site.getSleepTime());
        return;
    }

@sutra
Copy link
Collaborator

sutra commented Jun 17, 2024

实现这个 us.codecraft.webmagic.SpiderListener.onSuccess(Request),在这个里面去判断 status code 试试。

@niuxiaozu
Copy link
Contributor Author

这样是可以的,我先试试吧,感谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants