最近获得不少用户反馈,,,,,网站经常会见很是慢甚至打不开,,,,,cpu占用很是高,,,,,服务器负载整体也很是高,,,,,经由手艺职员剖析网站的日志发明有许多不着名的蜘蛛一直在爬行客户的站点,,,,,凭证pp电子履历连系大数据时代,,,,,种种数据爬虫(如清静扫描、舆情监测、AI大模子训练等)在持续一直的扫描和收罗网站数据,,,,,这类型的会见量占有网站总流量的99%以上。。。。。以是问题肯定是出在这里,,,,,我们有须要屏掉没有须要的蜘蛛爬行来镌汰网站的运行压力,,,,,接下来我们跟各人分享一下,,,,,怎样通过设置web.config 屏障一些不常用的蜘蛛爬行。。。。。着实设置要领也很简朴,,,,,登录您的服务器,,,,,用记事本等工具翻开 web.config,,,,,找到以下rewrite节点,增添如下红色的设置:
<system.webServer>
<modules runAllManagedModulesForAllRequests="true">
<add type="Kesion.APPCode.HttpModule" name="HttpModule" />
<remove name="Session" />
<add name="Session" type="System.Web.SessionState.SessionStateModule" />
</modules>
<security>
<requestFiltering>
<requestLimits maxAllowedContentLength="262144000" />
</requestFiltering>
</security>
<defaultDocument>
<files>
<clear />
<add value="index.aspx" />
<add value="index.html" />
</files>
</defaultDocument>
<handlers>
<add name="html" path="*.html" verb="*" type="System.Web.UI.PageHandlerFactory" />
<add name="all" path="*" verb="*" modules="IsapiModule" scriptProcessor="%windir%\Microsoft.NET\Framework\v4.0.30319\aspnet_isapi.dll" resourceType="Unspecified" requireAccess="None" preCondition="classicMode,runtimeVersionv2.0,bitness32" />
<remove name="ExtensionlessUrlHandler-Integrated-4.0" />
<remove name="OPTIONSVerbHandler" />
<remove name="TRACEVerbHandler" />
<add name="ExtensionlessUrlHandler-Integrated-4.0" path="*." verb="*" type="System.Web.Handlers.TransferRequestHandler" preCondition="integratedMode,runtimeVersionv4.0" />
</handlers>
<staticContent>
<!-- <mimeMap fileExtension=".mp4" mimeType="application/octet-stream" />
<mimeMap fileExtension=".woff" mimeType="application/x-woff" />
<mimeMap fileExtension="." mimeType="application/octet-stream" />
-->
<mimeMap fileExtension=".vue" mimeType="text/html" />
</staticContent>
<directoryBrowse enabled="false" />
<httpErrors errorMode="Custom">
<remove statusCode="404" />
<error statusCode="404" prefixLanguageFilePath="" path="/index.aspx?c=Go404" responseMode="ExecuteURL" />
</httpErrors>
<rewrite>
<rules>
<!-- 阻挡非主流爬虫 -->
<rule name="Block Non-Mainstream Crawlers" stopProcessing="true">
<match url=".*" />
<conditions>
<add input="{HTTP_USER_AGENT}" pattern="spider|scanner|curl|MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$"
ignoreCase="true" />
<add input="{HTTP_USER_AGENT}" pattern="Googlebot|Bingbot|Sogou|360Spider|Baiduspider" negate="true" />
</conditions>
<action type="AbortRequest" />
</rule>
<rule name="https" stopProcessing="true">
<match url="(.*)" />
<conditions>
<add input="{HTTPS}" pattern="^OFF$" />
<add input="{PATH_INFO}" pattern="^websystem" />
</conditions>
<action type="Redirect" url="https://{HTTP_HOST}/{R:1}" redirectType="Temporary" />
</rule>
</rules>
</rewrite>
</system.webServer>
以上增添的设置规则说明:
以下这句话规则中默认屏障部分不明蜘蛛,,,,,要屏障其他蜘蛛按规则添加即可,,,,,如下:
<add input="{HTTP_USER_AGENT}" pattern="spider|scanner|curl|MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$" ignoreCase="true" />
而这句会将主流的蜘蛛放行(如必应、百度,,,,,搜狗,,,,,百度等)
<add input="{HTTP_USER_AGENT}" pattern="Googlebot|Bingbot|Sogou|360Spider|Baiduspider" negate="true" />
通过设置以上rewrite规则,,,,,我们可以屏掉大部分的蜘蛛爬行,,,,,有用的提高网站的稳固运行。。。。。
虽然,,,,,条件允许的情形下,,,,,强烈建议您的网站开启WAF功效,,,,,好比购置阿里云的网站 WAF清静产品并做好响应的规则过滤等设置。。。。。
以下附各大蜘蛛名字:
google蜘蛛:googlebot
百度蜘蛛:baiduspider
百度手机蜘蛛:baiduboxapp
yahoo蜘蛛:slurp
alexa蜘蛛:ia_archiver
msn蜘蛛:msnbot
bing蜘蛛:bingbot
altavista蜘蛛:scooter
lycos蜘蛛:lycos_spider_(t-rex)
alltheweb蜘蛛:fast-webcrawler
inktomi蜘蛛:slurp
有道蜘蛛:YodaoBot和OutfoxBot
热土蜘蛛:Adminrtspider
搜狗蜘蛛:sogou spider
SOSO蜘蛛:sosospider
360搜蜘蛛:360spider