pp电子

登录
开通试用
pp电子网校V10/V11
资助首页 pp电子网校V10/V11 - 问题汇总

IIS 阻挡屏障垃圾蜘蛛UA爬行降低负载要领

0 2022/7/15 14:56:02

最近获得不少用户反馈,,,,,网站经常会见很是慢甚至打不开,,,,,cpu占用很是高,,,,,服务器负载整体也很是高,,,,,经由手艺职员剖析网站的日志发明有许多不着名的蜘蛛一直在爬行客户的站点,,,,,凭证pp电子履历连系大数据时代,,,,,种种数据爬虫(如清静扫描、舆情监测、AI大模子训练等)在持续一直的扫描和收罗网站数据,,,,,这类型的会见量占有网站总流量的99%以上。。。。。以是问题肯定是出在这里,,,,,我们有须要屏掉没有须要的蜘蛛爬行来镌汰网站的运行压力,,,,,接下来我们跟各人分享一下,,,,,怎样通过设置web.config 屏障一些不常用的蜘蛛爬行。。。。。着实设置要领也很简朴,,,,,登录您的服务器,,,,,用记事本等工具翻开 web.config,,,,,找到以下rewrite节点,增添如下红色的设置:

<system.webServer>

    <modules runAllManagedModulesForAllRequests="true">

      <add type="Kesion.APPCode.HttpModule" name="HttpModule" />

      <remove name="Session" />

      <add name="Session" type="System.Web.SessionState.SessionStateModule" />

    </modules>

    <security>

      <requestFiltering>

        <requestLimits maxAllowedContentLength="262144000" />

      </requestFiltering>

    </security>

    <defaultDocument>

      <files>

        <clear />

        <add value="index.aspx" />

        <add value="index.html" />

      </files>

    </defaultDocument>

    <handlers>

      <add name="html" path="*.html" verb="*" type="System.Web.UI.PageHandlerFactory" />

      <add name="all" path="*" verb="*" modules="IsapiModule" scriptProcessor="%windir%\Microsoft.NET\Framework\v4.0.30319\aspnet_isapi.dll" resourceType="Unspecified" requireAccess="None" preCondition="classicMode,runtimeVersionv2.0,bitness32" />

      <remove name="ExtensionlessUrlHandler-Integrated-4.0" />

      <remove name="OPTIONSVerbHandler" />

      <remove name="TRACEVerbHandler" />

      <add name="ExtensionlessUrlHandler-Integrated-4.0" path="*." verb="*" type="System.Web.Handlers.TransferRequestHandler" preCondition="integratedMode,runtimeVersionv4.0" />

    </handlers>

    <staticContent>

      <!-- <mimeMap fileExtension=".mp4" mimeType="application/octet-stream" />

      <mimeMap fileExtension=".woff" mimeType="application/x-woff" /> 

      <mimeMap fileExtension="." mimeType="application/octet-stream" />

  -->

      <mimeMap fileExtension=".vue" mimeType="text/html" />

    </staticContent>

    <directoryBrowse enabled="false" />

    <httpErrors errorMode="Custom">

      <remove statusCode="404" />

      <error statusCode="404" prefixLanguageFilePath="" path="/index.aspx?c=Go404" responseMode="ExecuteURL" />

    </httpErrors>

        <rewrite>

            <rules>

             <!-- 阻挡非主流爬虫 -->

      <rule name="Block Non-Mainstream Crawlers" stopProcessing="true">

        <match url=".*" />

        <conditions>

         <add input="{HTTP_USER_AGENT}" pattern="spider|scanner|curl|MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$"

ignoreCase="true" />

          <add input="{HTTP_USER_AGENT}" pattern="Googlebot|Bingbot|Sogou|360Spider|Baiduspider" negate="true" />

        </conditions>

        <action type="AbortRequest" />

      </rule>

                <rule name="https" stopProcessing="true">

                    <match url="(.*)" />

                    <conditions>

                        <add input="{HTTPS}" pattern="^OFF$" />

                        <add input="{PATH_INFO}" pattern="^websystem" />

                    </conditions>

                    <action type="Redirect" url="https://{HTTP_HOST}/{R:1}" redirectType="Temporary" />

                </rule>

            </rules>

        </rewrite>

  </system.webServer>


以上增添的设置规则说明:

以下这句话规则中默认屏障部分不明蜘蛛,,,,,要屏障其他蜘蛛按规则添加即可,,,,,如下:

<add input="{HTTP_USER_AGENT}" pattern="spider|scanner|curl|MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$" ignoreCase="true" />

而这句会将主流的蜘蛛放行(如必应、百度,,,,,搜狗,,,,,百度等)

<add input="{HTTP_USER_AGENT}" pattern="Googlebot|Bingbot|Sogou|360Spider|Baiduspider" negate="true" />


通过设置以上rewrite规则,,,,,我们可以屏掉大部分的蜘蛛爬行,,,,,有用的提高网站的稳固运行。。。。。


虽然,,,,,条件允许的情形下,,,,,强烈建议您的网站开启WAF功效,,,,,好比购置阿里云的网站 WAF清静产品并做好响应的规则过滤等设置。。。。。



以下附各大蜘蛛名字:


google蜘蛛:googlebot


百度蜘蛛:baiduspider


百度手机蜘蛛:baiduboxapp


yahoo蜘蛛:slurp


alexa蜘蛛:ia_archiver


msn蜘蛛:msnbot


bing蜘蛛:bingbot


altavista蜘蛛:scooter


lycos蜘蛛:lycos_spider_(t-rex)


alltheweb蜘蛛:fast-webcrawler


inktomi蜘蛛:slurp


有道蜘蛛:YodaoBot和OutfoxBot


热土蜘蛛:Adminrtspider


搜狗蜘蛛:sogou spider


SOSO蜘蛛:sosospider


360搜蜘蛛:360spider


100%
pp电子·模拟器(试玩游戏)官方网站 pp电子·模拟器(试玩游戏)官方网站 pp电子·模拟器(试玩游戏)官方网站
【网站地图】
IIS 阻挡屏障垃圾蜘蛛UA爬行降低负载要领_KESION