Skip to content

第16章 采集关于收集的数据 #20

@we0530

Description

@we0530

@coderLMN 老师您好!

url <- str_c(baseURL,keyword)
firstSearchPage <- getURL(url, encoding = "UTF-8")
parsedFirstSearchPage <- htmlParse(firstSearchPage, encoding = "UTF-8")

运行出来HTML的结果与网页中的不匹配,有缺失和乱码出现,请问这是什么原因造成的?

下面是运行出来的结果:

   <div class="a-divider a-divider-section"><div class="a-divider-inner"></div></div>

        <div class="a-text-center a-spacing-small a-size-mini">
            <a href="https://www.amazon.com/gp/help/customer/display.html/ref=footer_cou?ie=UTF8&amp;nodeId=508088">Conditions of Use</a>
            <span class="a-letter-space"></span>
            <span class="a-letter-space"></span>
            <span class="a-letter-space"></span>
            <span class="a-letter-space"></span>
            <a href="https://www.amazon.com/gp/help/customer/display.html/ref=footer_privacy?ie=UTF8&amp;nodeId=468496">Privacy Policy</a>
        </div>
        <div class="a-text-center a-size-mini a-color-secondary">
          © 1996-2014, Amazon.com, Inc. or its affiliates
          <script>
           if (true === true) {
             document.write('<img src="https://fls-na.amaz'+'on.com/'+'1/oc-csi/1/OP/requestId=WHMJ70XYD6ZR2TSDJGT9&js=1" />');
           };
          </script><noscript>
            <img src="https://fls-na.amazon.com/1/oc-csi/1/OP/requestId=WHMJ70XYD6ZR2TSDJGT9&amp;js=0">
</noscript>
        </div>
    </div>
    <script>
    if (true === true) {
        var elem = document.createElement("script");
        elem.src = "https://images-na.ssl-images-amazon.com/images/G/01/csminstrumentation/csm-captcha-instrumentation.min._V" + (+ new Date()) + "_.js";
        document.getElementsByTagName('head')[0].appendChild(elem);
    }
    </script>
</body>
</html>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions