Skip to content

老师,365淘房网站上的数据爬取出问题,麻烦您帮我看一下 #14

@dayushan

Description

@dayushan
.libPaths("D:/R/library")
library(RCurl)
library(bitops)
library(XML)
library(stringr)
library(plyr)
library(rvest)
##i为2010和2011时会报错
##Error in eval(substitute(expr), envir, enclos) : 
#  input conversion failed due to input error, bytes 0xA9 0x4F 0xC6 0xF0 [6003]
for(i in 2012:2014){
	for(j in 1:12){
		mac_url<-paste("http://news.nj.house365.com/newslist/esfpd/esfbb/date=",i,"-",j,"-11/",sep="")
	          #paste("http://news.nj.house365.com/newslist/esfpd/esfbb/date=",paste(i,j,11,sep="-"),"/",sep="")
		url<-getHTMLLinks(mac_url)[4]
		if(url=="javascript:void(0);"){
			mac_url<-paste("http://news.nj.house365.com/newslist/esfpd/esfbb/date=",i,"-",j,"-21/",sep="")
	            #paste("http://news.nj.house365.com/newslist/esfpd/esfbb/date=",paste(i,j,21,sep="-"),"/",sep="")
		      url<-getHTMLLinks(mac_url)[4]
		}
		#wp<-getURL(url,.encoding="gb2312") #用网页本身的编码
		#wp2=iconv(wp,"gb2312","UTF-8") #转码
		#Encoding(wp2) #UTF-8
		#doc <- htmlParse(wp2,asText=T,encoding="UTF-8")
		web<-read_html(url,encoding="gb2312")
               ..........此处代码省略........
			}
}

以上为我的代码,但是在采集2012年4月的行情数据时报错,报错内容如下:
##Error in eval(substitute(expr), envir, enclos) :

input conversion failed due to input error, bytes 0xA9 0x4F 0xC6 0xF0 [6003]

麻烦吴老师帮我看一下

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions