+ -
当前位置:首页 → 问答吧 → 抓取网页问题

抓取网页问题

时间:2011-12-09

来源:互联网

HTML code

<form action="http://fccx.runsky.com/app/chushouSearch.php"  name="SaleSearchForm" method="post">
  <select name="paixu">
    <option value="">请选择排序方式</option>
    <option value="1" >按时间排序</option>
    <option value="2" >按价格由高到低</option>
    <option value="3" >按价格由低到高</option>
    <option value="4" >按面积由高到低</option>
    <option value="5" >按面积由低到高</option>
  </select>
  <input type='hidden' name='houseFlag' value='0'/>
  <input type='hidden' name='randid' value='20.0.0.18013234048941874285921'/>
  <input name='page' value='30'/>
  <input type="submit" value="提交" />
</form>


VBScript code

<%
PostData = "paixu=1&houseFlag=0&randid=20.0.0.18013234048941874285921&page=3"
Set HTTPReq = Server.createobject("Msxml2.XMLHTTP")
HTTPReq.Open "POST", "http://fccx.runsky.com/app/chushouSearch.php?", False   
HTTPReq.setRequestHeader "Content-Length",Len(PostData)
HTTPReq.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
HTTPReq.setRequestHeader "Referer", RefererUrl
HTTPReq.Send PostData

Response.Write(Bytes2bStr(HTTPReq.responseBody))

Function Bytes2bStr(vin) 
  Dim BytesStream, StringReturn 
  Set BytesStream = Server.CreateObject("ADODB.Stream") 
  BytesStream.Type = 2 
  BytesStream.Open 
  BytesStream.WriteText vin 
  BytesStream.Position = 0 
  BytesStream.Charset = "utf-8" 
  BytesStream.Position = 2 
  StringReturn = BytesStream.ReadText 
  BytesStream.close 
  Set BytesStream = Nothing 
  Bytes2bStr = StringReturn 
End Function
%>



用html就好使,用asp就出错,怎么解决此问题,希望能够用asp取得对方不同页的内容。

作者: notended   发布时间: 2011-12-09

TTPReq.Open "POST", "http://fccx.runsky.com/app/chushouSearch.php?", False
用true,或get,或去掉?都试过了,都不好使

作者: notended   发布时间: 2011-12-09

抓取网页。偶要实现实实更新天气预报。利用了XMLHTTP组件,抓取网页的指定部分。
需要分件html源代码
此例中的被抓取的html源代码如下
<p align=left>2004年8月24日星期二;白天:晴有时多云南风3—4级;夜间:晴南风3—4级;气温:最高29℃最低19℃ </p>
而程序中是从
以2004年8月24日为关键字搜索,直到</p>结速
而抓取的内容就变成了"2004年8月24日星期二;白天:晴有时多云南风3—4级;夜间:晴南风3—4级;气温:最高29℃最低19℃ "
干干净净的了。记录一下。

<%
On Error Resume Next
Server.ScriptTimeOut=9999999
Function getHTTPPage(Path)
  t = GetBody(Path)
  getHTTPPage=BytesToBstr(t,"GB2312")
End function

Function GetBody(url) 
  on error resume next
  Set Retrieval = CreateObject("Microsoft.XMLHTTP") 
  With Retrieval 
  .Open "Get", url, False, "", "" 
  .Send 
  GetBody = .ResponseBody
  End With 
  Set Retrieval = Nothing 
End Function

Function BytesToBstr(body,Cset)
  dim objstream
  set objstream = Server.CreateObject("adodb.stream")
  objstream.Type = 1
  objstream.Mode =3
  objstream.Open
  objstream.Write body
  objstream.Position = 0
  objstream.Type = 2
  objstream.Charset = Cset
  BytesToBstr = objstream.ReadText 
  objstream.Close
  set objstream = nothing
End Function
Function Newstring(wstr,strng)
  Newstring=Instr(lcase(wstr),lcase(strng))
  if Newstring<=0 then Newstring=Len(wstr)
End Function
%>

<html>

<BODY bgColor=#ffffff leftMargin=0 topMargin=0 MARGINHEIGHT=0 MARGINWIDTH=0>
<!-- 开始 -->  

<%
Dim wstr,str,url,start,over,dtime
dtime=Year(Date)&"年"&Month(Date)&"月"&Day(Date)&"日"
url="http://www.qianhuaweb.com/"
  wstr=getHTTPPage(url)
  start=Newstring(wstr,dtime)
  over=Newstring(wstr,"</p>")
 body=mid(wstr,start,over-start)

response.write "<MARQUEE onmouseover=this.stop(); onmouseout=this.start();>"&body&"</marquee>"


%>
<!-- 结束 -->
</body></html>

作者: hefeng_aspnet   发布时间: 2011-12-09