+ -
当前位置:首页 → 问答吧 → 【求助】python登录网站获取数据

【求助】python登录网站获取数据

时间:2011-09-01

来源:互联网

我要登录的网站:http://shipping.capitallink.com/members/login.html
我要获取数据的网址:http://shipping.capitallink.com/baltic_exchange/historical_data.html?ticker=BDI

如果不登录的话是看不到数据的,下面的图是登录后看到的有数据的页面:


页面的源代码中有很多我要的数据,像这种样子:
<tr>

<td align="left" valign="bottom" class="div_line">
&nbsp;&nbsp;<font class="text">Baltic Dry Index [BDI]</font>
</td>
<td align="center" valign="bottom" class="div_line" width="150">
&nbsp;<font class="text">Aug 31, 2011</font>
</td>
<td align="right" valign="bottom" class="div_linex" width="100">
<font class="text">1619.00</font>&nbsp;
</td>
</tr>


<tr>

<td align="left" valign="bottom" class="div_line">
&nbsp;&nbsp;<font class="text">Baltic Dry Index [BDI]</font>
</td>
<td align="center" valign="bottom" class="div_line" width="150">
&nbsp;<font class="text">Aug 30, 2011</font>
</td>
<td align="right" valign="bottom" class="div_linex" width="100">
<font class="text">1537.00</font>&nbsp;
</td>
</tr>

我装了个httplib2,用的python3.2,有下面这样的一段代码,代码中的真实登录用户名和密码我替换了没写出来:
Python code
#python3.2
import urllib
import httplib2
url_login='http://shipping.capitallink.com/members/login.html'
body={'clUser_memberID':'myusername','clUser_password':'mypassword'}
headers={'Content-type':'application/x-www-form-urlencoded'}
httplib2.debuglevel=1
http=httplib2.Http()
response,content=http.request(url_login,'POST',headers=headers,body=urllib.parse.urlencode(body))
#headers={'Cookie':response['set-cookie']}
url_bdi='http://shipping.capitallink.com/baltic_exchange/historical_data.html?ticker=BDI'
response2,content2=http.request(url_bdi,'GET')


这段代码获取不到我要的数据,我不知道问题在哪里。我没开发过网站不了解原理,只是看了点python的书上的例子,弄不明白是怎么回事。请高手指点一下,谢谢了!

下面是一些照着书上的类似步骤在idle里做的一些操作,我看不出没有获取到数据的原因。
>>> import urllib
>>> import httplib2
>>> url_login='http://shipping.capitallink.com/members/login.html'
>>> body={'clUser_memberID':'casion98','clUser_password':'panzifei'}
>>> headers={'Content-type':'application/x-www-form-urlencoded'}
>>> httplib2.debuglevel=1
>>> http=httplib2.Http()
>>> response,content=http.request(url_login,'POST',headers=headers,body=urllib.parse.urlencode(body))

send: b'POST /members/login.html HTTP/1.1\r\nHost: shipping.capitallink.com\r\nContent-Length: 49\r\ncontent-type: application/x-www-form-urlencoded\r\naccept-encoding: gzip, deflate\r\nuser-agent: Python-httplib2/0.7.0 (gzip)\r\n\r\nclUser_memberID=casion98&clUser_password=panzifei'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date header: Server header: X-Powered-By header: Connection header: Transfer-Encoding header: Content-Type 
>>> response
{'status': '200', 'x-powered-by': 'PHP/4.4.4', 'transfer-encoding': 'chunked', 'server': 'Apache/1.3.42 Ben-SSL/1.60 (Unix) PHP/4.4.4 mod_perl/1.30', 'connection': 'close', 'date': 'Thu, 01 Sep 2011 03:34:40 GMT', 'content-type': 'text/html'}
>>> len(content)
62728
看了下response没看到google的httplib2的example里说的cookie之类的东西,继续往下试:

>>> url_bdi='http://shipping.capitallink.com/baltic_exchange/historical_data.html?ticker=BDI'

>>> response2,content2=http.request(url_bdi,'GET')
send: b'GET /baltic_exchange/historical_data.html?ticker=BDI HTTP/1.1\r\nHost: shipping.capitallink.com\r\naccept-encoding: gzip, deflate\r\nuser-agent: Python-httplib2/0.7.0 (gzip)\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date header: Server header: X-Powered-By header: Connection header: Transfer-Encoding header: Content-Type 
>>> response2
{'status': '200', 'content-location': 'http://shipping.capitallink.com/baltic_exchange/historical_data.html?ticker=BDI', 'x-powered-by': 'PHP/4.4.4', 'transfer-encoding': 'chunked', 'server': 'Apache/1.3.42 Ben-SSL/1.60 (Unix) PHP/4.4.4 mod_perl/1.30', 'connection': 'close', 'date': 'Thu, 01 Sep 2011 03:45:54 GMT', 'content-type': 'text/html'}
>>> len(content2)
62465
>>> fh=open('bdi.txt','w')
>>> fh.write(content2.decode('utf8'))
62465
>>> fh.close()
把获取到的响应写到文件里打开看了没有数据,结果是没有登录成功打开页面的那种页面吧。我应该怎么做呢?



作者: panzifei   发布时间: 2011-09-01

没cookie 80%是拿不到数据的。

作者: iambic   发布时间: 2011-09-01

在response里没有看到set-cookie是说这个网站不使用cookie而通过别的方式来控制我访问数据吗?

作者: panzifei   发布时间: 2011-09-01

热门下载

更多