当前位置：首页 → 问答吧 → 【求助】python登录网站获取数据

【求助】python登录网站获取数据

时间：2011-09-01

来源：互联网

我要登录的网站：http://shipping.capitallink.com/members/login.html
我要获取数据的网址：http://shipping.capitallink.com/baltic_exchange/historical_data.html?ticker=BDI

如果不登录的话是看不到数据的，下面的图是登录后看到的有数据的页面：

页面的源代码中有很多我要的数据，像这种样子：
<tr>

<td align="left" valign="bottom" class="div_line">
  Baltic Dry Index [BDI]
</td>
<td align="center" valign="bottom" class="div_line" width="150">
 Aug 31, 2011
</td>
<td align="right" valign="bottom" class="div_linex" width="100">
1619.00 
</td>
</tr>

<tr>

<td align="left" valign="bottom" class="div_line">
  Baltic Dry Index [BDI]
</td>
<td align="center" valign="bottom" class="div_line" width="150">
 Aug 30, 2011
</td>
<td align="right" valign="bottom" class="div_linex" width="100">
1537.00 
</td>
</tr>

我装了个httplib2，用的python3.2，有下面这样的一段代码，代码中的真实登录用户名和密码我替换了没写出来：
Python code

#python3.2
import urllib
import httplib2
url_login='http://shipping.capitallink.com/members/login.html'
body={'clUser_memberID':'myusername','clUser_password':'mypassword'}
headers={'Content-type':'application/x-www-form-urlencoded'}
httplib2.debuglevel=1
http=httplib2.Http()
response,content=http.request(url_login,'POST',headers=headers,body=urllib.parse.urlencode(body))
#headers={'Cookie':response['set-cookie']}
url_bdi='http://shipping.capitallink.com/baltic_exchange/historical_data.html?ticker=BDI'
response2,content2=http.request(url_bdi,'GET')

这段代码获取不到我要的数据，我不知道问题在哪里。我没开发过网站不了解原理，只是看了点python的书上的例子，弄不明白是怎么回事。请高手指点一下，谢谢了！

下面是一些照着书上的类似步骤在idle里做的一些操作，我看不出没有获取到数据的原因。
>>> import urllib
>>> import httplib2
>>> url_login='http://shipping.capitallink.com/members/login.html'
>>> body={'clUser_memberID':'casion98','clUser_password':'panzifei'}
>>> headers={'Content-type':'application/x-www-form-urlencoded'}
>>> httplib2.debuglevel=1
>>> http=httplib2.Http()
>>> response,content=http.request(url_login,'POST',headers=headers,body=urllib.parse.urlencode(body))

send: b'POST /members/login.html HTTP/1.1\r\nHost: shipping.capitallink.com\r\nContent-Length: 49\r\ncontent-type: application/x-www-form-urlencoded\r\naccept-encoding: gzip, deflate\r\nuser-agent: Python-httplib2/0.7.0 (gzip)\r\n\r\nclUser_memberID=casion98&clUser_password=panzifei'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date header: Server header: X-Powered-By header: Connection header: Transfer-Encoding header: Content-Type
>>> response
{'status': '200', 'x-powered-by': 'PHP/4.4.4', 'transfer-encoding': 'chunked', 'server': 'Apache/1.3.42 Ben-SSL/1.60 (Unix) PHP/4.4.4 mod_perl/1.30', 'connection': 'close', 'date': 'Thu, 01 Sep 2011 03:34:40 GMT', 'content-type': 'text/html'}
>>> len(content)
62728
看了下response没看到google的httplib2的example里说的cookie之类的东西，继续往下试：

>>> url_bdi='http://shipping.capitallink.com/baltic_exchange/historical_data.html?ticker=BDI'

>>> response2,content2=http.request(url_bdi,'GET')
send: b'GET /baltic_exchange/historical_data.html?ticker=BDI HTTP/1.1\r\nHost: shipping.capitallink.com\r\naccept-encoding: gzip, deflate\r\nuser-agent: Python-httplib2/0.7.0 (gzip)\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date header: Server header: X-Powered-By header: Connection header: Transfer-Encoding header: Content-Type
>>> response2
{'status': '200', 'content-location': 'http://shipping.capitallink.com/baltic_exchange/historical_data.html?ticker=BDI', 'x-powered-by': 'PHP/4.4.4', 'transfer-encoding': 'chunked', 'server': 'Apache/1.3.42 Ben-SSL/1.60 (Unix) PHP/4.4.4 mod_perl/1.30', 'connection': 'close', 'date': 'Thu, 01 Sep 2011 03:45:54 GMT', 'content-type': 'text/html'}
>>> len(content2)
62465
>>> fh=open('bdi.txt','w')
>>> fh.write(content2.decode('utf8'))
62465
>>> fh.close()
把获取到的响应写到文件里打开看了没有数据，结果是没有登录成功打开页面的那种页面吧。我应该怎么做呢？

作者: panzifei 发布时间: 2011-09-01

没cookie 80%是拿不到数据的。

作者: iambic 发布时间: 2011-09-01

在response里没有看到set-cookie是说这个网站不使用cookie而通过别的方式来控制我访问数据吗？

作者: panzifei 发布时间: 2011-09-01