【求助】python登录网站获取数据
时间:2011-09-01
来源:互联网
我要登录的网站:http://shipping.capitallink.com/members/login.html
我要获取数据的网址:http://shipping.capitallink.com/baltic_exchange/historical_data.html?ticker=BDI
如果不登录的话是看不到数据的,下面的图是登录后看到的有数据的页面:
页面的源代码中有很多我要的数据,像这种样子:
<tr>
<td align="left" valign="bottom" class="div_line">
<font class="text">Baltic Dry Index [BDI]</font>
</td>
<td align="center" valign="bottom" class="div_line" width="150">
<font class="text">Aug 31, 2011</font>
</td>
<td align="right" valign="bottom" class="div_linex" width="100">
<font class="text">1619.00</font>
</td>
</tr>
<tr>
<td align="left" valign="bottom" class="div_line">
<font class="text">Baltic Dry Index [BDI]</font>
</td>
<td align="center" valign="bottom" class="div_line" width="150">
<font class="text">Aug 30, 2011</font>
</td>
<td align="right" valign="bottom" class="div_linex" width="100">
<font class="text">1537.00</font>
</td>
</tr>
我装了个httplib2,用的python3.2,有下面这样的一段代码,代码中的真实登录用户名和密码我替换了没写出来:
Python code
这段代码获取不到我要的数据,我不知道问题在哪里。我没开发过网站不了解原理,只是看了点python的书上的例子,弄不明白是怎么回事。请高手指点一下,谢谢了!
下面是一些照着书上的类似步骤在idle里做的一些操作,我看不出没有获取到数据的原因。
>>> import urllib
>>> import httplib2
>>> url_login='http://shipping.capitallink.com/members/login.html'
>>> body={'clUser_memberID':'casion98','clUser_password':'panzifei'}
>>> headers={'Content-type':'application/x-www-form-urlencoded'}
>>> httplib2.debuglevel=1
>>> http=httplib2.Http()
>>> response,content=http.request(url_login,'POST',headers=headers,body=urllib.parse.urlencode(body))
send: b'POST /members/login.html HTTP/1.1\r\nHost: shipping.capitallink.com\r\nContent-Length: 49\r\ncontent-type: application/x-www-form-urlencoded\r\naccept-encoding: gzip, deflate\r\nuser-agent: Python-httplib2/0.7.0 (gzip)\r\n\r\nclUser_memberID=casion98&clUser_password=panzifei'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date header: Server header: X-Powered-By header: Connection header: Transfer-Encoding header: Content-Type
>>> response
{'status': '200', 'x-powered-by': 'PHP/4.4.4', 'transfer-encoding': 'chunked', 'server': 'Apache/1.3.42 Ben-SSL/1.60 (Unix) PHP/4.4.4 mod_perl/1.30', 'connection': 'close', 'date': 'Thu, 01 Sep 2011 03:34:40 GMT', 'content-type': 'text/html'}
>>> len(content)
62728
看了下response没看到google的httplib2的example里说的cookie之类的东西,继续往下试:
>>> url_bdi='http://shipping.capitallink.com/baltic_exchange/historical_data.html?ticker=BDI'
>>> response2,content2=http.request(url_bdi,'GET')
send: b'GET /baltic_exchange/historical_data.html?ticker=BDI HTTP/1.1\r\nHost: shipping.capitallink.com\r\naccept-encoding: gzip, deflate\r\nuser-agent: Python-httplib2/0.7.0 (gzip)\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date header: Server header: X-Powered-By header: Connection header: Transfer-Encoding header: Content-Type
>>> response2
{'status': '200', 'content-location': 'http://shipping.capitallink.com/baltic_exchange/historical_data.html?ticker=BDI', 'x-powered-by': 'PHP/4.4.4', 'transfer-encoding': 'chunked', 'server': 'Apache/1.3.42 Ben-SSL/1.60 (Unix) PHP/4.4.4 mod_perl/1.30', 'connection': 'close', 'date': 'Thu, 01 Sep 2011 03:45:54 GMT', 'content-type': 'text/html'}
>>> len(content2)
62465
>>> fh=open('bdi.txt','w')
>>> fh.write(content2.decode('utf8'))
62465
>>> fh.close()
把获取到的响应写到文件里打开看了没有数据,结果是没有登录成功打开页面的那种页面吧。我应该怎么做呢?
我要获取数据的网址:http://shipping.capitallink.com/baltic_exchange/historical_data.html?ticker=BDI
如果不登录的话是看不到数据的,下面的图是登录后看到的有数据的页面:
页面的源代码中有很多我要的数据,像这种样子:
<tr>
<td align="left" valign="bottom" class="div_line">
<font class="text">Baltic Dry Index [BDI]</font>
</td>
<td align="center" valign="bottom" class="div_line" width="150">
<font class="text">Aug 31, 2011</font>
</td>
<td align="right" valign="bottom" class="div_linex" width="100">
<font class="text">1619.00</font>
</td>
</tr>
<tr>
<td align="left" valign="bottom" class="div_line">
<font class="text">Baltic Dry Index [BDI]</font>
</td>
<td align="center" valign="bottom" class="div_line" width="150">
<font class="text">Aug 30, 2011</font>
</td>
<td align="right" valign="bottom" class="div_linex" width="100">
<font class="text">1537.00</font>
</td>
</tr>
我装了个httplib2,用的python3.2,有下面这样的一段代码,代码中的真实登录用户名和密码我替换了没写出来:
Python code
#python3.2 import urllib import httplib2 url_login='http://shipping.capitallink.com/members/login.html' body={'clUser_memberID':'myusername','clUser_password':'mypassword'} headers={'Content-type':'application/x-www-form-urlencoded'} httplib2.debuglevel=1 http=httplib2.Http() response,content=http.request(url_login,'POST',headers=headers,body=urllib.parse.urlencode(body)) #headers={'Cookie':response['set-cookie']} url_bdi='http://shipping.capitallink.com/baltic_exchange/historical_data.html?ticker=BDI' response2,content2=http.request(url_bdi,'GET')
这段代码获取不到我要的数据,我不知道问题在哪里。我没开发过网站不了解原理,只是看了点python的书上的例子,弄不明白是怎么回事。请高手指点一下,谢谢了!
下面是一些照着书上的类似步骤在idle里做的一些操作,我看不出没有获取到数据的原因。
>>> import urllib
>>> import httplib2
>>> url_login='http://shipping.capitallink.com/members/login.html'
>>> body={'clUser_memberID':'casion98','clUser_password':'panzifei'}
>>> headers={'Content-type':'application/x-www-form-urlencoded'}
>>> httplib2.debuglevel=1
>>> http=httplib2.Http()
>>> response,content=http.request(url_login,'POST',headers=headers,body=urllib.parse.urlencode(body))
send: b'POST /members/login.html HTTP/1.1\r\nHost: shipping.capitallink.com\r\nContent-Length: 49\r\ncontent-type: application/x-www-form-urlencoded\r\naccept-encoding: gzip, deflate\r\nuser-agent: Python-httplib2/0.7.0 (gzip)\r\n\r\nclUser_memberID=casion98&clUser_password=panzifei'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date header: Server header: X-Powered-By header: Connection header: Transfer-Encoding header: Content-Type
>>> response
{'status': '200', 'x-powered-by': 'PHP/4.4.4', 'transfer-encoding': 'chunked', 'server': 'Apache/1.3.42 Ben-SSL/1.60 (Unix) PHP/4.4.4 mod_perl/1.30', 'connection': 'close', 'date': 'Thu, 01 Sep 2011 03:34:40 GMT', 'content-type': 'text/html'}
>>> len(content)
62728
看了下response没看到google的httplib2的example里说的cookie之类的东西,继续往下试:
>>> url_bdi='http://shipping.capitallink.com/baltic_exchange/historical_data.html?ticker=BDI'
>>> response2,content2=http.request(url_bdi,'GET')
send: b'GET /baltic_exchange/historical_data.html?ticker=BDI HTTP/1.1\r\nHost: shipping.capitallink.com\r\naccept-encoding: gzip, deflate\r\nuser-agent: Python-httplib2/0.7.0 (gzip)\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date header: Server header: X-Powered-By header: Connection header: Transfer-Encoding header: Content-Type
>>> response2
{'status': '200', 'content-location': 'http://shipping.capitallink.com/baltic_exchange/historical_data.html?ticker=BDI', 'x-powered-by': 'PHP/4.4.4', 'transfer-encoding': 'chunked', 'server': 'Apache/1.3.42 Ben-SSL/1.60 (Unix) PHP/4.4.4 mod_perl/1.30', 'connection': 'close', 'date': 'Thu, 01 Sep 2011 03:45:54 GMT', 'content-type': 'text/html'}
>>> len(content2)
62465
>>> fh=open('bdi.txt','w')
>>> fh.write(content2.decode('utf8'))
62465
>>> fh.close()
把获取到的响应写到文件里打开看了没有数据,结果是没有登录成功打开页面的那种页面吧。我应该怎么做呢?
作者: panzifei 发布时间: 2011-09-01
没cookie 80%是拿不到数据的。
作者: iambic 发布时间: 2011-09-01
在response里没有看到set-cookie是说这个网站不使用cookie而通过别的方式来控制我访问数据吗?
作者: panzifei 发布时间: 2011-09-01
相关阅读 更多
热门阅读
-
office 2019专业增强版最新2021版激活秘钥/序列号/激活码推荐 附激活工具
阅读:74
-
如何安装mysql8.0
阅读:31
-
Word快速设置标题样式步骤详解
阅读:28
-
20+道必知必会的Vue面试题(附答案解析)
阅读:37
-
HTML如何制作表单
阅读:22
-
百词斩可以改天数吗?当然可以,4个步骤轻松修改天数!
阅读:31
-
ET文件格式和XLS格式文件之间如何转化?
阅读:24
-
react和vue的区别及优缺点是什么
阅读:121
-
支付宝人脸识别如何关闭?
阅读:21
-
腾讯微云怎么修改照片或视频备份路径?
阅读:28