当前位置：首页 → 问答吧 → 一个看似好简单其实系好复习既问题=.=

一个看似好简单其实系好复习既问题=.=

时间：2014-07-03

来源：互联网

我想批量抽出已红的字元(原本是黑色,红了是方便大家清楚)，可以用任何方法，任何程式，如何抽出?
最好能把抽出的红色结果贴在excel.

q：<a href="http://hk.rdeeg/cat/1/3//aid/*http://hksdfangyeah.com">tsuihaassng</a>q：<a href="http://hk.rdeeg/cat/1/3//aid/*http://hkhangyeah.asfcom">tsuihassang</a>q：<a href="http://hk.rdeeg/cat/1/3//aid/*http://hkhaagngyeah.com">happyfridayss222</a>q：<a href="http://hk.rdeeg/cat/1/3//aid/*http://hkhangyageah.com">billykillerss1021</a>q：<a href="http://hk.rdeeg/cat/1/3//aid/*http://hkhangyeagh.com">hangysseah</a>q：<a href="http://hk.rdeeg/cat/1/3//aid/*http://bbc.tin.fokom">ssssab</a>

[ 本帖最后由 yy_happy 於 2014-6-17 04:08 PM 编辑 ]

作者: yy_happy 发布时间: 2014-07-03

引用:原帖由 yy_happy 於 2014-6-17 04:06 PM 发表
我想批量抽出已红的字元(原本是黑色,红了是方便大家清楚)，可以用任何方法，任何程式，如何抽出?
最好能把抽出的红色结果贴在excel.

q：tsuihaassngq：tsuihassangq：happyfridayss222q：billykillerss1021q： ...

我自己有个function cut(@target, left, right)

Observing that 你每个要求字串被 ">" 及 "</a>" 包住.

So the solution is something like :
full_string := "q：<a .................om">ssssab</a>"
loop
one_token := cut(@full_string,">","</a>")
if empty then
exit loop
else
print one_token
endif
end loop

作者: 111x111=12321 发布时间: 2014-07-03

.net :
Str= "<your text>"
Str = System.Text.RegularExpressions.Regex.Replace(Str, "<[^>]*>", vbCrLf)

作者: kckcp 发布时间: 2014-07-03

正路应该用 XPath 一类的东西？
http://www.w3schools.com/xml/xml_xpath.asp

[ 本帖最后由 xianrenb 於 2014-6-17 05:54 PM 编辑 ]

作者: xianrenb 发布时间: 2014-07-03

with .net regex

复制内容到剪贴板代码: MatchCollection matches = Regex.Matches(htmlString, @"(<a[^>].*?>)(?<link>[^>].*?)(</a>)", RegexOptions.IgnoreCase | RegexOptions.Multiline);
foreach (Match m in matches)
{
Console.WriteLine("{0}", m.Groups["link"]);
}

作者: form5 发布时间: 2014-07-03

with HtmlAgilityPack

复制内容到剪贴板代码: HtmlDocument doc = new HtmlDocument();
doc.Load(new StringReader(htmlString));
foreach(HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
{
Console.WriteLine(link.InnerText);
}

作者: form5 发布时间: 2014-07-03

引用:原帖由 kckcp 於 2014-6-17 05:05 PM 发表
.net :
Str= ""
Str = System.Text.RegularExpressions.Regex.Replace(Str, "]*>", vbCrLf)

I think "q:" string would not be filtered here

作者: form5 发布时间: 2014-07-03

如各 CHing 回覆。

Reg Expression 不是好复杂罢？

作者: pc_chai 发布时间: 2014-07-03

同上各ching，regular expression就得
有心做…攞个notepad++都做得到