+ -
当前位置:首页 → 问答吧 → 请教shell脚本筛选日志命令

请教shell脚本筛选日志命令

时间:2011-03-20

来源:互联网

本帖最后由 Tennessee3Waltz 于 2011-03-20 14:28 编辑

本人是shell菜鸟,碰到一个筛选日志的问题,脚本写不出来,请教论坛里的同行。
源文件是:
  1. <a class="name" name="suite_C-plane.CN4240-TD LTE CP Counters ENB EPS BEARER REL REQ RNL RLF 20M SISO" title="C-plane.CN4240-TD LTE CP Counters ENB EPS BEARER REL REQ RNL RLF 20M SISO">CN4240-TD LTE CP Counters ENB EPS BEARER REL REQ RNL RLF 20M SISO</a>
  2. <td class="msg">error: (10060, 'Operation timed out')</td>
  3. <a class="name" name="suite_C-plane.CN4240-TD LTE CP Counters Paging 20M SISO" title="C-plane.CN4240-TD LTE CP Counters Paging 20M SISO">CN4240-TD LTE CP Counters Paging 20M SISO</a>
  4. <td class="msg">IndexError: list index out of range</td>
  5. <a class="name" name="suite_C-plane.CN4240-TD LTE CP Counters S1 SETUP 20M SISO" title="C-plane.CN4240-TD LTE CP Counters S1 SETUP 20M SISO">CN4240-TD LTE CP Counters S1 SETUP 20M SISO</a>
  6. <td class="msg">'D:\TestCase\trunk\log\UDPLog_Thu_Mar_17_23_21_28_2011.log' does not contain '[u'now OnAir', u'PBCH']'</td>
  7. <td class="msg">'D:\TestCase\trunk\log\UDPLog_Thu_Mar_17_23_21_28_2011.log' does not contain '[u'now OnAir', u'PBCH']'</td>
  8. <td class="msg">IndexError: list index out of range</td>
复制代码
现在目标是写shell指令,把红色部分提出出来:
<a class="name" name="suite_C-plane.CN4240-TD LTE CP Counters ENB EPS BEARER REL REQ RNL RLF 20M SISO" title="C-plane.CN4240-TD LTE CP Counters ENB EPS BEARER REL REQ RNL RLF 20M SISO">CN4240-TD LTE CP Counters ENB EPS BEARER REL REQ RNL RLF 20M SISO</a>
<td class="msg">error: (10060, 'Operation timed out')</td>
<a class="name" name="suite_C-plane.CN4240-TD LTE CP Counters Paging 20M SISO" title="C-plane.CN4240-TD LTE CP Counters Paging 20M SISO">CN4240-TD LTE CP Counters Paging 20M SISO</a>
<td class="msg">IndexError: list index out of range</td>
<a class="name" name="suite_C-plane.CN4240-TD LTE CP Counters S1 SETUP 20M SISO" title="C-plane.CN4240-TD LTE CP Counters S1 SETUP 20M SISO">CN4240-TD LTE CP Counters S1 SETUP 20M SISO</a>
<td class="msg">'D:\TestCase\trunk\log\UDPLog_Thu_Mar_17_23_21_28_2011.log' does not contain '[u'now OnAir', u'PBCH']'</td>
<td class="msg">'D:\TestCase\trunk\log\UDPLog_Thu_Mar_17_23_21_28_2011.log' does not contain '[u'now OnAir', u'PBCH']'</td>
<td class="msg">IndexError: list index out of range</td>

问题一:
我会写出每行提出的东西,比如第一行的筛选命令:awk 'BEGIN {FS="\""} {print $4}' | awk 'BEGIN {FS="."} {print $2}';我也会写第二行的筛选命令。可是这两套筛选标准的命令不一样,我如何对所有行写一个命令?

问题二:
代码中每一个class=“name”下的下一行可能有多行的class=“msg”,我只需要取多个msg中的第一行。我如何做到呢?

问题三:
代码输出时我想加点注释再输出,比如:
SUITE NAME:CN4240-TD LTE CP Counters ENB EPS BEARER REL REQ RNL RLF 20M SISO;
ERROR MASSAGE:error: (10060, 'Operation timed out');
SUITE NAME:CN4240-TD LTE CP Counters Paging 20M SISO;
ERROR MASSAGE:IndexError: list index out of range;
红色部分为注释。每行的注释我也还是会加,但是奇数行和偶数行的注释不一样怎么办呢?

问完了,问了这么多问题。好期待有高手帮我解答:em21: 会有人解答吗?

作者: Tennessee3Waltz   发布时间: 2011-03-20

try:
  1. awk -v RS="<a|</?td" -v FS="C-plane.|\"|>" '/class="name"/{print "SUITE NAME: "$8";";getline v;print "ERROR MASSAGE: "gensub(/.*msg">(.*)/,"\\1",1,v)";"} ' file
  2. SUITE NAME: CN4240-TD LTE CP Counters ENB EPS BEARER REL REQ RNL RLF 20M SISO;
  3. ERROR MASSAGE: error: (10060, 'Operation timed out');
  4. SUITE NAME: CN4240-TD LTE CP Counters Paging 20M SISO;
  5. ERROR MASSAGE: IndexError: list index out of range;
  6. SUITE NAME: CN4240-TD LTE CP Counters S1 SETUP 20M SISO;
  7. ERROR MASSAGE: 'D:\TestCase\trunk\log\UDPLog_Thu_Mar_17_23_21_28_2011.log' does not contain '[u'now OnAir', u'PBCH']';
复制代码

作者: yinyuemi   发布时间: 2011-03-20

还可以再简化下
awk -v RS="<a|</td|<td class=\"msg\">" -v FS="C-plane.|\"|>|" '/class="name"/{print "SUITE NAME: "$8";";getline v;print "ERROR MASSAGE: "v";"} ' file

作者: yinyuemi   发布时间: 2011-03-20

本帖最后由 Tennessee3Waltz 于 2011-03-20 14:52 编辑

回复 yinyuemi
你真是太棒了,这么快就写好了。为了用你的指令,我正在apt-get install gawk。
我有一个扩展的问题:
假如经过了刚才的处理后得到的输出是:
  1. SUITE NAME: CN4240-TD LTE CP Counters ENB EPS BEARER REL REQ RNL RLF 20M SISO;
  2. ERROR MASSAGE: error: (10060, 'Operation timed out');
  3. SUITE NAME: CN4240-TD LTE CP Counters Paging 20M SISO;
  4. ERROR MASSAGE: IndexError: list index out of range;
  5. SUITE NAME: CN4240-TD LTE CP Counters S1 SETUP 20M SISO;
  6. ERROR MASSAGE: 'D:\TestCase\trunk\log\UDPLog_Thu_Mar_17_23_21_28_2011.log' does not contain '[u'now OnAir', u'PBCH']';
  7. SUITE NAME: CN4240-TD LTE CP Counters S1 SETUP 10M MIMO;
  8. ERROR MASSAGE: 'D:\TestCase\trunk\log\UDPLog_Thu_Mar_17_23_21_28_2011.log' does not contain '[u'now OnAir', u'PBCH']';
  9. SUITE NAME: CN4240-TD LTE CP Counters Paging 20M MIMO for CP;
  10. ERROR MASSAGE: IndexError: list index out of range;
  11. SUITE NAME: CN4240-TD LTE CP Counters S1 SETUP 15M MIMO ENB EPS BEARE;
  12. ERROR MASSAGE: 'D:\TestCase\trunk\log\UDPLog_Thu_Mar_17_23_21_28_2011.log' does not contain '[u'now OnAir', u'PBCH']';
复制代码
即下面红色为重复的MASSAGE:
SUITE NAME: CN4240-TD LTE CP Counters ENB EPS BEARER REL REQ RNL RLF 20M SISO;
ERROR MASSAGE: error: (10060, 'Operation timed out');
SUITE NAME: CN4240-TD LTE CP Counters Paging 20M SISO;
ERROR MASSAGE: IndexError: list index out of range;
SUITE NAME: CN4240-TD LTE CP Counters S1 SETUP 20M SISO;
ERROR MASSAGE: 'D:\TestCase\trunk\log\UDPLog_Thu_Mar_17_23_21_28_2011.log' does not contain '[u'now OnAir', u'PBCH']';
SUITE NAME: CN4240-TD LTE CP Counters S1 SETUP 10M MIMO;
ERROR MASSAGE: 'D:\TestCase\trunk\log\UDPLog_Thu_Mar_17_23_21_28_2011.log' does not contain '[u'now OnAir', u'PBCH']';
SUITE NAME: CN4240-TD LTE CP Counters Paging 20M MIMO for CP;
ERROR MASSAGE: IndexError: list index out of range;
SUITE NAME: CN4240-TD LTE CP Counters S1 SETUP 15M MIMO ENB EPS BEARE;
ERROR MASSAGE: 'D:\TestCase\trunk\log\UDPLog_Thu_Mar_17_23_21_28_2011.log' does not contain '[u'now OnAir', u'PBCH']';

你看现在这些信息中有的ERROR MASSAGE是重复的(SUITE NAME不会重复),我们先定义把SUITE NAME和ERROR MASSAGE叫做一对。若ERROR MASSAGE有重复的,则不再输出,并且相应成对的SUITE NAME也不输出。因为我只要是统计有哪些出错类型。即,希望输出是:
SUITE NAME: CN4240-TD LTE CP Counters ENB EPS BEARER REL REQ RNL RLF 20M SISO;
ERROR MASSAGE: error: (10060, 'Operation timed out');
SUITE NAME: CN4240-TD LTE CP Counters Paging 20M SISO;
ERROR MASSAGE: IndexError: list index out of range;
SUITE NAME: CN4240-TD LTE CP Counters S1 SETUP 20M SISO;
ERROR MASSAGE: 'D:\TestCase\trunk\log\UDPLog_Thu_Mar_17_23_21_28_2011.log' does not contain '[u'now OnAir', u'PBCH']';
该如何做到呢?

作者: Tennessee3Waltz   发布时间: 2011-03-20

sed只提取 其他的没有处理

:a
/<a class=/, /<\/a>/{
   s/.*suite_C-plane\.\([^"]*\)" title=".*/\1/;
   ta
}

/<td class=/, /<\/td>/{
  s/.*class="msg">\([^<]*\)<.*/\1/;
  ta
}

作者: chenbin200818   发布时间: 2011-03-20

想要错误不重复, 只要将取出来的值 作为数组的下标 然后判断是否有该下标就ok了

作者: chenbin200818   发布时间: 2011-03-20

  1. awk 'BEGIN{ RS="SUITE NAME";FS="\n";ORS=""}!a[$2]++&&NR>1{print RS$0}' file
  2. SUITE NAME: CN4240-TD LTE CP Counters ENB EPS BEARER REL REQ RNL RLF 20M SISO;
  3. ERROR MASSAGE: error: (10060, 'Operation timed out');
  4. SUITE NAME: CN4240-TD LTE CP Counters Paging 20M SISO;
  5. ERROR MASSAGE: IndexError: list index out of range;
  6. SUITE NAME: CN4240-TD LTE CP Counters S1 SETUP 20M SISO;
  7. ERROR MASSAGE: 'D:\TestCase\trunk\log\UDPLog_Thu_Mar_17_23_21_28_2011.log' does not contain '[u'now OnAir', u'PBCH']';
复制代码

作者: yinyuemi   发布时间: 2011-03-20

回复 chenbin200818
谢谢你的回复,这里真是高手如云啊。能不能麻烦把代码写一下呢?我不知道该怎么写呢。

作者: Tennessee3Waltz   发布时间: 2011-03-20

热门下载

更多