当前位置：首页 → 问答吧 → 求教：有关删除一个文件中重复内容的脚本

求教：有关删除一个文件中重复内容的脚本

时间：2010-12-16

来源：互联网

有这样一个文件，里面的内容是
< 1
< 1
> 1
< 2
> 2
> 2
< 3
< 3
里面的第二位1和1，2和2是重复的，但是都是一一匹配删除的。
比方说这个例子删除后应该是
< 1
> 2
< 3
< 3
现在有一个脚步，将重复的内容删除之后，结果是
< 3
< 3
删除了不该删除的内容。脚本内容如下：
#!/usr/bin/perl

if( @ARGV != 2 )
{
print ("command line : ./eraser-same.pl src dest ! \n");
}
my $source = $ARGV[0];
open FILE, "<$source" or die "it can not open file $!";
my $dest = $ARGV[1];
open DESTFILE, ">$dest" or die "it can not open file $!";
our @newout=();
our @oldout=();
our @sameout=();

######################################################################
#FUNCTION: compare_two_out #
# get the same out of newout and oldout,push to @sameout #
######################################################################
sub compare_two_out
{
foreach( @oldout )
{
$old = $_;
if( $old =~ /^< (.*)$/ )
{
$old_text = $1;
}
else
{
next;
}
foreach( @newout )
{
$new = $_;
if( $new =~ /^> (.*)$/ )
{
$new_text = $1;
}
else
{
next;
}
if( $old_text eq $new_text )
{
push( @sameout, $new );
}
}
}
}

######################################################################
#FUNCTION: display_out #
# display the variable of the list except the same one #
######################################################################
sub display_out
{
@out = @_;
$count = 0;
$num_same = 0;
for( $count=0; $count<@out; $count++ )
{
$out1 = $out[$count];
if( $out1 =~ /^[<>] (.*)$/ )
{
$out_text = $1;
}
else
{
print ( DESTFILE $out1);
next;
}
for( $num_same=0; $num_same<@sameout; $num_same++ )
{
$same = $sameout[$num_same];
if( $same =~ /^[<>] (.*)$/ )
{
$same_text = $1;
}
if( $out_text eq $same_text )
{
last;
}
}
if( $num_same >= @sameout )
{
print ( DESTFILE $out1);
}
}
}

######################################################################
##########START MAIN FUNCTION HERE####################################
######################################################################
while( <FILE> )
{
if( $_ =~ /^diff / )
{
if( (@oldout==0) && (@newout==0) )
{
print( DESTFILE "$_" );
next;
}
&compare_two_out;
&display_out(@oldout);
&display_out(@newout);
print( DESTFILE "$_" );
@newout=();
@oldout=();
@sameout=();
}
elsif( $_ =~ /^<.*/ ) ##old out
{
push(@oldout, $_);
}
elsif( $_ =~ /^>.*/ ) ##new out
{
push(@newout, $_);
}
else
{
next;
}
}
&compare_two_out;
&display_out(@oldout);
&display_out(@newout);
close DESTFILE;
close FILE;

调查了一天了，也不知道怎么修改这个脚本。100分。

作者: wengfanfan 发布时间: 2010-12-16

太长了，你用自然语言描述下你的问题和代码逻辑吧。比如所谓的匹配是怎么匹配的。
把代码格式化下，多举几个例子，举例子不能只举一个会有很多歧义。

作者: iambic 发布时间: 2010-12-16

引用 1 楼 iambic 的回复:
太长了，你用自然语言描述下你的问题和代码逻辑吧。比如所谓的匹配是怎么匹配的。
把代码格式化下，多举几个例子，举例子不能只举一个会有很多歧义。

额，好像是太长了，我自己都不太想看了。格式也乱七八糟的，我再整理下。
我主要不明白的地方时在display_out这个函数里，不太明白是怎么去除重复的。
为什么if( $out_text eq $same_text )
{
last;
}
就删除了呢。很奇怪

作者: wengfanfan 发布时间: 2010-12-16