求一perl程序,希望得到实在的程序代码帮助
时间:2010-09-18
来源:互联网
编程需求:如果>行之间相同的字母组合完全相同,则输出相应的>所在的行。
如:
>gi|123|CAV28776.1| unnamed protein product [Physcomitrella patens]
PYCVRMGLKRKILHASEPQSPVGVL
>gi|34fg|gb|CAV28776.1| unnamed protein product [Physcomitrella patens]
PYCVRMGLKRKILHASEPQSPVGVL
两者存在相同的字符串PYCVRMGLKRKILHASEPQSPVGVL,所以输出结果:
>gi|123|CAV28776.1| unnamed protein product [Physcomitrella patens]
>gi|34fg|gb|CAV28776.1| unnamed protein product [Physcomitrella patens]
字符串相同
有一文件,内容如下
如:
>gi|218328416|gb|CAV28776.1| unnamed protein product [Physcomitrella patens]
MPQIQYSEKYFDDTYEYRHVVLPPDIAKLLPKNRLLSEAEWRGIGVQQSRGWVHYAIHRPEPHIMLFRRP
LNYGQPQQAAAVQQQPTGMKA
>gi|218328416|emb|CAV28776.1| unnamed protein product [Physcomitrella patens]
MPQIQYSEKYFDDTYEYRHVVLPPDIAKLLPKNRLLSEAEWRGIGVQQSRGWVHYAIHRPEPHIMLFRRP
LNYGQPQQAAAVQQQPTGMKA
>gi|51833416|emb|CAV28776.1| unnamed protein product [Physcomitrella patens]
MPQIQYSEKYFDDTYEYRHVVLPPDIAKLLPKNRLLSEAEWRGIGVQQSRGWVHYAIHRPEPHIMLFRRP
LNYGQPQQAAAVQQQPTGMKA
>gi|26190151|emb|CAD21955.1| cyclin D [Physcomitrella patens]
MSPSVDCLASLYCAEDVSGTAWNESEMCGAADRVFESQPAVFMDFPVEDDEAIATLLMKEAQFMPEADYL
ERYQSRKLSLEARLAAIEWILKVHSFYNYSPLTVALAVNYMDRFLSRYYFPEGKEWMLQLLSVACISLAA
KMEESDVPILLDFQVEQEEHIFEAHTIQRMELLVLSTLEWRMSGVTPFSYVDYFFHKLGVSDLLLRALLS
RVSEIILKSIRVTTSLQYLPSVVAAASIICALEEVTTIRTGDLLRTFNELLVNVESVKDCYIDMRQSEIG
PYCVRMGLKRKILHASEPQSPVGVLEAADVSSPSGTVLGFSSRESSPDVTDSPPSTNSQRKRRKLCLHNE
SCLHVESASL
输出结果:
>gi|218328416|gb|CAV28776.1| unnamed protein product [Physcomitrella patens]
>gi|218328416|emb|CAV28776.1| unnamed protein product [Physcomitrella patens]
>gi|51833416|emb|CAV28776.1| unnamed protein product [Physcomitrella patens]
字符串相同
附件是一个需要处理的文件
如:
>gi|123|CAV28776.1| unnamed protein product [Physcomitrella patens]
PYCVRMGLKRKILHASEPQSPVGVL
>gi|34fg|gb|CAV28776.1| unnamed protein product [Physcomitrella patens]
PYCVRMGLKRKILHASEPQSPVGVL
两者存在相同的字符串PYCVRMGLKRKILHASEPQSPVGVL,所以输出结果:
>gi|123|CAV28776.1| unnamed protein product [Physcomitrella patens]
>gi|34fg|gb|CAV28776.1| unnamed protein product [Physcomitrella patens]
字符串相同
有一文件,内容如下
如:
>gi|218328416|gb|CAV28776.1| unnamed protein product [Physcomitrella patens]
MPQIQYSEKYFDDTYEYRHVVLPPDIAKLLPKNRLLSEAEWRGIGVQQSRGWVHYAIHRPEPHIMLFRRP
LNYGQPQQAAAVQQQPTGMKA
>gi|218328416|emb|CAV28776.1| unnamed protein product [Physcomitrella patens]
MPQIQYSEKYFDDTYEYRHVVLPPDIAKLLPKNRLLSEAEWRGIGVQQSRGWVHYAIHRPEPHIMLFRRP
LNYGQPQQAAAVQQQPTGMKA
>gi|51833416|emb|CAV28776.1| unnamed protein product [Physcomitrella patens]
MPQIQYSEKYFDDTYEYRHVVLPPDIAKLLPKNRLLSEAEWRGIGVQQSRGWVHYAIHRPEPHIMLFRRP
LNYGQPQQAAAVQQQPTGMKA
>gi|26190151|emb|CAD21955.1| cyclin D [Physcomitrella patens]
MSPSVDCLASLYCAEDVSGTAWNESEMCGAADRVFESQPAVFMDFPVEDDEAIATLLMKEAQFMPEADYL
ERYQSRKLSLEARLAAIEWILKVHSFYNYSPLTVALAVNYMDRFLSRYYFPEGKEWMLQLLSVACISLAA
KMEESDVPILLDFQVEQEEHIFEAHTIQRMELLVLSTLEWRMSGVTPFSYVDYFFHKLGVSDLLLRALLS
RVSEIILKSIRVTTSLQYLPSVVAAASIICALEEVTTIRTGDLLRTFNELLVNVESVKDCYIDMRQSEIG
PYCVRMGLKRKILHASEPQSPVGVLEAADVSSPSGTVLGFSSRESSPDVTDSPPSTNSQRKRRKLCLHNE
SCLHVESASL
输出结果:
>gi|218328416|gb|CAV28776.1| unnamed protein product [Physcomitrella patens]
>gi|218328416|emb|CAV28776.1| unnamed protein product [Physcomitrella patens]
>gi|51833416|emb|CAV28776.1| unnamed protein product [Physcomitrella patens]
字符串相同
附件是一个需要处理的文件

sequences.rar (8.68 KB)
作者: bioinfor 发布时间: 2010-09-18
本帖最后由 longbow0 于 2010-09-18 11:48 编辑
用 Bioperl,Bio::SeqIO
大致是
复制代码
用 Bioperl,Bio::SeqIO
大致是
- use Bio::SeqIO
-
- my $o_seqi = Bio::SeqIO->new(
- -file => $infile,
- -format => 'fasta',
- );
-
- while (my $o_seq = $o_seqi->next_seq) {
- # 比较 $o_seq->seq 就可以了
- }
作者: longbow0 发布时间: 2010-09-18
本帖最后由 iamlimeng 于 2010-09-18 12:24 编辑
用HASH吧,很容易实现。如果数据量很大,那要考虑内存容量,或者换一种读文件的方式。
复制代码
用HASH吧,很容易实现。如果数据量很大,那要考虑内存容量,或者换一种读文件的方式。
- #!/usr/bin/perl
-
- use strict;
- use warnings;
-
- my $data_file = "sequences.txt";
- my $result_file = "result.txt";
-
- undef $/;
- open(FH,$data_file);
- my $data = <FH>;
- close FH;
-
- my %hash;
- foreach ( split(/>/,$data) ) {
- next if (!$_);
- s/\n//g;
- my ($key,$string) = split /]/;
- push(@{$hash{$string}},">$key]");
- }
-
- open(FH,">$result_file");
- foreach my $string (keys %hash) {
- if (scalar(@{$hash{$string}}) > 1) {
- foreach my $key (@{$hash{$string}}) {
- print FH "$key\n";
- }
- print FH "字符串相同\n\n";
- }
- }
- close FH;
- print "OK!\7";
- <>;
作者: iamlimeng 发布时间: 2010-09-18
相关阅读 更多
热门阅读
-
office 2019专业增强版最新2021版激活秘钥/序列号/激活码推荐 附激活工具
阅读:74
-
如何安装mysql8.0
阅读:31
-
Word快速设置标题样式步骤详解
阅读:28
-
20+道必知必会的Vue面试题(附答案解析)
阅读:37
-
HTML如何制作表单
阅读:22
-
百词斩可以改天数吗?当然可以,4个步骤轻松修改天数!
阅读:31
-
ET文件格式和XLS格式文件之间如何转化?
阅读:24
-
react和vue的区别及优缺点是什么
阅读:121
-
支付宝人脸识别如何关闭?
阅读:21
-
腾讯微云怎么修改照片或视频备份路径?
阅读:28