実現したいこと
一つのアラインメントに対して、読み込みファイルから配列@query_posへ、Query(アミノ酸配列)のポジションをpushする。
アラインメントが終わるごとに、@query_posを初期化し、同様にQueryのポジションをpushし、それぞれのアラインメントについて
配列@query_posの最初と最後の値を表示したい。
下の結果の場合、24と548、613と782を表示したい。
※アラインメント...2つの文字列を並べること
プログラムの結果(一例)
Query= sp|P49021|TIM_DROME Protein timeless OS=Drosophila melanogaster
Sbjct > NT_033779.5 Drosophila melanogaster chromosome 2L
Score = 833 bits (2151), Expect = 0.0, Method: Compositional matrix adjust.
Identities = 517/544 (95%), Positives = 518/544 (95%), Gaps = 22/544 (4%)
Frame = -1
Query 24 TYVVNPNALAILEEINYKLTYEDQTLRTFRRAIGFGQNVRSDLIPLLENAKDDAVLESVI 83 Query 84 RILVNLTVPVECLFSVDVMYRTDVGRHTIFELNKLLYTSKEAFTEARSTKSVVEYMKHIL 143 Query 144 ESDPKLSPHKCDQinncllllrnilhiPETHAHCVMPMMQSMPHGISMQNTILWNLFIQS 203 Query 204 IDKLLLYLMTCPQRAFWGVTMVQLIALIYKDQHVSTLQKLLSLWFeaslsessednesnt 263 Query 264 sPPKQGSGDsspmltsdptsdssdngsngrgmgggmregtAATLQEVSRKGQEYQNAMAR 323 Query 324 VPADKPDGSEEASDMTGNDSEQPGSPEQSQPAGESMDDGDYEDQRHRQLNEHGeededed 383 Query 384 e-------------------veeeeYLQLGPASEPLNLTQQPADKVNNTTNPTSSAPQGC 424 Query 425 LGNEPFKPPPPLPVRASTSAHAQMQKFNESSYASHVSAVKLGQKSPHAGQLQLTKGKCCP 484 Query 485 QKRECPSSQSELSDCGYGTQVENQESISTSSNDDDgpqgkpqhqkppCNTKPRNKPRTIM 544 Query 545 SPMD 548
Score = 321 bits (822), Expect = 2e-88, Method: Compositional matrix adjust.
Identities = 170/170 (100%), Positives = 170/170 (100%), Gaps = 0/170 (0%)
Frame = -3
Query 613 KVPIDTSHFFWLVTYFLKFAAQLELDMEHIDTILTYDVLSYLTYEGVSLCEQLELNARQE 672 Query 673 GSDLKPYLRRMHLVVTAIREFLQAIDTYNKVTHLNEDDKAHLRQLQLQISEMSDlrclfv 732 Query 733 lllrrfNPSIHSKQYLQDLVVTNHILLLILDSSAKLGGCQTIRLSEHITQ 782
...
...
ソースコード
Perl
1#!/usr/bin/perl 2use strict; 3use warnings; 4 5my @query_pos; 6 7open(FH, '<', $ARGV[0]) or die 'Cannot open file: $!'; 8 9while (my $buff = <FH>) { 10 chomp $buff; 11 print $&.$',"\n" if ($buff =~ /^Query=/); 12 print "\nSbjct ",$&.$',"\n" if ($buff =~ /^>/); 13 print "\n",$&.$',"\n" if ($buff =~ /^ Score =/); 14 print $&.$',"\n" if ($buff =~ /^ Identities =/); 15 print $&.$',"\n\n" if ($buff =~ /^ Frame =/); 16 if ($buff =~ /Query\s+(\d+)\s+[a-zA-Z-]+\s+(\d+)/) { 17 push(@query_pos, $1, $2); 18 print "\t",$&,"\n"; 19 } 20}
読み込みファイル
result_tim_fly.txt
TBLASTN 2.8.1+ Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Database: User specified sequence set (Input: GCF_000001215.4_Release_6_plus_ISO1_MT_genomic.fna). 1,870 sequences; 143,726,002 total letters Query= sp|P49021|TIM_DROME Protein timeless OS=Drosophila melanogaster OX=7227 GN=tim PE=1 SV=3 Length=1398 Score E Sequences producing significant alignments: (Bits) Value NT_033779.5 Drosophila melanogaster chromosome 2L 833 0.0 NT_033778.4 Drosophila melanogaster chromosome 2R 35.0 1.9 > NT_033779.5 Drosophila melanogaster chromosome 2L Length=23513712 Score = 833 bits (2151), Expect = 0.0, Method: Compositional matrix adjust. Identities = 517/544 (95%), Positives = 518/544 (95%), Gaps = 22/544 (4%) Frame = -1 Query 24 TYVVNPNALAILEEINYKLTYEDQTLRTFRRAIGFGQNVRSDLIPLLENAKDDAVLESVI 83 TY +P AILEEINYKLTYEDQTLRTFRRAIGFGQNVRSDLIPLLENAKDDAVLESVI Sbjct 3504318 TYFQSP---AILEEINYKLTYEDQTLRTFRRAIGFGQNVRSDLIPLLENAKDDAVLESVI 3504148 Query 84 RILVNLTVPVECLFSVDVMYRTDVGRHTIFELNKLLYTSKEAFTEARSTKSVVEYMKHIL 143 RILVNLTVPVECLFSVDVMYRTDVGRHTIFELNKLLYTSKEAFTEARSTKSVVEYMKHIL Sbjct 3504147 RILVNLTVPVECLFSVDVMYRTDVGRHTIFELNKLLYTSKEAFTEARSTKSVVEYMKHIL 3503968 Query 144 ESDPKLSPHKCDQinncllllrnilhiPETHAHCVMPMMQSMPHGISMQNTILWNLFIQS 203 ESDPKLSPHKCDQINNCLLLLRNILHIPETHAHCVMPMMQSMPHGISMQNTILWNLFIQS Sbjct 3503967 ESDPKLSPHKCDQINNCLLLLRNILHIPETHAHCVMPMMQSMPHGISMQNTILWNLFIQS 3503788 Query 204 IDKLLLYLMTCPQRAFWGVTMVQLIALIYKDQHVSTLQKLLSLWFeaslsessednesnt 263 IDKLLLYLMTCPQRAFWGVTMVQLIALIYKDQHVSTLQKLLSLWFEASLSESSEDNESNT Sbjct 3503787 IDKLLLYLMTCPQRAFWGVTMVQLIALIYKDQHVSTLQKLLSLWFEASLSESSEDNESNT 3503608 Query 264 sPPKQGSGDsspmltsdptsdssdngsngrgmgggmregtAATLQEVSRKGQEYQNAMAR 323 SPPKQGSGDSSPMLTSDPTSDSSDNGSNGRGMGGGMREGTAATLQEVSRKGQEYQNAMAR Sbjct 3503607 SPPKQGSGDSSPMLTSDPTSDSSDNGSNGRGMGGGMREGTAATLQEVSRKGQEYQNAMAR 3503428 Query 324 VPADKPDGSEEASDMTGNDSEQPGSPEQSQPAGESMDDGDYEDQRHRQLNEHGeededed 383 VPADKPDGSEEASDMTGNDSEQPGSPEQSQPAGESMDDGDYEDQRHRQLNEHGEEDEDE Sbjct 3503427 VPADKPDGSEEASDMTGNDSEQPGSPEQSQPAGESMDDGDYEDQRHRQLNEHGEEDEDEV 3503248 Query 384 e-------------------veeeeYLQLGPASEPLNLTQQPADKVNNTTNPTSSAPQGC 424 VEEEEYLQLGPASEPLNLTQQPADKVNNTTNPTSSAPQGC Sbjct 3503247 SFYDQCH*NI*QNIVLFQDEVEEEEYLQLGPASEPLNLTQQPADKVNNTTNPTSSAPQGC 3503068 Query 425 LGNEPFKPPPPLPVRASTSAHAQMQKFNESSYASHVSAVKLGQKSPHAGQLQLTKGKCCP 484 LGNEPFKPPPPLPVRASTSAHAQMQKFNESSYASHVSAVKLGQKSPHAGQLQLTKGKCCP Sbjct 3503067 LGNEPFKPPPPLPVRASTSAHAQMQKFNESSYASHVSAVKLGQKSPHAGQLQLTKGKCCP 3502888 Query 485 QKRECPSSQSELSDCGYGTQVENQESISTSSNDDDgpqgkpqhqkppCNTKPRNKPRTIM 544 QKRECPSSQSELSDCGYGTQVENQESISTSSNDDDGPQGKPQHQKPPCNTKPRNKPRTIM Sbjct 3502887 QKRECPSSQSELSDCGYGTQVENQESISTSSNDDDGPQGKPQHQKPPCNTKPRNKPRTIM 3502708 Query 545 SPMD 548 SPMD Sbjct 3502707 SPMD 3502696 Score = 321 bits (822), Expect = 2e-88, Method: Compositional matrix adjust. Identities = 170/170 (100%), Positives = 170/170 (100%), Gaps = 0/170 (0%) Frame = -3 Query 613 KVPIDTSHFFWLVTYFLKFAAQLELDMEHIDTILTYDVLSYLTYEGVSLCEQLELNARQE 672 KVPIDTSHFFWLVTYFLKFAAQLELDMEHIDTILTYDVLSYLTYEGVSLCEQLELNARQE Sbjct 3502126 KVPIDTSHFFWLVTYFLKFAAQLELDMEHIDTILTYDVLSYLTYEGVSLCEQLELNARQE 3501947 Query 673 GSDLKPYLRRMHLVVTAIREFLQAIDTYNKVTHLNEDDKAHLRQLQLQISEMSDlrclfv 732 GSDLKPYLRRMHLVVTAIREFLQAIDTYNKVTHLNEDDKAHLRQLQLQISEMSDLRCLFV Sbjct 3501946 GSDLKPYLRRMHLVVTAIREFLQAIDTYNKVTHLNEDDKAHLRQLQLQISEMSDLRCLFV 3501767 Query 733 lllrrfNPSIHSKQYLQDLVVTNHILLLILDSSAKLGGCQTIRLSEHITQ 782 LLLRRFNPSIHSKQYLQDLVVTNHILLLILDSSAKLGGCQTIRLSEHITQ Sbjct 3501766 LLLRRFNPSIHSKQYLQDLVVTNHILLLILDSSAKLGGCQTIRLSEHITQ 3501617 Score = 257 bits (657), Expect = 3e-68, Method: Compositional matrix adjust. Identities = 122/124 (98%), Positives = 122/124 (98%), Gaps = 0/124 (0%) Frame = -1 Query 994 ILLDLIIKENKAQHLLWLQRILIECCFVKLTLRSGLKVPEGDHIMEPVAYHCICKQKSIP 1053 ILLDLIIKENKAQHLLWLQRILIECCFVKLTLRSGLKVPEGDHIMEPVAYHCICKQKSIP Sbjct 3500361 ILLDLIIKENKAQHLLWLQRILIECCFVKLTLRSGLKVPEGDHIMEPVAYHCICKQKSIP 3500182 Query 1054 VVQWNNEQSTTMLYQPFVLLLHKLGIQLPADAGSIFARIPDYWTPETMYGLAKKLGPLDK 1113 VVQWNNEQSTTMLYQPFVLLLHKLGIQLPADAGSIFARIPDYWTPETMYGLAKKLGPLDK Sbjct 3500181 VVQWNNEQSTTMLYQPFVLLLHKLGIQLPADAGSIFARIPDYWTPETMYGLAKKLGPLDK 3500002 Query 1114 LNLK 1117 LK Sbjct 3500001 RELK 3499990 ... ...
やってみたこと
下記のコードを追加し、配列@query_posに代入されている値を表示した。
for (my $i=0; $i<$#query_pos; $i++) { print "Query\t$query_pos[$i]\n"; }
Query 24
Query 83
Query 84
Query 143
Query 144
Query 203
Query 204
Query 263
Query 264
Query 323
Query 324
Query 383
Query 384
Query 424
Query 425
Query 484
Query 485
Query 544
Query 545
Query 548
Query 613
Query 672
Query 673
Query 732
Query 733
Query 782
Query 994
Query 1053
Query 1054
Query 1113
Query 1114
Query 1117
...
...
回答2件
あなたの回答
tips
プレビュー
バッドをするには、ログインかつ
こちらの条件を満たす必要があります。