いつも勉強させて頂いています。
Centos8、Anaconda3.5、Python3.8の環境で
以下のModifyBarcodeInBam_2ndSetBC.pyを使おうとしていますが、
ModifyBarcodeInBam_2ndSetBC.py
1#!/usr/bin/env python 2''' 3Created on May 23, 2019 4 5@author: cervera(modified by Dong) 6''' 7import pysam 8import csv 9import sys 10from re import sub 11 12 13if len(sys.argv) != 4: 14 print ("############################################") 15 print ("\ni.e." ,sys.argv[0], "file.bam barcode.txt output.bam") 16 print ("############################################") 17 sys.exit() 18 19 20def modifyXC(read): 21 ''' replace XC tag, tags is from a pysam.AlignmentFile.tags, returns list of tags ''' 22 tag_value = read.get_tag(tag="XC") 23 if(tag_value not in BCsFromProtocol): 24 for BC in BCsFromProtocol: 25 mism_counter=0 26 for j in range(0,8): 27 if tag_value[j] != BC[j] : 28 mism_counter+=1 29 if mism_counter > 1: 30 break 31 if(mism_counter==1): 32 # this BC has 1 mismatch so we merge it with a known BC 33 read.set_tag(tag="XC", value=BC) 34 break 35 36 37# take the barcodes used in the protocol: BCsFromProtocol 38# ####################################################### 39BCsFromProtocol_filename = sys.argv[2] 40#print ("###########", sys.argv[2]) 41#sys.exit() 42BCsFromProtocol=[] 43with open(BCsFromProtocol_filename, 'r') as f: 44 reader = csv.reader(f, delimiter=';') 45 for row in reader: 46 BCsFromProtocol.append(row[0]) 47 48 49# ######################################## 50# ######################################## 51# ######################################## 52 53if len(sys.argv) == 4: 54 assert sys.argv[1].endswith('.bam') 55 inbamfn = sys.argv[1] 56 outbamfn = sys.argv[3] #sub('.bam$', '_modifiedBC.bam', inbamfn) 57 58 inbam = pysam.AlignmentFile(inbamfn, 'rb',check_sq=False) 59 outbam = pysam.AlignmentFile(outbamfn, 'wb', template=inbam) 60 61 n = 0 62 for read in inbam: 63 modifyXC(read) 64 outbam.write(read) 65 n+=1 66 if n % 100000 == 0: 67 print (n) 68 69 outbam.close() 70 inbam.close() 71 print ("Finished. Reads were written to", outbamfn)
[E::idx_find_and_load] Could not retrieve index file for 'file.bam' 100000 Finished. Reads were written to output.bam
というエラーが出ます。
barcode.txt(index file)は以下のような8文字の配列が入っているテキストです。
CGTCTAAT AGACTCGT GCACGTCA TCAACGAC ATTTAGCG ATACAGAC TGCGTAGG TGGAGCTC TGAATACC TCTCACAC TACTGGTA ACGATAGG GATGTCGA TTACGGGT GAATGAGT …
これが読めていないエラーだと思うのですが、
テキストの記述の仕方が違うのか、何が原因なのかよく分かりません。
エラーの原因、修正箇所を教えて頂けないでしょうか。
お願いします。
<追記>
file.bamは以下のような形です。
BAMファイルの読み方が分からず、
samtools fasta file.bam > file.fa
で変換したものが以下です。
>M01400:201:000000000-B57RM:1:1113:26374:10799 TAGTCCCAGCTGCTCAGGAGGCTGAAGCAGGAGAATTACTTGAACTGAAGCGGAGGTTGCAGTGAGCTGAGATCACGCCACTGCACCTCCAGCCTGGGTAACAAAGAGAGACA >M01400:201:000000000-B57RM:1:1119:26033:11432 ATCTAAAGCGGTGAGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCGGAGGTATTCA >M01400:201:000000000-B57RM:1:1107:18890:12462 ATCTAAAGCGGTGAGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGGGGGGGGGTTTTCAA >M01400:201:000000000-B57RM:1:2108:16617:8847 ATCTAAAGCGGTGAGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGGGGGCGTA >M01400:201:000000000-B57RM:1:2102:14973:23007 GCGGTGAGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGGGGGGT >M01400:201:000000000-B57RM:1:1119:11253:10536 CTAAAGCGGTGAGTTTTTTTTTTTTTTTTTTTTTTTTTTTTAGAGAGGT ... >M01400:201:000000000-B57RM:1:2110:20940:24899 CTAAAGCGGTGAGTTTTTTTTTTTTTTTTTTTTTTTTTTTGGCA ATCTAAAGCGGTGAGTTTTTTTTTTTTTTTTTTTTTTTTT >M01400:201:000000000-B57RM:1:1101:5379:9468 >M01400:201:000000000-B57RM:1:1112:19959:11614 >M01400:201:000000000-B57RM:1:1111:17929:2377 >M01400:201:000000000-B57RM:1:2114:10378:19188 >M01400:201:000000000-B57RM:1:2107:3172:12498 ATCTAAAGCGGTGAGTTTTTTTTTTTTTTTTTTTTTTTTTTAATTTTGCCC >M01400:201:000000000-B57RM:1:1109:7422:9436 >M01400:201:000000000-B57RM:1:1112:25404:3648 >M01400:201:000000000-B57RM:1:1114:10367:23476 CCCTCCCAAAGTTAACTAATCCTCAGAGACTGCAAACAACCTACCTGTGAGCATGCCTTTGATATGCAA >M01400:201:000000000-B57RM:1:2105:21791:8292 CTTCCGATCTCACTTATCTTGACTATAGGAGAGATAAGTG >M01400:201:000000000-B57RM:1:2111:6584:14960 >M01400:201:000000000-B57RM:1:1116:11971:15695 >M01400:201:000000000-B57RM:1:2107:14103:2153 >M01400:201:000000000-B57RM:1:2112:6123:8257 >M01400:201:000000000-B57RM:1:1112:16612:2657 AGTCATCATCTAATGGAATTGCATGGAATCATCATCAAATGGAATCGAATGGAATCAACATCAAATGGAATCAAATGGAATCATTGAACGGA >M01400:201:000000000-B57RM:1:2116:3980:8645 >M01400:201:000000000-B57RM:1:1108:15797:6718 ATGGTGCCCTCTTGCTTCTAAAAAGCAAGACTGCTTTGTAATGGCTGCTCT >M01400:201:000000000-B57RM:1:1116:7506:23225 >M01400:201:000000000-B57RM:1:2113:16155:14842 TGTTCAACTCCCACTTATGAGTGAGAACATGGGGTATTTGGTTTTCTCTTCCTGTGTT >M01400:201:000000000-B57RM:1:1104:23739:8369 GAATTCACTCGGAATAGGAAATCTCAGCCTGATTCTCCCCCAGCATCTCCCTCCCCTATAGAGCAATGCCTGT >M01400:201:000000000-B57RM:1:2108:13627:25167 CTAAAGCGGTGAGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCC >M01400:201:000000000-B57RM:1:1117:20846:18468 ATCTAAAGCGGTGAGTTTTTTTTTTTTTTTTTTTTTTTTTTTCGAA >M01400:201:000000000-B57RM:1:1119:28174:17064 >M01400:201:000000000-B57RM:1:2105:23616:19481 AGTGGCCTGGGTCCCATTCTTTCTGAAGTCTCTCCCTCTCCAGCTCTGGACTGCATCGTCGCCCACAGAGGGGATGCTTCTTATCCGGTGG >M01400:201:000000000-B57RM:1:1102:3493:9712 ATGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCAGATCACCTGAAGTCAGGAGTTTGAGACCAGCTTGACCAACATAGTGA >M01400:201:000000000-B57RM:1:2116:22487:4781 ATCTAAAGCGGTGAGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGGGGGGTAAGATCGGAAAA >M01400:201:000000000-B57RM:1:1118:21295:23935 AGTCCATGGGTTATTCCATTCCATTCAATTAGATAATTCCATTC >M01400:201:000000000-B57RM:1:1110:20805:19431 >M01400:201:000000000-B57RM:1:1116:25365:19844 >M01400:201:000000000-B57RM:1:2105:17405:22459 ATCTAAAGCGGTGAGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGGTTAGCGG >M01400:201:000000000-B57RM:1:2110:24292:21276 ATCTAAAGCGGTGAGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGGAG >M01400:201:000000000-B57RM:1:1115:11693:3270 >M01400:201:000000000-B57RM:1:2104:19336:14696 CTAAAGCGGTGAGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTA >M01400:201:000000000-B57RM:1:2102:27456:10532 ACTAGCCGGGCTTGGTGGCGGGCGCCTGTAGTCCCAGCTACTCGGGAGGCTGAGGCA >M01400:201:000000000-B57RM:1:2114:7890:12218 >M01400:201:000000000-B57RM:1:1102:12104:24151 >M01400:201:000000000-B57RM:1:2111:18205:18888 >M01400:201:000000000-B57RM:1:2107:4141:15987 >M01400:201:000000000-B57RM:1:1119:13339:17827 >M01400:201:000000000-B57RM:1:1109:20097:6551 TCTATTTTAATGCATGGCTTCGGATGTTAAGAATACTTAATGAAAATGTCAAAACTTGGCAATTTTCAGCAGAAACTCA >M01400:201:000000000-B57RM:1:2119:24934:23194 >M01400:201:000000000-B57RM:1:2101:13434:14822 >M01400:201:000000000-B57RM:1:1112:27249:14630 CTGCTCTGGAGGCTGCTGCAGTTGTCTGGGTGAGAGACGGTGGTGGCTTGGACCA >M01400:201:000000000-B57RM:1:2102:28197:13960 ATCTAAAGCGGTGAGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAGA >M01400:201:000000000-B57RM:1:2105:29041:15461 ATCTAAAGCGGTGAGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAGA >M01400:201:000000000-B57RM:1:2119:21220:15826 ATCTAAAGCGGTGAGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAGA ... >M01400:201:000000000-B57RM:1:2112:8709:10847 >M01400:201:000000000-B57RM:1:2109:15574:3676 >M01400:201:000000000-B57RM:1:2113:23952:23774 >M01400:201:000000000-B57RM:1:2118:18170:22640 >M01400:201:000000000-B57RM:1:2108:20065:19990 AGAAAGAAACATAAAGTAGATGGTGGTTGCCAGGGTTTGGATGGAAGGGGGCATAGGAGTGACTACTAACA >M01400:201:000000000-B57RM:1:1101:10277:16222 ATTTCGCTCGCTCAGCGGGAAGACCTTGGTCAGCGTGAGGGTCTGTGT >M01400:201:000000000-B57RM:1:1101:10348:17597 ATTTCGCTCGCTCAGCGGGAAGACCTTGGTCAGCGTGAGGGTCTGTGT >M01400:201:000000000-B57RM:1:1101:10673:12347 ATTTCGCTCGCTCAGCGGGAAGACCTTGGTCAGCGTGAGGGTCTGTGT >M01400:201:000000000-B57RM:1:1101:10824:16029 ATTTCGCTCGCTCAGCGGGAAGACCTTGGTCAGCGTGAGGGTCTGTG ...
回答1件
あなたの回答
tips
プレビュー