前提・実現したいこと
ここに質問の内容を詳しく書いてください。
sp|O43236|SEPT4_HUMAN Septin-4 OS=Homo sapiens OX=9606 GN=SEPTIN4 PE=1 SV=1
MDRSLGWQGNSVPEDRTEAGIKRFLEDTTDDGELSKFVKDFSGNASCHPPEAKTWASRPQ
VPEPRPQAPDLYDDDLEFRPPSRPQSSDNQQYFCAPAPLSPSARPRSPWGKLDPYDSSED
DKEYVGFATLPNQVHRKSVKKGFDFTLMVAGESGLGKSTLVNSLFLTDLYRDRKLLGAEE
RIMQTVEITKHAVDIEEKGVRLRLTIVDTPGFGDAVNNTECWKPVAEYIDQQFEQYFRDE
SGLNRKNIQDNRVHCCLYFISPFGHGLRPLDVEFMKALHQRVNIVPILAKADTLTPPEVD
HKKRKIREEIEHFGIKIYQFPDCDSDEDEDFKLQDQALKESIPFAVIGSNTVVEARGRRV
RGRLYPWGIVEVENPGHCDFVKLRTMLVRTHMQDLKDVTRETHYENYRAQCIQSMTRL
KERNRNKLTRESGTDFPIPAVPPGTDPETEKLIREKDEELRRMQEMLHKIQKQMKENY
sp|P01094|IPA3_YEAST Protease A inhibitor 3 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) OX=559292 GN=PAI3 PE=1 SV=1
MNTDQQKVSEIFQSSKEKLQGDAKVVSDAFKKMASQDKDGKTTDADESEKHNYQEQYNKL
KGAGHKKE
sp|P02686|MBP_HUMAN Myelin basic protein OS=Homo sapiens OX=9606 GN=MBP PE=1 SV=3
MGNHAGKRELNAEKASTNSETNRGESEKKRNLGELSRTTSEDNEVFGEADANQNNGTSSQ
DTAVTDSKRTADPKNAWQDAHPADPGSRPHLIRLFSRDAPGREDNTFKDRPSESDELQTI
QEDSAATSESLDVMASQKRPSQRHGSKYLATASTMDHARHGFLPRHRDTGILDSIGRFFG
GDRGAPKRGSGKDSHHPARTAHYGSLPQKSHGRTQDENPVVHFFKNIVTPRTPPPSQGKG
RGLSLSRFSWGAEGQRPGFGYGGRASDYKSAHKGFKGVDAQGTLSKIFKLGGRDSRSGSP
MARR
上記のリストをFASTA = {
“ac1”:{“seq”: 配列, “composition”:{“A”:0,”C”:0・・”W”:0}},
“ac2”:{“seq”: 配列, “composition”:{“A”:0,”C”:0・・”W”:0}},
“ac3”:{“seq”: 配列, “composition”:{“A”:0,”C”:0・・”W”:0}},
}
というように辞書の形で出力したいです。ac1,ac2,ac3というのは、||で囲まれたO43236、P01094、P02686…の部分です。配列の部分はアルファベットが並んだ、MDRSLGWQGNSVPEDRTEAGIKRFLEDTTDDGELSKFVKDFSGNASCHPPEAKTWASRPQ
VPEPRPQAPDLYDDDLEFRPPSRPQSSDNQQYFCAPAPLSPSARPRSPWGKLDPYDSSE……の部分で、compositionではこのアルファベットを一つ一つ数えています。
発生している問題・エラーメッセージ
O43236 {'seq': 'MDRSLGWQGNSVPEDRTEAGIKRFLEDTTDDGELSKFVKDFSGNASCHPPEAKTWASRPQVPEPRPQAPDLYDDDLEFRPPSRPQSSDNQQYFCAPAPLSPSARPRSPWGKLDPYDSSEDDKEYVGFATLPNQVHRKSVKKGFDFTLMVAGESGLGKSTLVNSLFLTDLYRDRKLLGAEERIMQTVEITKHAVDIEEKGVRLRLTIVDTPGFGDAVNNTECWKPVAEYIDQQFEQYFRDESGLNRKNIQDNRVHCCLYFISPFGHGLRPLDVEFMKALHQRVNIVPILAKADTLTPPEVDHKKRKIREEIEHFGIKIYQFPDCDSDEDEDFKLQDQALKESIPFAVIGSNTVVEARGRRVRGRLYPWGIVEVENPGHCDFVKLRTMLVRTHMQDLKDVTRETHYENYRAQCIQSMTRLKERNRNKLTRESGTDFPIPAVPPGTDPETEKLIREKDEELRRMQEMLHKIQKQMKENY'} P01094 {'seq': 'MNTDQQKVSEIFQSSKEKLQGDAKVVSDAFKKMASQDKDGKTTDADESEKHNYQEQYNKLKGAGHKKE'} P02686 {'seq': 'MGNHAGKRELNAEKASTNSETNRGESEKKRNLGELSRTTSEDNEVFGEADANQNNGTSSQDTAVTDSKRTADPKNAWQDAHPADPGSRPHLIRLFSRDAPGREDNTFKDRPSESDELQTIQEDSAATSESLDVMASQKRPSQRHGSKYLATASTMDHARHGFLPRHRDTGILDSIGRFFGGDRGAPKRGSGKDSHHPARTAHYGSLPQKSHGRTQDENPVVHFFKNIVTPRTPPPSQGKGRGLSLSRFSWGAEGQRPGFGYGGRASDYKSAHKGFKGVDAQGTLSKIFKLGGRDSRSGSPMARR'} P04638 {'seq': 'MKLLAMVALLVTICSLEGALVRRQAAETDVQTLFSQYLQSLTDYGKDLMEKAQPSEIQNQAKAYFQNAQERLTPFVQRTGTNLMDFLSRLMSPEEKPAPAAK'} P04639 {'seq': 'MKAAVLAVALVFLTGCQAWEFWQQDEPQSQWDRVKDFATVYVDAVKDSGRDYVSQFESSTLGKQLNLNLLDNWDTLGSTVGRLQEQLGPVTQEFWANLEKETDWLRNEMNKDLENVKQKMQPHLDEFQEKWNEEVEAYRQKLEPLGTELHKNAKEMQRHLKVVAEEFRDRMRVNADALRAKFGLYSDQMRENLAQRLTEIKNHPTLIEYHTKASDHLKTLGEKAKPALDDLGQGLMPVLEAWKAKIMSMIDEAKKKLNA'} P05221 {'seq': 'MASTVSNTSKLEKPVSLIWGCELNEQDKTFEFKVEDDEEKCEHQLALRTVCLGDKAKDEFNIVEIVTQEEGAEKSVPIATLKPSILPMATMVGIELTPPVTFRLKAGSGPLYISGQHVAMEEDYSWAEEEDEGEAEGEEEEEEEEDQESPPKAVKRPAATKKAGQAKKKKLDKEDESSEEDSPTKKGKGAGRGRKPAAKK'} P09883 {'seq': 'MSGGDGRGHNTGAHSTSGNINGGPTGIGVSGGASDGSGWSSENNPWGGGSGSGIHWGGGSGRGNGGGNGNSGGGSGTGGNLSAVAAPVAFGFPALSTPGAGGLAVSISASELSAAIAGIIAKLKKVNLKFTPFGVVLSSLIPSEIAKDDPNMMSKIVTSLPADDITESPVSSLPLDKATVNVNVRVVDDVKDERQNISVVSGVPMSVPVVDAKPTERPGVFTASIPGAPVLNISVNDSTPAVQTLSPGVTNNTDKDVRPAGFTQGGNTRDAVIRFPKDSGHNAVYVSVSDVLSPDQVKQRQDEENRRQQEWDATHPVEAAERNYERARAELNQANEDVARNQERQAKAVQVYNSRKSELDAANKTLADAIAEIKQFNRFAHDPMAGGHRMWQMAGLKAQRAQTDVNNKQAAFDAAAKEKSDADAALSAAQERRKQKENKEKDAKDKLDKESKRNKPGKATGKGKPVGDKWLDDAGKDSGAPIPDRIADKLRDKEFKSFDDFRKAVWEEVSKDPELSKNLNPSNKSSVSKGYSPFTPKNQQVGGRKVYELHHDKPISQGGEVYDMDNIRVTTPKRHIDIHRGK'} P45976 {'seq': 'MSSSEDEDDKFLYGSDSELALPSSKRSRDDEADAGASSNPDIVKRQKFDSPVEETPATARDDRSDEDIYSDSSDDDSDSDLEVIISLGPDPTRLDAKLLDSYSTAATSSSKDVISVATDVSNTITKTSDERLITEGEANQGVTATTVKATESDGNVPKAMTGSIDLDKEGIFDSVGITTIDPEVLKEKPWRQPGANLSDYFNYGFNEFTWMEYLHRQEKLQQDYNPRRILMGLLSLQQQGKLNSANDTDSNLGNIIDNNNNVNNANMSNLNSNMGNSMSGTPNPPAPPMHPSFPPLPMFGSFPPFPMPGMMPPMNQQPNQNQNQNSK'} P9WHN5 {'seq': 'MAQEQTKRGGGGGDDDDIAGSTAAGQERREKLTEETDDLLDEIDDVLEENAEDFVRAYVQKGGQ'} Q06253 {'seq': 'MQSINFRTARGNLSEVLNNVEAGEEVEITRRGREPAVIVSKATFEAYKKAALDAEFASLFDTLDSTNKELVNR'} Q6BBK3 {'seq': 'MSDIQEKIEQARQEAHAISEEKGATSPDAAAAWDAVEELQAEAAHQRQQKSETEPFFGDYCSENPDAAECLIYDD'} Q98XH7 {'seq': 'MEPVDPKLEPWKHPGSQPKTACNNCYCKRCCLHCQVCFMTKGLGIYYGRKKRRQRRRASQDRQTHQDSLSEQ'} Q99LM3 {'seq': 'MEQTEGNSSEDGTTVSPTAGNLETPGSQGIAEEVAEGTVGTSDKEGPSDWAEHLCKAASKSGESGGSPGEASILDELKTDLQGEARGKDEAQGDLAEEKVGKEDTTAASQEDTGKKEETKPEPNEVREKEEAMLASEKQKVDEKETNLESKEKSDVNDKAKPEPKEDAGAEVTVNEAETESQEEADVKDQAKPELPEVDGKETGSDTKELVEPESPTEEQEQGKENESEERAAVIPSSPEEWPESPTDEGPSLSPDGLAPESTGETSPSASESSPSEVPGSPTEPQPSEKKKDRAPERRVSAPSRPRGPRAQNRKAIMDKFGGAASGPTALFRNTKAAGAAIGGVKNMLLEWCRAMTRNYEHVDIQNFSSSWSSGMAFCALIHKFFPEAFDYAELDPAKRRHNFTLAFSTAEKLADCAQLLEVDDMVRLAVPDSKCVYTYIQELYRSLVQKGLVKTKKK'} Q9Y6Q9 {'seq': 'MSGLGENLDPLASDSRKRKLPCDTPGQGLTCSGEKRRREQESKYIEELAELISANLSDIDNFNVKPDKCAILKETVRQIRQIKEQGKTISNDDDVQKADVSSTGQGVIDKDSLGPLLLQALDGFLFVVNRDGNIVFVSENVTQYLQYKQEDLVNTSVYNILHEEDRKDFLKNLPKSTVNGVSWTNETQRQKSHTFNCRMLMKTPHDILEDINASPEMRQRYETMQCFALSQPRAMMEEGEDLQSCMICVARRITTGERTFPSNPESFITRHDLSGKVVNIDTNSLRSSMRPGFEDIIRRCIQRFFSLNDGQSWSQKRHYQEAYLNGHAETPVYRFSLADGTIVTAQTKSKLFRNPVTNDRHGFVSTHFLQREQNGYRPNPNPVGQGIRPPMAGCNSSVGGMSMSPNQGLQMPSSRAYGLADPSTTGQMSGARYGGSSNIASLTPGPGMQSPSSYQNNNYGLNMSSPPHGSPGLAPNQQNIMISPRNRGSPKIASHQFSPVAGVHSPMASSGNTGNHSFSSSSLSALQAISEGVGTSLLSTLSSPGPKLDNSPNMNITQPSKVSNQDSKSPLGFYCDQNPVESSMCQSNSRDHLSDKESKESSVEGAENQRGPLESKGHKKLLQLLTCSSDDRGHSSLTNSPLDSSCKESSVSVTSPSGVSSSTSGGVSSTSNMHGSLLQEKHRILHKLLQNGNSPAEVAKITAEATGKDTSSITSCGDGNVVKQEQLSPKKKENNALLRYLLDRDDPSDALSKELQPQVEGVDNKMSQCTSSTIPSSSQEKDPKIKTETSEEGSGDLDNLDAILGDLTSSDFYNNSISSNGSHLGTKQQVFQGTNSLGLKSSQSVQSIRPPYNRAVSLDSPVSVGSSPPVKNISAFPMLPKQPMLGGNPRMMDSQENYGSSMGGPNRNVTVTQTPSSGDWGLPNSKAGRMEPMNSNSMGRPGGDYNTSLPRPALGGSIPTLPLRSNSIPGARPVLQQQQQMLQMRPGEIPMGMGANPYGQAAASNQLGSWPDGMLSMEQVSHGTQNRPLLRNSLDDLVGPPSNLEGQSDERALLDQLHTLLSNTDATGLEEIDRALGIPELVNQGQALEPKQDAFQGQEAAVMMDQKAGLYGQTYPAQGPPMQGGFHLQGQSPSFNSMMNQMNQQGNFPLQGMHPRANIMRPRTNTPKQLRMQLQQRLQGQQFLNQSRQALELKMENPTAGGAAVMRPMMQPQVSSQQGFLNAQMVAQRSRELLSHHFRQQRVAMMMQQQQQQQQQQQQQQQQQQQQQQQQQQQQQTQAFSPPPNVTASPSMDGLLAGPTMPQAPPQQFPYQPNYGMGQQPDPAFGRVSSPPNAMMSSRMGPSQNPMMQHPQAASIYQSSEMKGWPSGNLARNSSFSQQQFAHQGNPAVYSMVHMNGSSGHMGQMNMNPMPMSGMPMGPDQKYC'} composition {'K': 55, 'A': 70, 'C': 16, 'D': 64, 'E': 60, 'F': 34, 'G': 119, 'H': 29, 'I': 43, 'L': 111, 'M': 71, 'N': 94, 'P': 111, 'Q': 146, 'R': 68, 'S': 179, 'T': 63, 'V': 59, 'W': 5, 'Y': 27} 一番下にだけ、Q9Y6Q9を数えたと思われるcompositionが表記されてしまう。
該当のソースコード
pythonここに言語名を入力
1 2import sys 3 4file_path=sys.argv[1] 5 6f=open(file_path,"r") 7data=f.readlines() 8f.close() 9 10fasta={} 11multi_fasta={} 12composition={} 13residues = ["K","A","C","D","E","F","G","H","I","L","M","N","P","Q","R","S","T","V","W","Y"] 14for line in data: 15 line=line.strip() 16 if ">" in line: 17 ac=line.split('|')[1] 18 #multi_fasta.update({ac:{"seq":""}}) 19 multi_fasta.update({ac:{"seq":""}}) 20 # multi_fasta={"ac":ac,"seq":""} 21 else: 22 multi_fasta[ac]["seq"]+=line 23 24multi_fasta.update({"composition":composition}) 25 26for k,v in multi_fasta.items(): 27 28 for a in residues: 29 composition.update({a:0}) 30 for a in multi_fasta[ac]["seq"]: 31 composition[a]+=1 32 multi_fasta.update({"composition":composition}) 33 print(k,v)
試したこと
補足情報(FW/ツールのバージョンなど)
回答1件
あなたの回答
tips
プレビュー
バッドをするには、ログインかつ
こちらの条件を満たす必要があります。
2019/12/24 07:41