pipe - Python pipeline using GNU Parallel -


i'm trying write wrapper around gnu parallel in python run command in parallel, seem misunderstanding either how gnu parallel works, system pipes and/or python subprocess pipes.

essentially looking use gnu parallel handle splitting input file , running command in parallel on multiple hosts.

i can investigate pure python way in future, seems should implemented using gnu parallel.

t.py

#!/usr/bin/env python  import sys  print print sys.stdin.read() print 

p.py

from subprocess import * import os os.path import *  args = ['--block', '10', '--recstart', '">"', '--sshlogin', '3/:', '--pipe', './t.py']  infile = 'test.fa'  fh = open('test.fa','w') fh.write('''>m02261:11:000000000-adwj7:1:1101:16207:1115 1:n:0:1 cagctactcggggaatccttgttgctgagctcttcccttttcgctcgcagctactcggggaatccttgttgctgagctcttcccttttcgctcgcagctactcggggaatccttgttgctgagctcttcccttttcgctcgcagctactcggggaatccttgttgctgagctcttcccttt >m02261:11:000000000-adwj7:1:1101:21410:1136 1:n:0:1 atagtagatagggacatagggaatctcgttaatccattcatgcgcgtcactaattagatgacgaggcatttggctaccttaagagagtcatagttactcccgccgtttacc >m02261:11:000000000-adwj7:1:1101:13828:1155 1:n:0:1 ggtttagagtctctagtcgatagatcaatgtaggtaagggaagtcggcaaattagatccgtaacttcgggataaggattggctctgaaggctgggatgactcgggctctggtgccttcgcgggtgctttgcctcaacgcgcgccggccggctcgggtggtttgcgccgcctgtggtcgcgtcggccgctgcagtcatcaataaacagccaattcagaactggcacggctgagggaatccgacggtctaattaaaacaaagcattgtgatggactccgcaggtgttgacacaatgtgatttt >m02261:11:000000000-adwj7:1:1101:14120:1159 1:n:0:1 gagtagctgcgagcgaaaagggaagagctcaaggggaggaaaagaaactaacaaggattccccgagtagctgcgagcgaaaagggaagcgcccaaggggggcaacaggaactaacaagaattcgccgactagctgcgacctgaaaaggaaaaacccaaggggaggaaaagaaactaacaaggattccccgagtagctgcgagcagaaaaggaaaagcacaagaggaggaaacgacactaataagacttcccatacaagcggcgagcaaaacagcacgagcccaacggcgagaaaagcaaaa >m02261:11:000000000-adwj7:1:1101:8638:1172 1:n:0:1 nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn ''') fh.close()  # call 1 popen(['parallel']+args, stdin=open(infile,'rb',0), stdout=open('output','w')).wait()  # call 2 _cat = popen(['cat', infile], stdout=pipe) popen(['parallel']+args, stdin=_cat.stdout, stdout=open('output2','w')).wait()  # call 3 popen('cat '+infile+' | parallel ' + ' '.join(args), shell=true, stdout=open('output3','w')).wait() 

call 1 , call 2 produce same output while call 3 produces output expect input file split , contains empty lines between records.

i'm more curious differences between call 1,2 , call 3.

tl;dr don't quote ">" when shell=false.

if use shell=true, can use shell's facilities, globbing, i/o redirection, etc. need quote needs escaped shell. can pass entire command line single string, , shell parse it.

unsafe = subprocess.popen('echo `date` "my files" * >output', shell=true) 

with shell=false, have no "secret" side effects behind scenes, , none of shell's facilities available you. need take care of globbing, redirection, etc on python side. on plus account, save (potentially significant) process, have more control, , don't need (and indeed mustn't) quote things had quoted when shell involved. in summary, safer, because can see doing.

cmd = ['echo'] cmd.append(datestamp()) cmd.append['my files']  # notice absence of shell quotes around string cmd.extend(glob('*')) safer = subprocess.popen(cmd, shell=false, stdout=open('output', 'w+')) 

(this still differs slightly, because modern shells, echo builtin, whereas now, executing external utility /bin/echo or whichever executable name comes first in path.)

now, returning examples, problem in args quoting literal ">" record separator. when shell involved, unquoted right broket invoke redirection, specify string, has escaped or quoted; when no shell in picture, there isn't handles (or requires) quotes, pass literal > argument, pass literally.

with out of way, call #1 seems way go. (though i'm not entirely convinced it's sane write python wrapper shell command implemented in perl. suspect juggling bunch of parallel child processes in python directly not more complicated.)


Comments

Popular posts from this blog

html - Firefox flex bug applied to buttons? -

html - Missing border-right in select on Firefox -

python - build a suggestions list using fuzzywuzzy -