这是我部分的数据,我已经有一个分析脚本,但是我用这个脚本分析的时候是一个一个的进行,分析完600个,需要半年时间,所以需要一个循环,能够同时30个进行,不过这些编号不连续。求一个循环~
#!/usr/bin/bash
doc=/data/re-sequencedata/cleandata
Ref=/data/re-sequencedata/mappingZ/Reference/hic.fa
FM_index=/data/re-sequencedata/mappingZ/index/hic
gatkdoc=/home/yuning/software/gatk-4.1.9.0/gatkdoc
t=20
p_Xmx=50g
tmpdir="tmpdir"
bam_dir="."
G_Xmx=50g
while read id
do
di=${id}
echo "===========processing ${di} file============="
gunzip -c ${doc}/${di}_1_clean.fq.gz >${di}_1_clean.fq
gunzip -c ${doc}/${di}_2_clean.fq.gz >${di}_2_clean.fq
grep "^@A00" ${di}_1_clean.fq|cut -d ':' -f1-4|sort|uniq >${di}.RG.table &&
while read rg
do
affix=`echo $rg|tr ":" "-"`
grep -A 3 ${rg} ${di}_1_clean.fq >${di}-${affix}_1_clean.fq
grep -A 3 ${rg} ${di}_2_clean.fq >${di}-${affix}_2_clean.fq
done <${di}.RG.table
rm ${di}_1_clean.fq ${di}_2_clean.fq &&
while read rg
do
affix=`echo $rg |tr ":" "-"`
bwa mem -t $t -M -R "@RG\tID:${affix}\tSM:${di}\tPL:illumina" $FM_index ${di}-${affix}_1_clean.fq ${di}-${affix}_2_clean.fq >${di}-${affix}.sam && rm ${di}-${affix}_*.fq
done <${di}.RG.table
ls -v ${di}-*.sam|xargs -I [] echo "I="[]|xargs -L 1000000 \
java -Xmx${p_Xmx} -jar /home/yuning/software/picard/build/libs/picard.jar MergeSamFiles \
O=${di}.merged.sorted.bam \
SORT_ORDER=coordinate \
CREATE_INDEX=true \
VALIDATION_STRINGENCY=LENIENT \
REFERENCE_SEQUENCE=/data/re-sequencedata/mappingZ/Reference/hic.fa \
TMP_DIR=${tmpdir} 2>${di%}.log \
&& rm ${di}-*.sam &&
java -Xmx${p_Xmx} -jar /home/yuning/software/picard/build/libs/picard.jar MarkDuplicates \
I=${di}.merged.sorted.bam \
O=${di}.merged.sorted.dedup.bam \
M=${di}.Marked_dup_metrics.txt \
CREATE_INDEX=true \
VALIDATION_STRINGENCY=LENIENT \
REFERENCE_SEQUENCE=/data/re-sequencedata/mappingZ/Reference/hic.fa \
TMP_DIR=${tmpdir} 2>${di%}.log \
&& rm ${di}.merged.sorted.bam* &&
if [ ! -f "${tmpdir}/${di}_h" ];then
mkdir -p ${tmpdir}/${di}_h
else
rm -r ${tmpdir}/${di}_h/*
fi
gatk --java-options "-Djava.io.tmpdir=${tmpdir}/${di}_h -Xmx${G_Xmx}" \
HaplotypeCaller \
-ERC GVCF \
-R /data/re-sequencedata/mappingZ/Reference/hic.fa \
-I ${di}.merged.sorted.dedup.bam \
-O ${di}.gvcf.gz \
1>${di}.gvcf.log 2>&1 \
&& rm -r ${tmpdir}/${di}_h
done
以上是我用来分析的脚本,摘自https://blog.csdn.net/Gossie/article/details/109296315。这个脚本是可以分析,但是是一个一个的分析,需要很久。本身对循环不怎么精通,急求大神给一个~~~另外,怎么给里边的bwa、picard、gatk建立临时文件夹,因为有时候会出现out of memory的情况而脚本停止运行。