MA_9270965g0010 PF00010 Helix-loop-helix DNA-binding domain
MA_10437060g0010 PF00082;PF05922 Subtilase family;Peptidase inhibitor I9
Here is the function:
#Start reading file line by line using tab field seperator
awk 'BEGIN{FS="\t"}{
#split second and third column by ";"
second_field_array_length=split($2,second_field_array,";");
third_field_array_length=split($3,third_field_array,";");
concat_str="";
#Loop the second column array and merge with 3rd column consecutive element
for(i=1;i<=second_field_array_length;i++){
#concat with ";" when count is greater than 0
if(i>1){
concat_str=concat_str";"second_field_array[i]"-"third_field_array[i]
}else{
concat_str=second_field_array[i]"-"third_field_array[i]
}
}
print $1,concat_str
}' Pabies1.0-Pfam-update.txt | head
Here is the output:
MA_9270965g0010 PF00010-Helix-loop-helix DNA-binding domain
MA_10437060g0010 PF00082-Subtilase family;PF05922-Peptidase inhibitor I9
No comments:
Post a Comment