Question Details

No question body available.

Tags

awk sed bioinformatics fastq

Answers (8)

July 4, 2025 Score: 4 Rep: 34,338 Quality: Medium Completeness: 60%

With sed, supposing you want to modify SRR111497061.fastq :

sed -E -e "s=[0-9]+/==" -i SRR111497061.fastq

Example of execution on my Pi 5 (Debian bookworm)

bruno@raspberrypi:/tmp $ cat SRR111497061.fastq
@SRR11149706.16630586 16630586/1
CCCAACAACAACAACAGCAACCTCCTCACGCCAACGCCGATCCCGCCGCTGTTTTCCAA

@SRR11149706.16630587 16630587/1 CAAAGCACCAGGTGCAGTGCACCTTGTCCGTCGGTCTGAATATCTGCTCTCTGTTCTCCA bruno@raspberrypi:/tmp $ bruno@raspberrypi:/tmp $ sed -E -e "s=[0-9]+/==" -i SRR111497061.fastq bruno@raspberrypi:/tmp $ bruno@raspberrypi:/tmp $ cat SRR111497061.fastq @SRR11149706.16630586 1 CCCAACAACAACAACAGCAACCTCCTCACGCCAACGCCGATCCCGCCGCTGTTTTCCAA

@SRR11149706.16630587 1 CAAAGCACCAGGTGCAGTGCACCTTGTCCGTCGGTCTGAATATCTGCTCTCTGTTCTCCA bruno@raspberrypi:/tmp $

If you do not want to modify SRR111497061.fastq, remove the option -i and maybe redirect the output into the expected result file.


Above I supposed there is only one occurrence of a number followed by / to remove per line, if you want to remove all the occurrrences per line :

sed -E -e "s=[0-9]+/==g" -i SRR111497061.fastq

In the title of your question you speak about two special characters but you just speak about /

If the number/ must be removed only on lines starting by @ :

sed -E -e "/^@/ s=[0-9]+/==" -i SRR111497061.fastq

of course replace @ by @SRR11149706 if needed etc

and add g as previously to be able to remove all occurrences of number/ per selected line rather than just the first occurrence

July 5, 2025 Score: 3 Rep: 59,413 Quality: Medium Completeness: 50%

This might work for you (GNU sed):

sed -E '\#^@[^/]/.$#s#\S+/##' file

Look for a line that starts with an @ and ends with a / before the last character.

Then remove the non-space characters before the / as well as the / too.

N.B. The use of the \#...# which replaces the normal /.../ and allows the / to be included in the search regex. Of course the / could have been escaped but perhaps this is more elegant than /^@[^/]\/.$/ as the subsequent substitution also uses the same # delimiters.

July 4, 2025 Score: 2 Rep: 12,848 Quality: Medium Completeness: 100%

Use this Perl one-liner:

perl -pe 's{\s+\d+/}{ }' infile.fastq > outfile.fastq

or modify the file in-place:

perl -i.bak -pe 's{\s+\d+/}{ }' infile.fastq

The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $ by default. Add print $ after each loop iteration.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak. If you want to skip writing a backup file, just use -i and skip the extension.

s{PATTERN}{REPLACEMENT} : Replace regex PATTERN with REPLACEMENT.

\s+\d+/ : 1 or more whitespace characters, followed by 1 or more digits, followed by a literal /.

See also:

July 5, 2025 Score: 2 Rep: 38,522 Quality: Medium Completeness: 60%

I would harness GNU AWK for this task following way, let file.txt content be

@SRR11149706.16630586 16630586/1
CCCAACAACAACAACAGCAACCTCCTCACGCCAACGCCGATCCCGCCGCTGTTTTCCAA

@SRR11149706.16630587 16630587/1 CAAAGCACCAGGTGCAGTGCACCTTGTCCGTCGGTCTGAATATCTGCTCTCTGTTCTCCA

then

awk '/\//{sub(/^[0-9]+\//,"",$NF)}{print}' file.txt

gives output

@SRR11149706.16630586 1
CCCAACAACAACAACAGCAACCTCCTCACGCCAACGCCGATCCCGCCGCTGTTTTCCAA

@SRR11149706.16630587 1 CAAAGCACCAGGTGCAGTGCACCTTGTCCGTCGGTCTGAATATCTGCTCTCTGTTCTCCA

Explanation: for line containing slash (note that we need to escape it, as otherwise it would be mistaken for regular expression terminator) I replace one-or-more leading digits followed by slash with empty string in last field. I print every line. Disclaimer: this solution assumes your fields are sheared by exactly one SPACE character.

(tested in GNU Awk 5.3.1)

July 4, 2025 Score: 2 Rep: 114,432 Quality: Low Completeness: 50%

Sorry, I don't really know AWK, (and I got dizzy by the 5 page of info awk :-) )

But that can also be achieved with a Python 1-liner - although a bit more verbose sinde reading from stdin (except line by line) and regexps are not Python built-in, and the regexps are not special cased in the language, requiring some quotes.

After adding these,it simply works and you can type this at the shell:

 cat input.fastq| python -c 'import sys,re; print(re.sub(r"^(@[A-Z 0-9 .]+\s)(\d+)(\/.*)", r"\1\3", sys.stdin.read(), flags=re.MULTILINE))' >output.fastq

What I am doing here: I am using Python's re.sub which, in case there is no match will simply return the input line. For matching lines, it breaks your line in three sub-groups, and then replaces then by combining the first and the last, dropping the second group - which are the digits you want to drop.

July 5, 2025 Score: 2 Rep: 209,490 Quality: Low Completeness: 50%

Using any sed:

$ sed 's:[0-9]*/::' SRR111497061.fastq
@SRR11149706.16630586 1
CCCAACAACAACAACAGCAACCTCCTCACGCCAACGCCGATCCCGCCGCTGTTTTCCAA

@SRR11149706.16630587 1 CAAAGCACCAGGTGCAGTGCACCTTGTCCGTCGGTCTGAATATCTGCTCTCTGTTCTCCA

or any awk:

$ awk '{sub("[0-9]+/","")} 1' SRR111497061.fastq
@SRR11149706.16630586 1
CCCAACAACAACAACAGCAACCTCCTCACGCCAACGCCGATCCCGCCGCTGTTTTCCAA

@SRR11149706.16630587 1 CAAAGCACCAGGTGCAGTGCACCTTGTCCGTCGGTCTGAATATCTGCTCTCTGTTCTCCA
July 8, 2025 Score: 1 Rep: 1,027 Quality: Low Completeness: 80%

You can use Raku/Sparrow for that, it's quite simple, given input data inside data.txt file:

task.bash

cat data.txt

task.check

~regexp: (\S+) \s+ (\d+) "/" (.*)

code:
August 20, 2025 Score: 1 Rep: 3,100 Quality: Low Completeness: 50%

awk half-liner - using regex outcome as powering exponent :

echo '
@SRR11149706.16630586 16630586/1
CCCAACAACAACAACAGCAACCTCCTCACGCCAACGCCGATCCCGCCGCTGTTTTCCAA

@SRR11149706.16630587 16630587/1 CAAAGCACCAGGTGCAGTGCACCTTGTCCGTCGGTCTGAATATCTGCTCTCTGTTCTCCA' |
awk '(NF != 2)^(/^@/) || NF = NF' FS=' [0-9]+[/]'

@SRR11149706.16630586 1
CCCAACAACAACAACAGCAACCTCCTCACGCCAACGCCGATCCCGCCGCTGTTTTCCAA

@SRR11149706.16630587 1 CAAAGCACCAGGTGCAGTGCACCTTGTCCGTCGGTCTGAATATCTGCTCTCTGTTCTCCA