SeqScrub
Cleaning sequences!
Upload a FASTA file to edit all headers into the specified cleaned format.
Choose a file
Choose a tree (optional)
Type of sequence content:
Amino acids
Nucleotides
Select header output format: (Click field to add and drag items to rearrange)
Gene information
Species
Common name
Genus
Family
Order
Class
Phylum
Kingdom
Sub Genus
Super Genus
Sub Family
Super Family
Sub Order
Super Order
Sub Class
Super Class
Sub Phylum
Super Phylum
Sub Kingdom
Super Kingdom
Curation options
Remove obsolete sequences
Remove un-mappable sequences
Remove sequences containing:
Remove these characters from header:
Keep original headers - just remove characters from headers:
Don't check databases - just remove characters from headers:
Retain only the first ID from headers with multiple IDs
Formatting options
Format UniProt IDs like this -
>tr|A0A1A8UQI7|A0A1A8UQI7_NOTFU
>tr|A0A1A8UQI7
>A0A1A8UQI7
Add this character after ID
Use this character to split gene information:
Use this character to split species name information:
Use this character to split taxonomic / common name:
Change spaces to underscores in header
Add square brackets around species name
Remove internal brackets in species name
If cleaning a tree and the new label contains whitespace, add quotation marks
Cleaned sequences:
Sequences with illegal characters:
Obsolete sequences:
Un-mappable sequences:
Which output fields should be saved?
Select all output
Cleaned sequences
Sequences with illegal characters
Sequences that are obsolete
Sequences that couldn't be cleaned
Cleaned phylogenetic tree
Summary of changes made (.csv)
Summary of changes made (.txt)