Academic Publication The Pfam protein families database: embracing AI/ML
Research Abstract & Technology Focus
The Pfam protein families database is a comprehensive collection of protein domains and families used for genome annotation and protein structure and function analysis (https://www.ebi.ac.uk/interpro/). This update describes major developments in Pfam since 2020, including decommissioning the Pfam website and integration with InterPro, harmonization with the ECOD structural classification, and expanded curation of metagenomic, microprotein and repeat-containing families. We highlight how AlphaFold structure predictions are being leveraged to refine domain boundaries and identify new domains. New families discovered through large-scale sequence similarity analysis of AlphaFold models are described. We also detail the development of Pfam-N, which uses deep learning to expand family coverage, achieving an 8.8% increase in UniProtKB coverage compared to standard Pfam. We discuss plans for more frequent Pfam releases integrated with InterPro and the potential for artificial intelligence to further assist curation. Despite recent advances, many protein families remain to be classified, and Pfam continues working toward comprehensive coverage of the protein universe.
Correlated Market Trend: Database
Bridging academia to market: The 60-day public search velocity mapping directly to the core technology of this paper. Dashed line represents 7-day moving average.
AI Semantic Synergy Context
Connecting this academic literature to real-world market discussions and products.
Foundation models in bioinformatics
ABSTRACT With the adoption of foundation models (FMs), artificial intelligence (AI) has become increasingly significant in bioinformatics and has successfully addressed many historic...
AlphaFold hits ‘next level’: the AI tool now includes protein pairing
The database of 200 million protein-structure predictions now includes homodimers, adding new biological relevance.
Bilingual language model for protein sequence and structure
Abstract Adapting language models to protein sequences spawned the development of powerful protein language models (pLMs). Concurrently, AlphaFold2 broke through in protein struct...
InterPro: the protein sequence classification resource in 2025
Abstract InterPro (https://www.ebi.ac.uk/interpro) is a freely accessible resource for the classification of protein sequences into families. It integrates predictive models, known a...
AI-driven multi-omics integration for multi-scale predictive modeling of genotype-environment-phenotype relationships
No description provided.
Frequently Asked Questions (FAQ)
Curated market intelligence mapped to this research.
What is the core focus of the research titled 'The Pfam protein families database: embracing AI/ML'?
This literature focuses on: Abstract The Pfam protein families database is a comprehensive collection of protein domains and families used for genome annotation and protein structure and function analysis (https://www.ebi.ac.uk/interpro/). This update describe...
Are there open-source GitHub repositories related to The Pfam protein families database: embracing AI/ML?
Yes, open-source projects like t8y2/dbx (15MB, lightweight, cross-platform database client. Supports MySQL, PostgreSQL, SQLite, Redis, MongoDB, DuckDB, ClickHouse, SQL Server and more.) are actively building upon these concepts.
Which startups are commercializing the technology behind The Pfam protein families database: embracing AI/ML?
Products like HelixDB are bringing this to market. Their focus is: An open-source OLTP graph-vector database built in Rust..
What other academic literature is closely related to 'The Pfam protein families database: embracing AI/ML'?
Yes, highly correlated activity was mapped. An entry titled 'Foundation models in bioinformatics' discusses this: ABSTRACT With the adoption of foundation models (FMs), artificial intelligence (AI) has become increasingly significant in bioinform...
Are there commercial applications of 'The Pfam protein families database: embracing AI/ML' in market news publications?
Yes, highly correlated activity was mapped. An entry titled 'AlphaFold hits ‘next level’: the AI tool now includes protein pairing' discusses this: The database of 200 million protein-structure predictions now includes homodimers, adding new biological relevance.
Cite this Market Intelligence Report
Reference our AI-mapped synergy between this research and the commercial market to instantly build authority.
Commercial Realization
Startups and Open Source tools heavily associated with the concepts explored in this paper.
-
GitHubt8y2/dbx
-
GitHubmark9-droid/TomodachiPC
-
Product HuntHelixDB
-
Product HuntActian VectorAI DB
SaaS Metrics