SAAINT-DB: a comprehensive structural antibody database for antibody modeling and design
2025-07-09
From:
Mabnus
浏览量:

Background

Antibody (Ab) structures and antibody-antigen (Ag) interactions (AAI) are crucial for understanding immune recognition and designing Ab therapeutics. Although existing structural Ab databases, such as PDB, IMGT/3Dstructure-DB, BEID, AgAbDb, SAbDab, etc., provide valuable insights, they still have limitations in data accuracy, completeness, and/or update frequency.

On June 30, 2025, an article titled " SAAINT-DB: A comprehensive structural antibody database for antibody modeling and design " was published in Acta Pharmacologica Sinica. The article introduces SAAINT-parser, a computational workflow designed for rapid and accurate processing of PDB entries to extract structured antibody and antigen information. The article describes SAAINT-DB, updated as of May 1, 2025, and containing 19,128 data entries from 9,757 PDB structures, providing a comprehensive and up-to-date resource. A detailed analysis shows that SAAINT-DB outperforms the widely used SAbDab in terms of data accuracy and completeness. Furthermore, SAAINT-DB provides nearly twice as many non-redundant, manually curated antibody-antigen binding affinity entries as SAbDab. To support antibody research and benefit the broader scientific community, SAAINT -parser, SAAINT-DB summary files, unprocessed PDB structures, and SAAINT-parser-processed structural models are now available openly at https://github.com/tommyhuangthu/SAAINT.

SAAINT-DB a comprehensive structural antibody database for antibody modeling and design

SAAINT-parser workflow

The SAAINT-parser workflow extracts paired monoclonal antibodies (Abs), unpaired antibody chains, and antigen-antibody interactions (AAIs) from experimentally determined structures, using their PDB IDs as input. It consists of three main modules, each processing PDB-associated FASTA files, mmCIF files, and web content. These modules generate intermediate data that are then integrated to infer results.

SAAINT-DB a comprehensive structural antibody database for antibody modeling and design

SAAINT-DB Statistics and Analysis

SAAINT-DB contains 19,128 data entries derived from 9757 PDB entries published between May 19, 1976, and April 30, 2025. A significant increase was seen between 2020 and 2023, which may be related to research related to the global COVID-19 pandemic.

SAAINT-DB a comprehensive structural antibody database for antibody modeling and design

SAAINT-DB defines 29 Ab types; the most common of these are FabH:FabL, VH:VL, VHH, and scFv, with 9377, 3643, 3283, and 1377 entries, respectively. These classifications are based on a comprehensive analysis of sequence data, structural features, and PDB annotations, and can more accurately reflect the actual structure and function of antibodies. Within the VH:VL type, this number can range from as few as 20 amino acids to as many as 80 amino acids. VHVL and scFv both consist of a VH and a VL domain, and their PDB-seqs cluster around 220 amino acids, making it difficult to distinguish them by sequence.

SAAINT-DB a comprehensive structural antibody database for antibody modeling and design

SAAINT-DB a comprehensive structural antibody database for antibody modeling and design

Of the 19,128 entries in the SAAINT-DB, 14,316 (74.8%) were classified as AAIs, in which each antibody interacts with one or more antigenic chains, including proteins, peptides, DNA, and RNA. Regarding the antigenic source, the most common species were humans (Homo sapiens), SARS-CoV-2, HIV-1, influenza A virus, and Plasmodium falciparum.

SAAINT-DB a comprehensive structural antibody database for antibody modeling and design

AAIs exhibit remarkable diversity, with the number of interface residues (Nab ag infres) of AAIs ranging widely, with most ranging between 30 and 60. A key structural feature of AAIs is that Abs interact with Ags primarily through their CDR residues. The majority of CDR residues reside at the Ab-Ag interface, ranging in number from 5 to 40, reflecting diverse binding properties. The proportion of interface CDR residues to total interface Ab residues ranges from 25% to 100%, with most exceeding 70%, consistent with the knowledge that most interface Ab residues are located in the CDR region. Optimizing the affinity of an antibody for its target is a critical step in developing antibody drugs. Therefore, integrating antibody-antigen binding affinity data into the SAAINT-DB is crucial.

SAAINT-DB collects 1444 non-redundant antibody-antigen binding affinity data entries involving 1331 PDB structures. These data cover a wide range of affinities from high micromolar to sub-picomolar, with pKD values ranging from 4 to 14, and a median of approximately 8.5.

SAAINT-DB a comprehensive structural antibody database for antibody modeling and design

Comparison of SAAINT-DB with existing antibody databases

Database size and update frequency: Compared with other databases such as IMGT/3Dstructure-DB, AbDb, and SAbDab, SAAINT-DB has a larger number of data entries and PDB structures than these databases, and is updated more frequently.

Antibody chain pairing accuracy: SAAINT-DB is superior in antibody chain pairing, accurately pairing some structures that are incorrectly paired in SAbDab. For example, in PDB entries 8d01 and 2oqj, SAAINT-parser correctly identifies the HC-LC pairing, while AbDab's pairing results do not match the actual pairing.

SAAINT-DB a comprehensive structural antibody database for antibody modeling and design

Detailed classification of antibody types: SAAINT-DB provides a more detailed classification of antibody types and can accurately distinguish different types of antibodies, such as Fab, Fv, VHH and scFv. AbDab only focuses on the VH/VL domains and only provides rough annotations to indicate whether an Ab is an scFv, without detailed classification of Ab types.

Antigen-antibody interaction data: SAAINT-DB identifies and records more AAI data, including detailed information on different antigen sources and interface residues, providing richer data support for the design and optimization of antibody therapies.

SAAINT-DB a comprehensive structural antibody database for antibody modeling and design

SAAINT-DB a comprehensive structural antibody database for antibody modeling and design

SAAINT-DB also has some limitations. SAAINT-parser relies on AbRSA for Ab chain labeling, so its accuracy depends on the precision of AbRSA. There is still some ambiguity in the classification of engineered or unusually long Ab chains. SAAINT-parser and SAAINT-DB currently only support proteins, peptides, RNA, and DNA Ags, limiting their applicability to other Ag types, such as carbohydrates and haptens.

Summarize

This study introduced SAAINT-parser, an advanced tool for efficiently processing PDB structures and extracting structural Ab information. By applying this tool to the PDB, we constructed a new, comprehensive database of structural Abs. Detailed analysis and comparison with existing databases highlighted the advantages of SAAINT-DB in terms of data completeness, accuracy, and update frequency. However, it still faces several limitations, particularly in terms of its dependency on antibody chain recognition, its ability to handle complex antibody structures, the breadth of antigen types supported, and its user-friendliness. Future research and development are needed to address these issues to further enhance the utility and impact of SAAINT-DB.

SAAINT-DB a comprehensive structural antibody database for antibody modeling and design