Input files¶

GPathFinder needs the previous preparation of three mandatory input files:

A .yaml file with the configuration of the calculation.
A .mol2 file containing the 3D structure of the ligand molecule.
A .mol2 file containing the 3D structure of the receptor molecule.

If you are considering to minimize the sample structures generated by Normal Mode Analysis for the global motions of the receptor, and your receptor has non-standard residues, you will also need a .prmtop file with the receptor topology and the parametrization of such non-standard residues.

.yaml configuration file¶

GPathFinder uses a YAML-formatted input file for setting up the calculation. YAML is a human-readable serialization format, already implemented in a broad range of languages. Input files must contain these five sections:

output. Project options. Configure it to your liking
ga. Genetic algorithm configuration. Normally, you don’t have to touch this, except maybe the number of generations and population size.
similarity. The similarity function to compare potentially redundant solutions.
genes. List of descriptors used to define an individual
objectives. The list of functions that will evaluate your individuals.

Normally, you can start from one of our standard input files, where Genetic Algorithm parameters have been set to appropiate values for a general case. You should choose the input file depending on the following:

What kind of experiment you want to perform: discover (un)binding pathways or analyze a known pathway (initial and final points given in advance).
If you want to minimize the receptor samples before starting the actual calculation or not.
What method you will use to evaluate the solutions (clashes, vina, smina).

Link	Use	Minimization	Clashes	Vina	Smina
Input file 1	Discover unbinding pathways	No	Yes	No	No
Input file 2	Discover unbinding pathways	No	Yes	Yes	No
Input file 3	Discover unbinding pathways	Yes	Yes	Yes	No
Input file 4	Discover unbinding pathways	No	Yes	No	Yes
Input file 5	Discover unbinding pathways	Yes	Yes	No	Yes
Input file 6	Discover binding pathways	No	Yes	No	No
Input file 7	Discover binding pathways	No	Yes	Yes	No
Input file 8	Discover binding pathways	Yes	Yes	Yes	No
Input file 9	Discover binding pathways	No	Yes	No	Yes
Input file 10	Discover binding pathways	Yes	Yes	No	Yes
Input file 11	Analyze a known pathway	No	Yes	No	No
Input file 12	Analyze a known pathway	No	Yes	Yes	No
Input file 13	Analyze a known pathway	Yes	Yes	Yes	No
Input file 14	Analyze a known pathway	No	Yes	No	Yes
Input file 15	Analyze a known pathway	Yes	Yes	No	Yes

If you want to deepen the knowledge of the different parameters and fine-tune the input file, you can follow the tutorial Understanding the different sections of the input file

There is also a list of all the parameters and their default values in List of parameters

.mol2 files for ligand and receptor¶

A typical workflow to prepare the two files for the ligand and receptor molecules starts from a .pdb structure of the Protein Data Bank. Of course, you can also use your own ligand and/or receptor files. For example, you can test a different ligand from the crystallographic one, or a conformation of the receptor obtained from a Molecular Dynamics simulation.

The requirements for the receptor file are:

Small molecules that are not essential to consider in the calculation (like solvent molecules) should be removed.
Alternative locations for residues (i.e. rotamers) should be removed. Only one conformation for each residue is allowed.
In the case of clashes evaluation, it is not necessary to add Hydrogens if the user doesn’t want to consider them.
In the case of clashes+vina evaluation, adding Hydrogens is mandatory.
In the case of minimizing the NMA samples, you have to take special care of terminal residues correctness and repair possible missing residues in your structure. Otherwise, OpenMM (in charge of the minimization) will complain and the calculation will be stopped.

The requirements for the ligand file are:

In the case of unbinding pathway discovery, it is recommended that the 3D coordinates correspond to the binded position. Otherwise, the center of the binding site should be indicated explicitly in the .yaml configuration file (parameter gaudi.genes.path.origin).
In the case of clashes evaluation, it is not necessary to add Hydrogens if the user doesn’t want to consider them.
In the case of clashes+vina evaluation, adding Hydrogens is mandatory.

A tutorial on how to generate .mol2 files using UCSF Chimera starting from a crystallographic structure of the PDB is provided in Preparing ligand and protein files