A suite of mathematical solutions to describe ternary complex formation and their application to targeted protein degradation by heterobifunctional ligands

Small molecule–induced targeted protein degradation by heterobifunctional ligands or molecular glues represents a new modality in drug development, allowing development of therapeutic agents for targets previously considered undruggable. Successful target engagement requires the formation of a ternary complex (TC) when the ligand brings its target protein in contact with an E3 ubiquitin ligase. Unlike traditional drugs, where target engagement can be described by a simple bimolecular equilibrium equation, similar mathematical tools are currently not available to describe TC formation in a universal manner. This current limitation substantially increases the challenges of developing drugs with targeted protein degradation mechanism. In this article, I provide a full, exact, and universal mathematical description of the TC system at equilibrium for the first time. I have also constructed a comprehensive suite of mathematical tools for quantitative measurement of target engagement and equilibrium constants from experimental data. Mechanistic explanations are provided for many common challenges associated with developing this type of therapeutic agent. Insights from these analyses provide testable hypotheses and grant direction to drug development efforts in this promising area. The mathematical and analytical tools described in this article may also have broader applications in other areas of biology and chemistry in which ternary complexes are observed.

An equilibrium binding of a hetero-bifunctional ligand (L) with its target protein (P) and an E3 ligase (E) can be described as a series of bimolecular binding reactions as depicted above. The ligand, L, can initially form a binary complex either with the target protein, P, or with an E3 ligase, E, to form PL or EL with an equilibrium dissociation constant of KP1 and KE1, respectively (Eq.1-1 and Eq.1-2). Each of these bimolecular complexes in turn can bind with E or P to form a ternary complex PLE with an equilibrium dissociation constant of KE2 and KP2, respectively (Eq.1-5 and Eq.1-4). Alternatively, all three of the L, P, and E may simultaneously come together to form a ternary complex in one step with an equilibrium dissociation constant of . While there are up to five different binding reactions in this system, the equilibrium dissociation constants are constrained in certain ways dictated by the thermodynamic principle of pathway independence. In other words, the overall equilibrium constant, KPLE, is dictated only by the difference in the free energy status between the final and initial states and is independent of the specific path. The following equation captures this principle.
Indeed, substituting equations 1-1,1-2,1-4, and 1-5 into the above expression confirms that the overall equilibrium dissociation constant KPLE turns out to be the same regardless of the specific pathway.
First, we find a few simple equilibrium relationships involving the target protein, P, below.
At all times, total concentration of all forms of the target protein species, , should stay the same.

Supporting Information 2. Excel-based kinetic simulation program for ternary complex formation.
An Excel-based program for kinetic simulation can be found in

BHan_TCKinSim_v3.5.4_200505.xlsx
Instruction on how to use the program is provided in the "Instruction" tab.
There are twelve hidden tabs, labeled as "DR1" through "DR12". Do not modify them. The program is run within the "Run" tab, which serves as a control panel. All user-provided data and parameters are entered in this tab only within the areas highlighted with bright yellow.
A backup copy of the control panel is in the "Run_backup" tab. If restoration of default settings is desired or if the program codes get corrupted within the control panel, them make a copy of the desired area from the backup tab and paste into the "Run" tab.
Typical simulation exercise can be done in four steps.
1. Initiation of the simulation w precision value of 0.1, or 10% maximum error.
2. Iterative simulation w precision value of 0.01 in the circular reference mode.
3. 1 st refinement w precision 0.001 4. Optional 2 nd refinement w precision 0.0001 Contact Bomie Han at bomiehan@lilly.com for further questions/instructions. Additional details on how the program works are provided below.
This program contains 12 copies of the identical module that allows simultaneous execution of 12 different reaction conditions, which can be used to simulate a 12-point dose-response experiment. These 12 modules are controlled from a central panel within the tab, labeled as "Run". Each module contains a block of 1000 simulation steps, which can be looped into a circular reference so that one cycle of simulation can go through hundreds of blocks of simulation steps without interference. A typical simulation experiment consists of running one block of simulation followed by three cycles of simulation with increasing precision and decreasing speed at each cycle. Before running the simulation cycles, basic information on the experimental condition is provided within the central panel such as the equilibrium constants (KP1, KE1, and α) and concentrations of individual protein and the ligand. From the three independent equilibrium constants, all the other equilibrium dissociation constants in the Figure 2A are calculated. Each of these equilibrium dissociation constants (Kd) need to be broken down into an association and a dissociation rate constant (k_on and k_off, respectively) in a way to satisfy the relationship of Kd = k_off/k_on. The program provides suggested values based on a simple set of rules show below to maintain consistency across different parts of the system: 1) The ratio of the forward (k_on) and reverse (k_off) rate constants and the equilibrium dissociation constant (Kd) for any given binding reaction must satisfy the equation, Kd = k_off / k_on. 2) The value for the rate constant k_off follows the rule k_off = ln2/t1/2, and the half time of dissociation (t1/2) is 15 sec for equilibrium dissociation constant of 100 nM. 3) Change in Kd values from 100 nM is reflected equally between decrease in k_on and increase in k_off. As a result, half time of dissociation for X nM Kd is SquareRoot(X nM/100 nM) * 15 sec. The user can enter any value as long as the thermodynamic equilibrium principle is not broken.
Once the kinetic simulation is initiated, a small test value of time interval, delta t, is selected and changes in concentration of each component is calculated for the time interval. These values are compared to the pre-set user definable parameters that dictate the speed and precision of simulation. If overall changes in concentration are bigger than maximum value allowed by the simulation precision, then the test time interval is scaled back. If the overall changes in concentration are smaller than this preset value, then a bigger time interval is taken within limits dictated by the maximum simulation speed. The re-calculated changes in concentration of each component within the adjusted time interval is fed into the concentration of each component and the elapsed time as well as the optimized time interval for the simulation cycle is recorded. This completes one step in the simulation process. In the next step, simulation starts with the previously optimized time interval and repeats the whole process. As the reaction progresses and concentrations of individual components change, the optimized time interval keeps changing for each cycle but always starts with the optimized time interval for the previous step. In this manner, the simulation proceeds with the maximum efficiency allowed within the preset precision throughout the whole course of the simulation. Since each simulation cycle records total time elapsed, simulations can be tracked by the reaction time as well as by the number of simulation steps.
For most efficient execution of simulation with the maximum precision, the whole simulation exercise is broken into four different cycles; initiation, execution, refinement, and optional fine refinement. The first cycle of "initiation" consists of one block of 1000 steps of simulation with a precision factor of 0.1, corresponding to maximum allowed change in concentration at any simulation step is 10%. At the end of the initiation cycle, the circular reference is engaged by feeding the outcome of the block of 1000 steps as the initial condition into next block of simulation steps. The second cycle of "execution" consists of repetition of up to 500 blocks (500,000 simulation steps) with a precision factor of 0.01, or 1% maximal change in concentration per simulation step. The third cycle of "refinement" is done in the same manner with a precision factor of 0.001. The optional fourth cycle of "fine refinement" is done with a precision factor of 0.0001. Detailed instruction is provided in the "Instruction" tab within the Excel-based program file. Outcomes of each simulation cycle can be followed from graphical output of the results built in the program.

Supporting Information 3-1, Common variations of the ternary complex system with additional equilibria
Three common variations of the ternary complex system are shown below along with mathematical methods to handle them. In A, the target protein P is in conformational equilibrium with a closed form, Pc, with an equilibrium constant Kc as defined in Eq. 3-1a. Pc cannot bind the ligand. In B, there is an endogenous ligand, C, that binds to the target protein with an equilibrium dissociation constant Ki as described in Eq. 3-2a. The endogenous ligand C competes with the exogenous hetero-bifunctional ligand L. In C, the two extra equilibria described above coexist. In each case, the equilibrium concentration of the ternary complex can be calculated by the equations Eq. 3-4a and 3-4b in D. Notice that these equations are identical to the equations Eq. 2-1a & 2-1b in Figure 1B for the simple ternary complex system except that the binary dissociation constant between the hetero-bifunctional ligand and the target protein, Kp1, was replaced with K'p1 as shown in equations Eq. 3-1b, Eq. 3-2b, and Eq. 3-3c, respectively. Full mathematical derivation for each of these equations are shown in Supporting Information 3-2A, 3-2B, and 3-2C, respectively. Mathematical equations for similar cases in the binary complex system are shown in Supporting Information 3-3 as a comparison.

Supporting Information 3-2B. Case for a competition by a mono-functional ligand for the target protein
When there is an additional equilibrium involving the target protein binding a mono-functional ligand, or a competitor, C, with an equilibrium constant of Ki, total concentration of the target protein, Pt, can be written as

Supporting Information 4-2A. Solving ECmax
ECmax is defined as the concentration of the hetero-bifunctional ligand that gives the maximum concentration of the ternary complex, or the maximum effective concentration. Graphically, this is the concentration that gives the slope of the PLE as a function of L on the semi-log plot zero, or

Supporting Information 5-1. Summary of features for Gaussian function and their interconversion to geometric features for ternary complex dose-response curve
Geometric features of the Gaussian function are shown below along with their translation to the geometric features of the ternary complex dose-response curve shown in Supporting Information 4-1. Derivation for the equation Eq.5-7 is shown in Supporting Information 5-2.

Supporting Information 5-3. Iterative LeastSumSquare method (iLSS) for Gaussian curve fitting of the experimental dose-response data using total ligand concentration
The iterative LeastSumSquare (iLSS) method was used for curve fitting of the simulated ternary complex signal (TCS) dose-response data with Gaussian function. Log10 values were used for all concentration values for this function. Initial set of values were assigned to each of the three parameters for the function, Amplitude, Mean, and SD using a simple rule as follows: maximum observed TCS signal was assigned to Amplitude. Concentration that gave the maximum TCS signal was assigned to Mean. Standard deviation of all concentration values in log10 base was assigned to SD. With this initial set of values, expected TCS values were calculated and compared with the measured TCS values. Sum of the square of the differences at each ligand concentration (SumSquare) was calculated. Small variations were tested for all parameters in a systematic manner and the set of parameters that gives the minimum value of SumSquare (LeastSumSquare) was searched. In other words, the parameters were optimized to minimize the SumSquare value. Multiple iterations are required for this method and this process was automated by using Solver function within Microsoft Excel. For successful optimization of the parameters for this method, starting with a reasonable set of initial test values is important. The rule described above worked satisfactorily in most cases. A template program is provided under "Gss_totalL" tab in an Excel file, BHan_TCextLSS_v3.5.4_200221.xlsx. Solver function within Microsoft Excel needs to be enabled by following the menu options under File → Options → Add-Ins → Analysis ToolPak → then checking Solver Add-In option. Once enabled, the Solver function can be accessed under Data → Solver. The program automatically calculates not only the three parameters of the Gaussian function but also other useful parameters such as FWHM and AUC.

Supporting Information 6A. Iterative LeastSumSquare method (iLSS) for optimizing equilibrium constants (KE1, KP1, and α) in the PLE function from experimental dose-response data using total ligand concentration
The same iLSS method shown in Supporting Information 5A can be used to curve fit the experimental dose-response data with the PLE function and obtain optimized values for the equilibrium constants KP1, KE1, α, and the system conversion factor β. In this case, unmodified total ligand concentration should be used as opposed to the log10 value for the Gaussian function. However, initial evaluation of the method using the simulated experimental data showed that the equilibrium constants from this exercise deviated significantly from the true values due to ligand depletion. Information on free ligand concentration was needed for this purpose. A numeric approach can be used to get a highly precise value for the free ligand concentration as explained in the Supporting Information 6B. Extended LeastSumSquare (extLSS) method (Supporting Information 6C) was developed to incorporate numeric calculation of the free ligand concentrations in every iteration within the iLSS method.

The concentration symbol, [ ], is omitted for [L], [PLE], [Pt], and [Et] below for convenience.
Solution for a cubic function, y = L 3 + L 2 + L + can be obtained numerically as illustrated in the graph below: Starting with an initial value of L0 = Lt, find a line that passes through (L0, y0) with a slope of y' = 3L 2 + 2 L + . Intercept of this line at the horizontal axis is (L1, 0). Repeat the above process until y value approaches zero. Usually, 5 cycles (L5) is more than sufficient to get an excellent estimate of L that satisfies the equation y = L 3 + L 2 + L + = 0.

Supporting Information 6C. Extended LeastSumSquare (extLSS) method for finding equilibrium constants from Ternary Complex Dose-Response data
In the extended LeastSumSquare (extLSS) method, numeric calculation of the free ligand concentration through an iterative process was looped into the iLSS method for every iteration of the parameter adjustment. Similar to the Gaussian curve fitting of the data with iLSS method, selection of reasonable initial condition was important for successful estimation of the equilibrium constants. ECmax and TCSmax (maximum ternary complex signal) values from the Gaussian curve fitting of the same data using the total ligand concentration were used to generate initial values for the four parameters, KP1, KE1, α, and β in combination with the information on total protein concentration. Initial value for KE1 was assigned to be 100 nM, and the initial value for KP1 was calculated from the relationship KP1 = ECmax 2 /KE1. Median value of [PLE]max across all data sets was assumed to be 10% of the total protein concentration and the initial value for β for individual data set was calculated from the tentative [PLE]max value and the measured TCSmax value. With these assumptions, initial value for α was calculated from the Eq.6-1a in Supporting Information 6D. When system conversion factor β was floated for initial evaluation and for system calibration, it was obtained from TCSmax = β * [PLE]max. Otherwise, β was fixed to the calibrated value. A template program is provided under "extLSS_freeL" tab in the Excel file BHan_TCextLSS_v3.5.4_200221.xlsx. Solver function needs to be enabled within Microsoft Excel as explained for the iLSS method in the Supporting Information 5-3. Evaluation of the extLSS method using simulated training data set is described in the Supporting Information 6E.
The Excel-based curve fitting programs, BHan_TCextLSS_v3.5.4_200221.xlsx, contains 1. Gss_ totalL; Gaussian Curve fitting on total [L] by iterative LeastSumSquare (iLSS) method 2. extLSS_freeL; Gaussian Curve fitting on total [L] by iLSS method + PLE curve fitting on free [L] by extLSS method 3. Instruction; instruction on how to use the program User-provided dose-response data and total protein concentration information are entered within the areas highlighted with bright yellow. Back up copy of each module is placed as "Hidden" tabs. If restoration of default settings is desired or if the program codes get corrupted within the program areas, them make a copy of the desired area from the backup tab and paste into the affected areas.
Contact Bomie Han at bomiehan@lilly.com for further questions/instructions. In all cases, the optimized value of ECmax was very close to the true value with the extLSS method, whereas iLSS method consistently reported higher value than the true value. This was consistent with lack of consideration for ligand depletion in the iLSS method based on total ligand concentration.

Supporting Information 6C. Equations for α, KP1, & KE1
Assigning individual values for KP1 and KE1 from this optimization exercise was more challenging when the simulated data contained random errors. Overall, for ligands with similar KE1 values, the extLSS method performed very well to predict changes in the KP1 or the binary affinity for the target protein, but the system performed poorly in correctly predicting changes in KE1 value in all cases. This behavior stems from the symmetry in mathematical terms between the target protein P and the E3 ligase E. When either KP1 or KE1 values are significantly different from the initial test values for the optimization iteration, then the optimization method cannot determine which one needs to be changed to get the best fit. Some of the optimized values in this figure have the KP1 and KE1 values flipped. Optimized binary equilibrium constants (KP1 and KE1) will have to be compared to structural information of the heterobifunctional ligand for proper assignment of the KP1 and KE1 values. Optimizing the cooperativity factor α from the extLSS method did not perform well when all four parameters were floated (Top panel). When the calibration factor β was fixed to the median value from the first round of optimization (Middle panel), the rank order of the α values was correctly predicted even though the absolute values showed deviations from the true values especially for high α values. When the β value was fixed to the true median value (Bottom panel), the optimized α values showed an excellent agreement with the true values. In other words, when the system is correctly calibrated to obtain the best β value, then the extLSS method can correctly give an excellent estimates of the cooperativity factor α values.
From these observations, I suggest the following steps for best estimation of the equilibrium parameters from the ternary complex dose-response data. Once the assay system is chosen for measurement of the ternary complex, evaluate existing hetero-bifunctional ligands using simple Gaussian curves and determine ECmax and TCSmax values based on total ligand concentration. Choose multiple ligands with various ECmax and TCSmax values as a set of calibrators for the system. When significant changes in affinity for the E3 ligase is expected among different ligands based on structural information, then select a separate set of ligands with similar expected affinity for the E3 ligase within the set. Use these set of calibrators to go through extLSS method with all four parameters floating. Alternatively, the system can be calibrated using a ligand(s) with known equilibrium constants, if available. Next step is to calculate the median value of the system calibration factor β from these calibrators and the system is considered "calibrated." Use the calibrated value of β for all subsequent optimization of the three equilibrium constants, KP1, KE1, and α, for these and other ligands. As with all non-linear curve fitting methods that rely on minimizing the sum of squares of the difference between the measured and predicted values, successful search for optimum set of parameters depend on starting with a reasonable set of initial test values. In this extLSS method in the template provided in the BHan_TCextLSS_v3.5.4_200221.xlsx, outcomes from the Gaussian curve fitting of the experimental data with the total ligand concentration were used to generate a "reasonable" set of initial test values.
Once the system conversion factor β is determined from the curve fitting, then the measured TCS data can be converted to molar concentration of the ternary complex PLE. For this reason, β can be also called a system calibration factor. When the system is calibrated, it becomes possible to calculate what fraction of the total target protein was engaged in the ternary complex under given experimental condition. It may seem counterintuitive at first that absolute concentrations can be calculated from relative signal intensities without an external calibration. The reason why this is possible is because the PLE function (Eq. 2-1a/b) contains reference to known molar concentrations of the total protein, [Pt] and [Et], which serve as a frame of reference in this calculation.