### 3D-DAOSTORM algorithm

This algorithm attempts to fit overlapping images of emitters simultaneously with multiple Gaussian peaks using an approach similar to that employed by DAOPHOT, an algorithm previously developed for analyzing images of stars (Stetson 1987). DAOPHOT has been recently used to analyze 2D super-resolution data where the PSFs of all emitters are assumed to have the same shape, an approach termed DAOSTORM (Holden et al. 2011). Here we extend this approach to analyze astigmatism-based 3D super-resolution data, where the PSFs of emitters can be approximated by elliptical Gaussians whose ellipticity depends on the *z* position of the emitter.

The basic idea behind the DAOPHOT algorithm is to fit the detected emitters in an image, and then examine the residual image after subtracting the fit for evidence of any undetected emitters. These emitters are then fit simultaneously with the emitters identified in the previous cycle, and the process is repeated until there is no further indication of undetected emitters in the residual image. The primary differences between the original DAOPHOT and the 3D-DAOSTORM algorithm developed here are as follows:

- (1)
DAOPHOT fits the image of every emitter with a fixed-shape PSF. To extend the algorithm to the analysis of astigmatism-based 3D super-resolution data, where the PSFs of emitters vary with their

*z* position and can be modeled by elliptical Gaussians with varying

*x* and

*y* widths, the images of individual emitters are fit with:

$bg+h{e}^{-2{\left(x-{x}_{0}\right)}^{2}/{w}_{x}^{2}(z)}e-2{\left(y-{y}_{0}\right)}^{2}/{w}_{y\phantom{\rule{0.12em}{0ex}}}^{2}(z)$

(1)

where

*w*
_{
x
} and

*w*
_{
y
} are pre-determined functions of

*z*. The dependence of

*w*
_{
x
} and

*w*
_{
y
} on

*z* was determined by fitting defocusing curves to images of single emitters bound to a coverslip (Huang et al.

2008)

- (2)
The error in the fit is calculated using the maximum likelihood estimator suitable for a Poisson distribution of error as previously described (Laurence et al. 2010; Mortensen et al. 2010; Smith et al. 2010). This approach gives superior fitting performance for data where the number of detected photons is sufficiently low that the Gaussian distribution of error assumed by least squares fitting is not valid.

- (3)
The DAOPHOT algorithm groups overlapping images of emitters and simultaneously fits all of them together. This approach involves deciding which emitters overlap sufficiently to be grouped together. It also requires solving a set of coupled linear equations by inverting a matrix with (MxN)^{2} elements, where M is the number of parameters that describe the PSF of each emitter and N is the number of emitters whose images overlap. As it is computationally expensive to solve these coupled equations due to the poor scaling of matrix inversion with matrix size, we thought to simplify the problem by fitting each emitter independently, but in an iterative fashion. We accomplished this task by performing a single cycle of fit improvement for each emitter independently, then recalculating the overall fit image based on the updated position of every identified emitter (details are given below). This procedure is repeated until the algorithm either converges or reaches a pre-determined maximum number of cycles.

- (4)
Unlike the DAOPHOT algorithm we do not include a cubic spline term to correct for deviations in the PSF from an idealized Gaussian. Including this term could further improve the fitting accuracy.

The flow of operations of the algorithm is described below:

- 0.
Initialize the algorithm.

0.1. To start by fitting only the peaks from the brightest emitters in the field of view, we first set a threshold value for the peak height to be equal to 4x the user-specified minimum peak height, *h*
_{0} (*h*
_{0} typically equals 75 photons).

0.2. Set the residual image equal to the original image.

- 1.
New localization identification.

1.1Identify pixels in the residual image that are both greater than all neighboring pixels within a user-specified radius (typically 5 pixels) and greater than the current peak height threshold. Mark the center positions of these peak pixels as new localizations, and ignore a pixel if it has already been chosen twice previously as a potential localization. The latter criterion was added to avoid getting trapped in an infinite, futile cycle of first identifying a localization, then removing it due to other screening criteria (such as being too close to a neighbor), and then identifying the same localization again in the next cycle. We found empirically that allowing at most two localizations per pixel was a good compromise, which breaks the repeated, futile localization addition cycle without substantially restricting the addition of valid localizations.

1.2. If no new localizations were identified and the localization height threshold is at its minimum value *h*
_{0}, then exit the algorithm and return the current list of localizations.

1.3. If the localization threshold value is greater than *h*
_{0}, decrease the localization height threshold value by *h*
_{0} in preparation for the next cycle of localization identification.

1.4. Add the newly identified localizations that are at least 1 pixel away from all current localizations into the current list of localizations and flag them as “running”. Even though it is possible that some activated emitters are separated by less than a single pixel, we do not attempt to discriminate them as the signal-to-noise ratio of the images is not high enough. Current localizations that are closer than a user-specified distance (typically 5 pixels) of a newly identified localization are flagged as “running” to indicate that further refinement of their parameters may be necessary.

1.5. Set the parameter-dependent clamp values,

*C*
_{
k
}, for all the localizations to the default values (1000.0 for

*h* 1.0 for

*x* and

*y*, 3.0 for

*w*
_{
x
} and

*w*
_{
x
}, 100.0 for

*bg* and 0.1 for

*z*) in preparation for the parameter refinement in the next step.

- 2.
Refining localization parameters.

2.1. For each localization, determine a fitting neighborhood within which fitting to the image is performed to refine localization parameters. We calculate the neighborhood size (defined by *X* and *Y* along the *x* and *y* directions) based on the current *w*
_{
x
} and *w*
_{
y
} values of the localizations. This neighborhood extends to twice *w*
_{
x
} or *w*
_{
x
} from the localizations center position.

2.2. Calculate the fit image, *f*, from the list of localizations by drawing an elliptical Gaussian for each localization using Eq.1 and the current fitting parameters. For reasons of efficiency, *f* is only calculated within the fitting neighborhood of the localizations as determined in step 2.1.

2.3. Calculate the fit error for each of the “running” localizations using the following equation (Laurence and Chromy

2010):

${\chi}_{\mathit{MLE}}^{2}=2\sum _{i=1}^{N}({f}_{i}-{g}_{i})-2\sum _{i=1,y\ne 0}^{N}{g}_{i}ln(\raisebox{1ex}{${f}_{i}$}\!\left/ \!\raisebox{-1ex}{${g}_{i}$}\right.)$

Where *f*
_{
i
} is the value of the fit image at pixel *f*
_{
i
} in the neighborhood of the localization, *g*
_{
i
} is the actual image intensity at pixel *i*, and *N* is the number of pixels in the fitting neighborhood. If |current error – previous error|/current error is less than a threshold (typically set to 1.0e^{-6}), flag the localization as “converged”.

2.4. For each “running” localization (i.e. those localizations that have not converged as judged by the convergence criteria in step 2.3) perform a single cycle of fit optimization as described below.

2.4.1. Calculate the Jacobian (

*J*) vector using the following equation (Laurence and Chromy

2010)

$\begin{array}{ll}{J}_{k}& ={\nabla}_{a}{\chi}_{\mathit{MLE}}^{2}=\frac{\partial {\chi}_{\mathit{MLE}}^{2}}{\partial {a}_{k}}\\ =2\sum _{i=1}^{N}\left(1-\frac{{g}_{i}}{{f}_{i}}\right)\frac{\partial {f}_{i}}{\partial {a}_{k}}\end{array}$

where *J* is a vector containing the first derivatives of *χ*
_{
MLE
}
^{2} with respect to each parameter in the Gaussian that is fit to the localization, and *a*
_{
k
} are the parameters describing the Gaussian, which include the background value, *bg*, the peak height *h*, the centroid position of the peak in *x* and *y*, (*x*
_{0}
*y*
_{0}), and the peak widths in *x* and *y*, (*w*
_{
x
}
*w*
_{
y
}).

2.4.2. Calculate the Hessian matrix (

*H*) using the following equation (Laurence and Chromy

2010)

$\begin{array}{ll}{H}_{\mathit{kl}}& ={\nabla}_{a}{\nabla}_{a}{\chi}_{\mathit{MLE}}^{2}=\frac{{\partial}^{2}{\chi}_{\mathit{MLE}}^{2}}{\partial {a}_{k}\partial {a}_{l}}\\ =2\sum _{i=1}^{N}\frac{\partial {f}_{i}}{\partial {a}_{k}}\frac{\partial {f}_{i}}{\partial {a}_{l}}\frac{{g}_{i}}{{f}_{i}^{2}}\end{array}$

Note that we ignore the second derivative terms in *H* as suggested in (Laurence and Chromy 2010).

2.4.3. Calculate the parameter update vector *U* by solving *HU* = *J* using the LAPACK function dposv (Anderson et al. 1999). The vector *U* describes how to best adjust each of the parameters *a*
_{
k
} of the localization to reduce the error in the fit.

2.4.4. Subtract the Gaussian peak calculated using the current localization parameters from the fit image *f* calculated in 2.2. As more cycles of optimization are performed, more of the localizations will have converged. To avoid having to recalculate *f* for all of the localizations since many of them will not have changed, we subtract the localizations with the “current parameters” from *f* in this step and then add the localizations with the “updated parameters” back to *f* in a later step (step 2.4.8).

2.4.5. Update individual localization parameters (*a*
_{
k
}) based on the parameter update vector U and the parameter specific clamp value *C*
_{
k
} using the formula ${a}_{k}(new)={a}_{k}\left(old\right)+{U}_{k}/(1+\frac{abs\left({U}_{k}\right)}{{C}_{k}})$

If the sign of *U*
_{
k
} has changed since the previous iteration, then *C*
_{
k
} is first reduced by a factor of 2. The *C*
_{
k
} value suppresses oscillations in the optimization as well as damping excessively large corrections (Stetson 1987). Initial values for each *C*
_{
k
} of the localization are set when the localization is created (step 1.5).

2.4.6. Flag localizations that have a negative background value, *bg*, peak height *h*, or peak widths (*w*
_{
x
}, *w*
_{
y
}), as “bad”. These localizations are ignored in subsequent iterations of the fit, and removed from the current list of localizations in step 3.1.

2.4.7. Adjust the size of the localization’s neighborhood, *X* and *Y* , based on the updated *w*
_{
x
} and *w*
_{
x
} parameters.

2.4.8. If the localization is not “bad” add it back to the fit image calculated in 2.2 with the updated parameters.

2.5. If there are still “running” localizations and the maximum number of iterations (typically set to 200) has not been reached, go to step 2.3.

- 3.

3.1. Construct a new localization list containing only those localizations that are “converged” or “running”, have a height (*h*) greater than 0.9*h*
_{0} and have widths (*w*
_{
x
}, *w*
_{
y
}) greater than a user-specified value (typically 0.5 pixels).

3.2. Remove all the localizations in this list whose height is less than that of any neighboring localizations within a user-specified distance (typically 1 pixel). Such localizations tend to be false localizations due to the limited signal-to-noise ratio of our images. Nearby localizations to the ones that are removed are flagged as “running”.

3.3. Repeat step 2 (parameter refinement) with the new list of localizations, then go to step 4. This additional step is performed even if no localizations are removed in step 3.2. It gives localizations that may still be “running” additional cycles to converge. In the event that all the localizations have “converged”, this repetition of step 2 will finish almost immediately.

- 4.
Update the residual image.

4.1. Estimate the background by subtracting the fit image from the original image, then smoothing the result with a 2D Gaussian with a sigma of 8 pixels. The smoothing helps to suppress noise in the background image, and is justified under the assumption that the actual background varies smoothly across the image.

4.2. Calculate the new residual image.

4.2.1. Set the residual image equal to the original image minus the fit image.

4.2.2. Compute the mean value of the residual image.

4.2.3. Subtract the estimated background from the residual image. This step flattens the residual image in situations where the background is not uniform across the image. Flattening the residual image in turn makes it easier to identify to new localizations in subsequent iterations of the algorithm.

4.2.4. Add the mean value from 4.2.2. back to the residual image.

- 5.
Termination of the algorithm.Go to step 1 if the total number of iterations has not been exceeded (typically 20). If the residual image is such that no new localizations will be found, then the algorithm will exit at step 1.2. If new localizations can still be found in the residual image even after 20 iterations we terminate anyway as we are most likely caught in an infinite loop. For most of the images that we have analyzed the total number of iterations performed is less than 7.

The algorithm was implemented in a combination of the C and Python languages. It is available for download at http://zhuang.harvard.edu/software.html.

#### Generation of simulated STORM images

Simulated STORM images were generated with the following parameters, 20 photons/pixel background, a constant 2000 photons per emitter, an overall camera gain of 3, and a camera read noise of 2. These parameter values are close to real experimental values. The emitters were placed on the image with a uniform random distribution in *x* and *y*. The *z* location of the localization was randomly distributed in a range of 800 nm. Localization widths (*w*
_{
x
}, *w*
_{
y
}) were calculated based on the *z* location using the defocusing curve ${w}_{x,y}\left(z\right)={w}_{o}\sqrt{1+{\left(\frac{z-{c}_{x,y}}{d}\right)}^{2}}$ with *w*
_{
o
} = 2 pixels, *c*
_{
x
} = 150 nm, *c*
_{
y
} = − 150 nm, *d* = 400 nm and *z* = − 400 nm to 400 nm. These values are again close to real experimental values. The overall image was generated as the sum of the elliptical Gaussian functions associated with individual localizations based on the above-described parameters. The noise due to the EMCCD gain of the acquisition camera was modeled with an exponential distribution.

#### Lectin labeling of retina and brain tissue samples

3–6 month old C57 mice were euthanized by asphyxiation with CO2 following procedures approved by the Harvard University Animal Care and Use Committees. The eyes or brains were then dissected, fixed by immersion in 4% paraformaldehyde (PFA), and select areas of interest, such as retina or regions of the cerebral cortex were further dissected. The fixed tissue was then washed with phosphate buffered saline (PBS) and stored in PBS at 4°C until use. Fixed tissue was incubated with the Alexa-647 dye labeled lectin at a concentration of 0.25 mg/ml for 3–5 days at 4°C in a labeling buffer containing PBS supplemented with 0.49 mM Mg^{2+} and 0.90 mM Ca^{2+}. The tissue was then washed extensively with the labeling buffer and fixed overnight with 2% PFA, 0.2% Glutaraldehyde in the labeling buffer. The tissue was sectioned at 50 nm (for 2D STORM imaging) or 100 nm (for 3D STORM imaging) thickness with a Leica UC6 ultra-microtome. The tissue sections were transferred to cleaned coverslips and stored on coverslips at room temperature prior to use. The following Alexa-647 labeled lectins were used in this study: Concanavalin A (ConA, #C21421), Wheat Germ Agglutinin (WGA, #W32466), Peanut lectin (PNA, #L32460) and Red Kidney Bean lectin (PHA-L, #L32457), all purchased from Invitrogen.

#### STORM imaging of the lectin-labeled brain and retina tissue

Flow channels containing the tissue samples were constructed by sandwiching two pieces of double stick tape (3 M) between the coverslip with the tissue sections and a microscope slide. The following imaging buffer was added to the flow channel: 10 mM Tris, 50 mM NaCl, 0.1% Triton-X100, pH8.0 supplemented with 0.5 mg/ml Glucose Oxidase (Sigma, G2133), 40 ug/ml Catalase (Sigma, C100), 5% Glucose and 100 mM cysteamine (Sigma 30070). For 3D STORM imaging, 1% (v/v) beta-mercaptoethanol (Sigma, 63689) was used instead of 100 mM cysteamine. After the addition of the imaging buffer, the flow channel was sealed with 5 minute epoxy and placed on a custom microscope setup built for STORM imaging. Epoxy sealed samples were imaged within a few hours of preparation.

Low resolution conventional fluorescence images were taken with a 2x air objective (Nikon, Plan Apo λ, 0.1NA) first to locate the tissue sections on the coverslip and to find the areas of interest. Once an area of interest was identified, a high resolution conventional fluorescence picture was taken with a 100x oil immersion objective (Nikon, Plan Apo λ, 1.45NA), followed by a STORM image.

STORM imaging was performed on a custom setup built around a Nikon TiU inverted microscope (Huang et al. 2008; Bates et al. 2007). Illumination of the Alexa-647 dye was provided by a 300 mW 656 nm solid state laser (Crystalaser, CL656-300). The 656 nm laser light excites fluorescence from Alexa 647 and switches the dye off rapidly. The same light also reactivates Alexa 647 back to the fluorescent state, but at a very low rate such that only a small fraction of the dye molecules (~0.1%) emit fluorescence at any given instant. When necessary, a 50 mW 405 nm diode laser (Coherent, Cube-405) was used to increase the dye activation rate (Dempsey et al. 2011). The output of the lasers was combined and coupled into a single mode photonic fiber (NKT Photonics, LMA-8) for transmission to the STORM microscope. Light from the fiber was collimated and focused on the back-focal plane of the microscope objective. The illumination was adjusted from epi-flourescence to total internal reflection by translating the illumination beam across the back-focal plane of the objective. Imaging was performed with a 100x oil immersion objective (Nikon, Plan Apo λ , 1.45NA). The laser intensities at the sample were ~1 kW/cm^{2} for the 656 nm laser light and ~20 W/cm^{2} for the 405 nm laser light. The fluorescence signal was recorded with an EMCCD camera (Andor, DU-897). For 3D STORM images, a 1 m focal length cylindrical lens was added to the optical path to provide astigmatism, such that the PSF of individual emitters appear elliptical with ellipticity depending on the z position of the emitter (Huang et al. 2008). In addition the setup had an infrared focus lock system that was used to stabilize the distance between the microscope objective and the sample (Huang et al. 2008).