INTRODUCTION    
    Cells of a multicellular organism can assume different    phenotypes that can have markedly different morphological and    gene expression patterns. A fundamental question in    developmental biology is how a single fertilized egg develops    into different cell types in a spatialtemporally controlled    manner. Cell phenotypic transition (CPT) also takes place for    differentiated cells under physiological and pathological    conditions. A well-studied example is the    epithelial-to-mesenchymal transition (EMT), central to many    fundamental biological processes, including embryonic    development and tissue regeneration, wound healing, and    disease-like states such as fibrosis and tumor invasiveness    (1). One additional example is    artificially reprogramming differentiated cells, such as    fibroblasts, into induced pluripotent stem cells and other    differentiated cell types such as neurons and cardiomyocytes    (2). CPT is ubiquitous in biology,    and a mechanistic understanding of how a CPT proceeds emerges    as a focused research area with an ultimate goal of achieving    effective control of the phenotype of a cell.  
    Recent advances in snapshot single-cell techniques, including    single-cell RNA sequencing (scRNA-seq) and imaging-based    techniques, pose new questions. One such question is, how does    a CPT process proceed, step by step, in the continuously    changed high-dimensional gene expression (e.g., transcriptome    and proteome) space? These destructive methods, however, are    inherently unable to reveal the temporal dynamics of how an    individual cell evolves over time during a CPT.Approaches such    as pseudo-time trajectory analysis (3),    ergodic rate analysis (4), and RNA    velocities (5) have been    developed to retrieve partial dynamical information from    snapshot data.  
    However, some fundamental limits exist in inferring dynamical    information from snapshot data (6).    Figure 1 illustrates that inference from    snapshot data unavoidably misses key dynamical features. A    bistable system is coupled to a hidden process, e.g.,    epigenetic modification for a cellular system, which is slow    compared to the transition process being studied (Fig. 1A). Presence of the slow variable    leads to observed heterogeneous dynamics (Fig. 1B,    more details of the simulation are in fig. S1). Individual    trajectories show characteristic stepping dynamics, but the    transition positions vary among different trajectories due to    the system assuming different values for the hidden variable.    Consequently, when only snapshot data are available,    information about the temporal correlation of individual cell    trajectories is missing, and one cannot deduce the underlying    two-state dynamics (Fig. 1C).  
    (A) A double well potential with one    observable coupled to a hidden variable with dynamics much    slower than that of the process under study. Arrows represent    simulations of two trajectories that start from well A and jump    to B. The black line represents the boundary of the two wells.    (B) One-dimensional (1D) potential slices    along the observable coordinate corresponding to the    transitions indicated in (A). (C) Superimposed    trajectories from stochastic simulation (shown as background),    with two typical trajectories highlighted corresponding to the    transitions in (A). The hidden variables for the green    trajectory and the magenta trajectory take the values 0.47 and    0.45, respectively. a.u., arbitrary units. (D)    Stacked histograms of the observable at various time points    [t0, t1,    t2, t3, and    t4 in (C)], reflecting the snapshot data.    Blue bars were sampled from points that reside in well A. Red    bars were sampled from points that reside in well B. Notice    that the information as to which well a sampling point resides    is obtained from the full 2D potential and cannot be accurately    derived from the snapshot data.  
    The necessity of live-cell trajectories has been illustrated in    a number of studies, such as the information capacity of a    signal transduction network (7),    incoherent feed-forward loops to detect only fold change but    not the absolute change of an input signal (8), and stepwise cellular responses    to drugs (9). Two recent studies    conclude a linear path for EMT from analyzing single-cell    RNA-seq and proteomic data (10, 11). Discrepancy, however, exists    between this conclusion and theoretical predictions of parallel    paths for a multistable system like EMT (12, 13), raising a question whether    the theoretical models need to be modified or the inferred    linear path is an artifact from snapshot data. Live-cell    imaging is needed to address such a question.  
    Therefore, acquiring information from long-term CPT dynamics    requires tracking individual cells through live-cell imaging,    typically with time-lapse fluorescent imaging. However,    identifying appropriate molecular species that faithfully    reflect the process for labeling and generating such labeling    can be tedious and time-consuming. In addition, multiplex and    frequent fluorescent image acquisition over a long period of    time, e.g., days, is necessary for characterizing a CPT process    but is severely limited by the number of available fluorescence    channels and cytotoxicity concerns.  
    In short, a technical dilemma exists: Fixed cellbased    techniques provide high-dimensional expression profiles of    individual cells but lack true dynamical information, while    fluorescent labelingbased live-cell imaging techniques    typically provide dynamical information for only a small number    of dynamical variables. To tackle the substantial challenges in    CPT studies, here, we develop a framework for extracting cell    dynamical information through quantitative analysis of    live-cell trajectories in a high-dimensional cell feature    space. Our method resides on the observation that a cell state    can be defined by either the expression pattern or cell    morphological features. The latter broadly refers to collective    cellular properties such as cell body shape, organelle    distribution, etc., which are convenient for live-cell imaging,    with and without labeling. Hundreds of these morphological    features have been routinely used in pathology and in a number    of fixed- and live-cell studies for defining and studying cell    phenotypes and drug responses (9).    Introduction of this framework allows one to study CPTs in the    context of well-established rate theories (14). Rate theories study dynamical    processes of escaping from a metastable attractor or relaxing    to a newly established attractor.  
    We applied this framework to study transforming growth factor    (TGF-)induced EMT in a human A549 derivative cell line with    endogenous vimentinred fluorescent protein (VIM-RFP) labeling    [American Type Culture Collection (ATCC) CCL-185EMT]. We    represented the state of a cell at a given time as a point (or    equivalently a vector) in a 309-dimensional composite feature    space of the cell body contour shape and texture features of    vimentin, an intermediate filament, and a key mesenchymal    marker. While the framework is for morphological features in    general, in this study, we focus on the cell body shape and use    cell shape and morphology indistinctively. Through quantifying    time-lapse images, aided by a deep learningbased image    analysis algorithm, we were able to define the epithelial and    mesenchymal regions in the state space and unravel two parallel    pathways that EMT proceeds through. We provide a Python    package, multiplex trajectory recording and analysis of    cellular kinetics (M-TRACK), for studying CPT in the composite    feature space. The framework will provide a foundation for    quantitative experimental and theoretical studies of CPT    dynamics.  
      The A549 VIM-RFP cell line (ATCC CCL-185EMT) was generated as      a model system for studying EMT. It was created using      CRISPR-Cas9 technology, in which the RFP sequence was      inserted just before the stop codon of one allele of the      endogenous vimentin gene (Fig. 2A and      fig. S2A). The VIM-RFP knock-in allele was confirmed by      sequencing (fig. S2B) and colocalization immunostaining of      vimentin and VIM-RFP (Fig. 2B) with a Pearson      correlation coefficient 0.9 and a Manders overlap coefficient      0.93.    
      (A) Schema of CRISPR-Cas9mediated      generation of VIM-RFP knock-in allele. LHA, left homology      arm; RHA, right homology arm; pA, polyadenylation signal;      BSR, blasticidin resistance; exon9, exon #9 of vimentin gene.      (B) Colocalization immunostaining of      vimentin and endogenous VIM-RFP. Scale bar, 20 m.      (C) Matrigel invasion assay of A549 VIM-RFP      cells. After TGF- induction, cells show increased invasive      capacity. Scale bar, 100 m. Data represent mean  SD from      three repeated experiments. Students t test;      asterisk denotes P < 0.01.    
      EMT studies reach a consensus that there is a continuous      spectrum of EMT phenotypes (1). Previous studies report that      the parental A549 cells have already undergone partial EMT      with detectable basal vimentin expression and can complete a      full EMT upon TGF- treatment (11). Consistently, with      recombinant human TGF-1 treatment, the A549 VIM-RFP cells      showed an increased invasive capacity reflecting the      functionality of mesenchymal cells (Fig. 2C)      and increased Snail1 and N-cadherin expressions (fig. S2C).      After 2 days of continuous treatment, most cells underwent      apparent cell shape changes from round polygon shapes to      elongated spear shapes, characteristic changes of EMT. These      results confirmed that under TGF-1 treatment, the A549      VIM-RFP cells completed a full EMT to convert from a partial      EMT to a complete mesenchymal phenotype, and this is the      process that we focused on in this study.    
      We performed time-lapse imaging on the A549 VIM-RFP cells      continuously treated with TGF-1 for 2 days, with an imaging      frequency of one frame per 5 min for morphology and one frame      per 10 min for vimentin (more details in Materials and      Methods). To mathematically describe how a CPT (EMT here)      proceeds, one needs to choose a mathematical representation      of cell status at a given time. For representing the cell      shape, we adopted the active shape model that has been widely      used in computer-based image analyses (15) and particularly, in cell      biology studies (16, 17), but here, we use it for the      purpose of forming a complete orthonormal basis set (Fig. 3). That is, we first segmented the      images using our modified deep convolutional neural network      procedure (Fig. 3A, see Supplemental Text for      details) and tracked individual cell trajectories. Each cell      shape was aligned to a reference shape and was approximated      by N (=150) landmark points equally spaced along the      cell contour (Fig. 3B) (18). For two-dimensional (2D)      images, a cell was specified by a point z =      (x1, x2, ,      xN; y1,      y2, , yN) in the      2N-dimensional morphology space (Fig. 3C). By performing principal      components analysis (PCA) on the dataset of a collection of      single-cell trajectories, one constructs a complete      orthonormal basis set with the 2N  4 eigenvectors      {a}, or the principal modes, for the morphology      space. Notice that alignment fixes four degrees of freedom      (center and orientation). An attractive feature of the      principal modes is that they have clear physical meanings.      For example, the two leading principal modes of A549 VIM-RFP      cells undergoing EMT reflect cell growth along the long and      short axes, respectively (Fig. 3D). Then,      any cell shape that is approximated by the landmark point      z(t), which is generally time      dependent in live-cell imaging, can be expressed as a linear      combination of these principal modes z(t)=i=12N4ci(t)ai(Fig. 3E). Therefore,      c(t) =      (c1(t),,      c2N4(t)) forms a      trajectory in the shape space expanded by the basis set      {a}.    
      (A) Segmentation of single cells with the      deep convolution neural networks (DCNN)/watershed method.      (B) Extraction of cell outline from      segmented mask of individual cells, followed by resampling      using the active shape model and aligning all cell outlines      to a mean cell shape for consistent digitization of cell      morphology features. (C) Representation of      single-cell shapes as points in the 2N       4dimensional morphology space. Each dot represents one cell.      (D) Principal modes of morphology variation.      Left: first principal mode (PC1). Right: second principal      mode (PC2). The 1 represents the corresponding coordinate      value on the axis of morphology PC1 or PC2. The principal      modes reflect the characteristics of cell morphology      variation along the PC axes. (E) A      representative cell shape (left) and its reconstruction with      the first seven leading principal modes. (F)      A typical single-cell trajectory in the two leading      morphology PC domains (left) and its corresponding contours      (triangle dots marked by arrows in the left that have the      same color as the contours) at various time points (right).      Each dot represents an instantaneous state of the cell in the      morphology space. Color bar represents time (unit in hour).    
      Figure 3F shows a typical trajectory      projected to the first two leading principal component (PC)      modes in the morphology space and their corresponding time      courses of cell contour shape changes. Over time, this cell      elongated along the major axis (PC1), while shortened      slightly along the minor axis (PC2), resulting in a long rod      shape with an enlarged cell size. Two additional trajectories      in fig. S3 further reveal that single-cell trajectories are      heterogeneous with switch-like or continuous transitions      while sharing similar elongation of PC1 over time.    
      During a CPT, cell morphology changes are accompanied by      global changes in gene expression profiles (19). Specifically, in A549      VIM-RFP cells that have been treated with TGF- for 2 days,      vimentin was up-regulated with texture change from being      condensed in certain regions of the cell to be dispersed      throughout the cytosol (Fig. 4A), consistent with      previous reports (2022). These previous studies used      fixed cells and lack temporal information about the vimentin      dynamics. Therefore, we recorded the change in vimentin      within individual cells with time-lapse imaging.    
      (A) Flowchart of quantification. Left:      typical vimentin fluorescence images of single cell before      (top) and after treatment of TGF- (4 ng/ml) for 2 days      (bottom; scale bar, 50 m). Middle: segmented single-cell      image with only pixels inside the cell mask kept for Haralick      feature calculations. Right: framework of calculating the      single-cell Haralick features. All pixel values inside an      individual segmented cell were used to calculate GLCM and      Haralick features. (B) A typical single-cell      trajectory on the plane of vimentin Haralick features PC1 and      PC2 (left) and on the plane of PC3 and PC4 (right). Color bar      represents time (units in hours). Large triangular dots      labeled with numbers correspond to 0, 12, 24, 36, and 48      hours, respectively. Each dot represents an instantaneous      state of the cell in the Haralick feature space.      (C) Segmented single-cell images of vimentin      at various time points corresponding to the trajectory in (B)      (labeled with large, triangular dots).    
      Texture features are widely used for image profiling in drug      screening, phenotype discovery, and classification (2325). We hypothesized that the      texture features of vimentin can be quantified as an      indicator of EMT progression. For quantification, we used      Haralick features on the basis of the co-occurrence      distribution of gray levels (see Materials and Methods for      detailed information of Haralick features used). After      segmentation, we calculated the gray-level co-occurrence      matrix (GLCM) in the mask of each cell and 13 Haralick      features based on the single-cell GLCM and averaged all four      directions (Fig. 4A, see Materials and Method for      details). Nearly every Haralick feature shows a shift in      distribution after TGF- treatment (fig. S4).    
      To capture the major variation in vimentin Haralick features      during EMT, we performed linear dimension reduction with PCA      (more details are in Materials and Methods). Figure 4 (B and C) shows a typical      single-cell trajectory in the vimentin Haralick feature space      and corresponding segmented single-cell images at various      time points along this trajectory, respectively. The dynamics      in the vimentin feature space are again heterogeneous, as      indicated here, and from two additional trajectories in fig.      S5.    
      Overall, we described a cell state in a 309-dimensional      composite feature space of cell morphology and vimentin      texture features. The cells occupy distinct regions in the      composite feature space at the initial (0 to 2 hours) and      final stages (46 to 48 hours) of TGF- treatment (4 ng/ml)      (Fig. 5A). Physically, upon TGF-      treatment, the cell population relaxes from an initial      stationary distribution in the composite feature space into a      new one, and this study focuses on the dynamics of this      relaxation process.    
      (A) Kernel density plots in the plane of      morphology PC1 and vimentin Haralick feature PC1 estimated      from 3567 single-cell states from 196 trajectories      (represented by dots), at 0 to 2 hours (top, 1920 single-cell      states) and 46 to 48 hours after addition of TGF- (bottom,      1647 single-cell states). (B) Distributions      of cells at 0 to 2 hours and 46 to 48 hours cells after      adding TGF- in various features (morphology PC1, vimentin      Haralick PC1, PC3, and PC4). The diagonal axes are plots of      kernel density estimation of the 1D distribution of the      corresponding features. The four PC      modes capture 82.7% of morphology variance, 57.0, 8.8, and      5.2% of total vimentin feature variance, respectively. The      distributions along vimentin PC2 for cells before and after      treatment largely overlap with each other and were not      included here. A complete combination of distributions      is shown in fig. S6. (C) Scatter plot of 0-      to 2-hour data (left) and 46- to 48-hour data (right) on the      plane of morphology PC1 and vimentin Haralick features PC1.      Color represents the region a cell resides at a given time      point, as predicted by the fitted label-spreading function.      E, epithelial region; I, intermediate      region; M, mesenchymal region. (D)      A single-cell trajectory with its residing regions predicted      by the label-spreading function on the domain of morphology      PC1 and vimentin Haralick features PC1 and PC3 (with regions      represented by different colors).    
      Close examination reveals major distribution shifts along      four coordinates: morphology PC1 (82.7% variance of      morphology), vimentin Haralick PC1 (57.0% variance), PC3      (8.8% variance), and PC4 (5.2% variance) (Fig. 5B and fig. S6). This observation      permits subsequent analyses that are restricted to these      collective coordinates.    
      In rate theories, to define a reactive event, one typically      divides the configuration space describing a reaction system      into reactant, intermediate, and product regions (26, 27). Specifically, for the      present system, we developed a computational procedure that      combines the Gaussian mixture model (GMM) analysis of the      cell distributions (fig. S7, A and B) followed by fitting a      label-spreading function (28) using the k-nearest      neighbor (KNN) method (Material and Methods). This      label-spreading function takes coordinates (a      multidimensional vector) of individual cells in the composite      feature space as input and predicts the label of a cell,      i.e., the region identity (E, I, or      M discussed below) in which it resides (see Material      and Methods for details). Combining the biology      characteristics of EMT, this procedure divides the      four-dimensional space into epithelial (E; or more      precisely partial E for A549 VIM-RFP), intermediate      (I), and mesenchymal (M) regions (Fig. 5C). Figure 5D shows      a trajectory that starts within the E region and      then progresses to the I and then to the M      regions.    
      Consistent with a standard definition of reactive events in      rate theories, we defined an ensemble of reactive      trajectories as all the single-cell trajectories similar to      the one in Fig. 5D (and fig. S7C) that leave the      E region and end in the M region before      returning to E. Overall, we recorded      NT (=196) acceptable continuous      trajectories (see Materials and Methods); among them,      NR (=139) are reactive trajectories      (movies S1 and S2).    
      Single-cell trajectories in the composite feature space show      clear heterogeneous transition dynamics. In one      representative trajectory (Fig. 6A and      fig. S8A, left), the cell transits from the E to the      M region following a series of transitions first      along the vimentin Haralick PC1 and then the morphology PC1.      In contrast, in another trajectory (Fig. 6B      and fig. S8A, right), the cell proceeds with concerted      morphological and vimentin Haralick feature changes.    
      (A) A typical single-cell trajectory in      which the major change along the vimentin Haralick PC1      precedes the major change along the morphology PC1 (class I).      (B) A typical single-cell trajectory in      which the morphology PC1 and vimentin Haralick PC1 show      concerted variation (class II). (C)      Projection of recorded 139 reactive trajectories on 2D t-SNE      space using the DTW distances. Each dot is a single-cell      trajectory that undergoes EMT. Color represents labels of      k-means clustering on the DTW distances.      (D) Mean trajectories of class I and II      trajectories, respectively. They were calculated using the      soft-DTW barycenter method. (E) Plausible      mechanistic model. In this network, both morphology and      vimentin changes are induced by TGF-, and they both activate      each other and themselves. (F) Transition      paths simulated with the model in (E). The transition paths      show dynamical characteristics similar to those observed      experimentally.    
      To systematically study the two types of distinct behaviors,      we used soft-dynamic time warping (DTW) (29) to calculate the distance      between different trajectories, and t-distributed stochastic      neighbor embedding (t-SNE) to project the trajectory distance      matrix to 2D space (30). We      found that these trajectories form two communities (Fig. 6C). A k-means clustering on      these trajectories separated them into two groups, consistent      with the two communities in the t-SNE space. The two groups      of trajectories reveal different dynamical characteristics.      In one group (class I), vimentin Haralick PC1 varies first,      followed by marked change of morphology PC1. In the other      group (class II), for most trajectories, the morphology PC1      and vimentin Haralick PC1 change concertedly, while for only      a small percentage, the morphology PC1 changes earlier than      vimentin Haralick PC1 (fig. S8B). Distinction between the two      groups of trajectories is apparent from the scattered plot in      the morphology PC1/vimentin Haralick PC1 plane (fig. S8C) and      the nonoverlapping mean trajectories obtained using the      soft-DTW barycenter (Fig. 6D and fig. S8D)      (29, 31).    
      To rule out the possibility that the existence of two classes      of trajectories is an artifact of DTW, we analyzed      cross-correlation between morphology PC1 and vimentin      Haralick PC1 of individual reactive trajectories.      Cross-correlation analysis calculates the time delay at which      the correlation between morphology PC1 and vimentin PC1      reaches a maximum value (32). The time delay shows a      stretched distribution (fig. S8E). A large portion of      trajectories have vimentin Haralick PC1 change before      morphology PC1 change, while another main group of      trajectories has the time delay between morphology PC1 and      vimentin Haralick PC1 close to zero. After separating the      trajectories into two groups based on the sign of time delay,      the mean trajectories of the two groups (fig. S8E), referred      to as transition paths that connect the initial and final      cell states, are similar to what was obtained with      k-means clustering on the DTW distance.    
      Therefore, a main conclusion of this study is that the      live-cell platform revealed two types of paths for the      TGF-induced EMT in A549 VIM-RFP cells. Figure 6E      shows a plausible mechanistic model summarizing the existing      literature (see Materials and Methods for details and      references therein). TGF- activates morphological change and      vimentin to induce EMT, while morphological change and      vimentin expression can induce each other and themselves.      Computer simulations with the model followed by      k-means clustering on DTW distance reproduce the two      parallel EMT paths (Fig. 6F and fig. S9, C and      D). The cross-correlation analysis also showed results      similar to what was observed experimentally (fig. S9E). That      is, the live-cell imaging platform presented here can provide      mechanistic insight for further analyses.    
      We also compared our analysis on trajectories with snapshot      data analysis. Similar to Fig. 1, we      extracted single-cell data from the live-cell trajectories      every 6 hours, treated them as snapshot data with no      information of temporal correlation, and performed      pseudo-time analysis with Scanpy and Wishbone (33, 34). The pseudo-time results      (fig. S10, A and B) show only gradual change from epithelial      to mesenchymal regions. We then enforced a two-branch      analysis with Wishbone (fig. S10C), but the results are      difficult for biologically meaningful interpretation. These      results corroborate with Fig. 1 and      demonstrate that some dynamical features revealed from      live-cell imaging can be missing from snapshot data due to      lack of temporal correlation information in the latter.    
    Compared to the recent advances of fixed cellbased single-cell    techniques, live-cell imaging remains underdeveloped especially    in studying CPTs, due to technical challenges. Specifically,    the degrees of freedom specifying cell coordinates should be    experimentally feasible for live-cell measurement and    faithfully represent cell states. However, individual gene    products typically only reflect partial dynamical information    of a CPT process, and simultaneous fluorescence labeling of    multiple genes is challenging. Recently, tracking cell    morphological features through live-cell imaging emerges as a    means of extracting temporal information about cellular    processes in conjunction with expression-based cell state    characterization (9, 3538). Cellular and subcellular    morphologies reflect collective gene expression pattern and    cell phenotype (3941). Furthermore, hundreds or more    of morphology features such as cell size and shape can be    conveniently extracted from bright-field images without    necessity of additional fluorescence labeling. Here, we further    developed a quantitative framework for recording and analyzing    single-cell trajectories in a composite feature space,    including cell shape and texture features and a computational    package for related image analyses.  
    Our application to the TGF-induced A549 VIM-RFP EMT process    demonstrates the importance of extracting dynamical information    from live-cell data. A cell has a large number of molecular    species that form an intricately connected network, and it    interacts with a fluctuating extracellular environment,    including cell-cell interactions. Consequently, even isogenetic    cells show cell-cell heterogeneity, which further manifest as    large trajectory-to-trajectory heterogeneity in single-cell CPT    dynamics, and some dynamical features characteristic to a    particular process might be unavoidably concealed from snapshot    data. Our live-cell data reveals information on the two    distinct types of paths of EMT with distinct vimentin dynamics.    We then constructed and analyzed a mechanistic model on the    basis of the previous reports that vimentin is a regulator and    marker of EMT, and the model predicts these parallel paths.    Further studies, however, are needed to investigate whether    alternative mechanistic models can also explain the    observations and examine whether these parallel paths exist in    other EMT processes, in addition to TGF-induced EMT with the    A549 VIM-RFP cells. One can also apply the framework to    investigate the reverse process, mesenchymal-to-epithelial    transition (MET), and examine whether MET follows different    paths, as predicted from analyzing snapshot data (11).  
    While we present a general framework here, it has some    limitations and needs further development. In this study, we    restricted specification of the state of a cell by its cell    shape and vimentin texture features. Application with recently    developed imaging techniques, aided by machine learningbased    computational algorithms, may provide additional features such    as organelle texture and distributions in three dimensions from    long-term label-free imaging, as demonstrated by a recent work    of Sandoz et al. (42) using holotomographic    microscopy to study multiorganelle dynamics. Cell cycle is a    possible hidden variable contributing to the observed cell-cell    heterogeneity in this study and can be tracked with proper    reporters. Adding these new dimensions of information can    provide finer resolution of cell state in an expanded cell    state space.  
    Another limitation of the morphology/texture-based live-cell    imaging framework is that it provides mostly indirect    information on the dynamics of involved molecular species. For    example, further mechanistic understanding of the observed    parallel paths requires information on how the cell expression    pattern changes along the paths. One can use the paths    identified from live-cell imaging data, rather than currently    used pseudo-trajectory approaches based on a perceived    expression-similarity criterion, to time-order snapshot    single-cell data, thus resolving the dilemma experienced in    single-cell studies. For this purpose, one needs to establish a    mapping system between the composite feature space and the    expression space.  
    In summary, in this study, we demonstrate that live-cell    imaging is necessary to reveal certain dynamical features of a    CPT process concealed in snapshot data due to cell-cell    heterogeneity. Meanwhile, we present a framework that    facilitates recent emerging efforts of using live-cell imaging    to investigate how a CPT process proceeds along continuous    paths at multiplex, albeit lower-dimensional feature space,    complementing fixed cellbased approaches that can provide    snapshots of genome-wide expression profiles of individual    cells. We expect that the framework can be generally applied    since marked morphological changes typically accompany a CPT    process.  
      The human nonsmall cell lung carcinoma lines, A549 (ATCC      CCL-185) and A549 VIM-RFP (ATCC CCL-185EMT), were from ATCC.      Cells were cultured in F-12K medium (Corning) with 10% fetal      bovine serum (FBS) in MatTek glass bottom culture dishes      (P35G-0-10-C) in a humidified atmosphere at 37C and 5%      CO2. Culture medium was changed every 3 to 5 days.      During imaging, antibiotic-antimycotic (100) (Thermo Fisher      Scientific, 15240062) and 10 mM Hepes (Thermo Fisher      Scientific, 15630080) were added to the culture medium.    
      The CHOPCHOP website (https://chopchop.cbu.uib.no/)      was used to design high-performance single guide RNAs      (sgRNAs) to target the sequence near the stop codon of the      human vimentin gene. The cleavage activities of the gRNAs      were validated using the T7 endonuclease 1 (T7E1) assay      according to the manufacturers instructions (New England      Biolabs, no. E3321). The sgRNA VIM-AS3      (5-CTAAATTATCCTATATATCA-3) was chosen in this study. To      generate the sgRNA-expressing vector, VIM-AS-3 gRNA oligos      were designed, phosphorylated, annealed, and cloned into the      PX458 (Addgene, catalog no. 48138) vector, using Bbs I      ligation. Multiple colonies were chosen for Sanger sequencing      to identify the correct clones, using the primer U6      (forward): 5-AAGTAATAATTTCTTGGGTAGTTTGCAG-3.    
      The VIM-RFP knock-in donor was designed and constructed to      contain approximately 800base pair left and right homology      arms, a Cayenne RFP gene (ATUM, no. FPB-55-609), preceded by      a 22amino acid linker, and followed by a bovine growth      hormone polyadenylation signal sequence. To assist in      drug-based selection of gene-edited cell clones, human      elongation 1-alpha (EF1)blasticidin selection cassette,      flanked by Loxp sites, was also cloned into the vector and      positioned upstream of the right homology arm.    
      CRISPR-Cas9 technology was used to incorporate the RFP      reporter into the 3 terminal end of the vimentin gene.      Briefly, A549 cells were plated at a density of 2       105 cells per well in a six-well plate. After 24      hours, cells were transfected with 4.0 g of PX458_VIM-AS3      plasmid, 4.0 g VIM-RFP knock-in donor plasmid, and 24 l of      transfeX (ATCC ACS-4005). Blasticidin selection (10 g/ml)      was applied 24 hours after transfection. RFP-positive cells      were single-cell sorted and expanded for molecular      characterization.    
      RFP positive A549 VIM-RFP cells were harvested, and DNA was      extracted using QuickExtract (Epicentre, QE09050). Primers      were designed for left homology arm and right homology arm      junction polymerase chain reaction (PCR): left junction:      5-TAGAAACTAATCTGGATTCACTCCCTCTG-3 (forward) and      5-ATGAAGGAGGTAGCCAGGATGTCG-3 (reverse); right homology:      5-ATTGCTGCCCTCTGGTTATGTGTG-3 (forward) and      5-ATTACACCTACAGTTAGCACCATGCG-3 (reverse). Junction PCR was      performed using the Phire Hot Start II DNA Polymerase (Thermo      Fisher Scientific), and the PCR amplicons were subjected to      Sanger sequencing for identification of clones that contained      the expected junction sequences at both left and right      homology junctions.    
      A549 VIM-RFP cells were washed with phosphate-buffered saline      (PBS), fixed with 4% formaldehyde, and blocked with 5% normal      goat serum/0.1% Triton X-100 in PBS for 30 mins. Afterward,      the primary antibodies were added to the blocking buffer, and      cells were incubated for 1 hour at room temperature. Cells      were subsequently washed and incubated with the secondary      antibodies for 1 hour and wrapped in aluminum foil. After      washing, images were taken with a Nikon Ti-E microscope      (Hamamatsu Flash 4.0V2). The primary antibodies were rabbit      anti-vimentin (D21H3) (1:500 dilution; Cell Signaling      Technologies, catalog no. 5741), mouse anti-N-cadherin (13A9)      (1:100 dilution; Cell Signaling Technologies, catalog no.      14215), and mouse anti-Snail (L70G2) (1:300 dilution; Cell      Signaling Technologies, catalog no. 3895). For secondary      antibodies, goat anti-mouse or goat anti-rabbit Alexa Fluor      488 (Thermo Fisher Scientific, catalog no. A-21235) was used      at a 1:1000 dilution.    
      A549 VIM-RFP cells were plated at a density of 1       104 cells/cm2 and maintained in F-12K      medium (ATCC 30-2004) supplemented with 10% FBS (ATCC      30-2020). After 24 to 48 hours, culture medium was replaced      with fresh medium supplemented with TGF- (4.0 ng/ml; R&D      Systems 240-B) for 1 to 3 days to induce EMT. A549 VIM-RFP      cells treated with PBS were used as a control.    
      Control and EMT-induced A549 VIM-RFP cells were seeded into      inserts of Boyden chambers (BD Biosciences, San Jose, CA)      that were precoated with Matrigel (1 mg/ml), at 5       104 cells per insert in culture medium without      FBS. Inserts were then transferred to wells with culture      medium containing 10% FBS as a nutritional attractor. After      24-hour incubation, invading cells on the bottom side of the      insert membrane were fixed with 4% paraformaldehyde for 2      min, permeabilized with 100% methanol for 20 min, and stained      with 0.05% crystal violet for 15 min at 37C. Noninvading      cells on the top side of the membrane were removed by cotton      swab. Photographs were taken from five random fields per      insert. Cells in the five random fields were counted.    
      The colocalization analysis was performed with JACoP (plugin      of ImageJ) (43, 44). The Pearson correlation      coefficient and Manders overlap coefficient were calculated      following the method described in (45).    
      Time-lapse images were taken with a Nikon Ti-E microscope      (Hamamatsu Flash 4.0V2) with differential interference      contrast (DIC) and tetramethyl rhodamine isothiocyanate      (TRITC) channels (excitation wavelength is 555 nm and      emission wavelength is 587) (20 objective, numerical      aperture = 0.75). The cell culture condition was maintained      with the Tokai Hit Microscope Stage Top Incubator. Cells were      imaged every 5 min with the DIC channel and every 10 min with      the TRITC channel. The exposure time for DIC was 100 ms and      that for the TRITC channel was 30 ms. That is, each full (2      days long) single-cell trajectory contains 577 DIC images and      289 fluorescent images. While taking the images, all the      imaging fields were chosen randomly.    
      We segmented single cells using a previously developed method      combining deep convolution neural networks (DCNNs) and      watershed (46). To quantify      cell morphology, we adopted the active shape model method      (15, 18). After single-cell      segmentation, the cell outline was extracted and resampled      into 150 points. All the single-cell outlines were aligned to      a reference outline (calculated on the basis of the average      of several hundred cells). The 150 points (x and      y coordinates) are the 300 features of cell      morphology.    
      For single-cell tracking, we used the TrackObjects module in      CellProfiler on the segmented images using a linear      assignment algorithm (47, 48). In long-term imaging, the      accurate tracking of cells can be lost for several reasons,      such as cells moving in or out of the field of view or      inaccurate segmentation. We kept trajectories that were      continuously tracked with the starting point no later than 12      hours and the end point no earlier than 30 hours after adding      TGF-. These 196 trajectories were used for subsequent PCA.      Among them, 139 were identified as reactive trajectories.    
      Haralick features have been widely used for classifying      normal and tumor cells in the lungs (49) and the subcellular features      or patterns such as protein subcellular locations (50, 51). After cell segmentation,      each cell was extracted, and its Haralick features were      calculated using mahotas (52). Haralick features describe      the texture as coarse or smooth and complexity of images, and      we used the following 13 features: 1, angular second moment;      2, contrast; 3, correlation; 4, sum of squares: variance; 5,      inverse difference moment; 6, sum average; 7, sum variance;      8, sum entropy; 9, entropy; 10, difference variance; 11,      difference entropy; 12, information measure of correlation 1;      and 13, information measure of correlation 2 (50). Haralick feature      calculation was based on the GLCMs (53). The GLCMs size was      determined by the number of gray levels in the cell image.      Because of cell heterogeneity, the numbers of gray levels      varied in different cells. Because the GLCM has four      directions (0, 45, 90, and 135), the Haralick features      were averaged on all four directions to keep rotation      invariance.    
      Because of cell heterogeneity, it is more informative to      examine the temporal change of an individual cell relative to      its initial state, such as the basal level of gene expression      in signal transduction studies (8,      54, 55). For the present system, it      is the initial position in the composite feature space. We      used a stay pointsearching algorithm (56) to find the initial stay      point of each cell in the space of cell shape and vimentin      Haralick features. For each trajectory, we scaled all the      landmark points by the square root of the area of the initial      stay point. Physically, the latter is a characteristic length      of the cell, and the scaling reflects the observation that      the cell size does not affect EMT (57). All the vimentin Haralick      features were reset so that the values at the initial stay      point assume zero. The PCs were calculated after scaling. The      scaling allows one to examine the relative temporal variation      of single cells.    
      PCA was performed on all NT trajectories,      i.e., a total of Nc (49,689 cells) with      300 morphology features (Nc  300 matrix)      for linear dimensionality reduction (58). The first seven components      explained more than 98% of the variance. Specifically, the      first and second components explained 82.7 and 10.5% of the      variance, respectively. After calculation of Haralick      features for each cell, PCA was calculated on the      Nc  13 matrix for linear dimension      reduction.    
      We fitted the distribution on each of the four      morphology/texture coordinates with a two-component      (c0, c1) GMM      separately (fig. S7A) (58) and used the four GMMs to      define the E, I, and M states      (fig. S7B).    
      For each single cell in the space of morphology PC1, vimentin      Haralick PC1, PC3, and PC4      X(xii = 1,2,3,4), the      label of each coordinate Li is defined      using the GMM with the following equationsLi({xi{p(xi{ci,0)>0.5}{xi0.9}{x>ci,1})=2(2)where      p(xici,      ) is the posterior probability of certain component of      xi, and      ci,0 and      ci,1 are the mean      values of the two components of ith GMM.      The label of complementary set is defined as 1.    
      With the labels defined on all four coordinates, we first      defined the E state asS({L1=0}{L2=0}{L3=0}{L4=0})=E(3)the      M state asS({L1=2}{L2=2}{L3=2}{L4=2})=M(4)and      the I state otherwise. However, this definition      suffers from one weakness in that it assigns the same weight      to vimentin Haralick PC1, PC3, and PC4, although vimentin      Haralick PC3 and PC4 count for less variance than the PC1      does. Because of this equal weight distribution, small      fluctuations in vimentin Haralick PC3 and PC4 could lead to      unstable assignment of cell states.    
      To solve this problem, we use the above definition as an      initial estimate of cell state to fit to a label-spreading      function (28, 58). When fitting the      label-spreading function, we adopted the KNN method (50      neighbors) and used a high clamping factor (0.5) to assure      the global and local consistency. The KNN algorithm in the      label-spreading function allows one to take the different      scales of vimentin Haralick PC1, PC3, and PC4, and community      structure into consideration (Fig. 5C). Since      PC1 is more important in defining the range of neighbors, the      weight of PC1 in the definition was automatically increased.      This change in definition avoids a situation in which cells      belonging to a common community and close to each other in      the composite feature space get assigned to different cell      states.    
      The cross-correlation was used to calculate the time delay      between different signals. We observed different transition      time of morphology PC1 and vimentin Haralick PC1. The      cross-correlation of the two time series (of morphology PC1      and vimentin Haralick PC1) was calculated (59). The time lag between the      two signals was set for when the value of cross-correlation      reaches the maximum (32). We      separated all the trajectories by the sign of time delay      between morphology PC1 and vimentin Haralick PC1.    
      While a cellular system is far from thermodynamic      equilibrium, for simplicity, we illustrated the effect of      cell-cell heterogeneity due to hidden slow variables with the      following model system (https://github.com/opnumten/M-TRACK).    
      The potential function isU=(o2h)4(o2h)32(o2h)2+3y4(5)where      o is the observable and h is the hidden      slow variable (Fig. 1A).    
      The simulations were performed using the following      procedures:    
      1. Generate initial condition (o0,      h0) in the left well of this potential of      multiple trajectories (4711) (fig. S1) with the      Metropolis-Hastings algorithm (60).    
      2. For each initial condition, with fixed hidden slow      variable h0, propagate the observable      o with Langevin simulations along the 1D potential      and with a Gaussian white noise ((t))o(t+dt)=o(t)+dt(4(o2h0)+3(o2h0)24(o2h0)3)+dt(t)(6)where      dt is 0.01.    
      3. Propagate each trajectory to t = 10.    
      We built the network model by summarizing the existing      literatures. The EMT morphology variation is mainly      contributed to by generation of filament actin and E-cadherin      down-regulation, which can activate the YAP/TAZ      (yes-associated protein/transcriptional coactivator with      PDZ-binding motif) pathway (61). The YAP/TAZ pathway can      induce translocation of Smad2 (mothers against      decapentaplegic homolog 2), which plays important roles in      EMT (62, 63). Thus, morphology variation      can activate both vimentin and itself. Vimentin can induce      the EMT morphological change through regulating 1-intergrin      and E-cadherin (64). Vimentin can      be activated upon TGF- induction through Slug and it also      activates Slug through dephosphorylation of extracellular      signalregulated kinase, which forms a self-activation loop      (65). Vimentin is required for      the mediation of Slug and Axl (64, 6668), and it can induce variation      of cell morphology, motility, and adhesion (68). Vimentin fibers regulate      cytoskeleton architecture (64), and more vimentin fibers      are assembled in A549 cells during EMT (20).    
      Next, we formulated a mathematical model corresponding to the      networkdMdt=m+m+vmV2V2+Kvm2+cmM4M4+Km4mM(7)dVdt=v+v+mvM2M2+Kmv2+cvV4V4+Kv4vV(8)where      m (=0.1) and v      (=0.12) are the basal generate rates of morphology variable      and vimentin, separately; m and      v are the generate rates of morphology      change and vimentin activated by TGF-, respectively;      vm (=2.0) and mv      (=1.0) are the activation coefficients of vimentin and      morphology to each other; cm (= 3.8) and      cv (= 4.0) are the self-activation      coefficients of morphology and vimentin, respectively; and      Kvm (=8.0),      Kmv (=8.0),      Kv (=2.4), and Km      (=2.4) are the half-maximal effective concentrations of the      Hill function. m (1.0) and      v (=1.0) are degradation rates of      morphology and vimentin, respectively.    
      The Langevin simulations were performed as follows:    
      1. We set m and v      as 0 for simulating the control condition (i.e., without      TGF- treatment). Initialize a trajectory with random point      (M0 and V0) sampled      from a uniform distribution within a range of [0,5). Run      simulations with the following equationM(t+dt)=M(t)+dt(m+m+vmV2V2+Kvm2+cmM4M4+Km4mM)+dt(t)(9)dt(v+v+mvM2M2+Kmv2+cvV4V4+Kv4vV)+dt(t)(10)where      dt was set to be 0.01, and (t) was the      normal Gaussian white noise. The duration of simulation was      set to 100. At the end of the simulation, the cell state      relaxed to the basin of the epithelial state (fig. S9A),      which was then set as the initial condition under TGF-      treatment.    
      2. After generating multiple initial conditions from the      first step, we increased the values of      m(to 0.6) and v(to      1.0) to simulate the condition of TGF- treatment. If the      cell gets into the range that its distance to the attractor      of the mesenchymal state (Fig. 6F and      fig. S9B) is less than one, then this trajectory was      considered to be a trajectory of EMT.    
      3. After getting Nsimu (=185) reactive      EMT trajectories, we performed analysis with the simulated      trajectories similar to what we did with the experimental      trajectories.    
      We obtained the steady-state probability distribution      Pss by solving the diffusion equation      (using Matlab 2018a PDEtool)P(M,V,t)t=(P(M,V,t)F(M,V))+D22P(M,V,t)(11)where      F(M,V)=(dMdt,dVdt), and      D is the diffusion coefficient. With the      steady-state probability distribution, we obtained the      quasi-potential of EMT defined as U   ln      (Pss) (69, 70). Without TGF-, there exists      a deep basin as epithelial state and a shallow basin as      mesenchymal state in the quasi-potential landscape (fig.      S9A). After TGF- treatment, the landscape is changed, on      which the mesenchymal basin becomes deep, and a valley is      formed where the vimentin level is high (Fig. 6E      and fig. S9B).    
      The snapshot data were extracted from live-cell trajectories      with a time step of 6 hours. We processed the data with      Scanpy and Wishbone (33, 34). The root cell in Scanpy or      the start cell in Wishbone was set as the cell that is      closest to the average of data at 0 hour in feature space.      The number of neighbors was set to 50 in constructing a      neighborhood graph in Scanpy. In addition, the number of way      points in Wishbone was set to 200.    
      The M-TRACK program is written in Python 3 and provided with      a graphical user interface. It provides tools for analyses of      cell morphology with the active shape model, distribution and      texture features of protein or gene florescence in single      cell, and single-cell trajectories in the PC domain. The      input files include the original gray-level images, segmented      cell mask, and a database file of tracking results from      Cellprofiler. The computer package can be downloaded from      GitHub (https://github.com/opnumten/M-TRACK).      Part of the source code is adapted from CellTool (18).    
      Statistical analyses were performed mainly with Python      package, including SciPy and scikit-learn (58, 59). Students t test      was used to calculate the statistical difference between      different groups of samples. The samples for imaging were      randomly selected to avoid bias.    
More here:
Live-cell imaging and analysis reveal cell phenotypic transition dynamics inherently missing in snapshot data - Science Advances