The paper *Deep Learning Phase Segregation* (2018) by Farimani et al. presents a data-driven approach for the learning, modeling, and prediction of phase segregation. The authors use a conditional Generative Adversarial Network (cGAN) to map initially dispersed binary fluid to it's equilibrium concentration field. The authors claim that their deep learning approach is able to make predictions which conserve phase fraction, correctly predict phase transition, and reproduce area, perimeter, and total free energy distributions with up to 98% accuracy.

# Overview

Phase segregation is used to describe many physical processes where materials are initially configured in a mixed state then transition to some equilibrium, characterized by initial conditions and global parameters. The predominant computational model for understanding macroscopic time evolution of the phase segregation process can be approximated by the Cahn-Hillard equation, which is generally solved numerically. These methods are generally computationally intensive and the process of decomposition is not reversible.

The paper uses cGANs to perform the task of decomposition. A cGAN is a generative model which learns a direct mapping between a set of conditioning variables and the desired data distribution (through deep learning), and the authors use this model to learn a direct, reversible mapping between an initially dispersed binary fluid and it's equilibrium concentration field across a range of initial binary concentrations. This has the benefit of being able to learn and infer the non-linear physical phenomenon of phase segregation without any knowledge of the underlying physical laws.

The above figure shows the paper's cGAN architecture and training procedure. The activation layer outputs of the generator (Figure 1b) demonstrate how spinodal decomposition inference occurs. During encoding, regions of similar concentration are identified and aggregated by convolutional downsampling. This is particularly interesting, as typically the output of convolutional activation layers holds little to no interpretable information. Similarly, the decoding process shows how the aggregated regions of similar concentration form seeds by which single-phase enriched regions manifest.

This analysis of the output of convolutional activation layers seems like a key takeaway from this paper. The cGAN, in itself, attempts to merely to map an initial state directly to a final state. However, the intermediate layers seem to learn some of the essential characteristics of the underlying process. I have not seen this phenomenon in any other instance of convolutional neural networks I've seen (although I'm by no means an expert on the topic), but it's a digestible piece of process. If the process of phase segregation can be roughly thought of as a process of convolutions in that localized regions of the image react in a spatially aware manner, then it would make sense that the generator mimics this process.

On the other hand, I feel this exact detail might give some insight into the limitations of this approach. The task this paper solves might be particularly suitable for cGANs if the process is "simple" enough to be interpreted this way. But if a similar task can't be modeled in such a way, then we wouldn't expect to see this emergent phenomenon occurring which brings into question how accurate the model would turn out. Still, it's hard to imagine such a process that *can't* be interpreted this way, so perhaps this method is general.

Ultimately, the authors demonstrate that their cGAN approach can be used to directly learn the dynamics of phase segregation based solely on observations. The paper demonstrates a successful learning and prediction for steady state spatial decomposition as well as the ability to learn a reverse mapping between the phase segregated concentration field to initial binary mixture. The authors also show their process passes both geometric *and* thermodynamic validation, including the chemical free energy component, surface free energy component, total free energy, mean concentration, phase area, and phase perimeter.

# Methods

The authors use a combination of \( L1 \) loss and the cGAN loss to produce the final generator. A hyperparameter \( \lambda \) is used to weigh the contribution of each of the losses to the total loss for training.

The loss for the cGAN is given by:

while the \( L1 \) loss is given by:

so that the total loss, \( G^* \), is given by:

The \( L1 \) component of the loss function ensures that the output approximates the ground truth images in an "\( L1 \) sense," which, here, works on a pixel-level and gives the overall structure of the final solution. The authors note that the \( L1 \) loss alone fails to capture many of the subtler, high-frequency details of the phase segregated liquids. The addition of the cGAN loss compensates for this by being, as the authors describe, a "learned component" of the loss which, in practice, adds a sharpness to the images produced (where the \( L1 \) loss, alone, produces washed out, blurry images).

The domain for the dataset is two-dimensional with no input or output influx. The goal is to obtain a long-time (steady-state) solution of the concentration field \( c(x, y, t) \). The Cahn-Hilliard equation involves fourth-order spatial partial-differential operators:

The phase field is in the form of \( x(x,y,t) \), and the authors assume a constant diffusion coefficient \( D \). The \( \epsilon \) parameter is the energetic penalty of gradients in the concentration field. \( f(c) \) is the free energy of the system. The parameter \( a \) defines the depth of the wells, and the paper considered \( a = 1.0 \). The equation is solved for the long-time behavior \( t \rightarrow \infty \).

The authors used semi-implicit Fourier-spectral method with periodic boundary conditions to solve Equation 4 which was used to produce 8,640 training samples with varying initial concentrations in \( [0.05 - 0.94 ] \) with the step size of \( 0.01 \) and initial amplitude of thermal noise constant at \( 0.1 \). At each concentration, \( 96 \) random initial conditions were generated. The domain size is a grid of \( 64 \times 64 \) with only one-channel (i.e., a grayscale image). Another 8,640 test samples were randomly generated for evaluation.

The above figure shows a sample of input configurations, the output of their model given the conditions, and the ground truth with \( \lambda = 1.0 \). The results are quite impressive.

The above figure shows long-time reverse phase separation behavior. Again, the input configuration, output of their model, and ground truth are displayed for comparison. We see that their model is capable of accurately reversing this mapping, which isn't possible through the numerical methods they use to generate the process.

# Thoughts

The paper itself is pretty comprehensive and well-written, and the results are impressive. The actual implementation is not very complicated, although it's able to solve a rather complicated task. Basically, their work boils down to an image-to-image CNN with an augmented cGAN used to tune the results. Frankly, I find the deep learning prospects to be more interesting than the phase segregation task they aimed to solve, but I suppose this paper does a good job making them one in the same.

I don't recall another architecture where the loss is linear combination of the losses contributed from \( L1 \) and cGAN, but, in retrospect, it seems like an obvious experiment (one which is used in many other places I'm sure). What is more surprising is the insight into the convolutional activation layers' outputs.

The network managed to encode the physical process of spinodal decomposition into it's forward-propagation through it's intermediate convolutional layers. That is, the network learns to solve a dynamic process with a single forward pass. This begs the question: can a network learn any dynamic process like this?

## Reinforcement Learning

In reinforcement learning, we're aiming to solve a Markov decision process (MDP). To do so, we train a model to output a distribution over actions which we sample from given an input of a representation of the current state of the process. The process starts with an initial state \( s_0 \) which is given to our policy \( \pi \) which we sample from to produce an action \( a_0 \) which is then applied to the environment to produce the next state \( s_1 \) and a scalar cost \( c_1 \) in response. This repeats for every timestep \( t \) in our time-horizon \( T \), with our policy, \( \pi \), producing actions \( a_t \) given state \( s_t \) so as to minimize the cumulative discounted future cost \( \sum_{ t\in T } \gamma^t \dot c_t \) for \( \gamma \in [ 0, 1 ] \). The environment consists of a set of states \( S \), a state-transition probability function \( P(s,a) = Pr(s_{t+1} = s' | s_t = s, a_t = a) \), and a cost function \( C (s, s'): S \times S \rightarrow \mathbb{R} \) which characterize it's dynamics.

Can we frame phase segregation as such a process? If our environment is the actual (or, at least, numerically simulated) physical process, than our states can be roughly represented by the images of the material configuration while the laws of physics (or the output of numerical simulation) is represented by the state-transition function. The policy then attempts to map the current state to the next state or, more precisely, to the next expected state transitioned to by \( P \). The cost, then, is a measurement of how close our predicted next state is with the actual state (which may be in terms of image similarity or the geometric/thermodynamic properties mentioned in this paper).

It would seem, at least without thinking *too* hard, that this could work, at least in theory. But this paper avoids the complications of reinforcement learning and instead learns to directly simulate the physical process through a single pass in a convolutional neural network. In terms of RL, it would seem they've learned a model of the environment, and they have done so without needing a proper MDP. In fact, the fact that they trained their model on only the first state and the last state means that their approach is even more powerful as it learns a (rough version of a) dynamic process *without ever seeing it*.

From the reverse, one could ask if any MDP can be solved using a cGAN such as in this paper. If our policy was architected so as to incorporate a cGAN, it could potentially learn to map the current state to some future state by encoding the process through convolutions. Our agent could then potentially learn to predict the result of these encoded processes in order to take better actions. Similar things have been done before, but encoding of the process directly in the network still seems novel.