Artificial intelligence (AI) has experienced a revival of pretty large proportions in the last decade. We’ve gone from AI being mostly useless to letting it ruin our lives in obscure and opaque ways. We’ve even given AI the task of crashing our cars for us.
AI experts will tell us that we just need bigger neural networks and the cars will probably stop crashing.
To give you an idea of the scale of energy we’re talking about here, a good GPU uses 20 picoJoules (1pJ is 10-12J ) for each multiply and accumulate operation. A purpose-built integrated circuit can reduce that to about 1pJ. But if a team of researchers is correct, an optical neural network might reduce that number to an incredible 50 zeptoJoules (1zJ is 10-21J).
How did the researchers come to that conclusion?
Layered, like onions
Let’s start with an example of how a neural network works. A set of inputs is spread across a set of neurons. Each input to a neuron is weighted and added, then the output from each neuron is given a boost. Stronger signals are amplified more than weak signals, making differences larger. That combination of multiplication, addition, and boost occurs in a single neuron, and neurons are placed in layers, with the output from one layer becoming the input of the next. As signals propagate through layers, this structure will amplify some and suppress others.
For this system to do useful calculations, we need to preset the weighting of all inputs in all layers, as well as the boost function (more accurately, the nonlinear function) parameters. These weights are usually set by giving the neural network a training dataset to work on. During training, the weighting and function parameters evolve to good values through repeated failure and occasional success.
There are two basic consequences here. First, a neural network requires a lot of neurons to have the flexibility to cope with complicated problems. Second, a neural network needs to be able to adjust its parameters as it accumulates new data. This is where our theoretical optical neural network flexes its nascent muscles.
All optical AI
In the optical neural network, the inputs are pulses of light that are divided up. The weights are set by changing the brightness. If these are set in physical hardware, they often can’t be changed, which is undesirable. In the researcher’s scheme, however, the weightings come from a second set of optical pulses, making it substantially more flexible.
At a single neuron, all the optical pulses arrive together and are added through the process of interference. The interfering pulses hit a photo detector to perform the multiplication. Then the electrical output of the photo detector can have whatever boost function we like applied to it electronically. The final value this produces is then emitted as light to be sent on to the next neural network layer.
The cool thing about this is that the weight is an optical pulse that can be continuously adjusted, which should result in an optical neural network with the flexibility of a computer-based neural network, but operating much faster.
Remarkably, the researchers propose to do all of this in free space rather than on optical integrated circuits. Their argument is that combinations of diffractive optical elements—elements for manipulating optical beams in complex ways—and photodiode arrays are much more precise and scaleable than the photonic circuits we can print on chips.
They may be right. Fabrication reliability is the bane of photonic circuits at the moment. I cannot imagine creating a large-scale photonic circuit successfully using today’s techniques. So I’m willing to buy that argument, and I will even agree that scaling to millions of neurons over multiple layers is feasible. It all looks good.
You have to count all the energy
But I don’t buy the energy argument at all. The researchers have calculated how much energy the optical pulses require to ensure that the output of the detection stage from a single neuron is accurate.That leads to an impressive-sounding 50zJ per operation.
That may be right, but it ignores a lot of rather important stuff. How much energy for the boost function? How much energy to turn the electrons back into light? The researchers have attached some numbers to some of that, but their calculations essentially tell us that the per-op energy isn’t easy to calculate because the required electronics don’t exist.
Instead, the big win is in energy saved for data transport. A large neural network might be spread across multiple GPUs. This creates two large energy costs: shoveling data around on the GPU itself and shoveling data between GPUs. The optical architecture virtually eliminates that cost while also being able to scale to a larger number of gates.
Even size-wise, the optics are not going to be much worse than a box full of GPUs. I think you could fit a good-sized optical neural network on an average-sized desk. It might be important to keep the lights out while it’s running, however.
So where is all this going? I reckon researchers will demonstrate this neural network in the next year or two. It will consume more than a lot of energy per operation at the photodiode alone. Once you take into account the supporting structure, the total energy cost will be well beyond 1pJ per operation. In the end, I think they will demonstrate a large degree of scaleability but no significant energy saving.