Multimodal Neurons in Artificial Neural Networks
Distill
Multimodal Neurons in Artificial Neural Networks
Authors
Affiliations
Gabriel Goh
OpenAI
Nick Cammarata †
OpenAI
Chelsea Voss †
OpenAI
Shan Carter
Observable
Michael Petrov
OpenAI
Ludwig Schubert
Alec Radford
OpenAI
Chris Olah
Published
March 4, 2021
DOI
10.23915/distill.00030
li")].map(node => node.id)` -->
Acknowledgments
We are deeply grateful to Sandhini Agarwal, Daniela Amodei, Dario Amodei,<br>Tom Brown, Jeff Clune, Steve Dowling, Gretchen Krueger, Brice Menard,<br>Reiichiro Nakano, Aditya Ramesh, Pranav Shyam, Ilya Sutskever and Martin<br>Wattenberg.
Author Contributions
Gabriel Goh: Research lead. Gabriel Goh first discovered multimodal<br>neurons, sketched out the project direction and paper outline, and did<br>much of the conceptual and engineering work that allowed the team to<br>investigate the models in a scalable way. This included developing tools<br>for understanding how concepts were built up and decomposed (that were<br>applied to emotion neurons), developing zero-shot neuron search (that<br>allowed easy discoverability of neurons), and working with Michael Petrov<br>on porting CLIP to microscope. Subsequently developed faceted feature<br>visualization, and text feature visualization.
Chris Olah: Worked with Gabe on the overall framing of the article,<br>actively mentored each member of the team through their work providing<br>both high and low level contributions to their sections, and contributed<br>to the text of much of the article, setting the stylistic tone. He worked<br>with Gabe on understanding the neuroscience literature and better<br>understanding the relevant neuroscience literature. Additionally, he wrote<br>the sections on region neurons and developed diversity feature<br>visualization which Gabe used to create faceted feature visualization
Alec Radford: Developed CLIP. First observed that CLIP was learning<br>to read. Advised Gabriel Goh on project direction on a weekly basis. Upon<br>the discovery that CLIP was using text to classify images, proposed<br>typographical adversarial attacks as a promising research direction.
Shan Carter: Worked on initial investigation of CLIP with Gabriel<br>Goh. Did multimodal activation atlases to understand the space of<br>multimodal representations and geometry, and neuron atlases, which<br>potentially helped the arrangement and display of neurons. Provided much<br>useful advice on the visual presentation of ideas, and helped with many<br>aspects of visual design.
Michael Petrov: Worked on the initial investigation of multimodal<br>neurons by implementing and scaling dataset examples. Discovered, with<br>Gabriel Goh, the original “Spider-Man” multimodal neuron in the dataset<br>examples, and many more multimodal neurons. Assisted a lot in the<br>engineering of Microscope both early on, and at the end, including helping<br>Gabriel Goh with the difficult technical challenges of porting microscope<br>to a different backend.
Chelsea Voss†: Performed investigation of the typographical attacks<br>phenomena, both via linear probes and zero-shot, confirming that the<br>attacks were indeed real and state of the art. Proposed and successfully<br>found “in-the-wild” attacks in the zero-shot classifier. Subsequently<br>wrote the section “typographical attacks”. Upon completion of this part of<br>the project, investigated responses of neurons to rendered text on<br>dictionary words. Also assisted with the organization of neurons into<br>neuron cards.
Nick Cammarata†: Drew the connection between multimodal neurons in<br>neural networks and multimodal neurons in the brain, which became the<br>overall framing of the article. Created the conditional probability plots<br>(regional, Trump, mental health), labeling more than 1500 images,<br>discovered that negative pre-ReLU activations are often interpretable, and<br>discovered that neurons sometimes contain a distinct regime change between<br>medium and strong activations. Wrote the identity section and the emotion<br>sections, building off Gabriel’s discovery of emotion neurons and<br>discovering that “complex” emotions can be broken down into simpler ones.<br>Edited the overall text of the article and built infrastructure allowing<br>the team to collaborate in Markdown with embeddable components.
Ludwig Schubert: Helped with general infrastructure.
† equal contributors
Discussion and Review
Review 1 - Anonymous
Review 2 - Anonymous
Review 3 - Anonymous
References<br>Invariant visual representation by single neurons in the human brain [PDF]<br>Quiroga, R.Q., Reddy, L., Kreiman, G., Koch, C. and Fried, I., 2005. Nature, Vol 435(7045), pp. 1102--1107. Nature Publishing Group.<br>Explicit encoding of multimodal percepts by single neurons in the human brain<br>Quiroga, R.Q., Kraskov, A., Koch, C. and Fried, I., 2009. Current Biology, Vol 19(15), pp. 1308--1313. Elsevier.<br>Learning Transferable Visual Models From Natural Language Supervision [link]<br>Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J.,...