Show HN: AdvertBench, ranking the ability of LLMs to create image ads

joegibbs1 pts0 comments

Experiment that I ve made. The models get access to an E2B sandbox and are instructed to create an ad according to the specifications (they can choose whatever tools they want to use for it, e.g. Pillow, Chromium) as a proxy for their ability to use tools, create other kinds of images, do complex layouts etc. Currently Opus 4.8 is on top (not surprising, but it did take 66 conversation turns to create the image) and GLM-5.2 is on fifth (which I do find surprising because it doesn t have image capabilty).

create image ability tools surprising show

Related Articles