How to automate Instagram engagements with computer vision (and get banned)

florianherrengt2 pts0 comments

How to automate Instagram engagements with computer vision (and get<br>banned)

Obviously, Instagram does not want you to automate engagement. Their<br>HTML is a mess of randomly generated class names and deeply nested divs.<br>The structure changes every deployment. Any script that relies on DOM<br>selectors breaks within weeks because the class name doesn't exist<br>anymore.

But it doesn't matter anyway. Instagram can obfuscate their code all<br>they want because code is for machines. But<br>UI ... The UI is for humans. A heart icon has to look<br>like a heart icon. A comment button has to be where users expect it. The<br>layout has to be consistent enough that a person can easily navigate it.

So instead of fighting the DOM, let's just bypass it entirely. Take a<br>screenshot. Find the heart by its visual appearance. Get its<br>coordinates. Move the cursor there. Click. Done.

This works on anything that renders to pixels. Web apps, native apps,<br>games, terminals. If a human can see it and click it, a computer can<br>too. No selectors, no APIs, no platform-specific hooks. Just computer<br>vision and cursor automation.

Unfortunately, you can't just hardcode a position. Things move around<br>all the time. A long caption pushes the action bar down. A location tag<br>adds a line. A carousel of images takes up more vertical space. Every<br>post compresses or expands the layout differently.

Navigate between 2 posts and watch what happens to the hearts' position:

Hearts move between posts. The positions are never the same

Computer vision solves this. Instead of guessing where the hearts<br>should be, you look at the screen and find where they actually<br>are.

The First Problem: Too Much Screen, Too Many False Positives

The naive approach is simple: take the heart icon as a template and find<br>it on the screen. Wherever it matches, that's a heart. It's the most<br>basic computer vision operation you can do.

It doesn't work very well.

A full screenshot is huge. Over 7 million pixels on a typical screen.<br>And a heart is small, roughly 70x60 pixels. That's a lot of surface area<br>to search. In that sea of pixels there are plenty of things that vaguely<br>resemble a heart. You get too many false positives.

The detection technique is fine. The<br>search space is the problem. The screen is full of<br>noise. The more area you search, the more noise you find.

Shrink the Search Space

The fix is to stop searching the whole screen. Instead,<br>find things on screen that are easy to detect and use them to<br>figure out where the hearts must be.

On Instagram, two things are consistently easy to find:

The triple-dots menu (⋯) in the top-right corner of<br>every post. It's small, high-contrast and visually distinctive. It's<br>always present.

The action bar (like/comment/share) at the bottom of<br>every post. It's a wide, predictable pattern of icons.

Both can be found with basic template matching in milliseconds. But we<br>don't care about them for their own sake. We care about what they tell<br>us: the triple-dots sits directly above the heart column. The action bar<br>sits directly below it. If we know where those two landmarks are, we<br>know exactly where the hearts are. They're in the vertical strip between<br>them.

crop.x = triple_dots.x<br>crop.y = triple_dots.y + triple_dots.height<br>crop.width = triple_dots.width<br>crop.height = action_bar.y - crop.y - action_bar.height x 0.2

The only things left in the search region are actual hearts and whatever<br>happens to be in that exact column.

And since the crop region is derived from the actual positions of the<br>landmarks on screen rather than being hardcoded, it adapts to every post<br>automatically. The landmarks might be higher or lower depending on the<br>post content, but the geometric relationship between them and the hearts<br>is always the same.

The Sliding Window

Now that the search space is tiny and clean, you can run the sliding<br>window. Take the heart template and slide it across the search region<br>pixel by pixel. Score every position. The better the match, the more<br>likely it's a heart.

The sliding window is deliberately loose to catch every possible heart.<br>But that means it also catches things that aren't hearts.

Hearts on Instagram are all in one vertical column. Every single one.<br>Most detections will be on that line. Anything not on it is an outlier:

· · · · · ♡ · · ← most detections are here<br>· · · · · ♡ · · ← same column<br>· ✕ · · · · · · ← outlier (off to the left)<br>· · · · · ♡ · · ← same column<br>· · · · · ♡ · · ← same column<br>· · · · · · ✕ · ← outlier (off to the right)<br>· · · · · ♡ · · ← same column

The hearts (♡) cluster on one X coordinate. The false positives (✕) are<br>scattered. The sliding window thought they looked heart-shaped, but<br>they're not in the column.

So we find the most common X among all detections, the consensus line,<br>and discard anything more than 10 pixels away. It's just finding the<br>mode of the X values and treating everything else as noise. A few lines<br>of code and nearly all false positives are gone. The sliding window was<br>deliberately...

heart hearts screen column find search

Related Articles