I found 10k GitHub repositories distributing Trojan malware

theorchid2 pts0 comments

How I found 10,000 GitHub repositories distributing Trojan malware

How I found 10,000 GitHub repositories distributing Trojan malware

18 June 2026

This is the story of how I found 10,000 repositories on GitHub that distribute Trojan malware. They are all from different contributors, have different names, and are not forks of other repositories. But they share a common pattern, which is what allowed me to write a script to find such repositories.<br>Introduction<br>I have a project on GitHub, and I wanted to check whether search engines had indexed it. I typed the project name into Google, and my repository appeared in the results. I entered the same query into Bing, and someone else’s repository appeared in the results, with the exact same name and description. It was a copy of my repository with all the commits, and I was listed as a contributor. But an hour ago, another commit was pushed with a change to the readme. A link to a zip archive has been added to it.<br>I was choosing appropriate tags for another one of my projects on GitHub. I clicked through those tags to look at similar projects. In the list, I found a repository whose name and description matched exactly those of another repository on that list. It turned out that it also contained copies of all the commits from that repository, and two hours ago, a link to a zip archive has been added to the readme.<br>After monitoring these two repositories, I discovered that every few hours they delete the previous commit and push the exact same commit again. This commit contains only one change: adding a link to the archive in the readme file.<br>I submitted a request to GitHub support asking them to delete these repositories. Two weeks passed and nothing has changed; GitHub support hasn’t responded. I discussed with an AI what else could be done about this, but it didn’t offer any useful advice. I opened a thread on GitHub, and three people replied with the same AI slop that was of no use at all.<br>Another month later, GitHub support sent me an email saying that they had removed these repositories.<br>You can open other similar repositories, look at the latest commit, and see that a link to a zip archive was added to the readme a few hours ago:<br>https://github.com/Dicrida123/java-sdk<br>https://github.com/A2A-MC/ccresume<br>https://github.com/1-RAY-1/project-startup-cursor<br>https://github.com/123abukhaled0/FinCoach<br>The zip archive contains 4 files:<br>- Application.cmd or Launcher.cmd<br>- loader.exe or luajit.exe or another_name.exe<br>- random_name.cso or random_name.txt<br>- lua51.dll<br>If you submit a link to the archive to VirusTotal, it will find 0 viruses.<br>If you submit the zip file itself, it will detect a Trojan inside it.<br>Continued<br>It seemed like I had already forgotten about this event, but my subconscious hadn’t. And my subconscious often throws interesting ideas at me when I’m sleeping or waking up. Recently, I woke up and in the very same second realized what I needed to do. I need to come up with a general pattern and then write a script that will analyze all GitHub repositories and find the ones that match that pattern.<br>Search pattern:<br>- Every few hours the previous commit is deleted and a new one is pushed<br>- Only the readme file is updated in the commit<br>- The readme file contains a link to a zip archive<br>- The commits are copied from another repository<br>- This is a new repository, not a fork<br>- All repositories have different contributors and different names<br>From the last two points, it becomes clear that even if we find one such repository, we won’t be able to find other similar repositories using it. But there are 500 million repositories on GitHub. How can we analyze all of them? GitHub allows 5,000 requests per hour with a single token. For each repository, we need to make several requests to get the list of commits, modified files, and the content of the readme file. I didn’t want to wait a year for the script to analyze all the repositories.<br>But we don’t need all the repositories, we only need the ones that are updated every few hours. I found a service called gharchive, which lets you download all GitHub events for any given day. So we need to download the event archives for the last few days, filter them to include only commit push events, and identify the repositories that are updated between 2 and 10 times every 10 hours.<br>Over the past 5 days, there have been 16 million commit pushes. Of these, only 3,000 are repositories that are updated every few hours.<br>However, the events do not include information about which specific files were modified. This means that for each relevant repository, we need to make additional requests to the GitHub API.<br>After running the script, it returned a large number of repositories. I added several parameters to the filters:<br>- The commit must be from a user, not a bot<br>- More than a month has passed between the last commit and the one before that<br>- The repositories have more than one contributor<br>After that, only 14 repositories were found...

repositories github repository commit found readme

Related Articles