6M Fake GitHub Stars: How to Vet Open-Source AI Tools

6 Million Fake GitHub Stars: How to Vet Open-Source AI Tools Before You Bet on Them Skip to main content← Back to Blog Your team finds a promising AI agent framework on GitHub. It has 12,000 stars, an active-looking README, and a Discord link. The CTO greenlights a proof-of-concept. Three months later the project is abandoned, the maintainer vanishes, and someone on Hacker News points out that 70% of those stars came from bot accounts created in the same week. You are now maintaining a fork of a dead project as a core dependency. This scenario is not hypothetical. A peer-reviewed study from Carnegie Mellon University, presented at ICSE 2026, found approximately 6 million fake stars distributed across 18,617 repositories by roughly 301,000 accounts. AI and LLM repositories were the largest non-malicious category of recipients. If your organization is evaluating open-source AI tools, the star count on the repo page is one of the least reliable signals you can use. A 2026 CMU study found 6 million fake stars across 18,600+ GitHub repos, and AI/LLM projects are the most-manipulated non-malicious category. GitHub stars are not a reliability signal. Use fork-to-star ratios, contributor depth, commit cadence, and issue response time to evaluate open-source AI tools before building on them. Why Are GitHub Stars Unreliable as a Quality Signal? A GitHub star is a one-click, zero-commitment gesture. It does not mean the person who starred a repository has read the code, used the tool, or even cloned the repo. It is closer to a social media "like" than a product endorsement. Yet stars have become the default shorthand for open-source credibility, appearing in pitch decks, vendor comparison spreadsheets, and internal tool evaluations. The gap between what stars measure (casual interest) and what teams use them to infer (adoption, quality, community health) is where the manipulation lives. The CMU study used a tool called StarScout to analyze 20 terabytes of GitHub metadata (6.7 billion events and 326 million stars from 2019 to 2024). By mid-2024, 16.66% of all repositories with 50 or more stars were involved in fake star campaigns. That number was near zero before 2022. The researchers confirmed their detection accuracy: 90.42% of flagged repositories and 57.07% of flagged accounts had been deleted by January 2025, meaning GitHub itself recognized these as illegitimate. The incentive structure makes the problem worse. Venture capital firms explicitly use star counts as sourcing signals. Jordan Segall at Redpoint Ventures published an analysis of 80 developer tool companies showing that the median GitHub star count at seed financing was 2,850 and at Series A was 4,980. He confirmed that "many VCs write internal scraping programs to identify fast growing GitHub projects for sourcing." When stars convert directly into investor attention, the financial incentive to inflate them is obvious. How Does the Star-Buying Market Work? Stars sell for $0.03 to $0.85 each on at least a dozen websites, Fiverr gigs, and Telegram channels. No dark web access required. Budget services ($0.03 to $0.10 per star) use disposable new accounts that deliver in days. Premium services ($0.80 to $0.90 per star) use aged accounts with years of activity history, delivering gradually to mimic organic growth. Some vendors offer 30-day replacement guarantees and formal APIs for programmatic purchasing. The fingerprints are consistent. Independent analysis of manipulated repos found that 36% to 76% of stargazers have zero followers and zero public repositories. These are not new developers casually exploring GitHub. They are empty shells, many with account ages over 1,000 days (purchased or farmed specifically for star campaigns), designed to pass simple "young account" filters. The accounts star but do not fork, do not file issues, and do not watch for updates. They exist to increment a counter. The economics are striking. At seed-round median benchmarks of 2,850 stars, manufacturing that number costs $85 to $285 using budget services. A typical seed round unlocks $1 million to $10 million in funding. The return on investment for purchased credibility ranges from 3,500x to 117,000x. For an AI startup facing pressure to demonstrate traction, the math is unfortunately compelling. How Do You Spot a Repo With Inflated Stars? No single metric proves manipulation, but a combination of weak signals creates a clear picture. Here are the heuristics that matter most, drawn from both the CMU research and independent analyses of known-organic versus known-manipulated repositories. Fork-to-star ratio. This is the strongest simple heuristic. A fork means someone downloaded the code to use or modify it. A star costs nothing. Healthy, actively used projects show fork-to-star ratios between 10% and 25%. Flask (71,000 stars) has a ratio of 23.5%. LangChain (133,000 stars) is at 15.5%. Projects with confirmed manipulation campaigns routinely fall below 5%. One repo with...

6M Fake GitHub Stars: How to Vet Open-Source AI Tools

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits