Can AI coding systems generate $1 million as freelancers?
OpenAI researchers tested advanced AI systems on software development tasks for which humans had earned $1 million. Here's how they performed.
Freelance software engineering is a lucrative and evolving field where skilled developers handle various tasks, from bug fixes to full-stack feature development. In recent years, many have integrated AI tools into their workflow to assist with coding.

Figure 1.AI Coders Make Million as Freelancers.
This raises an intriguing question: Could an AI system perform the same job independently? Have software engineers, in effect, automated themselves out of their own roles? Figure 1 shows AI Coders as Freelancers.
Now, a study by Samuel Miserendino, Michele Wang, and colleagues at OpenAI Research provides insight. They have developed a benchmarking tool to assess whether state-of-the-art large language models (LLMs) can complete real-world software development tasks previously solved by humans. These developers collectively earned $1 million for their work, prompting the key question—could AI systems earn that amount on their own?
The findings may offer little reassurance to human developers. “Real-world freelance work in our benchmark remains challenging for frontier language models,” note Miserendino, Wang, and their colleagues. However, they estimate that the top-performing models could still earn a substantial portion of the $1 million.
Code Red
Software engineering extends far beyond just writing code. Engineers must interpret client requirements, navigate complex codebases, and make high-level architectural decisions [1]. Real-world freelance jobs demand full-stack development, debugging, and even managerial skills.
Evaluating large language models (LLMs) on these tasks is difficult because most benchmarks focus on standard coding problems—only a small part of a freelancer’s workload.
Miserendino, Wang, and their team aimed to bridge this gap by developing a database of real-world software engineering tasks that had previously been solved by human freelancers. They call this benchmark SWE-Lancer, hoping it will become the industry standard for assessing advanced LLMs' real-world coding capabilities.
The team sourced these freelance tasks from Expensify, a public company providing an expense management system used by 12 million customers. This software requires ongoing maintenance and development, which Expensify outsources to freelance developers. The company publicly shares these coding tasks on Upwork, a popular freelancing platform.
The OpenAI team selected 1,488 freelance tasks, roughly split between two categories:
- Individual programming tasks, where developers created coding patches to fix real-world issues.
- Management tasks, where freelancers selected the best solution from multiple competing proposals.
Putting AI to the Test
To evaluate cutting-edge AI performance, the team assigned these tasks to Anthropic’s Claude 3.5 Sonnet, OpenAI’s GPT-4o, and OpenAI’s o1 models.
- For coding tasks, the AI received the original task description from Upwork, a snapshot of the code before the fix, and the goal of the fix.
- For management tasks, the AI was provided with multiple proposed solutions, the original code, and the goal of selecting the best approach.
The Results
The findings were revealing. Claude 3.5 Sonnet performed best, followed by o1, and then GPT-4o. But none of them were perfect.
“All models earn well below the full $1 million USD of possible payout on the full SWE-Lancer dataset,” the researchers note.
However, some results were impressive:
- • Claude 3.5 Sonnet earned over $400,000 out of the possible $1 million—a substantial sum for AI-assisted freelance work.
- • The models performed better on management tasks than on actual coding, where they often provided superficial fixes rather than addressing deeper issues.
The Verdict
AI is proving valuable for evaluating solutions but still struggles with complex software development tasks. Overall, these models could only complete less than 50% of the available tasks, leading to a sobering conclusion:
“The real-world freelance work in our benchmark remains challenging for frontier language models.”
While AI is making strides, human expertise remains critical in software engineering—at least for now.
Money-Making Potential
The team attributes LLMs' inability to surpass human freelancers to several key limitations. Unlike human engineers, AI models don’t truly understand code—they generate patterns based on training data rather than engaging in deep problem-solving [2]. Additionally, human developers refine their solutions iteratively, running tests and debugging issues—something that LLMs struggle to replicate.
However, while AI isn’t ready to replace human engineers, the SWE-Lancer benchmark highlights a major opportunity: AI assistants can automate routine coding tasks, allowing developers to focus on more complex problem-solving.
Speed vs. Accuracy
One aspect not deeply explored by the researchers is the time required to complete tasks. AI systems may not yet outperform humans in accuracy, but if they can complete tasks significantly faster, that could still transform business planning.
Some tasks are already being automated by forward-thinking freelancers and companies, and as AI models continue improving, this proportion will only increase.
Given the rapid advancements in AI’s ability to tackle complex mathematical and engineering problems, progress in software development is likely to accelerate dramatically.
The Era of Disruptive Change
The takeaway? AI-driven transformation is no longer a distant future—it’s happening now. While human expertise remains crucial, businesses and developers who effectively integrate AI into their workflows stand to gain the most in the evolving software industry.
Reference
- https://www.discovermagazine.com/technology/can-ai-coding-systems-earn-usd1-million-as-freelancers
- https://www.reddit.com/r/singularity/comments/1isgv4q/new_openai_paper_can_llms_make_1_million/?rdt=48393
Cite this article:
Keerthana S (2025), Can AI coding systems generate $1 million as freelancers? AnaTechmaz,pp.72