AnaTech Maz Technology Magazine

Can AI coding systems generate $1 million as freelancers?

Keerthana S Febraury 22, 2025 | 02:35 PM Technology

OpenAI researchers tested advanced AI systems on software development tasks for which humans had earned $1 million. Here's how they performed.

Freelance software engineering is a lucrative and evolving field where skilled developers handle various tasks, from bug fixes to full-stack feature development. In recent years, many have integrated AI tools into their workflow to assist with coding.

Figure 1.AI Coders Make Million as Freelancers.

This raises an intriguing question: Could an AI system perform the same job independently? Have software engineers, in effect, automated themselves out of their own roles? Figure 1 shows AI Coders as Freelancers.

Now, a study by Samuel Miserendino, Michele Wang, and colleagues at OpenAI Research provides insight. They have developed a benchmarking tool to assess whether state-of-the-art large language models (LLMs) can complete real-world software development tasks previously solved by humans. These developers collectively earned $1 million for their work, prompting the key question—could AI systems earn that amount on their own?

The findings may offer little reassurance to human developers. “Real-world freelance work in our benchmark remains challenging for frontier language models,” note Miserendino, Wang, and their colleagues. However, they estimate that the top-performing models could still earn a substantial portion of the $1 million.

Code Red

Software engineering extends far beyond just writing code. Engineers must interpret client requirements, navigate complex codebases, and make high-level architectural decisions [1]. Real-world freelance jobs demand full-stack development, debugging, and even managerial skills.

Evaluating large language models (LLMs) on these tasks is difficult because most benchmarks focus on standard coding problems—only a small part of a freelancer’s workload.

Miserendino, Wang, and their team aimed to bridge this gap by developing a database of real-world software engineering tasks that had previously been solved by human freelancers. They call this benchmark SWE-Lancer, hoping it will become the industry standard for assessing advanced LLMs' real-world coding capabilities.

The team sourced these freelance tasks from Expensify, a public company providing an expense management system used by 12 million customers. This software requires ongoing maintenance and development, which Expensify outsources to freelance developers. The company publicly shares these coding tasks on Upwork, a popular freelancing platform.

The OpenAI team selected 1,488 freelance tasks, roughly split between two categories:

Individual programming tasks, where developers created coding patches to fix real-world issues.
Management tasks, where freelancers selected the best solution from multiple competing proposals.

Each task had previously been completed by human freelancers, with payments ranging from $250 to $32,000, totaling $1 million across all tasks.

Putting AI to the Test

To evaluate cutting-edge AI performance, the team assigned these tasks to Anthropic’s Claude 3.5 Sonnet, OpenAI’s GPT-4o, and OpenAI’s o1 models.

For coding tasks, the AI received the original task description from Upwork, a snapshot of the code before the fix, and the goal of the fix.
For management tasks, the AI was provided with multiple proposed solutions, the original code, and the goal of selecting the best approach.

The Results

The findings were revealing. Claude 3.5 Sonnet performed best, followed by o1, and then GPT-4o. But none of them were perfect.

“All models earn well below the full $1 million USD of possible payout on the full SWE-Lancer dataset,” the researchers note.

However, some results were impressive:

Claude 3.5 Sonnet earned over $400,000 out of the possible $1 million—a substantial sum for AI-assisted freelance work.
The models performed better on management tasks than on actual coding, where they often provided superficial fixes rather than addressing deeper issues.

The Verdict

AI is proving valuable for evaluating solutions but still struggles with complex software development tasks. Overall, these models could only complete less than 50% of the available tasks, leading to a sobering conclusion:

“The real-world freelance work in our benchmark remains challenging for frontier language models.”

While AI is making strides, human expertise remains critical in software engineering—at least for now.

Money-Making Potential

The team attributes LLMs' inability to surpass human freelancers to several key limitations. Unlike human engineers, AI models don’t truly understand code—they generate patterns based on training data rather than engaging in deep problem-solving [2]. Additionally, human developers refine their solutions iteratively, running tests and debugging issues—something that LLMs struggle to replicate.

However, while AI isn’t ready to replace human engineers, the SWE-Lancer benchmark highlights a major opportunity: AI assistants can automate routine coding tasks, allowing developers to focus on more complex problem-solving.

Speed vs. Accuracy

One aspect not deeply explored by the researchers is the time required to complete tasks. AI systems may not yet outperform humans in accuracy, but if they can complete tasks significantly faster, that could still transform business planning.

Some tasks are already being automated by forward-thinking freelancers and companies, and as AI models continue improving, this proportion will only increase.

Given the rapid advancements in AI’s ability to tackle complex mathematical and engineering problems, progress in software development is likely to accelerate dramatically.

The Era of Disruptive Change

The takeaway? AI-driven transformation is no longer a distant future—it’s happening now. While human expertise remains crucial, businesses and developers who effectively integrate AI into their workflows stand to gain the most in the evolving software industry.

Reference

https://www.discovermagazine.com/technology/can-ai-coding-systems-earn-usd1-million-as-freelancers
https://www.reddit.com/r/singularity/comments/1isgv4q/new_openai_paper_can_llms_make_1_million/?rdt=48393

Cite this article:

Keerthana S (2025), Can AI coding systems generate $1 million as freelancers? AnaTechMaz,pp.72

Previous Post Hong Kong Advances Virtual Asset Initiative with Expanded Licensing and Trading Options

Next Post Transitioning Military Drone Harnesses Rotor Backwash to Enhance Flight Efficiency

Can AI coding systems generate $1 million as freelancers?

Code Red

Putting AI to the Test

The Results

The Verdict

Money-Making Potential

Speed vs. Accuracy

The Era of Disruptive Change

Reference

Cite this article:

Recent Post

KNIME Introduces AI Companion To Help Data Workers Uncover Insights from Their Data

International Fact-Checking Network Debunks Zuckerberg's Censorship Claim

Softbank Considers Investing Up To $25 Billion In Openai to Become a Leading Backer

UK Regulator Considering Investigation into AWS And Microsoft's Cloud Businesses

D-Wave Quantum Secures $150 Million Through At-The-Market Equity Offering, Raising Cash to $320 Million

Quantum Breakthrough: Photonic QLDPC Codes Cut Qubit Needs by 20x

Quantum Dice and Thales Unveil QRNG-powered HSM for Advanced Enterprise Security and Post-Quantum Preparedness

Rigetti's Ankaa-3 Quantum Processor Now Accessible on AWS Braket

Salience Labs Raises $30M in Series A for AI Photonic Datacenter Solutions

Software for Text Analysis, Mining, And Analytics business intelligence

Hong Kong Advances Virtual Asset Initiative with Expanded Licensing and Trading Options

Can AI coding systems generate $1 million as freelancers?

Transitioning Military Drone Harnesses Rotor Backwash to Enhance Flight Efficiency

Deutsche Boerse's Clearstream to Provide Custody Services for Bitcoin and Ether

Softbank Acquires $676M Sharp Plant for OpenAI Partnership in Japan

Blog Archive

Popular Lnks