Did WriteHuman win the July 2026 HumanizerBench cycle?

Yes. WriteHuman finished first with a composite score of 73.07 out of 100, ahead of Undetectable AI at 72.17 and Humanize AI Pro at 70.49. All thirteen tools were scored the same way, across 33 samples and 429 tests against five commercial detectors.

Undetectable AI had a higher detector pass rate. How did WriteHuman finish first?

Undetectable AI posted a higher raw detector pass rate (0.957 versus our 0.816), but it reached that number by padding its outputs. Twenty-six of its 33 rewrites ran more than 1.4 times the length of the input, which triggered the length-inflation penalty and cost it the maximum ten points. WriteHuman took a single one-point penalty, so we finished ahead once the penalties were applied.

Is HumanizerBench independent?

No, and we do not describe it that way. WriteHuman operates HumanizerBench. What we can promise is that it is fully transparent: we pay for every competing tool ourselves, run each one by hand, and publish every input, output, detector verdict, and the scoring code, so anyone can clone the repository and recompute the leaderboard from scratch.

What is a detector pass rate?

It is the share of detector tests that read a tool's rewrite as human rather than machine-written, averaged across all five detectors. It is one of four scored properties, and on its own it does not tell you whether the rewrite kept your meaning or your length.

Why are the July composite scores lower than June's?

The scoring was revised this cycle and the readability component was rescaled, which lowered the absolute numbers for every tool. Because of that, month-to-month composite scores are not directly comparable. The cleaner comparison is by rank: WriteHuman moved from second in June to first in July.

AI Humanizer Rankings for July 2026 | WriteHuman #1 on HumanizerBench

The July 2026 HumanizerBench cycle is out, and WriteHuman finished first with a composite score of 73.07 out of 100, ahead of Undetectable AI at 72.17 and Humanize AI Pro at 70.49. We are glad to be back on top. But the number we most want to talk about is not our own. It is the one just below us.

Undetectable AI posted the highest raw detector pass rate of any tool this month: 0.957, which means its rewrites read as human to the detectors about 96 percent of the time. WriteHuman's pass rate was 0.816. On the single headline number that most humanizer marketing leads with, we lost by a wide margin. And we still won the cycle.

That gap is the most useful thing in the July data, because it is a clean illustration of something that is easy to miss when you shop for a humanizer by one statistic. A very high detection-evasion number can overstate what a tool actually did for you. Undetectable AI reached 0.957 in a way that cost it much of what a high score is supposed to represent, and the benchmark's penalty rules caught it. This post walks through how that happened, shows the full leaderboard, and explains the scoring so you can check every claim yourself.

The July 2026 leaderboard

Thirteen tools were scored this cycle against five commercial detectors (GPTZero, Winston AI, ZeroGPT, Copyleaks, and Originality.ai) across 33 samples and 429 individual tests, under methodology version 1.2.0. Here is where everyone landed.

Rank	Tool	Composite / 100	Detector pass rate	Meaning	Readability	Penalties
1	WriteHuman	73.07	0.816	0.729	0.562	-1
2	Undetectable AI	72.17	0.957	0.730	0.559	-10
3	Humanize AI Pro	70.49	0.704	0.743	0.603	0
4	Stealth Writer	68.07	0.812	0.685	0.445	-3
5	Humbot	66.42	0.707	0.753	0.439	-1
6	HIX Bypass	64.84	0.659	0.738	0.429	0
7	Walter Writes	62.84	0.806	0.647	0.615	-8
8	StealthGPT	61.52	0.816	0.630	0.606	-10
9	Phrasly	61.47	0.734	0.609	0.721	-8
10	ai-humanize-io	59.66	0.760	0.698	0.557	-10
11	Super Humanizer	54.13	0.423	0.732	0.671	-5
12	Grammarly	53.38	0.000	0.945	0.821	0
13	NoteGPT	44.04	0.000	0.691	0.745	0

A few things jump out. Grammarly and NoteGPT barely change the text at all, so they preserve meaning beautifully and read cleanly, but their pass rate is essentially zero, because they do almost nothing to how the writing reads to a detector. That is not a knock on them. They are editors, not humanizers, and this is a benchmark about detection evasion. StealthGPT matched our exact detector pass rate of 0.816 and still finished eighth, for the same reason Undetectable AI finished second: penalties. And the two tools with the cleanest penalty column, Humanize AI Pro and HIX Bypass, punched above their evasion numbers precisely because they were not docked. Humanize AI Pro climbed six places from June on that clean, no-shortcuts profile.

Why the runner-up's 0.957 does not mean what it looks like

Here is the tension in one sentence. Undetectable AI cleared detectors more often than we did and finished behind us anyway, because of how it got there.

When we replayed the July data, 26 of Undetectable AI's 33 outputs tripped the length-inflation rule. That rule flags any rewrite that runs more than 1.4 times the length of the input. In plain terms, on about four out of every five tests, Undetectable AI handed back noticeably more words than it was given. Padding is a real and well-known way to push a detector score up: more filler dilutes the statistical fingerprint a detector keys on, and the pass rate climbs. It also means the text you get back is not the text you wrote. It is your text plus a meaningful amount of stuffing you did not ask for and now have to cut back out.

The benchmark treats that as a quality failure, not a feature. Length inflation costs one point per flagged output, capped at ten points total, and Undetectable AI hit the cap. Now look at our row. WriteHuman took a single penalty this month, worth one point. It came from one output out of 33 where our meaning-preservation score dipped below the 0.85 threshold, which the benchmark flags as meaning drift. One flag, one point. We are not going to pretend that penalty does not exist, and we are not thrilled about it, but the contrast is the whole story: we won by pairing near-top detection evasion with outputs that stayed the length you gave us and said what you meant.

The math, laid out plainly

Strip the penalties out and Undetectable AI actually posted the highest weighted score in the cycle, roughly 82 out of 100, against our 74. In other words, on raw scoring alone it was ahead. The ten-point padding penalty is exactly what pulled it back below us, to 72.17, while our single point left us at 73.07. Our final margin was 0.90 points, which is narrower than the penalty swing between us. That is the honest way to read the result: the penalties are what flipped the order. If padding did not carry a cost, the ranking would look different. We think padding should carry a cost, because a humanizer's job is to make your writing read as human without quietly changing how much of it there is.

This is also why we ask people not to shop for a humanizer on the detector pass rate alone. That number tells you how often a tool cleared the detectors. It does not tell you what the tool did to your writing to get there. A high pass rate earned by inflating length, loosening your meaning, or handing back bloated text is a worse outcome than a slightly lower pass rate on text that still reads like you.

What actually moved this cycle

Two things are worth flagging for anyone comparing July to earlier months.

First, the composites are lower across the board than they were in June, and that is a scoring change, not a collapse in quality. Readability is now scored as an overall writing-quality rating of the output rather than the simpler grammar-based measure earlier cycles used, and the penalty caps were tightened. Those revisions pulled the absolute numbers down for everyone, us included, which is why our readability sits at 0.562 rather than near the top of the scale. Because the scoring was revised, month-to-month composite scores are not directly comparable, so the cleaner way to read June against July is by rank.

Second, by rank, WriteHuman moved from second in June to first in July. We are not going to oversell a 0.90-point margin as a blowout. It is a narrow lead in a month where the tool that would otherwise have edged us gave back ten points to the penalty rules. We will take the top spot, and we will also tell you it was close.

How the benchmark works

The composite score out of 100 is a weighted blend of four measured properties:

Detector results, weighted 42 percent. How often a tool's output reads as human across all five detectors. This is the detector pass rate column.
Meaning preservation, weighted 32 percent. How faithfully the rewrite keeps the original meaning, measured as the similarity between the input and the output.
Readability, weighted 16 percent. An overall writing-quality rating of the output.
Consistency, weighted 10 percent. How steady a tool's performance is across different writing categories.

On top of that weighted score, the benchmark subtracts penalties for specific quality failures: meaning drift, length inflation, over-trimming, refusals, and returning the input unchanged. Each failure costs a point per flagged output, capped at ten per category. Penalties are already reflected in the composite shown in the table. That penalty layer is exactly what separated the top two tools this month.

Every tool is bought and paid for by us and run by hand on the most evasion-focused setting it advertises. There are no affiliate arrangements and no vendor-supplied numbers. All thirteen tools carry a medium confidence rating this cycle, each run through the same 33 prompts. The source passages the tools were asked to rewrite were generated by three separate models, Claude Sonnet 5, Gemini 3.5 Flash, and GPT-5.5, and span categories including blog posts, news articles, marketing copy, and general discussion writing.

Reproduce this yourself

We would rather you not take our word for any of this, especially since we run the benchmark. The full cycle is published as a public audit record, and you can replay it end to end:

Clone the public repository at github.com/HumanizerBench/humanizerbench.
Run the verifier with npm run verify, or verify a single cycle with npx tsx scripts/verify-cycle.ts "July 2026".
The verifier re-derives the prompts, recomputes a checksum of every data file, and replays the frozen scoring code against the raw inputs, outputs, and detector verdicts. If any number had been altered after the cycle started, the check fails.

The prompts themselves are protected against gaming by a commit-and-reveal scheme: a random seed fixes each cycle's prompts, only its hash is published at the start, and the seed is revealed at the close so anyone can confirm the prompts were set before any tool was tested. Vendors get auditability without the ability to pre-train against next month's inputs. The specific figures in this post, including the 26 length-inflation flags on Undetectable AI's outputs and our single meaning-drift flag, come straight from replaying that published July 2026 data.

The takeaway

WriteHuman is ranked first in the July 2026 HumanizerBench cycle, and we are proud of that. But the more durable lesson is the one the runner-up taught: the biggest evasion number on the leaderboard belonged to a tool that got there by padding its outputs, and it finished behind us once the cost of that padding was counted. When you are choosing a humanizer, the question is not only how often it clears the detectors. It is what it did to your writing on the way there. We think that is the right way to judge these tools, which is why the benchmark is built to reward it, and why we are comfortable publishing the whole thing for you to check.

You can see the full leaderboard, per-detector breakdowns, and every tool's sub-scores at humanizerbench.com/leaderboard.

AI Humanizer Rankings for July 2026: WriteHuman Takes the Top Spot

The July 2026 leaderboard

Why the runner-up's 0.957 does not mean what it looks like

The math, laid out plainly

What actually moved this cycle

How the benchmark works

Reproduce this yourself

The takeaway

Frequently asked questions

Make AI text sound truly human.

Make AI text sound truly human.

Related Articles

Online Tools to Humanize AI Text: The Honest 2026 Comparison

Grammarly AI Checker vs. a Dedicated AI Humanizer: Which One Actually Cleans Up AI-Written Research?

How to Use ChatGPT for Writing (And Humanize the Output So It Sounds Like You)

Ready to humanize your writing?