Every number here is computed, not guessed.

Two gauges sit on top of each compound: how well it's studied, and how the community reads it. This page is the whole machine behind them — the evidence ladder, the quality rubric, and the actual formulas. No vibes, no eyeballing. If you disagree with a score, you can check our working.

From a pile of papers to one honest read

Most of what a search returns isn't about a human who lifts. We screen it down before anything gets scored. Real example — Trenbolone:

900+
raw PubMed records mentioning the compound
235
survive the relevance screen (cattle-residue, assays and groundwater work thrown out)
62
scored 1–5 for quality and kept as the evidence base
12
of those are in humans — 0 randomized trials

That last number is why Trenbolone's Studies gauge lands mid-scale while its animal literature is huge: volume of non-human work never substitutes for human evidence. The formula below is built around exactly that principle.

Two gauges, two different questions

Studies — how studied it is

Confidence in the science. Driven by human-trial evidence and its quality. Animal and in-vitro data count, but are capped. Nothing here is sentiment.

Community — sentiment & popularity

How much the real world talks about it and how clearly they lean. High discussion + a clear consensus reads strong — independent of whether the science exists.

The evidence ladder

Every source is weighted by method, not by how exciting the claim is. Human trials are the only tier that can push a score to the top; the rest is pooled and capped.

Human studiesTrials, cohorts, and case reports in people who actually took it.full weight, uncapped
Reviews / metaSecondary synthesis. Useful, but only as good as what it summarizes.×0.7 · pooled
Animal workRodent and livestock studies. Translatable, never conclusive for humans.×1.0 · pooled+capped
In-vitro / mechanisticCells and pathways. Explains how, not whether it works in a body.×0.4 · pooled+capped

Then each study earns a quality score

1–5, on method. A q5 is worth roughly 1.75× a q4 and 28× a q1 — top-tier evidence is rewarded steeply and marginal evidence barely moves the needle (which also makes the score hard to inflate with weak papers).

1

0.05

Anecdote-grade, tiny or uncontrolled.

2

0.15

Weak design or very small sample.

3

0.40

Decent observational / solid animal.

4

0.80

Good controlled study.

5

1.40

Large RCT or strong meta-analysis.

The Studies formula

Human evidence is summed linearly (it can always raise the score). Everything else is pooled and passed through a saturating cap, so a mountain of animal papers adds a bounded amount and no more. The total is squeezed into 0.10–0.98.

# weighted evidence mass, by tier (qw = the 1–5 quality weight above) Wh = Σ qw(q) over human studies → linear, uncapped Wlow = Σ qw(q)·animal + 0.7·Σ qw(q)·review + 0.4·Σ qw(q)·invitro L = 6 · (1 − e^(−Wlow / 7)) → low-tier support, hard-capped at 6 Weff = Wh + L # map to the gauge (0.10 floor, 0.98 ceiling) E = 0.10 + 0.885 · (1 − e^(−Weff / 15))
Worked example — Trenbolone. Its 12 human studies are 5×q4, 6×q2, 1×q3 → Wh = 5(0.80)+6(0.15)+1(0.40) = 5.30. Animal/review/in-vitro pool to Wlow = 6.65, so L = 6(1−e−6.65/7) = 3.68. Then Weff = 8.98 and E = 0.10 + 0.885(1−e−8.98/15) = 0.50. Big literature, but no human trials → Moderate, honestly earned. Anavar, with 76 high-quality human studies, runs the same math to 0.98.

Labels: Thin < 0.40 · Moderate 0.40–0.65 · Strong ≥ 0.66.

The Community formula

Popularity leads: how much real reporting exists (named expert voices + structured cycle logs). A clear consensus — most voices leaning the same way — nudges it up; a genuine split holds it back.

vol = 1 − e^(−(voices + 0.7·logs) / 9) # popularity / documentation depth agree = how one-sided the voices are # 0 = even split, 1 = unanimous C = 0.05 + 0.90 · vol · (0.70 + 0.30·agree)

A compound nobody logs sits near the floor. A widely-run compound with a clear community verdict — good or bad — reads Strong. This gauge measures signal, not endorsement: Trenbolone scores high because it's discussed constantly and consistently, most of it cautionary.

The parts that are just counting

The tallies under every scorecard are deterministic — no judgment involved.

Total scoredStudies that cleared the relevance screen and got a 1–5.count
Human / animalSplit by study type tag.count by tag
High-qualityThe ones that earned a 4 or 5.count of q ≥ 4

And the needle is just geometry

A score s ∈ [0,1] becomes an angle. That's the whole trick behind the dial.

θ = 180° · (1 − s) # s=0 points hard-left (red), s=1 hard-right (green)

How the verdict gets written

The headline sentence on each page isn't a fourth score — it's the reconciliation of the two gauges with the plain yes/no of whether it works and how sharp the risks are.

Studies (is the proof there?) + Community (what do real users find?) + risk profile → the one-line verdict. When the two gauges disagree — strong community, thin science, or vice-versa — we say so out loud rather than splitting the difference. That tension is often the most useful thing on the page.

What we don't pretend

The score is only as clean as the tags

A study mislabeled "human, high-quality" would over-count. We audit for this; where a compound looks over-scored, the fix is re-tagging the data, not fudging the gauge.

Popularity ≠ truth

The Community gauge can read Strong for a compound the science barely covers. That's the point of keeping the two gauges separate instead of averaging them into mush.

No sample-size term (yet)

Where trial sizes aren't reliably captured, a 20-person study and a 2,000-person study can weigh the same within a quality tier. We'd rather admit that than fake precision.

We never invent a number

If the evidence isn't there, the gauge sits low and the page says "we don't know" — instead of manufacturing confidence to fill the space.

Now go read a compound — and check our working.

Every scorecard runs the exact math on this page. If a number looks wrong, tell us where.