Full transparency
How suchsignal works
Every scoring decision on this site is explained in full. This page shows you the exact AI models used, word-for-word instructions they receive, the math behind the numbers, how cross-referencing and synthesis work, how articles are generated, and the hard limits of what the system can and cannot assess. Nothing is a black box.
1. Plain-English overview
suchsignal is a tool for evaluating UAP (Unidentified Aerial Phenomena) and UFO disclosure materials. Admins submit documents, testimonies, images, data files, or URLs. An AI reads the content and scores it across seven dimensions — things like "how credible is the source?" and "how specific and checkable are the claims?" — then outputs a single headline percentage we call Signal Strength.
Signal Strength is not a measure of whether something is "true." It is a measure of how well-evidenced and how verifiable the claims in a piece of content are. A poorly-sourced claim that happens to be accurate will score low. A carefully documented, multiply-corroborated claim will score high. The score tells you how seriously you should take the content as evidence — not whether you should believe it.
Beyond individual items, suchsignal runs a continuous synthesis pipeline that clusters related claims across all published items, scores competing hypotheses, identifies contradictions, and produces an overall Verdict — a data-driven summary of what the evidence collectively supports. This is separate from any individual item's score and is updated whenever the item corpus changes materially.
The AI does not have opinions. It applies the same rubric, described in full below, to every piece of content it sees. Human editors review all AI output before anything is published and can attach explanatory notes. The AI score itself is never changed after the fact.
2. Step-by-step process
- 01
Submission
An admin submits content in one of five formats: a URL (fetched and stripped of HTML), pasted plain text, a PDF (text extracted automatically), an image (sent to Claude's vision model), or a CSV data file. The source type (e.g. "Government Document", "Whistleblower Testimony") is selected manually — the AI does not decide this.
- 02
Ingestion
Content is prepared for analysis. URLs are fetched and HTML is stripped to plain text. PDFs are parsed to extract their text layer. Images are kept in their original binary form (JPEG, PNG, etc.) and sent directly to the vision model. CSVs are passed as text for statistical interpretation. The full ingested content is stored alongside the final scores.
- 03
AI analysis
The content is sent to Claude alongside a fixed system prompt (shown in full below). Claude reads the content and produces structured scores for all seven dimensions, a reasoning note for each score, a title, a description, and a list of key claims. This is done in a single API call using Claude's structured tool use feature.
- 04
Scoring
The raw dimension scores (each 0–10) are fed into a fixed mathematical formula to produce the headline Signal Strength percentage. No judgment, no adjustment — just arithmetic. The formula is shown in full below.
- 05
Embedding and cross-referencing
Immediately after scoring, a vector embedding is generated from the item's title, description, and content. This embedding is stored in a vector database (Cloudflare Vectorize) and queried against all existing published items to find semantically similar content. Claude Haiku then classifies the relationship between the new item and each similar item (supporting, contradicting, contextualising, or unrelated) and stores these cross-references for display on the item page.
- 06
Editorial review
An admin reviews all AI output before the item is published. They can add explanatory notes to any dimension if they feel context is missing. They cannot change numeric scores. Any override note is displayed publicly on the item page.
- 07
Publication
The item goes live. Every score, every reasoning note, the cross-references found, and the rubric version used are all displayed on the public item page. Nothing is hidden.
- 08
Synthesis (automatic)
Publishing an item triggers the synthesis pipeline: claims are clustered, hypotheses are scored against the full evidence base, contradictions are grouped, and an updated Verdict is generated. This runs asynchronously via a job queue — it does not delay publication. The updated Verdict is visible at /analysis/verdict/.
3. The AI models we use
We use two Claude models from Anthropic, each matched to the task's complexity and cost:
Claude Sonnet 4.6 — primary analysis model
Used for: item scoring, verdict generation, article drafting from URLs.
Sonnet is Anthropic's most capable current model and handles complex reasoning tasks: evaluating evidence quality across seven dimensions, synthesising the full item corpus into a Verdict, and generating structured intelligence articles from source content.
Claude Haiku 4.5 — fast classification model
Used for: cross-reference classification, claim clustering, contradiction grouping.
Haiku handles high-volume, repetitive classification tasks where speed and cost matter more than deep reasoning. Each published item may be compared against dozens of existing items — Haiku processes these comparisons rapidly without compromising the more expensive Sonnet calls for primary analysis.
Neither model has internet access during analysis. They cannot look up additional sources beyond what they are given. Their knowledge has a training cutoff date. See Limitations below.
4. What Claude sees
For each item, the analysis model receives exactly two things:
- A system prompt — fixed instructions that define the scoring task, explain each dimension, and specify what constitutes a low, medium, or high score. This prompt is the same for every item and is reproduced in full in the next section.
- The content to analyze — prepared differently depending on the input type:
- URL: page text after all HTML tags, scripts, and styles are stripped.
- Plain text: the raw pasted input, unchanged.
- PDF: text extracted from the PDF's text layer, page by page.
- Image: the original image bytes (JPEG, PNG, GIF, or WebP), sent directly to Claude's vision capability alongside any context notes the submitter provided.
- CSV: the raw CSV text, which Claude reads as structured data for statistical interpretation.
Claude does not receive: the item's source type label, the submitter's name, any previous scores for this item, or any context about what score we might expect. Content is capped at approximately 60,000 characters for text inputs.
5. The exact instructions we give Claude
The following is the complete, unedited system prompt sent with every analysis request. It has not been paraphrased or summarised. This is word-for-word what Claude reads before analyzing content.
You are a rigorous analyst evaluating UAP (Unidentified Aerial Phenomena) and UFO disclosure materials for Signal Strength scoring on behalf of suchsignal.com. Score each of the following 7 dimensions on a scale of 0.0 to 10.0: SOURCE QUALITY (determines 30% of Signal Strength): 1. Credibility (0–10): The source's background, institutional position, verifiable track record, and incentive alignment. Higher scores for credentialed officials with institutional backing and no obvious financial motive. Score 9–10 only for sitting officials on the record. Score 1–2 for anonymous or unverifiable sources. 2. Integrity (0–10): Internal consistency, absence of contradictions, acknowledgment of uncertainty, and verifiability of cited facts. Deduct for known false statements, retractions, or suspicious pattern changes. CLAIM QUALITY (determines 70% of Signal Strength): 3. Corroboration (0–10): Degree to which independent sources confirm the same claims. Score 9–10 only if multiple unaffiliated, credentialed sources independently confirm. Score 1–2 for sole-source claims with no corroboration. 4. Consistency (0–10): Whether claims align with or contradict the established evidentiary record. Deduct for direct contradictions to official records or prior testimony. Reward for alignment with documented evidence. 5. Cross-reference (0–10): How well this item's claims cross-reference with other known disclosure materials, official records, declassified documents, or scientific literature. Score higher when specific named documents, dates, or individuals can be verified. 6. Recency (0–10): How recent the information is and how relevant its timeframe is to current disclosure efforts (post-2017 UAP Task Force era scores higher). Deduct for items describing only decades-old events with no present-day relevance. 7. Specificity (0–10): The precision and verifiability of claims — named individuals, specific dates, locations, document references, and technical details vs. vague generalities. Score 9–10 for items with highly specific, checkable claims. Use the score_item tool to return your analysis.
This prompt is versioned alongside the scoring rubric. The version number attached to each item tells you which version of this prompt was used to produce its scores. If the prompt changes materially, a new rubric version is created and all existing item scores remain attributed to the version that produced them.
6. How Claude returns its answer
Claude does not return free-form text. We use a feature of the Claude API called tool use (also called "function calling"). This forces Claude to produce a structured JSON response that conforms exactly to a schema we define. It cannot deviate from the format.
The tool is named score_item. Claude must return an object containing:
- A
title(max 80 characters) - A
description(one sentence, max 200 characters) - A
claimsarray — 3 to 7 key factual claims extracted from the content - A
scoresobject with all 7 dimension scores (each a number 0.0–10.0) and one paragraph of reasoning per dimension
If Claude fails to produce a valid tool call (which is rare but can happen with very short or ambiguous content), the analysis is marked as failed and the item is flagged for manual review. The error message is shown to admins.
Example output structure (abbreviated)
{
"title": "Grusch Congressional Testimony 2023",
"description": "Former intelligence officer claims US government operates UAP retrieval programs.",
"claims": [
"US government operates non-human intelligence retrieval programs",
"Programs conducted without Congressional oversight",
"Multiple additional whistleblowers with direct knowledge exist"
],
"scores": {
"credibility": 8.0,
"credibilityNote": "Grusch is a named, credentialed former NGA official...",
"integrity": 7.5,
"integrityNote": "Testimony is internally consistent across multiple hearings...",
...
}
} 7. The scoring formula
Signal Strength is computed from the seven dimension scores using a fixed weighted average. The weights reflect a deliberate editorial judgment: the quality of the claims matters more than the quality of the source, because even credible sources can make poorly-evidenced claims.
Formula
Source Quality (30% weight)
source_quality = (credibility + integrity) / 2 Claim Quality (70% weight)
claim_quality = (corroboration + consistency + cross_reference + recency + specificity) / 5 Signal Strength
signal = round((source_quality × 0.3 + claim_quality × 0.7) × 10) All dimension scores are 0–10. Signal Strength is the result scaled to 0–100%.
Example: an item scoring credibility 8, integrity 7, and all five claim dimensions averaging 6 would produce: source_quality = 7.5, claim_quality = 6.0, signal = round((7.5 × 0.3 + 6.0 × 0.7) × 10) = round(64.5) = 65%.
8. Dimension-by-dimension breakdown
Each dimension is scored 0–10. These are the exact definitions that appear in the system prompt above — reproduced here with score band examples for readability.
Source Quality · 15% of total signal
1. Credibility
The source's background, institutional position, verifiable track record, and incentive alignment. Higher scores for credentialed officials with institutional backing and no obvious financial motive.
Source Quality · 15% of total signal
2. Integrity
Internal consistency, absence of contradictions, acknowledgment of uncertainty, and verifiability of cited facts. Deducted for known false statements, retractions, or suspicious pattern changes.
Claim Quality · 14% of total signal
3. Corroboration
Degree to which independent sources confirm the same claims. Independence is critical — sources that cite each other do not count as independent.
Claim Quality · 14% of total signal
4. Internal Consistency
Whether claims align with or contradict the established evidentiary record. Also checks whether the document contradicts itself — are dates, names, and technical claims coherent throughout?
Claim Quality · 14% of total signal
5. Cross-reference
How well claims cross-reference with other known disclosure materials, official records, declassified documents, or scientific literature. Higher when specific named documents, dates, or individuals can be independently verified.
Claim Quality · 14% of total signal
6. Recency
How recent the information is and how relevant its timeframe is to current disclosure efforts. The post-2017 UAP Task Force era scores higher. Older documents can still score well if they are primary sources disclosed for the first time.
Claim Quality · 14% of total signal
7. Specificity
The precision and verifiability of claims. Named locations, exact dates, named personnel, document reference numbers, and technical parameters score highest. Vague assertions that cannot be tested or disproved score lowest.
9. Cross-referencing: the RAG pipeline
After each item is scored, suchsignal automatically searches the existing item corpus for semantically similar content and classifies the relationship between them. This is called retrieval-augmented generation (RAG) — retrieving relevant prior items and using AI to reason about how they relate.
How it works, step by step
- 1 An embedding vector is generated from the new item's title, description, and content using Cloudflare's Workers AI (bge-large-en-v1.5, 1024 dimensions). This numerical representation captures the item's semantic meaning.
- 2 The vector is queried against Cloudflare Vectorize — our vector database containing embeddings for all published items. The top 10 nearest neighbours by cosine similarity are retrieved.
- 3 Claude Haiku receives the new item's text alongside the titles, descriptions, and signal scores of the similar items. It classifies each relationship as one of: supports, contradicts, contextualises, or unrelated, with a one-sentence explanation.
- 4 The classified cross-references are stored and displayed on each item's public page, showing which other items it supports, contradicts, or provides context for. The new item's vector is also stored so future items can find it in turn.
Cross-referencing is non-fatal: if the vector database is unavailable or Haiku fails to classify a relationship, the item is still published without cross-references. Cross-references accumulate over time — they are not retroactively updated when new items are added.
The dimension score called "Cross-reference" in the scoring rubric (section 8, dimension 5) is separate from this pipeline. That score reflects Claude Sonnet's assessment of how well the item's own claims reference the external documentary record. The RAG pipeline compares items within suchsignal's corpus against each other.
10. Synthesis pipeline
Individual item scores tell you how strong each piece of evidence is in isolation. The synthesis pipeline asks a different question: what does the evidence collectively suggest? It runs automatically after each publication event and produces four outputs visible at /analysis/.
Step 1 — Claim clustering
All key claims extracted from published items are gathered. Claude Haiku groups them into thematic clusters — claims that address the same underlying question, even if they phrase it differently or come from different sources. Each cluster is assigned a label and a summary. Clusters with high-signal supporting items are distinguished from those with contradictory or low-signal evidence.
Step 2 — Hypothesis scoring
Five fixed hypotheses about UAP disclosure are scored against the full evidence base. For each hypothesis, Claude Sonnet identifies which published items support it, which contradict it, and computes a weighted support score using those items' Signal Strength values. Higher-signal items move the hypothesis score more than lower-signal ones. The hypotheses are fixed across rubric versions so scores are comparable over time.
Step 3 — Contradiction grouping
Pairs of claims that contradict each other — identified during cross-referencing and claim clustering — are grouped into contradiction sets. Each set is described with a plain-English summary of the conflict and a list of the items on each side. These are visible at /analysis/contradictions/.
Step 4 — Verdict generation
Claude Sonnet receives the hypothesis scores, claim clusters, contradiction groups, and a summary of the highest-signal items. It produces a structured Verdict: a markdown narrative, a confidence level, an evidentiary status, and a list of key findings. Each Verdict is versioned — previous verdicts are archived and remain accessible. The current Verdict is always at /analysis/verdict/.
The synthesis pipeline runs via a background job queue — it does not block publication. If it fails for any reason, the previous Verdict remains in place and synthesis retries on the next publication event. The synthesis state is visible to admins.
11. Articles and bias prevention
Articles are longer-form intelligence briefings that synthesise evidence across multiple items. They can be written manually by admins or generated automatically from a source URL.
URL-to-article generation
When an admin provides a URL, suchsignal fetches the page, strips its HTML to plain text, and sends the content to Claude Sonnet with instructions to produce a structured intelligence article: a title, description, full body in markdown, and a list of external references found in or related to the source.
Simultaneously, the article's content is embedded and queried against the item vector database to identify the most semantically relevant published items. These items are automatically selected as the article's "cited items" — the evidence base from which the aggregate signal is computed.
Why cited items are AI-locked for URL-generated articles
For articles generated from a URL, the cited items are selected by the AI and cannot be changed by an admin. This is a deliberate bias-prevention measure.
The aggregate signal displayed on an article is the mean Signal Strength across all cited items. If an admin could add or remove cited items after the fact, they could manipulate the aggregate signal to support a pre-determined narrative — cherry-picking high-signal items to make a weak article look well-supported, or excluding items that contradict the article's claims. AI selection at generation time, based purely on semantic similarity, removes this vector for editorial interference.
For manually-written articles (not generated from a URL), admins do select cited items themselves, because those articles are explicitly editorial in nature and the admin is already making a human judgment about what the article covers. The distinction is made clear on both the admin interface and the public article page.
At a glance
12. URL verification
Any URL associated with a piece of content — whether a source URL on an item, an external reference in an article, or a URL submitted for article generation — is automatically verified at the time of creation.
Verification sends an HTTP HEAD request to the URL (falling back to GET if HEAD returns 405). A URL is marked as available if it returns a 2xx or 3xx status code, and unavailable if it returns 4xx, 5xx, or fails to connect within 8 seconds. Up to 5 URLs are checked concurrently to avoid blocking the submission flow.
On public item and article pages, unavailable URLs are shown with an "URL unavailable" badge and the link text is struck through. This makes it immediately clear when a source can no longer be verified, without silently hiding it. The verification timestamp is stored so you know when the check was last performed.
URL checks are performed once at creation time. URLs are not continuously re-checked. A URL marked as available at creation may have gone offline since; one marked as unavailable may have been a transient error. Treat the verification status as a snapshot, not a live guarantee.
13. What the AI cannot do
These are hard limits. They are not caveats — they are things the system genuinely cannot assess. You should factor them into how you interpret every score on this site.
- ✗
Verify claims against real-time external sources
Claude has no internet access during analysis. It cannot look up whether a document exists, whether a person said what is attributed to them, or whether a claim has been contradicted or confirmed since its training cutoff.
- ✗
Analyze audio or video
Text, images, PDFs, and CSVs are supported. Audio recordings and video content cannot be analyzed directly. For media containing testimony, a transcript must be submitted instead.
- ✗
Know anything after its training cutoff
Claude's training data has a cutoff date. Events, disclosures, retractions, or corroborations that occurred after that date are invisible to it. The admin editorial review is the intended mitigation for this gap.
- ✗
Determine truth
The AI scores evidence quality, not truth. A sophisticated disinformation campaign that is internally consistent, well-sourced, and corroborated by other disinformation could score high. Signal Strength is a heuristic, not a verdict.
- ✗
Be fully consistent across identical inputs
Large language models are probabilistic. Running the same content through the same prompt twice may produce slightly different scores. We do not re-run analysis to "optimize" scores — the first run is published as-is.
- ✗
Guarantee cross-reference completeness
The RAG pipeline finds semantically similar items using vector embeddings — items that use different terminology to describe the same events may not surface as similar. Cross-references are probabilistic, not exhaustive. The synthesis pipeline may also miss connections that a domain expert would recognize.
14. Editorial overrides
Admin editors can attach a written note to any dimension on any published item. This note is displayed publicly directly beneath the AI's reasoning for that dimension. It is intended for cases where the AI's assessment is missing important context — for example, if a corroboration source was published after Claude's training cutoff, or if the AI missed a relevant contradiction in the public record.
Editorial override notes do not change the numeric score. The number remains exactly as Claude produced it. The note is an addition, not a correction. This preserves the integrity of the automated scoring system while still allowing human judgment to add value.
All override notes are attributed to "editorial" rather than a specific admin, to preserve editor privacy. The existence of overrides is always visible on the item page.
15. Community Trust Score
The Community Trust Score is entirely separate from the AI Signal Strength. It reflects how members of the public vote on an item's trustworthiness. Visitors can vote on the item overall and on each of the seven dimensions individually.
The two scores are displayed side by side, never merged or averaged. A large gap between them is itself informative — it may indicate that the public has knowledge the AI doesn't, or that the content is politically divisive, or that the scoring rubric is poorly calibrated for a particular type of source.
Community votes are rate-limited by IP address to limit gaming. The trust score is calculated as a net positive percentage: (upvotes / total votes) × 100.
16. Rubric versioning
The scoring rubric — the dimension definitions, score-band examples, and system prompt — is stored as a versioned record in our database. Every published item records which rubric version produced its scores. The current version is v1.0.
When the rubric changes materially (new dimensions, changed weights, revised score bands, or a new system prompt), the version number increments and a new version record is created. Historical scores are never retroactively recalculated — if you see a score produced under v1.0, it was produced under v1.0's exact rules. Comparisons between items scored under different rubric versions should be made with caution and are flagged on the item page when applicable.
Minor edits to the wording of this methodology page that do not change how scores are computed do not increment the rubric version. The synthesis pipeline, cross-referencing, article generation, and URL verification are operational features, not part of the scoring rubric — changes to them do not affect rubric versioning.