Accuracy proof page

Detection accuracy, explained with evidence and caveats.

ScamGuard is a WebyStudio product. This page shows the numbers behind message and URL detection, including held-out benchmarks, model checksums, Indian-script coverage, and where we still avoid over-claiming.

Review Evidence

Try a Scan

Message Model96.54%

18,696 held-out SMS samples

URL Model99.33%

35,999 held-out URL samples

Indian-Script Smoke Test15/15

9/9 (100%) scam recall, 6/6 (100%) legit precision

Training Samples1,42,117

1,05,939 real + 36,178 augmented

Detection Stack

How ScamGuard reaches a risk decision

Every scan is evaluated by several independent signals before ScamGuard produces a final risk explanation for the user.

Heuristic Signal Engine

Extracts risky entities, urgency language, payment cues, brand impersonation, and scam-category patterns before any model score is applied.

Machine Learning Classifier

Uses character n-grams and India-context SMS training data to catch altered wording, Hinglish, and regional scam phrasing.

NLP Assist Layer

Adds phishing and manipulation-tactic checks for polished messages that avoid obvious scam keywords.

Highest-Risk Wins

The strongest confident signal becomes the final risk explanation, so users see why ScamGuard raised the warning.

Model Reports

Benchmarks that are easy to scan

Message and URL detection are measured separately, with test sizes, benchmark scores, and model artifacts shown beside the claim.

Message Detection

SMS / Message Classification Model v5.2

96.54%Accuracy

93.19%Precision

95.84%Recall

94.50%F1 Score

12,496True NegativeLegit passed

406False PositiveLegit flagged

241False NegativeScam missed

5,553True PositiveScam caught

SHA-256: 48abaaa3c82c74707ddd1bfa485dc4294aa1aca80ff72ad3be9f387ff943b69d

URL Detection

URL Classification Model v4

99.33%Accuracy

98.23%Precision

99.10%Recall

0.9996ROC AUC

Headline URL accuracy includes many easy bare-domain examples. Path-rich legitimate URLs are harder, so ScamGuard treats this as a strong signal, not a safety guarantee.

2,03,993training URLs

35,999test URLs

SHA-256: 301cb136006c33a9382bf8fcb9fe789d87f1886b72a9e13d49916aad2c2bd584

Progression

Indian-script smoke test history

The smoke test tracks scam recall and legitimate-message precision across real Indian scripts, while keeping the small sample size visible.

v4.1 NLLBscam 5/9 · legit 2/6

7/15

v4.2 M2M100scam 9/9 · legit 0/6

9/15

v5.0 multilingualscam 9/9 · legit 6/6

15/15

v5.1 India expandedscam 9/9 · legit 6/6

15/15

v5.2 syntheticscam 9/9 · legit 6/6

15/15

Coverage

Indian language and scam-pattern coverage

ScamGuard separates smoke-verified languages, training-covered languages, and new v5.2 additions instead of treating all coverage as equal.

हिंदीHindiSmoke verified

தமிழ்TamilSmoke verified

বাংলাBengaliSmoke verified

తెలుగుTeluguSmoke verified

मराठीMarathiTraining covered

ગુજરાતીGujaratiTraining covered

ಕನ್ನಡKannadaTraining covered

മലയാളംMalayalamTraining covered

ਪੰਜਾਬੀPunjabiTraining covered

اردوUrduTraining covered

ଓଡ଼ିଆOdiaTraining covered

অসমীয়াAssameseNew v5.2

नेपालीNepaliNew v5.2

EnglishEnglishBenchmark covered

20 scam templates

KYC blockLottery prizeUPI fraudGovernment schemeIncome tax refundJob fraudCourier feeElectricity billTRAI SIM blockPersonal loanInsurance bonusAadhaar linkRBI noticeEPF withdrawalIRCTC refundFasTag KYCLegal noticePassport issuePAN misuseLPG subsidy

Evidence

What we can cite publicly

Published results include the sample size, the measurable outcome, and the limitation that should stay attached to the claim.

Evidence	Samples	Result	Limitation
SMS v5.2 held-out benchmark	18,696	96.54% accuracy, 95.84% recall	Good benchmark coverage, but real-world fraud wording will keep changing.
Indian-script smoke test	15	15/15 correct across scam and legitimate banking SMS	Small hand-crafted proxy; not a replacement for large native-labeled corpora.
URL model v4 held-out benchmark	35,999	99.33% accuracy, ROC AUC 0.9996	Easy bare-domain samples inflate headline accuracy; path-rich legitimate URLs remain harder.
South Indian language coverage	Smoke test + synthetic training	Covered in pipeline, but benchmark depth varies by language	Needs larger native-labeled Tamil, Telugu, Kannada, and Malayalam validation sets.

Honest Claim Policy

ScamGuard is a seatbelt, not a guarantee.

We publish checksums, sample sizes, and limitations. We do not claim 100% real-world scam detection, equal accuracy in every language, or guaranteed safety.

No guaranteed safety claims

Dataset size shown beside metrics

Model checksums included

Language limitations visible