Christopher 3122124781 🔤 Update Unicode scanner to v2026.03.0 (#73)
* 🔤 Update Unicode scanner to v2026.03.0

refactor: Expand common Unicode exclusions and improve
documentation

Update version to 2026.03.0 across all references. Enhance the
--exclude-common flag to cover additional typography characters
including soft hyphens, superscripts, subscripts, Roman numerals,
combining diacritical marks, and the replacement character. These
additions reduce false positives in documentation and markdown
repositories while maintaining security against actual threats.
Improve help text clarity for the --exclude-common option.

* fix:🔧 Correct Roman numeral range and remove false positives

Correct the Roman numeral Unicode range from U+2179 to U+217F to
include the complete set of Roman numeral characters. Remove checks
for combining diacritical marks (U+0300-U+030C) and the replacement
character (U+FFFD) as these are not security threats and cause false
positives in legitimate text processing.

* 🔧 refactor: improve docs and version display

Update help text to clarify --exclude-common behavior and
mention soft hyphen exclusion. Replace hardcoded version
string with VERSION variable for dynamic version display
in header output.

* 🔤 docs: clarify exclude-common flag behavior

Update help text to clarify that --exclude-common also suppresses
AI-confusion and homograph checks. Fix alignment of header banner
text to center properly within the box borders.

* 🔤 refactor: improve Unicode detection logic

Reorganize help text for better readability by rewrapping
lines at 72 characters. Remove soft hyphen from common
Unicode exclusions and refine Roman numeral detection to
exclude Latin-lookalike confusables (I, VI, X, v, x) while
maintaining detection of other Roman numeral characters.

* 🔤Expand Unicode exclusion patterns

Expand the --exclude-common option to cover additional
common Unicode characters including common spaces, angle
quotes, and per mille sign. Update documentation and add
clarifying notes about superscript character coverage to
reduce false positives in documentation and markdown
repositories.

* fix: 🔧 Expand subscript digit range to U+2089

Extend the subscript digits Unicode range from U+2080-U+2084
to U+2080-U+2089 to include all subscript digits. Update the
regex pattern from ^208[0-4]$ to ^208[0-9]$ to match the
complete range of subscript digit characters.

* docs: 📝 Add clarification on subscript digits range

Add explanatory comment to clarify that the subscript digits
regex pattern ^208[0-9]$ covers Unicode range U+2085-U+2089,
which are not currently included in the harmful_patterns list.
This documents the intentional broader matching for future
compatibility.
2026-03-10 22:12:45 -05:00
2025-11-17 08:47:14 -08:00
2023-08-14 21:39:57 -05:00
2023-10-27 11:00:31 -05:00

BigBearScripts

YouTube Link: https://www.youtube.com/@bigbeartechworld

BigBearCommunity

If you have a suggestion for a video, post in the BigBearCommunity.

Support My Work

ko-fi

Description
No description provided
Readme MIT 1 MiB
Languages
Shell 99.2%
JavaScript 0.8%