fix: Various link parsing issues
Fixes #11218 (closed), #11144 (closed) partially addresses #11145 (no longer parses _http
as a scheme).
The regular expression at the core of the parser now should find all syntactically-valid generic URIs and all JIDs (including many invalid according to PRECIS). Match results are then slightly adjusted/filtered procedureally to make some common cases work and some annoying/useless cases -- not.
Small architectural drive-by change: removed class Address(BaseUri)
which was used for scheme-less foo@bar
things. They go to XMPPAddress
now and get formatted into an xmpp:
URI, which is displayed as a tooltip over the unchanged foo@bar
hyperlink.
Known regressions: tokens starting with www.
previously were considered as http links, now they aren't. A more generic approach for turning domain names with official TLDs into https links might be desirable as a later improvement.
TODO:
-
Improve tests some more (splice bare URIs and JIDs into a piece of text and call _parse_uris
instead of using regexes directly)