Skip to content

fix: Various link parsing issues

mjk requested to merge mjk/gajim:fix-link-parsing into master

Fixes #11218 (closed), #11144 (closed) partially addresses #11145 (no longer parses _http as a scheme).

The regular expression at the core of the parser now should find all syntactically-valid generic URIs and all JIDs (including many invalid according to PRECIS). Match results are then slightly adjusted/filtered procedureally to make some common cases work and some annoying/useless cases -- not.

Small architectural drive-by change: removed class Address(BaseUri) which was used for scheme-less foo@bar things. They go to XMPPAddress now and get formatted into an xmpp: URI, which is displayed as a tooltip over the unchanged foo@bar hyperlink.

Known regressions: tokens starting with www. previously were considered as http links, now they aren't. A more generic approach for turning domain names with official TLDs into https links might be desirable as a later improvement.

TODO:

  • Improve tests some more (splice bare URIs and JIDs into a piece of text and call _parse_uris instead of using regexes directly)
Edited by Philipp Hörist

Merge request reports