[FIX] BMP for MUC nicks, dip toes in vCard avatars

Can, your, Bifrost, Do, That. -lh
This commit is contained in:
LDA 2024-09-21 13:18:59 +02:00
commit 6167732e83
6 changed files with 378 additions and 17 deletions

View file

@ -1,22 +1,25 @@
# Parsee - the jealous XMPP<=>Matrix bridge # Parsee - the jealous XMPP<=>Matrix bridge
Parsee is a Matrix<=>XMPP bridge written in C99, with Cytoplasm, similar to Bifrost, but it is Parsee is a Matrix<=>XMPP bridge written in C99, with Cytoplasm, similar to Bifrost, but it is
NOT a drop-in replacment. NOT a drop-in replacment.
Currently, it is *alpha* stage, which means that I wouldn't recommend using this in production, as I can change anything, at any time. Currently, it is *alpha* stage, which means that I wouldn't recommend using this in production,
as I can change anything, at any time, and it may behave strangely at times.
## Why? ## Why?
### Naming ### Naming
The name 'Parsee' is actually a reference to [Parsee Mizuhashi](https://en.touhouwiki.net/wiki/Parsee_Mizuhashi), The name 'Parsee' is actually a reference to [Parsee Mizuhashi](https://en.touhouwiki.net/wiki/Parsee_Mizuhashi),
a "*bridge* princess". a "*bridge* princess". The other name you actually can sometimes see explains itself, so I won't
be talking about it.
### Reasoning (personal to LDA) ### Reasoning (personal to LDA)
I hate Bifrost. I also wanted to dip my toes in XMPP, XML, and bridges a bit. Also, as a sister I hate Bifrost. I also wanted to dip my toes in XMPP, XML, and bridges a bit. Also, as a sister
project to KappaChat, this means that I can integrate Parsee with KappaChat however I wish it project to KappaChat, this means that I can integrate Parsee with KappaChat however I wish it
to be, which allows me to mess around with a codebase I'm already familiar with. to be, which allows me to mess around with a codebase I'm already familiar with.
A more "up-to-date" reason may be to have a small, 'Just Werks' bridging solution *that is good*. A more "up-to-date" reason may be to have a small, 'Just Werks' bridging solution *that is good*,
and maybe as a testing ground for Cytoplasm features I sometimes add.
Well, I'm *trying* to do that, at least. (Well, I'm *trying* to do that, at least.
Please scream at me if that fails(or just doesn't run on a overclocked Raspberry Please scream at me if that fails(or just doesn't run on a overclocked Raspberry
Pi 4B, which, by the way, is literally where Parsee+XMPP is running for now.) Pi 4B, which, by the way, is literally where Parsee+XMPP is running for now.))
### "Why not just use Matrix lol" ### "Why not just use Matrix lol"
### "Why not just use XMPP lol" ### "Why not just use XMPP lol"
@ -26,7 +29,7 @@ a bridge may be a good way to start.
## BUILDING ## BUILDING
```sh ```sh
$ cc configure.c -o configure $ cc configure.c -o configure # that or use tcc -run to consolidate these two steps.
$ ./configure # use -s if you want static Parsee+MbedTLS, use -s -l if LMDB is needed $ ./configure # use -s if you want static Parsee+MbedTLS, use -s -l if LMDB is needed
$ make $ make
$ make [PREFIX=...] install # run as root if on a protected dir like /usr $ make [PREFIX=...] install # run as root if on a protected dir like /usr
@ -65,17 +68,21 @@ returns with a landing page, then this side works. You can read it for some more
## DOCS ## DOCS
Currently, the main sources of documentation are the Ayadocs(for headers) and the manpages Currently, the main sources of documentation are the Ayadocs(for headers) and the manpages
(see `etc/man`) (see `etc/man`).
## TODOS before 1.0 rolls around ## TODOS before 1.0 rolls around
- PROPER FUCKING AVATARS - Make Parsee go *vroooooooooommmmmmm*, by NOT asking the server constantly
about what is available and what is not, as that is a source of latency, and
thus slowdowns.
- PROPER FUCKING VCARD AVATARS
XMPP->Matrix is decent, Matrix->XMPP is effectively a WIP XMPP->Matrix is decent, Matrix->XMPP is effectively a WIP
- Add [libomemo](https://github.com/gkdr/libomemo) or something as an optional dependency. - Add [libomemo](https://github.com/gkdr/libomemo) or something as an optional dependency.
- It depends on more stuff anyways, and I don't want to weigh down the - It depends on more stuff anyways, and I don't want to weigh down the
dependency list of Parsee for that. dependency list of Parsee for that.
- Matrix's libolm is deprecated. They replaced it with a Rust version that - Matrix's libolm is deprecated. They replaced it with a Rust version that
pulls in *way too many* dependencies, and that lacks a C binding. We may pulls in *way too many* dependencies, and that lacks a C binding. We may
put in the work of either forking off libolm or making a binding to KappaChat. ~~put in the work of either forking off libolm or~~ be making a binding with
KappaChat(when I get around to remaking UI :p).
- Josh did infact tell me that maybe C bindings may happen. I'd be - Josh did infact tell me that maybe C bindings may happen. I'd be
willing to help out, but IDK. In any case, this will at best be an willing to help out, but IDK. In any case, this will at best be an
extension packagers may integrate properly. extension packagers may integrate properly.

View file

@ -12,6 +12,7 @@
#include <signal.h> #include <signal.h>
#include <stdlib.h> #include <stdlib.h>
#include <Unistring.h>
#include <Parsee.h> #include <Parsee.h>
#include <XMPP.h> #include <XMPP.h>
#include <AS.h> #include <AS.h>
@ -71,6 +72,18 @@ Main(Array *args, HashMap *env)
start = UtilTsMillis(); start = UtilTsMillis();
{
Unistr *s = UnistrCreate("Array 日本語🌋");
size_t i;
for (i = 0; i < UnistrSize(s); i++)
{
uint32_t cp = UnistrGetch(s, i);
Log(LOG_INFO, "%X", cp);
}
UnistrFree(s);
}
memset(&conf, 0, sizeof(conf)); memset(&conf, 0, sizeof(conf));
Log(LOG_INFO, Log(LOG_INFO,
"%s - v%s[%s] (Cytoplasm %s)", "%s - v%s[%s] (Cytoplasm %s)",

View file

@ -8,6 +8,7 @@
#include <stdlib.h> #include <stdlib.h>
#include <StanzaBuilder.h> #include <StanzaBuilder.h>
#include <Unistring.h>
#include <Matrix.h> #include <Matrix.h>
#include <AS.h> #include <AS.h>
@ -21,10 +22,17 @@ JoinMUC(ParseeData *data, HashMap *event, char *jid, char *muc, char *name)
{ {
char *sender = GrabString(event, 1, "sender"); char *sender = GrabString(event, 1, "sender");
char *nick = StrDuplicate(name); Unistr *uninick = UnistrCreate(name);
Unistr *filtered = UnistrFilter(uninick, UnistrIsBMP);
char *nick = UnistrC(filtered);
char *rev = StrConcat(3, muc, "/", nick); char *rev = StrConcat(3, muc, "/", nick);
int nonce = 0; int nonce = 0;
Log(LOG_DEBUG, "MUCJOINER: filtered '%s' to '%s'", name, nick);
UnistrFree(uninick);
UnistrFree(filtered);
while (!XMPPJoinMUC(data->jabber, jid, rev, true) && nonce < 32) while (!XMPPJoinMUC(data->jabber, jid, rev, true) && nonce < 32)
{ {
char *nonce_str = StrInt(nonce); char *nonce_str = StrInt(nonce);

225
src/Unistr.c Normal file
View file

@ -0,0 +1,225 @@
#include <Unistring.h>
#include <Cytoplasm/Memory.h>
#include <Cytoplasm/Str.h>
#include <Cytoplasm/Log.h>
#include <string.h>
struct Unistr {
size_t length;
uint32_t *codepoints;
};
void
UnistrAddch(Unistr *unistr, uint32_t u)
{
if (!unistr || !u)
{
return;
}
unistr->length++;
unistr->codepoints = Realloc(
unistr->codepoints,
unistr->length * sizeof(*unistr->codepoints)
);
unistr->codepoints[unistr->length - 1] = u;
}
static bool
UTFIsN(char *off, size_t available, int n, uint8_t pc)
{
int i;
uint8_t *offu = (uint8_t *) off;
if ((available < n) || ((*offu >> (8-n-1)) != pc))
{
return false;
}
for (i = 0; i < n - 1; i++)
{
if ((offu[i+1] >> 6) != 0b10)
{
return false;
}
}
return true;
}
Unistr *
UnistrCreate(char *src)
{
size_t len, i;
Unistr *str;
char *start;
if (!src)
{
return NULL;
}
len = strlen(src);
str = Malloc(sizeof(*str));
str->length = 0;
str->codepoints = NULL;
/* We can't just set the length to {len}. */
for (i = 0; i < len; i++)
{
char byte = src[i];
size_t available = len - i;
if ((byte & 0x80) == 0)
{
/* This is a regular codepoint */
UnistrAddch(str, byte & 0x7F);
continue;
}
else if (UTFIsN(&src[i], available, 2, 0b110))
{
char a = src[i+0] & 0b00011111;
char b = src[i+1] & 0b00111111;
uint32_t u = (a << (6 * 1)) | b;
/* Overlongs are errors. */
if (u < 0x0080 || u > 0x07FF)
{
UnistrFree(str);
return NULL;
}
UnistrAddch(str, u);
i += 2 - 1;
continue;
}
else if (UTFIsN(&src[i], available, 3, 0b1110))
{
char a = src[i+0] & 0b00001111;
char b = src[i+1] & 0b00111111;
char c = src[i+2] & 0b00111111;
uint32_t u =
(a << (6 * 2)) |
(b << (6 * 1)) |
(c << (6 * 0)) ;
/* Overlongs are errors. */
if (u < 0x0800 || u > 0xFFFF)
{
UnistrFree(str);
return NULL;
}
UnistrAddch(str, u);
i += 3 - 1;
continue;
}
else if (UTFIsN(&src[i], available, 4, 0b11110))
{
char a = src[i+0] & 0b00000111;
char b = src[i+1] & 0b00111111;
char c = src[i+2] & 0b00111111;
char d = src[i+3] & 0b00111111;
uint32_t u =
(a << (6 * 3)) |
(b << (6 * 2)) |
(c << (6 * 1)) |
(d << (6 * 0)) ;
/* Overlongs are errors. */
if (u < 0x10000 || u > 0x10FFFF)
{
UnistrFree(str);
return NULL;
}
UnistrAddch(str, u);
i += 4 - 1;
continue;
}
}
return str;
}
void
UnistrFree(Unistr *unistr)
{
if (!unistr)
{
return;
}
Free(unistr->codepoints);
Free(unistr);
}
char *
UnistrC(Unistr *unistr)
{
char *ret, *tmp, *utf;
size_t i;
if (!unistr)
{
return NULL;
}
ret = NULL;
for (i = 0; i < unistr->length; i++)
{
uint32_t code = unistr->codepoints[i];
utf = StrUtf8Encode(code);
tmp = ret;
ret = StrConcat(2, ret, utf);
Free(tmp);
Free(utf);
}
return ret;
}
size_t
UnistrSize(Unistr *unistr)
{
return unistr ? unistr->length : 0;
}
uint32_t
UnistrGetch(Unistr *unistr, size_t i)
{
if (!unistr)
{
return 0;
}
return i < unistr->length ? unistr->codepoints[i] : 0;
}
bool
UnistrIsBMP(uint32_t u)
{
if (u == 0)
{
return NULL;
}
return u <= 0xFFFF;
}
Unistr *
UnistrFilter(Unistr *str, UnistrFilterFunc filter)
{
Unistr *unistr;
size_t i;
if (!str || !filter)
{
return NULL;
}
unistr = UnistrCreate("");
for (i = 0; i < UnistrSize(str); i++)
{
uint32_t code = UnistrGetch(str, i);
if (!filter(code))
{
continue;
}
UnistrAddch(unistr, code);
}
return unistr;
}

View file

@ -356,14 +356,14 @@ IQGet(ParseeData *args, XMLElement *stanza, XMPPThread *thr)
} }
else if (XMLookForTKV(stanza, "vCard", "xmlns", "vcard-temp")) else if (XMLookForTKV(stanza, "vCard", "xmlns", "vcard-temp"))
{ {
Log(LOG_INFO, "vCard information GET for %s", to); char *to_matrix = ParseeGetBridgedUser(args, stanza);
char *name = ASGetName(args->config, NULL, to_matrix);
XMLElement *iqVCard;
Log(LOG_DEBUG, "vCard information GET for %s", to);
/* TODO: "a compliant server MUST respond on behalf of the
* requestor and not forward the IQ to the requestee's
* connected resource". */
if (!strncmp(to, "parsee@", 7)) if (!strncmp(to, "parsee@", 7))
{ {
XMLElement *iqVCard = XMLCreateTag("iq"); iqVCard = XMLCreateTag("iq");
XMLAddAttr(iqVCard, "from", to); XMLAddAttr(iqVCard, "from", to);
XMLAddAttr(iqVCard, "to", from); XMLAddAttr(iqVCard, "to", from);
XMLAddAttr(iqVCard, "id", id); XMLAddAttr(iqVCard, "id", id);
@ -394,7 +394,49 @@ IQGet(ParseeData *args, XMLElement *stanza, XMPPThread *thr)
StreamFlush(jabber->stream); StreamFlush(jabber->stream);
pthread_mutex_unlock(&jabber->write_lock); pthread_mutex_unlock(&jabber->write_lock);
XMLFreeElement(iqVCard); XMLFreeElement(iqVCard);
Free(to_matrix);
Free(name);
return;
} }
iqVCard = XMLCreateTag("iq");
XMLAddAttr(iqVCard, "from", to);
XMLAddAttr(iqVCard, "to", from);
XMLAddAttr(iqVCard, "id", id);
XMLAddAttr(iqVCard, "type", "result");
{
XMLElement *vCard = XMLCreateTag("vCard");
char *mto_link = ParseeGenerateMTO(to_matrix);
XMLAddAttr(vCard, "xmlns", "vcard-temp");
{
XMLElement *fn = CreateTagWithText(
"FN", name ? name : to_matrix
);
XMLElement *nick = CreateTagWithText(
"NICKNAME", to_matrix
);
XMLElement *url = CreateTagWithText(
"URL", mto_link
);
/* TODO: Maybe abstract the vCard code. */
/* TODO: Make a function to just get a user's avatar
* automatically. */
XMLAddChild(vCard, nick);
XMLAddChild(vCard, url);
XMLAddChild(vCard, fn);
Free(mto_link);
}
XMLAddChild(iqVCard, vCard);
}
pthread_mutex_lock(&jabber->write_lock);
XMLEncode(jabber->stream, iqVCard);
StreamFlush(jabber->stream);
pthread_mutex_unlock(&jabber->write_lock);
Free(to_matrix);
Free(name);
} }
#define PS "http://jabber.org/protocol/pubsub" #define PS "http://jabber.org/protocol/pubsub"
else if ((pubsub = XMLookForTKV(stanza, "pubsub", "xmlns", PS))) else if ((pubsub = XMLookForTKV(stanza, "pubsub", "xmlns", PS)))
@ -417,8 +459,7 @@ IQGet(ParseeData *args, XMLElement *stanza, XMPPThread *thr)
b64 = Base64Encode(buf, len); b64 = Base64Encode(buf, len);
Free(buf); Free(buf);
Log(LOG_INFO, "FM=%s", to_matrix); Log(LOG_DEBUG, "IQ-GET: PUBSUB AVATAR OF=%s", to_matrix);
Log(LOG_INFO, "B=%s (%dB)", b64, (int) len);
/* Strike back with a response */ /* Strike back with a response */
reply = XMLCreateTag("iq"); reply = XMLCreateTag("iq");
XMLAddAttr(reply, "type", "result"); XMLAddAttr(reply, "type", "result");

67
src/include/Unistring.h Normal file
View file

@ -0,0 +1,67 @@
#ifndef PARSEE_UNISTRING_H
#define PARSEE_UNISTRING_H
/*-* <p>A basic datastructure to handle Unicode strings easily.</p>
* <p>Mainly used because dealing with UTF-8 directly may be an
* annoyance, and it may be used as a base for Cytoplasm's own
* string management</p>
* --------
* Written-By: LDA
* License: CC0 */
#include <stdbool.h>
#include <stdlib.h>
#include <stdint.h>
/* An opaque structure for a Unistring */
typedef struct Unistr Unistr;
/** Decodes an UTF-8 string into a separate Unistr.
* -------
* Returns: a valid Unistr[HEAP] | NULL
* Thrasher: UnistrFree */
extern Unistr * UnistrCreate(char *src);
/** Returns the length of an unistring.
* ----------
* Returns: the unistring's length | NULL */
extern size_t UnistrSize(Unistr *unistr);
/** Returns the character of an unistring at a location,
* or 0 if it is inaccessible.
* ----------
* Returns: The Unicode codepoint of a specific 0-index | 0 */
extern uint32_t UnistrGetch(Unistr *unistr, size_t i);
/** Adds a singular codepoint to a unistring(IFF not 0 and valid).
* -------------
* Returns: NOTHING
* Modifies: unistr */
extern void UnistrAddch(Unistr *unistr, uint32_t u);
/** Encodes a unistring into a C UTF-8 string
* --------------
* Returns: a valid NULL-terminated string[HEAP] | NULL
* Thrasher: Free */
extern char * UnistrC(Unistr *unistr);
/** Destroys all memory associated with a unistring.
* ----------
* Returns: NOTHING
* Thrashes: {unistr} */
extern void UnistrFree(Unistr *unistr);
/** Returns true IFF the character is within the unicode BMP and
* not 0x0000
* ------------------------------------------------------------
* Returns: whenever the character is within the BMP */
extern bool UnistrIsBMP(uint32_t u);
typedef bool (*UnistrFilterFunc)(uint32_t u);
/** "Filters" characters in a Unistring by codepoint, removing
* those with callbacks which return false into a new unistring.
* --------------------
* Returns: a new unistring with filtered characters removed */
extern Unistr * UnistrFilter(Unistr *str, UnistrFilterFunc filter);
#endif