An explanation of how Greasy Fork镜像's code similarity check works

JasonBarnabeMod

Posted: 15.7.2025

Report comment

In case you're interested, this Computerphile video gives an overview of the logic Greasy Fork镜像 uses. Was a little surprised that the stupid trick I came up with is actually a real computer science thing.

Azazello

Posted: 16.7.2025

Report comment

Interesting. Thanks for sharing.

A very simplistic question about the "look-behind" limit...

Won't - and Aren't - script-trolls and -grifters going to get around the check by making their scripts consistently larger than the 32 KB/KiB limit? (I imagine they can do this by just adding more&more bog-standard comments.)

I'm sure you're seeing this growth happening in the past 2 years with the increase use of 'AI' tools to create scripts.

I assume there must be a practical reason why 'simply' increasing the size of the compression window becomes problematic.

JasonBarnabeMod

Posted: 16.7.2025

Report comment

The 32KB window is built into gzip; it's not a configurable parameter. I've given some thought to switching the whole thing to use brotli instead of gzip, as brotli has a configurable window allowing up to 16MB. I'm not sure of the performance implications of doing this, and I'm not sure in what way this would skew the existing scores (from just switching to brotli, even with the same sized window).

You can't evade the current system by adding lots of comments. Two checks are performed on each comparison - the code as submitted and the "cleaned" code, which is run through terser and prettier. This process removes comments. You could however add non-functional code, or simply large blocks of CSS or data URI images. I don't think I've seen scammers do this specifically to get past this automated check yet. They potentially could, but doing so might raise other alarms, whether on manual check or a different automated check.

An explanation of how Greasy Fork镜像's code similarity check works

Post reply