ChaoticNeutralCzech

joined 5 months ago
[–] ChaoticNeutralCzech 37 points 1 month ago (4 children)

Weird Al actually licenses the songs for parodying because free use is thin ice.

[–] ChaoticNeutralCzech 3 points 1 month ago* (last edited 1 month ago) (2 children)

I know that similar computational problems use indexing and vector-space representation but how would you build an index of TiBs of almost-random data that makes it faster to find the strictly closest match of an arbitrarily long sequence? I can think of some heuristics, such as bitmapping every occurrence of any 8-pair sequence across each kibibit in the list. A query search would then add the bitmaps of all 8-pair sequences within the query including ones with up to 2 errors, and using the resulting map to find "hotspots" to be checked with brute force. This will decrease the computation and storage access per query but drastically increase the storage size, which is already hard to manage.

However, efficient fuzzy string matching in giant datasets is an interesting problem that computer scientists must have encountered before. Can you find a good paper that works well with random, non-delimited data instead of just using the approach of word-based indices for human languages like Lucene and OpenFTS?

[–] ChaoticNeutralCzech 5 points 1 month ago* (last edited 1 month ago)

What kind of image is it? Reducing the number of colors in a PNG is usually inferior to JPEG compression. It can be OK for screenshots of texts and simple drawing but otherwise you're better off with lossy JPEG or WebP.

Some instance admins don't even use the built-in pict-rs server so media cannot be uploaded natively at all.

Some only allow month-old accounts to post images up to 500 kiB.

Others leave the limits on the relatively high defaults: 10 MiB per file and up to 900 frames for soundless animation/video, which must be in WebM format and VP9 codec (or it will need to be reencoded, which usually fails because of the short timeout). It's easy to use ffmpeg or HandBrake to create low-bitrate 30-second HD videos that fit but the limits are not visible to users and even the defaults are nowhere to be found in Lemmy documentation, I had to read the source code.

[–] ChaoticNeutralCzech 108 points 1 month ago* (last edited 1 month ago) (19 children)

A bot strips away all spaces and letters that aren't A, T, C or G, then treats the rest like a genetic sequence and checks it against some database.

Presumably, it runs through many terabytes of data for each comment, as the Gallinula chloropus alone has about 51 billion base pairs, or some 15 GiB. The Genome Ark DB, which has sequences of two common moorhens, contains over 1 PiB. I wonder if a bored sequencing lab employee just wrote it to give their database and computing servers something to do when there is no task running.

No, I won't download the genome and check how close the "closest match" is but statistically, 93 base pairs are expected to recur every 2^186^ bits or once per 10^40^ PiB. By evaluating the function (4-1)^m^ × mℂ93 ≥ 4^93^ ÷ (pebi × 8), one can expect the 93-base sequence to appear at least once in a 1 PiB database if m ≥ 32 mismatches or over ⅓ are allowed. Not great.

This assumes true randomness, which is not true of naturally occuring DNA nor letters in English text, but should be in the right ballpark. Maybe fewer if you account for insertions/deletions.

[–] ChaoticNeutralCzech 11 points 1 month ago* (last edited 1 month ago) (1 children)

Two islands, divide them by sex. If you don't, they will eventually overpopulate and start colonizing places like they've been in the last 1000 years.

[–] ChaoticNeutralCzech 10 points 1 month ago* (last edited 1 month ago)

Shift + RClick to force the browser's native context menu instead of triggering a JavaScript event.

Ctrl + Shift + E (and then perhaps Ctrl + F5) to see URLs of resources.

[–] ChaoticNeutralCzech 3 points 1 month ago (1 children)

You're going to hate wojak comics

[–] ChaoticNeutralCzech 5 points 1 month ago* (last edited 1 month ago) (1 children)

I don't see trigger zones for the chair and the doorway. Is the other room and full-res view out of the window in memory at the same time??

[–] ChaoticNeutralCzech 2 points 1 month ago

Engine back guarantee. We'll land within 12 inches or you get the engines back!

[–] ChaoticNeutralCzech 1 points 1 month ago* (last edited 1 month ago)
BIN 00011001
OCT 31 <- C   I   C   D   N   E
DEC 25 <-   O   N   I   E   C   ?
HEX 19
[–] ChaoticNeutralCzech 2 points 1 month ago

Found the 14yo

view more: ‹ prev next ›