Anon 12/08/2023 (Fri) 21:30 No.8980 del
(167.39 KB 1024x1122 1426109984058.png)
>>8979
Done
Bash implementation for: 4chan rendered thread webpage ctrl+a,ctrl+c,ctrl+v -> TXT file -> extractor -> text files per-post
[1]

Done, mostly
Bash implementation for: Desuarchive thread webpage source code -> HTML file -> extractor -> HTMLs files per-post. Proof/example (remote pin=w3s):
https://dweb.link/ipfs/bafybeifiub4ovlskqwosrrnj6r7ofwgbzhfi34kv2pso3j2wrfqtczdrum/desuarchive/mlp/thread/20278564/
[2]

Both methods can take like one minute or more to proccess a thread.

[1] see https://ipfs.filebase.io/ipfs/Qmb7pn6qDfb75QZx65W26JPiffZe5rGjJgpCC4R4hV6F4Y/4chan/mlp/thread/40219665/multiple_how.txt
[2] via
>$ curl -sL https://desuarchive.org/mlp/thread/20278564/ > 20278564.htm
>$ cat 20278564.htm | perl -pE "s/<div class=\"post stub stub_doc_id_/\n<div class=\"post stub stub_doc_id_/g" | perl -pE "s/^<aside class=\"posts\">\n//g" | tail -n +191 | head -n199919991999 | head -n -203 | sed "s/ <\/aside>//g" | xxd -p | tr -d \\n | sed "s/../&/g" | perl -pE "s0a/\n/g" | xargs -d "\n" sh -c 'for args do id=$(echo $args | sed "s/.*22%20%69%64%3d%22//g" | sed "s22.*//g" | sed "s///g" | xxd -p -r -); echo $args | sed "s//g" | xxd -p -r - > $id.html; done' _; rm .html
>$ # this partly helped: https://code.whatever.social/questions/296536/how-to-urlencode-data-for-curl-command # 337 posts incl. OP & ls | wc -l = 339 & diff. of 2 = how file and complete thread file

Message too long. Click here to view full text.