Anon
12/08/2023 (Fri) 21:30
No.
8980
del
Open File
(
167.39 KB
1024x1122
1426109984058.png
)
>>8979
Done
Bash implementation for: 4chan rendered thread webpage ctrl+a,ctrl+c,ctrl+v -> TXT file -> extractor -> text files per-post
[1]
Done, mostly
Bash implementation for: Desuarchive thread webpage source code -> HTML file -> extractor -> HTMLs files per-post. Proof/example (remote pin=w3s):
https://dweb.link/ipfs/bafybeifiub4ovlskqwosrrnj6r7ofwgbzhfi34kv2pso3j2wrfqtczdrum/desuarchive/mlp/thread/20278564/
[2]
Both methods can take like one minute or more to proccess a thread.
[1] see
https://ipfs.filebase.io/ipfs/Qmb7pn6qDfb75QZx65W26JPiffZe5rGjJgpCC4R4hV6F4Y/4chan/mlp/thread/40219665/multiple_how.txt
[2] via
>$ curl -sL
https://desuarchive.org/mlp/thread/20278564/
> 20278564.htm
>$ cat 20278564.htm | perl -pE "s/<div class=\"post stub stub_doc_id_/\n<div class=\"post stub stub_doc_id_/g" | perl -pE "s/^<aside class=\"posts\">\n//g" | tail -n +191 | head -n199919991999 | head -n -203 | sed "s/ <\/aside>//g" | xxd -p | tr -d \\n | sed "s/../
&/g" | perl -pE "s
0a/\n/g" | xargs -d "\n" sh -c 'for args do id=$(echo $args | sed "s/.*
22%20%69%64%3d%22//g" | sed "s
22.*//g" | sed "s/
//g" | xxd -p -r -); echo $args | sed "s
//g" | xxd -p -r - > $id.html; done' _; rm .html
>$ # this partly helped:
https://code.whatever.social/questions/296536/how-to-urlencode-data-for-curl-command
# 337 posts incl. OP & ls | wc -l = 339 & diff. of 2 = how file and complete thread file
Message too long. Click
here
to view full text.