Permabooru Anon 07/09/2024 (Tue) 06:55 No.10645 del
>>10644
http://owmvhpxyisu6fgd7r2fcswgavs7jly4znldaey33utadwmgbbp4pysad.onion/posts/1/series:my%20little%20pony:%20friendship%20is%20magic
only 349,918 images, and I'm guessing that that is like 300 GB. Last page as of now:
http://owmvhpxyisu6fgd7r2fcswgavs7jly4znldaey33utadwmgbbp4pysad.onion/posts/14580/series:my%20little%20pony:%20friendship%20is%20magic

About http://owmvhpxyisu6fgd7r2fcswgavs7jly4znldaey33utadwmgbbp4pysad.onion/posts/2/my_little_pony
. links to http://owmvhpxyisu6fgd7r2fcswgavs7jly4znldaey33utadwmgbbp4pysad.onion/post/166768/bafybeif6izzainbeoxekd73mn3sgzpyq7xniutr6arwpcnp22gx4lc22se
.. artist is marked as "Do Not Support" due to paywall(s): http://owmvhpxyisu6fgd7r2fcswgavs7jly4znldaey33utadwmgbbp4pysad.onion/dns/7
.. odd CID: http://xbzszf4a4z46wjac7pgbheizjgvwaf3aydtjxg7vsn3onhlot6sppfad.onion/ipfs/zb2rhY77vBcF3gVzsZDtY1SfitgQQ4RYCCry1rwcPduQusoSH
... imported data from http...?format=car = "bafkreiavxpqylcfik7wyfkfr42kmnud2oeojqqceozmsedbrl6qym43zui"
.. not seen as ipfs-online: http://xbzszf4a4z46wjac7pgbheizjgvwaf3aydtjxg7vsn3onhlot6sppfad.onion/ipfs/bafybeif6izzainbeoxekd73mn3sgzpyq7xniutr6arwpcnp22gx4lc22se
... based on not showing up after 3m30s of running "ipfs dag export $cid>/dev/null"
... IPLD/chunks: 262144-byte raw blocks (256K bafkrei...)

WARC (not spanning hosts):
$ seq 1 14580 | sed "s/^/http:\/\/owmvhpxyisu6fgd7r2fcswga\
vs7jly4znldaey33utadwmgbbp4pysad.onion\/posts\//g" | sed "s/$/\/\
series:my%20little%20pony:%20friendship%20is%20magic/g" > 14580pp.txt
$ TZ=UTC torsocks wget --input-file=14580pp.txt --level=1 \
--adjust-extension --convert-links --restrict-file-names=windows \
--warc-max-size=99123456 --warc-cdx -e robots=off --warc-file=owm\
vhpxyisu6fgd7r2fcswgavs7jly4znldaey33utadwmgbbp4pysad.onion \
1>1wget1.txt 2>1wget2.txt
With level=1 or level=2, I think it downloads the pages/files in the input-file first before recursively downloading any outlinks. Next: get CIDs and stuff; run '$ cat series* | grep "<a h r e f=\"/post/"' and whatever. This, but headless:
$ cat series* | grep "<a h r e f=\"/post/" | head | sed "s/.*\///g"\
 | sed "s/\".*//g" | xargs -d "\n" sh -c 'for h do torsocks curl -sL \
http://xbzszf4a4z46wjac7pgbheizjgvwaf3aydtjxg7vsn3onhlot6sppfad.onion\
/ipfs/$h?format=car | ipfs dag import --stats --pin-roots=false; ipfs\
 files cp /ipfs/$h /a/tmp/owmvhpxyisu6fgd7r2fcswgavs7jly4znldaey33uta\
dwmgbbp4pysad.onion/$h; done' _ # could check if exists before
couldn't post "h r e f" without spaces above; images from permabooru (potpony isn't my fetish)