$ seq 1 14580 | sed "s/^/http:\/\/owmvhpxyisu6fgd7r2fcswga\ vs7jly4znldaey33utadwmgbbp4pysad.onion\/posts\//g" | sed "s/$/\/\ series:my%20little%20pony:%20friendship%20is%20magic/g" > 14580pp.txt $ TZ=UTC torsocks wget --input-file=14580pp.txt --level=1 \ --adjust-extension --convert-links --restrict-file-names=windows \ --warc-max-size=99123456 --warc-cdx -e robots=off --warc-file=owm\ vhpxyisu6fgd7r2fcswgavs7jly4znldaey33utadwmgbbp4pysad.onion \ 1>1wget1.txt 2>1wget2.txtWith level=1 or level=2, I think it downloads the pages/files in the input-file first before recursively downloading any outlinks. Next: get CIDs and stuff; run '$ cat series* | grep "<a h r e f=\"/post/"' and whatever. This, but headless:
$ cat series* | grep "<a h r e f=\"/post/" | head | sed "s/.*\///g"\ | sed "s/\".*//g" | xargs -d "\n" sh -c 'for h do torsocks curl -sL \ http://xbzszf4a4z46wjac7pgbheizjgvwaf3aydtjxg7vsn3onhlot6sppfad.onion\ /ipfs/$h?format=car | ipfs dag import --stats --pin-roots=false; ipfs\ files cp /ipfs/$h /a/tmp/owmvhpxyisu6fgd7r2fcswgavs7jly4znldaey33uta\ dwmgbbp4pysad.onion/$h; done' _ # could check if exists beforecouldn't post "h r e f" without spaces above; images from permabooru (potpony isn't my fetish)