Anon 12/08/2023 (Fri) 01:51 No.8970 del
(59.38 KB 703x704 channels4_profile.jpg)
In folder "0folder" I have multiple folders named based on Google Drive folder IDs. They should all have a corresponding rclone-created "[folders name].json" file. I'm getting file IDs out of the JSONs then downloading WBM CDX files to see what has and hasn't been saved to wbm:
>...[ls | tail -n +20 | head -n20 ... ls | tail -n +30 | head -n20 ...]
>...:/z8/put/gd/from_desumlp_drive.google.com_search/0folder$ h=12cEh_UXq5RUUEjGPcT_RG0wL-dtlVx53
>...:/z8/put/gd/from_desumlp_drive.google.com_search/0folder utc; mkdir ${h}_files_cdx; utc; cd ${h}_files_cdx; pwd; cat ..h.json | grep -v '"IsDir":true,' | jq -r .[].ID > 0_fid_1.txt; cat 0_fid_1.txt | xargs -d "\n" sh -c 'for args do echo $args; torsocks curl -L https://web.archive.org/web/timemap/cdx?url=https://drive.google.com/uc?id=$args > $args.cdx; done' _; echo "=="; find . -type f -empty | sed "s/\.cdx//g" | sed "s/^..//g" | xargs -d "\n" sh -c 'for args do echo $args; torsocks curl -L https://web.archive.org/web/timemap/cdx?url=https://drive.google.com/uc?id=$args > $args.cdx; done' _; find . -type f -empty >> ../0_queue_id.txt; cd ..

IDs in "0_queue_id.txt" (not found in wbm) will be put into file "$(date +%s)goodrive.txt", then those https://drive.google.com/uc?id=<ID> URLs will be requested to be saved by wbm
>.../overcast07-wayback-machine-spn-scripts-31d6bf1$ ./spn.sh -a api:key -d if_not_archived_within=157680000 -s -n -f . 1701996727goodrive.txt 1>6log1.txt 2>6log2.txt & disown
"-d if_not_archived_within=157680000" = don't save to wbm if a capture of it already exists in the past 5 years. Won't get saved, for now: ">2GB", "too large/virus scan", Google Docs files in Google Drive (doesn't work at https://drive.google.com/uc?id=).

There was previously this Google Drive folder with terabytes of WARCs of MLP-related websites. I have basically all of the metadata of that folder, so I have recursively have IDs of all files and folders that were in there. That paid folder with TBs of data (related to iwiftp) is basically empty now. If something really bad happened in regards to that, I hope that all of those files and folders were saved by AT back in 2021. But then I think some of the grabs in that folder were done in 2022 or 2023, so maybe those are somewhat lost.

pfp from npr
>https://y.com.sb/watch?v=e-9Zgxvqa3o Matthew "React Content is OK when I do it" Judge