/pone/ - World of Equestria

A board for discussing all things animated horse.

Posting mode: Reply

Check to confirm you're not a robot
Name
Email
Subject
Comment
Password
Drawing x size canvas
File(s)

Board Rules

Max file size: 350.00 MB

Max files: 5

Max message length: 4096

Manage Board | Moderate Thread

Return | Magrathea | Catalog | Bottom


CyTube Steam Catalog
/mlp/

Expand All Images


/go/ - Golden Oaks General #3 Anon 05/13/2024 (Mon) 03:37 [Preview] No. 10357
Welcome to Golden Oaks! Sit back and get /comfy/ and dive into the vast amounts of data across the web that has been generated by the MLP fandom.

What is Golden Oaks?
Golden Oaks is /endpone/'s archival and analysis thread devoted to the rather broad topic of the fandom itself. Active archival is a major feature, but a variety of topics are also active points of discussion. From analysis of trends and situations, to self reflection and representation (Some /tech/ discussion related to that end is welcome as well!).

What is Golden Oaks not?
Golden Oaks is not for the simple dissuasion of the latest scandal within the fandom or gossip about X. These subjects maybe relevant at times in the discussion of certain happenings and periods within the fandom, but the thread itself shouldn't dive into drama without reason.

What is this place?
This is the Endchan's pony board. While we are microscopic we have regular posting and a fairly distinct culture as can be with a handful of anons and drifters. /go/ welcomes outside contributions. Feel free to look around.
Current /NMAiE/: >>8915

Announcement!
A new thread has been created:>>10356 /culture/ (...and More!) for better discussion of certain niche topics and deep dives when this thread is too cluttered or topics that maybe too tangential but still useful.


Earlier /go/ Threads:
>>9086
>>3148


Pegasi Anchor Anon 05/13/2024 (Mon) 03:42 [Preview] No.10358 del
(5.55 MB 359x360 PegasusAnchor.gif)
EMERGENCY! SHUTDOWN NOTICES, MASS DELETIONS AND OTHER ARCHIVAL SITUATIONS.

Special anchor post. Reserve for only impending shutdowns of websites or mass deletions (or at least when the there is a high potential for such).

Stuff like this:>>8116,>>8983,>>9248


Cerberus Anchor Anon 05/13/2024 (Mon) 03:50 [Preview] No.10359 del
(220.62 KB 800x450 CeberusAnchor.png)
General Updates

Updates on the various archiving related activities of the thread. Not every little update of course, but stuff that maybe more worthy or certain milestones. Completed scripts, important progress and what have you. Use your own discretion.

Posts more like this:>>>8361, >>9068, >>9174


Anchor Anon 05/13/2024 (Mon) 03:54 [Preview] No.10360 del
Other Stuff

An anchor for everything else that might be worthy of highlighting. Again, use your own discretion.

Posts more like this:>>8195


How can I get involved? Anon 05/13/2024 (Mon) 08:22 [Preview] No.10367 del
(1.08 MB 788x535 Odds_and_Ends.gif)
Anypony can get started with something! Here is a simple (and incomplete for more in depth stuff) guide to for some suggestions on starting out.

Archiving:
Simply starting out:
Individualized archiving and keeping of records (especially interested in basic accounts and timelines of websites, see:>>4085). If you have any memories to share, that is nice as well! Even old screencaps and old accounts of what a place was like! {If thread seems too busy or it seems a bit off form main objectives, I again point to /culture/:>>10356)


For videos: Youtube (and other many other websites!) YT-DLP is the best thing available right now.
https://github.com/yt-dlp/yt-dlp
Don't forget the comments!
yt-dlp with --write-comments or this script here: https://github.com/egbertbouman/youtube-comment-downloader which That we had been using in the past rather intensively. NOTE: both not working for me as of writing this, uncertain if broken by a Youtube update or just something wrong on my end, don't have much time to test at the moment.



For full websites, the best option is grab-site from the Archive Team. It is a specialty tool that archives websites in the WARC file format:
https://github.com/ArchiveTeam/grab-site
Note that Windows support is experimental.

Using Linux, you most likely have Wget already installed, it can do simple websites and single web pages easily enough, although more complicated sites will NOT work well. Can do WARCs, but a bit problematic(see:https://wiki.archiveteam.org/index.php/Wget_with_WARC_output) You can get a version of Wget with Powershell on Windows, but it is very different under the hood. By default, Windows does come with its own version of curl as well. I'm not sure how useful either of these are in archival context compared to their Linux counterparts at least potential usefulness in some situations. There is a version of GNU Wget for Windows but it appears to be very outdated.

httrack is a program that has seen some use around here in the past. It might have a use but it's web crawls are incompatible with WARCs:
https://www.httrack.com/
It does have Windows support and I am still interested in investigating potential uses (weirdly, it sometimes was able to grab complex sites relatively well for me) but this is much lower tier option due to no-WARC support, it's own separate web-crawl format, and uncertainty of how active the developer is now.


Linux: You might want to have this installed for the tools mentioned above. No fear, you can do that in Windows now!
https://learn.microsoft.com/en-us/linux/install (need to look up some better Youtube tutorial or something.)


More questions! Anon 05/13/2024 (Mon) 08:34 [Preview] No.10369 del
(136.09 KB 250x273 3195882.gif)
>>10367
More advanced:
This is very broad and something I don't feel like I can even attempt to cover with any sort of justice right now. A whole host of issues are faced with archival right now. There is significant issues on several fronts. All of the free data and generous terms of the 2000s and 2010s internet are dying. Plus a rise in censorship and desires to rein in the old civil libertarian spirit of the Internet (for good or ill, one cannot deny that a lot of things will be wrongly caught in the crossfire) Endangering too many things to count. The central archives that underpin a lot of web history are also under potential risk for a variety of reasons. One bad lawsuit or the wrong person calling it quits (in the case of someplace like archive.today or The Pony Archive) might mean the loss of YEARS of archive work. I don't believe it is within the average person means to save everything everywhere but if a lot of people put a little effort a lot more could be saved then we expect if these places ever go down. Realistically, any long term advanced archiving should take these factors into account.

Hardware: This really deserves it's own section. Simply downloading and storing fair bit is pretty easy now and can be done with Potato PCs and a few external hard drives. Full on data hoarding with a plan of keeping something available years or even decades requires a bit more careful planning.Plus plenty of inbetween! From an old Optiplex, cheap default configured NASes to full on enterprise grade servers. There is a lot of set ups that could work for a lot of different people. One thing to remember though: multiple backups! Not everyone can go full 3-2-1 method but unless your storing it short term it is good to have two copies of something at least. Also learn about bit rot!


Glossary:
For some of the terms you may see around here frequently (needs expansion but this is a start).

IPFS
InterPlanetary File System: something that has been getting a lot of use around here lately. It is a distributed decentralized network and protocol. Think BiTTorrent without a central server (but more than that) would be a simple TL:DR. I like what Archivist said here on it:>>10173
>General idea, from one perspective. Are you interested in BitTorrent, but wish it was as elastic and expansive as the web? IPFS may be your solution! It takes good ideas from various things, such as HTTP and BitTorrent. Similar to the web, which has many various things, try not to rely on others to host IPFS data. In BitTorrent you can somewhat rely on other peers to host "important data" (read: some retarded TV show/movie/anime/video game/etc.) forever. Can't expect other peers to host your HTML or folder forever.
I could also invoke comparisons to ZeroNet and Freenet but I think those might be more obscure!

WARC
Web ARChive: a file format that is designed specifically for archiving websites. This isn't the same as downloading a single web page and is, to simply put it, much better at getting a site intact than most manual scraping.

ZFS
Z File System: /g/, r/datahoarder, and many techbros best and only choice for the storage of anything ever. Using anything else is retarded by some people's definition. Very good for long term storage and protection against bit rot. Lots of other useful features! Depending on your use case, might not be necessary or ideal.



Useful Links
https://wiki.archiveteam.org/ (The website of the archive team)
https://theponyarchive.com/ (The main and largest archive of the fandom right now)
https://desuarchive.org/ (Main archive of /mlp/ posts)
https://www.youtube.com/@DeletedPonyVideos/ (Randomly uploades deleted pony videos, sometimes useful, though it ain't the Library of Alexandria)
(Plenty more could and will be added here at some point).

Note: I consider this guide to be incomplete and will hopefully have the chance to expand it in the future or even add a section or two. Certainly better guides for Windows users and more tools/archive links for othe


Odds and Ends Anon 05/13/2024 (Mon) 09:38 [Preview] No.10370 del
(596.19 KB 1370x724 551.png)
Alright, this thread is now finally started! It only took like, what? 1 month late! That is the story of my life most of the time. Like most things, these can be a bit bumpy as I am often rushing to post while I have good Internet speed/access/uninterrupted time. All three of which sometimes come at a premium.

The guide is imperfect. Really, there is a lot of issues I could expand on. I focused on introductions to someone who is totally new to this and clarifications of a handful of things but I feel this could be better. Any suggestions here are welcomed of course.

Speaking of suggestions, /culture/. Anons were sometimes ambivalent on spiting the archiving section of the thread with the history and analysis sections. I understand both sides and have gone back and forth myself. I am not going to ban cultural analysis and history here or archival on that thread as I feel the subjects are too intertwined. Right now, I consider /culture/ to not be a full split but a optional place where some topics can be moved too as needed. Will see where it goes from there.

Something I would like to highlight. YT's Comment Converter:>>8361 and Feed-Analyzer programs:>>8413.I still haven't given up on these even if my life has been... complicated at times.Wherever you are, you still have may enteral salute. As I said before, no pressure to restart on these yourself, though if you ever come by again, you're more than welcome on /endpone/!

Archivist, what can I say? You might have a chaotic posting style but you are saving TBs of data constantly. You also get pic related! Though, don't put pressure yourself, alright?

There are plenty of more anons who have contributed and lurked here. I thank you for your contributions and wish you all the best!


Anon 05/13/2024 (Mon) 14:11 [Preview] No.10373 del
>>10357
Previous thread:
. HTML: >>9086
. JSON: https://endchan.org/pone/res/9086.json

To get the JSON/API version of a thread replace the "html" at the end of the URL with "json". JSON URL of a 4chan thread:
https://megalodon.jp/ref/2024-0507-0022-44/https://a.4cdn.org:443/wsr/thread/1458990.json

Showed up as binary data non-basic text jibberish in Megalodon for some reason. BTW, ウェブ魚拓 was updated to newer software or whatever in the start of 2024-05. Similar websites (rev.2024.05.10):
https://ja.wikipedia.org/wiki/ウェブアーカイブ#外部リンク

rsync information (has a duplicate "-S, --sparse" section):
https://ss64.com/bash/rsync_options.html


Anon 05/21/2024 (Tue) 07:48 [Preview] No.10381 del
2019 video not in TPA - "13 Kitchen Hacks And Decor Ideas":
https://invidious.incogniweb.net/watch?v=deSdfZb1l5g&list=UUIJ44QRtVGm_gBh_deuL5ow&index=1600&listen=false
. Beheaded pony
. Unicorn milk

2018 video not in TPA - "16 My Little Pony Hacks And Crafts":
https://invidious.materialio.us/watch?v=hJPs9f0SDhI&list=UUIJ44QRtVGm_gBh_deuL5ow&index=1967
. Russian text, DIY "life hacks"
. There was maybe two or more other MLP-related videos by this channel, but I didn't bother

2023 video not in TPA - "Twilight Sparkle Inflation Spell Animation (Watch Carefully At The Description)":
https://invidious.reallyaweso.me/watch?v=9bczdRi7UGk&list=UUKj5UeZ-kt9T4ArZ7PqxEdg&index=10
. reupload of an original animation
. small channel

>>10358
Booru.org is likely shutting down
https://desuarchive.org/mlp/thread/41078194#41092144


Anon 05/22/2024 (Wed) 10:09 [Preview] No.10386 del
Sexy mare:
https://twibooru.org/369640?q=pasties%2C-anthro%2C-human

YouTube videos, accessibility - "Administrator has disabled this endpoint." (works without "&local=true" but with no proxy):
https://inv.tux.pizza/latest_version?id=ydYGVhLPTgo&itag=22&local=true

Working proxy (also "itag=18"):
https://invidious.lunar.icu/latest_version?id=ydYGVhLPTgo&itag=22&local=true

Doesn't get >720p with those numbers. "Even downloading at a low resolution such as 240p or 360p still is infinitely better than nothing (0p)." --https://wiki.archiveteam.org/index.php/YouTube


Anon 05/22/2024 (Wed) 11:09 [Preview] No.10387 del
videos.json from WBM, pony-related channel now has at least one hidden video:
https://gateway.pinata.cloud/ipfs/bafkreidfyzdzx27esnexjtfiv5jom52u7n6yztrom6dnm3rg2lkubnobwq

Related to this channel which has some/all videos in TPA:
https://iv.ggtyler.dev/channel/UCzhBSigqirJ068ye_oN33ow

>>10359
I see that you linked to "Comments Converter":
https://endchan.org/pone/preview/8361.html

Haven't learned a lot about that software, but I guess it does this: takes .info.json with YT comments -> outputs HTML where you can easily read the comments on the YouTube video.

>>10386
>Sexy mare
Also Roseluck:
https://twibooru.org/3230646

pfp from npr video:
>https://web.archive.org/web/20240520154214/https://iv.ggtyler.dev/watch?v=MN-qXPyIDAs "Places to Avoid Getting Cut or Stabbed at All Cost"


Anon 05/22/2024 (Wed) 12:22 [Preview] No.10388 del
In the previous thread I saw that gateway.ipfs.io is gone (redirects to ipfs.io). Not so bad, I found that site to be unimpressive. Worse: cf-ipfs.com seems to be gone now too (3-test). It redirects to ipfs.io or dweb.link: both of which performed significantly worse than CF gateways from what I've seen (TTFB). #FuckGoogle image from cyb.

>>10367
>httrack might have a use
Use case: for some reason you can't use GNU/Linux and are like stuck with Windows 7. I have have tried various versions of wget for Windows and they were all horrible/unusable for writing .warc.gz
>but it's web crawls are incompatible with WARCs: https://www.httrack.com/
I was wondering about that in the past. Now I know.


Anon 05/23/2024 (Thu) 00:48 [Preview] No.10389 del
>>10369
>website archive.today
I recommend also using that site because Wayback Machine (WBM) sometimes deletes stuff. Multiple times, it seems that I got blocked from archive.ph. If I go there now in Brave, it shows a white page with the message "Welcome to nginx". However, if I go to that site in lynx browser, it works as expected.

>>10386
Was saved here - video file showed up yesterday in WBM but not today:
https://web.archive.org/web/20240522100404/https://invidious.lunar.icu/latest_version?id=ydYGVhLPTgo&itag=22&local=true

HTTP 403:
https://invidious.perennialte.ch/latest_version?id=ydYGVhLPTgo&itag=22&local=true

Working proxy:
https://invidious.privacyredirect.com/latest_version?id=ydYGVhLPTgo&itag=22&local=true

>>10388
>cf-ipfs.com redirects to ipfs.io or dweb.link: both of which performed significantly worse than CF gateways
I think I first noticed this in 2024-05-21 UTC. A 2024-05-20 12:41 UTC capture of cf-ipfs.com (no redirect to non- cf-ipfs.com):
https://cf-ipfs.com/ipfs/QmUamt7diQP54eRnmzqMZNEtXNTzbgkQvZuBsgM6qvbd57

That text file is basically this:
># The Great Web
>The Great Web is a web that lasts. It is based on three simple ideas.
>## Access
>Anyone who can store secret and compute digital signatures can use the Great Web. Humans, robots, animals, plants, and even mycelium can use it without discrimination and limits.
>## Immutability
>Particles in the Great Web can survive through spacetime thanks to frozen content addressing. So the Great Web can last indefinitely.
>## Universality
>The Great Web is built by connecting particles through cyberlinks. The result is universally acceptable language, dynamic but understandable and acceptable by anyone.


Anon 05/23/2024 (Thu) 10:30 [Preview] No.10390 del
My little pony spotted at the beginning of this episode of a shitty commie series:
https://fmoviesz.to/tv/yellowjackets-30648/1-6

( ebook blocked by Pinata: https://gateway.pinata.cloud/ipfs/bafykbzacedpavr5qwaqwdmckccacz3nx42yicq27jh63t54nc35aemm5wih7m )


Anon 05/25/2024 (Sat) 10:10 [Preview] No.10395 del
Post on pony butt (cross-thread): >>10394

>>10390
>My little pony spotted at the beginning of this episode of a shitty commie series:...
Also maybe in the "eat the bugs" episode (pic related):
https://fmoviesz.to/tv/yellowjackets-30648/1-9

The premise of that series is stupid. A group of girls get stuck in the Canadian wilderness, in a forest, due to a plane crash. Plane crash survivors couldn't make it back to society after like a year. Just burn down the forest. That would get the attention of someone who could save you. And/or learn to live nomadically instead of continually living "stranded" in the same place.


Anon 05/28/2024 (Tue) 10:07 [Preview] No.10406 del
(555.34 KB 755x1024 BB.png)
>>10390
Certainly, pony references in the media count as on topic, even if briefly, in my book.

>>10387
The comment converter formats and sets up the YT videos to be used in a archive fashion where they can be viewed with avatars and more easily readable than YT's infinite scrolling format (that was the idea anyway). Examples here:>>8381

>>10381
>Booru.org is likely shutting down
Never liked that place but still a shame considering stuff might be lost in that mess.

>>10386
> "Even downloading at a low resolution such as 240p or 360p still is infinitely better than nothing (0p)
My mentality as well.

>>10388
>I was wondering about that in the past. Now I know.
Found something of interest though not a full solution. National Library of Austria has tried to make a program that converts crawls:
https://github.com/nla/httrack2warc
Maybe not the most active development and has some potential issues of its own but good to note.


Anon 05/29/2024 (Wed) 08:48 [Preview] No.10409 del
Here is a shortened link ( info page at https://mub.me/yxUpk/stats ):
https://mub.me/yxUpk
->
https://tpa.mares.workers.dev/?output=html&id=

Informs me of 2022 video not in TPA - "MLP:FIM | FULL PMV | Princess Celestia & Princess Luna | Tribute 8 | Running Up That Hill":
https://vid.puffyan.us/watch?v=61ajUKx0bUk&list=UUYMslvPvLBto4IWZVglCxNA&index=26

>>10406
>https://github.com/nla/httrack2warc/zipball/master
Interesting. Also include httrack files to avoid possibility being called faked WARCs.


Anon 05/29/2024 (Wed) 09:04 [Preview] No.10410 del
I think I did download this channel in the past:
https://vid.puffyan.us/watch?v=dO_IOb2HT7s&list=UUOOhgESdE6ncQ9l0WnWIObA&index=17
>Walk of Life PMV (Time Capsule Version)

>>10409
While posting to Endchan on mobile there's a bug where an image attached to a post you made days ago gets automatically attached to the post you are currently writing. Press x to remove attachment. I didn't notice that and that's what happened with that post.



Top | Catalog | Post a reply | Magrathea | Return