/hydrus/ - Hydrus Network

Bug reports, feature requests, and other discussion for the hydrus network.

Posting mode: Reply

Check to confirm you're not a robot
Name
Email
Subject
Comment
Password
Drawing x size canvas
File(s)

Board Rules

Max file size: 350.00 MB

Max files: 5

Max message length: 4096

Manage Board | Moderate Thread

Return | Magrathea | Catalog | Bottom

Expand All Images


Version 508 Anonymous Board owner 11/30/2022 (Wed) 23:40 Id: 7ccfc4 [Preview] No. 1422
https://youtube.com/watch?v=81tEdqBvHVs [Embed]
windows
zip: https://github.com/hydrusnetwork/hydrus/releases/download/v508/Hydrus.Network.508.-.Windows.-.Extract.only.zip
exe: https://github.com/hydrusnetwork/hydrus/releases/download/v508/Hydrus.Network.508.-.Windows.-.Installer.exe
macOS
app: https://github.com/hydrusnetwork/hydrus/releases/download/v508/Hydrus.Network.508.-.macOS.-.App.dmg
linux
tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v508/Hydrus.Network.508.-.Linux.-.Executable.tar.gz

I had a great week fixing some bugs, adding hash lookup to the Client API, and finally attaching a Tag Filter to the PTR.

Full changelog: https://hydrusnetwork.github.io/hydrus/changelog.html

tag filter

For normal users: the PTR will start filtering out some bad tags automatically, you don't have to do anything.

The big job I wanted to do this week, adding a Tag Filter to the PTR (and other tag repositories), went a lot better than I expected. I split the work into four pieces and got the first one done. If you are a tag repository administrator, you can now edit a Tag Filter for your repo under the services->administrate services->tag repo->edit tag filter menu. Your tag repo will silently discard any tag mapping uploads that do not pass the filter. So, if you don't want 'tagme' or 'source:some-borked-url' tags, you can now filter them at the point of upload and no one has to create or process any petitions in future.

Regular users will sync to the set filter and can see it under services->review services. Users in advanced mode will also get popups on specific rule changes. The next step here will be to expose this filter more, showing it in manage tags and similar UI, not even trying to pend tags that don't match the filter, and retroactively fixing mappings that already exist on the server (including hard-replacing since-sibling'd tags). I'd like to get at least some of this done before the end of the year, but I'm sure some will spill into 2023.

other highlights

There is a new 'file relationships: show x' action in the 'media' shortcut set, which will show a file's potential dupes, actual dupes, alternates, or false positives in a new page, just like the 'show x' actions that are buried in the thumbnail menu. This works in the media viewer too.

I fixed a recent stupid bug in the duplicate filter that meant if you did a batch of work with only file deletes, they were not going through. Sorry for the trouble! The thing now calculates and reports duplicate decisions and manual file deletes separately. I also added a (+50%, -33%) label to the file size comparison statement, just as a test. The numbers are correct, but I'm not sure I like it--let me know what you think.

If you work with the downloader system, URL Classes get two new checkboxes this week, 'do not allow any extra path components/parameters', which stop matching URLs that have too-long paths or parameters. It should help fix up some situations where hydrus was having trouble figuring out which URL Class to match sets of nested URLs to, particularly when the Gallery URL is 'longer' than the Post URL.

With the new
/get_files/file_hashes
call, the Client API can now do lookups between sha256, md5, sha1, and sha512 hashes. If you provide some hashes in one of these types, the API can now give you known matches for any of the other types. Let's say you have a bunch of sha1 hashes and want to see if you have them imported, or you have a bunch of sha256 hashes for files you have and want to get their known md5s for an external site lookup, this is now possible. The client typically knows all four hashes for every file it has ever imported, so you can do your own 'already deleted' checks too.

next week

More like this. Some small work and some more janitor/serverside stuff. Only three more releases this year, so I'll try to keep things simple.


Release Tomorrow! Anonymous Board owner 12/07/2022 (Wed) 03:07 Id: cbedd6 [Preview] No.1425 del
I had a good week mostly fixing bugs, including the recent annoying-but-harmless shutdown bug some users had and the macOS app boot bug. There's also some UI quality of life.

The release should be as normal tomorrow.


Anonymous 12/07/2022 (Wed) 11:39 [Preview] No.1426 del
>>1425

Thanks for your continuing work on the project man, much appreciated. I have some feedback/wishes:

1. 4chan thread watcher via Tor gives 403 Cloudflare error - simple downloader works. Is there something one can do about this beyond switching circuits/proxy/clearnet? I mentioned this before and the browser header shouldn't make a difference here, as downloads as such work, just not the watcher.

2. 4chan simple downloader fails at .webms in same setup, these have to be manually added.

3. Add a second/third bar for tabs, so that the titles are not squeezed when having many tabs open.

4. PDF/ebook compatbility to use Hydrus for more informative tasks (this might get you a lot of support from librarian types).

5. Grouping of tags - as with the namespace/creator group by function, have colored bars or something that groups all items belonging to that category - or some similar function to have a "folder like" grouping in the main window. Tags are awesome for precise searches/classifications, but sometimes that "all in one" feeling that folders offer might be useful; I may have overlooked something though.

As always, I really love the work you do and you simplify a lot of time intensive stuff. I'd love to have a setup working that scrapes threads automatically by topic, I was thinking of a hydrus + something scripted setup that identifies threads by title/keyword and then automatically adds them to the watcher. This would save so much time and erase blind spots when covering news topics etc. Any ideas as to the feasibility?


Anonymous Board owner 12/08/2022 (Thu) 00:01 Id: fb3f3f [Preview] No.1428 del
>>1426
Thanks, I'm glad you like it!

1) The watcher hits up the 4chan API (in this form https://a.4cdn.org/m/thread/16086187.json), whereas the simple downloader hits the main html. My guess is that's what CF is blocking. If you had to set up headers or User-Agent to get the html to work, might want to replicate it for a.4cdn.org. Hydrus Companion may help here to copy cookies, not sure. If CF just blocks API hits from TOR, you may be out of luck.

2) Yeah, I bet the simple downloader looks for img tags or something, whereas webms are under 'video' or an iframe or something. Simple downloader can't do two things at once very well, I'm afraid! If you are feeling brave, you could try editing the simple parsing formula to try to get both img and (video), but I'm not sure if it is that clever.

3) Check out the 'page of pages'. Hit F9->special->page of pages. That'll let you nest your page tabs.

4) Thanks. Yeah, I'd like some basic support for more text formats, and 'any file' support, which will take some extra work for some internal technical reasons. But I'd like it. However, I do not think I can support nice text reading any time soon. I'm terrible at drawing lines, but I know I'll never have the time to make something anywhere near as good as Calibre, so the support will stay as an 'open externally' button, like PDFs now, for a long time.

5) Yeah, I absolutely want [+] expand/tree format for the tag list. That list is my own custom widget, so I have to code everything in it. But this is something I really want for myself too, and options for things like 'start with title tags collapsed'. And custom namespace sorting so you can put creator up top, that sort of stuff.

Having auto-scraping has been a long-time thought. I'm hesitant to make too many automatic searchers, since I like human eyeballs behind big decisions. I know some users, however, who have had big success writing their own scripts that fetch the current thread catalog, scan it for threads they want, and then pass them on to Hydrus using the Client API. If you are ok with scripting, you might like to play around with it: https://hydrusnetwork.github.io/hydrus/client_api.html

As always, I'm short on work time. When I eventually roll some of 4 + 5 out, let me know how it goes for you, and if you have any other problems or ideas.


Anonymous 12/22/2022 (Thu) 13:02 [Preview] No.1441 del
>>1428
1) Will try that out, once I find the time. Thanks for the answer!

2) I'll experiment on that as well and will report back with how it goes.

3) Nice, awesome!

4) I'd be enough to merely have the works indexed inside Hydrus. You can always CTRL+E to open in your preferred reader/program with all the capabilities you need on the media itself. It's just so you can collect and organize the files, which Hydrus really blows anything else I know out of the water. There's some academic software that lets you organize your sources and stuff, but it's more of a list thing. Since you can add tons of metadata... well let's just say Hydrus and zlib would go hand in hand ;)

5) Yeah, collapsible tag trees would make stuff more easily readable. I know you can organize by custom namespaces and that is already really powerful, but it'd be cool to have tag:one tag:two tag:three below each other in the tag list and in the folder list as well, in "blocks" so to speak.

I feel you on automation. One misstep and you practically DDoS a site or flood your hard drive. I also didn't listen to your warning in the documentation haha. "and then THEY DO SO only to work through thousands of files", I felt that so hard and it brought a smile to my face. But once you have figured out what you want to parse daily etc. you then really want the automation for these specifica.

I really do appreciate your time man. Your project is one I always look into the documentation again and check the forums here. That's a rare exception for a really solid project. If you go the academic/library route, you might catch some licensing stuff to capitalize on your outstanding work, like for official use and stuff.



Top | Catalog | Post a reply | Magrathea | Return