/hydrus/ - Hydrus Network

Bug reports, feature requests, and other discussion for the hydrus network.

Posting mode: Reply

Check to confirm you're not a robot
Name
Email
Subject
Comment
Password
Drawing x size canvas
File(s)

Board Rules

Max file size: 350.00 MB

Max files: 5

Max message length: 4096

Manage Board | Moderate Thread

Return | Magrathea | Catalog | Bottom

Expand All Images


Version 442 Anonymous Board owner 06/03/2021 (Thu) 00:01:47 Id: 69c54e [Preview] No. 1078
https://youtube.com/watch?v=bpEFn3MFyfA [Embed]
windows
zip: https://github.com/hydrusnetwork/hydrus/releases/download/v442/Hydrus.Network.442.-.Windows.-.Extract.only.zip
exe: https://github.com/hydrusnetwork/hydrus/releases/download/v442/Hydrus.Network.442.-.Windows.-.Installer.exe
macOS
app: https://github.com/hydrusnetwork/hydrus/releases/download/v442/Hydrus.Network.442.-.macOS.-.App.dmg
linux
tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v442/Hydrus.Network.442.-.Linux.-.Executable.tar.gz

I had a great week. An important part of GUI Sessions is overhauled, which should save a lot of hard drive time for larger clients.

gui sessions

I always encourage a backup before you update, but this week it matters more than normal. If you have a client with large sessions with many important things set up, make sure you have a backup done before you update! I feel good about the code, and I try to save data on various failures, but if your situation gives errors for an unforeseen reason, having the backup ready reduces headaches all around!

Like the subscriptions and network objects breakups I've done in the past year, I 'broke up' the monolithic GUI Session object this week. Now, when your session has changes, only those pages that have changed will be saved, saving a ton of CPU and HDD write I/O. Furthermore, sessions that share duplicate pages (this happens all the time with session backups), can now share that stored page, saving a bunch of hard drive space too. Like with subscriptions, some users are pushing multiple gigabytes of session storage total, so there is a good amount of work to save here.

You don't have to do anything here. Everything works the same on the front end, and all your existing sessions will be converted on update. Your client should be a little less laggy at times, and client shutdown should be a bit faster.

If any of your old sessions fail to load or convert, a backup will be made so we can check it out later. Let me know if you have any trouble!

Advanced stuff:

Another benefit is the old limit of 'sessions fail to save at about 500k session weight' now applies to pages individually. Please don't immediately try to nuke your sessions with five million new things, but if you do end up with a big session, let me know how other performance works out for you. Now this bottleneck is gone, we'll start hitting new ones. I believe the next biggest vulnerability is thread starvation with many simultaneous downloaders, so again please don't paste-spam a hundred now queries (for now).

If you have been tracking session weight (under the pages menu), I am rebalancing the weights. Before, the weight was file = 1, URL = 1, but after all our research into this, I am setting it to file = 1, URL = 20. In general, I think a page will fail to save at the new weight of about 10 million. If you are in advanced mode, you can now see each page's weight on page tab right-clicks. Let's get a new feeling for IRL distribution here, and we can aim for the next optimisation (I suspect it'll eventually be a downloader-page breakup, storing every query or watcher as a separate object). Since URLs seem to be the real killer, too, see if you can spread bigger downloads across multiple download pages and try to clear out larger completed queries when you can.


Anonymous Board owner 06/03/2021 (Thu) 00:02:19 Id: 69c54e [Preview] No.1079 del
the rest

I did a bunch of little stuff--check the changelog if you are interested.

I have also turned off the interval VACUUM maintenance and hidden the manual task for now. This was proving less and less useful in these days of huge database files, so I will bring it back in future on a per-file basis with some UI and more specific database metadata.

EDIT: Thanks to a user submission, yande.re post parser is updated to pull tags correctly if you are logged in. I hoped my update code would move the link over from the old parser correct, but it did not. I'll fix this for next week, but if you download from yande.re while logged in, please hit network->downloader components->manage url class links and move 'yande.re file page' from moebooru to 'yande.re post page parser'.

We fixed a couple more problems with the new builds--the Linux and Windows extract builds have their surplus 'ubuntu'/'windows' directories removed, and the Linux executables should have correct permissions again. Sorry for the trouble!

And after some tests, we removed the .py files and the source from the builds. I long-believed it was possible to run the program from source beside the executables, but it seems I was mistaken. Unless you are running the build-adjacent source pretty much on the same machine you built on (as my tests years ago were), you get dll conflicts all over the place. If you want to run from source, just extract the source proper in its own fresh directory. I've also fleshed out the 'running from source' help beyond setting up the environment to talk more about the actual downloading and running of the program. I'll continue work here and hope to roll out some easy one-and-done setup scripts to automate the whole thing.

full list

- gui sessions:
- gui sessions are no longer a monolithic object! now, each page is stored in the database separately, and when a session saves, only those pages that have had changes since the last save are written to db. this will massively reduce long-term HDD writes for clients with large sessions and generally reduce lag during session save intervals
- the new gui sessions are resilient against database damage--if a page fails to load, or is missing from the new store, its information will be recorded and saved, but the rest of the session will load
- the new page storage can now be shared across sessions. multiple backups of a session that use the same page now point to the same record, which massively reduces the size of client.db for large-sessioned clients
- your existing sessions and their backups will obviously be converted to the new system on update. if any fail to load or convert, a backup of the original object will be written to your database directory. the conversion shouldn't take more than a minute or two
- the old max-object limit at which a session would fail to save was around 10M files and/or 500k urls total. it equated to a saved object of larger than 1Gb, which hit an internal SQLite limit. sessions overall now have no storage limit, but individual pages now inherit the old limit. Please do not hurry to try to test this out with giganto pages. if you want to make do a heap of large long-term downloaders, please spread the job across several pages
- it seems URLs were the real killer here, so I am rebalancing it so URLs now count for 20 weight each. the weight limit at which point a _page_ will now fail to save, and the client will start generally moaning at you for the whole session (which can be turned off in the options), is therefore raised to 10M. most of the checks are still session-wide for now, but I will do more work here in future
- if you are in advanced mode, then each page now gives its weight (including combined weight for 'page of pages') from its tab right-click menu. with the new URL weight, let's get a new sense of where the memory is actually hanging around IRL


Anonymous Board owner 06/03/2021 (Thu) 00:03:24 Id: 69c54e [Preview] No.1080 del
- the page and session objects are now more healthily plugged into my serialisation system, so it should be much easier to update them in future (e.g. adding memory for tag sort or current file selection)
- .
- the rest:
- when subscriptions die, the little reporting popup now includes the death file velocity ('it found fewer than 1 files in the last 90 days' etc...)
- the client no longer does vacuums automatically in idle time, and the soft/full maintenance action is removed. as average database size has grown, this old maintenance function has increasingly proved more trouble than it is worth. it will return in future as a per-file thing, with better information to the user on past vacuums and empty pages and estimates on duration to completion, and perhaps some database interrupt tech so it can be cancelled. if you really want to do a vacuum for now, do it outside the program through a SQLite intepreter on the files separately
- thanks to a user submission, a yande.re post parser is added that should grab tags correct if you are logged in. the existing moebooru post parser default has its yande.re example url removed, so the url_class-parser link should move over on update
- for file repositories, the client will not try to sync thumbnails until the repository store counts as 'caught up' (on a busy repo, it was trying to pull thumbs that had been deleted 'in the future'). furthermore, a 404 error due a thumb being pulled out of sync will no longer print a load of error info to the log. more work will be needed here in future
- I fixed another stupid IPFS pin-commit bug, sorry for the trouble! (issue #894)
- some maintenance-triggered file delete actions are now better about saving a good attached file delition reason
- when the file maintenance manager does a popup with a lot of thumbnail or file integrity checks, the 'num thumbs regenned/files missing or invalid' number is now preserved through the batches of 256 jobs
- thoroughly tested and brushed up the 'check for missing/invalid files' maintenance code, particularly in relation to its automatic triggering after a repository processing problem, but I still could not figure out specifically why it is not working for some users. we will have to investigate and try some more things
- fixed a typo in client api help regarding the 'service_names_to_statuses_to_display_tags' variable name (I had 'displayed' before, which is incorrect)
- .
- build fixes:
- fixed the new Linux and Windows extract builds being tucked into a little 'ubuntu'/'windows' subfolder, sorry for the trouble! They should both now have the same (note Caps) 'Hydrus Network' as their first directory
- fixed the new Linux build having borked permissions on the executables, sorry for the trouble!
- since I fixed the urllib3 problem we had with serialised sessions and Retry objects, I removed it from the requirements.txts. now 'requests' can pull what it likes
- after testing it with the new build, it looks like I was mistaken years ago that anyone could run hydrus from source when inside a 'built' release (due to dll conflicts in CWD vs your python install). maybe this is now only true in py3 where dll loading is a little different, but it was likely always true and my old tests only ever worked because I was in the same/so-similar environment so the dlls were not conflicting. in any case the builds no longer include the .py/.pyw files and the 'hydrus' source folder, since it just doesn't seem to work. if you want to run from source, grab the actual source release in a fresh, non-conflicting directory. I've updated the help regarding this, sorry for any trouble or confusion you have ever run into here
- updated the running from source document to talk more about actually getting the source and fleshed out the info about running the scripts


Anonymous Board owner 06/03/2021 (Thu) 00:05:43 Id: 69c54e [Preview] No.1081 del
- misc boring refactoring and db updates:
- created a new 'pages' gui module and moved Pages, Thumbs, Sort/Collect widgets, Management panel, and the new split Session code into it
- wrote new container objects for sessions, notebook pages, and media pages, and wrote a new hash-based data object for a media page's management info and file list
- added a table to the database for storing serialised objects by their hash, and updated the load/save code to work with the new session objects and manage shared page data in the hashed storage
- a new maintenance routine checks which hashed serialisables are still needed by master containers and deletes the orphans. it can be manually fired from the _database->maintenance_ menu. this routine otherwise runs just after boot and then every 24 hours or every 512MB of new hashed serialisables added, whichever comes first
- management controllers now discard the random per-session 'page key' from their serialised key lookup, meaning they serialise the same across sessions (making the above hash-page stuff work better!)
- improved a bunch of access and error code around serialised object load/save
- improved a heap of session code all over
- improved serialised object hashing code

next week

I have one more week of work before my vacation. There's a ton of little jobs I have been putting off--checking new downloaders users sent in, some more help docs to work on, and magically growing multi-column list dialogs--as well as emails and other messages I haven't got to. I'll try to tidy up those loose ends as best I can before I take my break. I'll also deal with any problems with these new GUI Sessions.


Anonymous 06/05/2021 (Sat) 04:57:10 Id: 5b4011 [Preview] No.1083 del
excellent update hydev, all of my previous stutters are gone! thank you very much!


Release Tomorrow! Anonymous Board owner 06/09/2021 (Wed) 06:31:03 Id: 1dfb0b [Preview] No.1084 del
I had a great week working on small quality of life issues. A couple of bugs are fixed, some UI lag is reduced, and I worked on some layout too. Just a mix of cleanup before my vacation next week.

I have some unavoidable IRL tomorrow, so the release may be a bit later than usual.

>>1083
Great, thanks for letting me know!



Top | Catalog | Post a reply | Magrathea | Return