>>8952 >>8964 Next step for the greentext detection thing: the extractors. I was testing on ipfs://bafybeif533b3ekb6ceitlq64povkdu2z2mmuc4fv3v5xd2d2httxbvxobu which itself was created by a specific extractor which I wrote and posted in /pag/ some time ago. The one I did worked OK, it worked on this 4chan->some board->some thread (rendered HTML not source)->select all text->save to text file->run my extractor->creates multiple text files, one file per post. So bafybeif...xobu is an example of 4chan rendered HTML copy-and-paste extractor. Various sources to extract and parse from which I didn't do (yet?): parquet files from desuarchive /mlp/ dump, desuarchive rendered HTML copy-and-paste, desuarchive source HTML, 4chan source HTML, and maybe also desuarchive search.