Publishers Always Innovating

fossilesque@mander.xyz · 19 days ago

Publishers Always Innovating

ChaoticNeutralCzech@feddit.org · edit-2 19 days ago

Datasheet websites do that a lot. If it’s PDF.js, Firefox’s PDF viewer (or a fork of it), I just right-click to “Show only this frame” and it goes fullscreen. It might have shenanigans such as disabled printing but you can press Ctrl+Shift+E and reload to check network activity for what address the PDF is loaded from and save that.

The worse ones are PDFs that exist only for SEO and contain nothing but keywords and a link to a paywall.

helloworld55@lemm.ee · 18 days ago

Wow I had no idea about this. And I was just in the process of trying to download a pdf from one of these websites. Thanks

skillissuer@discuss.tchncs.de · 19 days ago

but if they just gave you pdf, how would they track every mouse movement for their bullshit metrics?

skillissuer@discuss.tchncs.de · 19 days ago

would somebody think of the advertisers?

abbadon420@lemm.ee · 19 days ago

Advertisers? Think of the managers! Managers are nothing without their metrics.

daddy32@lemmy.world · 19 days ago

No, it should obviously take you to “pay us enormous amount of money every month” page first.

Final Remix@lemmy.world · 18 days ago

Suck my InterLibraryLoan, Pearson.

Swedneck@discuss.tchncs.de · 18 days ago

suck my pear, son

ornery_chemist@mander.xyz · edit-2 18 days ago

https://…/epdf/… -> https://…/pdf/…

Works for some places at least. Super infuriating though. Why use the fast native PDF viewer in the browser when you could use a bloated and buggy JS app?

smpl@discuss.tchncs.de · 18 days ago

Very informative, but I’d change one small thing.

Why use the fast native PDF viewer ~~in the browser~~ when you could use a bloated and buggy JS app?

ornery_chemist@mander.xyz · 18 days ago

Fair, but certain corporate-mandated client-side PDF viewers are… bloatier. Though, I do like not having another window to manage when I open in browser, particularly when doing web searches. It pairs well with tab grouping extensions, and I generally don’t use markup, so no loss for me there.

Imgonnatrythis@sh.itjust.works · 19 days ago

That will be $30.12

AwkwardLookMonkeyPuppet@lemmy.world · 19 days ago

Every four weeks forever, or until you try to cancel and realize we’ve set the cancellation page up to just throw errors every time you get close to actually cancelling.

JackbyDev@programming.dev · 18 days ago

Oh boy, I sure am excited to websites hosting PDFs! I love when the tool that everyone uses for hosting and viewing HTML get to be blessed with the perfect format that is PDF!

I LOVE PDFS! I love two column PDFs! I love reading like this!

1 3
2 4
5 7
6 8

Instead of like this

1
2
3
4
5
6
7
8

It’s amazing and such a good user experience!

I love that PDFs are so difficult to transform into HTML, too. I would never want the besmirch the publishers oerfect one approved layout by resizing the window!

keepthepace@slrpnk.net · 18 days ago

I love that PDFs are so difficult to transform into HTML, too

FYI, if that’s relevant to your field, every new article published on arxiv.org now has a HTML render as well.

And on many older publications, transforming “arxiv.org” into “ar5iv.org” leads to an HTML rendering that is a best-effort experiments they ran for a while.

JackbyDev@programming.dev · 18 days ago

That’s really cool! What I really would like is a tool that converts PDFs to semantic HTML files. I took a peek there and it seems easier for them because they have the original LeX source.

I think for arbitrary PDFs files the information just isn’t there. I’ve looked into it a bit and it’s sort of all over. A tool called pdf2htmlex is pretty good but it makes the HTML look exactly like the PDF.

keepthepace@slrpnk.net · 18 days ago

Yes, PDFs are much more permissive and may not have any semantic information at all. Hell, some old publications are just scanned images!

PDF -> semantic seems to be a hard problem that basically requires OCR, like these people are doing

JackbyDev@programming.dev · 18 days ago

Oh nice, thanks for sharing that project. I haven’t heard of it before!

thevoidzero@lemmy.world · 16 days ago

Not just semantics. PDFs doesn’t even have segmentations like spaces/lines/paragraph. It’s just text drawn at locations the text processor/any other softwares inserted into. Many pdf editor softwares just detect the closeness of the characters to group them together.

And one step further is you can convert text to path, which basically won’t even have glyph (characters) info and font info, all characters will just be geometric shapes. In that case you can’t even copy the text. OCR is your only choice.

PDF is for finalizing something and printing/sharing without the ability to edit.

werefreeatlast@lemmy.world · 18 days ago

Choose your own adventure PDF! 1, 5, 7, 3, 9, 2, 0, 6, 4, 8! What an ending!

Bonsoir@lemmy.ca · 19 days ago

At least you can usually print them as PDF easily. My main issue is that the page title becomes “PDF.js Viewer - [Paper title]”.

ChaoticNeutralCzech@feddit.org · 19 days ago

If it’s PDF.js, it’s just Firefox’s PDF viewer (or a fork of it). I just right-click to “Show only this frame” and it goes fullscreen.

iAvicenna@lemmy.world · 18 days ago

well they have to justify the exorbitant amount of money they charge for publicly funded science articles (apart from the obvious reason of thinking about the shareholders)

FundMECFSResearch@lemmy.blahaj.zone · 19 days ago

I’ll just be happy it doesn’t ask me to make an account

IrritableOcelot@beehaw.org · 19 days ago

Truly. Also the springer nature ones load so slowly for absolutely no reason, and break 10% of the time. I really don’t get what their motivation is, do they think that after I’ve said no, I dont want a web version, I will be happy with a different web version?

affiliate@lemmy.world · 18 days ago

nothing beats having to click the download button twice. it’s my favorite

18 days ago

I use searxng and it has some option that automaticly replaces links wich just give u the pdf based on the doi or whatever its called.

werefreeatlast@lemmy.world · 18 days ago

Or a virus. It could be an exciting virus.

Kcg@lemmy.ml · 18 days ago

PDF button? Or time to create an account to get a subscription to access that PDF!

keepthepace@slrpnk.net · 18 days ago

You are welcome.