What... how... you made PDF text selection even more broken?



  • It's bad enough that copying text from a PDF usually results in an ill-formatted block of text... but this is just ridiculous.

    (animated gif)

    I can't even.



  • PDFs have never worked properly for selecting text or copying images. I always assumed that it was some insane form of anti-piracy...as if OCR software didn't exist or the print screen key wasn't on every keyboard ever.



  • It's probably an option. But the Preferences page has too many options - I can't be bothered.

    Plus, that's an old version of reader. The current series is now "Reader DC". Not that the built-in check for updates will find that. But Firefox will identify the plugin as out-of-date and kick you over to the adobe webpage (which, of course, wants to install 3rd party shit)


  • FoxDev

    @anotherusername said:

    It's bad enough that copying text from a PDF usually results in an ill-formatted block of text... but this is just ridiculous.

    I can't even.

    damn.... now that is an abuse of the PDF formatting that i have not seen in years.

    i thought all the PDF generator apps had finally got it through their heads that it's a bad idea to obfuscate the text like that.



  • @anotherusername said:

    I can't even.

    I can't even see the problem. Someone wanna clue me in, please?



  • @dcon said:

    adobe

    You found the Real :wtf:


  • Garbage Person

    There are raisins. I could explain them, but don't have the time ATM.


  • FoxDev

    @Weng said:

    There are raisins. I could explain them, but don't have the time ATM.

    /me sits and waits patiently.

    this should be good......



  • I've always assumed it's because of messed up Postscript. Something like each word being its own element and not being in the right order.


  • FoxDev

    @Dragnslcr said:

    I've always assumed it's because of messed up Postscript. Something like each word being its own element and not being in the right order.

    that would be it, and the reason for that more often than not is a postscript generator that's "copy Protecting" the PDF



  • @LB_ said:

    PDFs have never worked properly for selecting text or copying images. I always assumed that it was some insane form of anti-piracy...

    Not really. It's because PDFs were designed for layout and display, not editing. They don't have "paragraphs", they have text elements which can contain any length of text on one line. It's basically the equivalent of writing a whole document using <div style="position:absolute;white-space:nowrap;overflow-x:visible;"> elements that each contain 1 non-breaking line of text and have arbitrary x,y coordinates. That's fine if you are just trying to lay something out to print, but shit to try to go back and edit.

    Typically, there will be one text element for each line of text, and they'll follow one another in consecutive order, vertically down the page. They don't have to, though, and that's when you get stupid shit like this.

    @dcon said:

    It's probably an option. But the Preferences page has too many options - I can't be bothered.

    Plus, that's an old version of reader. The current series is now "Reader DC".

    It's not an option. That's the default text select, and I doubt upgrading Reader would help.

    @accalia said:

    it's a bad idea to obfuscate the text like that.

    Other parts of the same PDF were fine, so ... I dunno.

    @NedFodder said:

    I can't even see the problem. Someone wanna clue me in, please?

    Note the position of the mouse cursor; I only highlighted text on one line. The first text I highlighted in that animation should have been

    of the Aprisa SR+ platform is its use of quadrature
    

    Instead it selected a vertical stripe of text.

    Also, even the column select tool is broken. It's possible to highlight just that rectangle of text:

    but the text pasted is:

    o
    of the Aprisa
    a SR+ platform
    m is its use o
    of quadrature
    

    Note that even if you take out the line breaks which shouldn't be there, the first/last character of each chunk of text is repeated when it shouldn't be.

    @Dragnslcr said:

    Something like each word being its own element and not being in the right order.

    That's pretty much exactly what this is, although it's not words per se; it's just short strips of text that are arranged in a down-then-across pattern.



  • Some Jstor pdfs do this. Very irritating, since you have to type out quotes instead of doing the sane thing.



  • According to the PDF properties, it was generated through some process of Microsoft Word -> Acrobat Distiller 9.3.2 (Windows), and the PDF version is 1.6 (Acrobat 7.x).



  • @anotherusername said:

    animation

    OK, I'm TRWTF, I have gif animation turned off.

    I used to produce lots of PDFs in grad school. My process used to be latex -> dvips -> ps2pdf, which produced a lot of the same problems everyone here is complaining about. Then I switched my process to just pdflatex, which produced files that you could actually select text in.



  • @anotherusername said:

    It's not an option. That's the default text select, and I doubt upgrading Reader would help.

    That was my other thought - dumb generation.


  • :belt_onion:

    :wtf: I'll pull a Blakeyrat here and ask what kind of timepod the creator's living in, since Word (since 07 IIRC) has natively been able to export PDFs without that kind of crap as a side-effect...



  • This doesn't answer your question, but the creator's a major engineering firm.


  • ♿ (Parody)

    @anotherusername said:

    Not really. It's because PDFs were designed for layout and display, not editing. They don't have "paragraphs", they have text elements which can contain any length of text on one line. It's basically the equivalent of writing a whole document using
    elements that each contain 1 non-breaking line of text and have arbitrary x,y coordinates. That's fine if you are just trying to lay something out to print, but shit to try to go back and edit.

    I remember a project a while ago where we needed to get the data out of a ton of PDFs. They were all reports directly from some automated system (as in, not scanned from a dead tree version). So they all looked pretty much identical. But...they weren't. A lot of stuff was juuuuust a few pixels off.

    So it was much more complicated to extract the data than it seemed like it should have been.



  • I ran across a publisher's website recently where teachers could buy worksheets and other stuff pertaining to teaching.

    In order to buy the worksheets (and the corresponding solutions) you had to go through a rigmarole of registration - send them an official document that, yes, I am indeed a teacher and all that.

    However, they also had a preview button. Which gave me the whole PDF plus solutions, just with a "PREVIEW" watermark.

    I'm not quite sure what they were thinking.


  • :belt_onion:

    Well it's fine because you can only buy the solutions if you're a teacher. And if you're a teacher and trying to print it out, you'll have PREVIEW watermarked all over the page.

    What, you think I'm missing something?



  • They're thinking that the only people who would be interested are people who will buy and redistribute and won't want the watermark. There are plenty of better sources of information if all you want is the information.


  • :belt_onion:

    However, there are few better ways to cheat...



  • All the (four) textbooks I've written come with full solutions in the back. Cheaters only cheat themselves.


  • :belt_onion:

    @Captain said:

    All the (four) textbooks I've written come with full solutions in the back. Cheaters only cheat themselves.

    Depends on the book. I've never seen one with all the solutions in the back, but I have seen a few with the odd-only answers.



  • @dcon said:

    The current series is now "Reader DC". Not that the built-in check for updates will find that.

    Oddly enough, the updater that runs in the background on startup did offer the DC download, but I dismissed it. The 'Check for updates' option in the Help menu did not.



  • Along those lines, has anybody else else ever tried to copy and paste text out of the Kindle Cloud reader? Seems Amazon really doesn't want you to do that. Last time I tried, I think every letter was in its own <div> or something. The most workable thing I found was to get a whole big blob of stuff in dev tools, then copy it out as HTML, and use a text editor find/replace to strip the HTML out.


  • kills Dumbledore

    @ufmace said:

    copy it out as HTML, and use a text editorRegEx find/replace to strip the HTML out

    TPTFY


  • Discourse touched me in a no-no place



  • Oh wow. Okay, so I've upgraded to Reader DC, and pressing any key fucks up the text selection tool so badly that it doesn't work until you exit and restart Reader. As in, you can't select anything with the mouse anymore.

    Needless to say, it's impossible to use the column select tool. To use the column select tool, you have to hold Alt while selecting text, and holding Alt is pressing a key, and pressing a key breaks it.

    I'm wondering if just my installation is fucked or if it's actually Reader DC... oh well, time to downgrade.

    :wtf:


Log in to reply