LibreOffice: Still a WTF



  • After noticing a error on my  resume/CV, I had this crazy idea to try to fix it. It was originally created in Word 2007 and saved as Word 2000/2003/97 or whatever. I know it opens and saves fine in Word, because I've changed details as needed.

    I made the evidently stupid mistake of putting my one PC with Windows/Office installed in Storage, meaning I'm stuck with this laptop running Mint 13.

     Silly me, I had this silly idea in my head, possibly planted by alien telepathy, that LibreOffice/OpenOffice was "on par" with Microsoft Word. So I decided "well, I guess I can open it with LibreOffice" I told myself naively. I can open it in LibreOffice, but at some point in the conversion process it appears to misinterpret the instructions for bullets and numbering to mean "make the entire document look like a monkey ate and passed a box of crayons and smeared the resulting shit on the document". Absolutely none of the layout was correct. paragraphs overlapped each other, everything merged underneath the header, which is where I wanted to make the change and I couldn't edit the header. I managed to make the change, using find/replace, but upon saving the document it was half the size which I'm pretty sure means that it saved the monkey crayon feces version.

    So I tried ABIWord. It was able to read the text. Everything that wasn't on a bullet, in a table, or in a header, at least. so I got a blank page. Then somebody suggested Google Docs, which worked the best out of all of them, but still decided to randomly remove bullets, and changed the font to some disgusting serif font, and I've no idea how to fix that.

     Is making a word processor really this hard? Do people who use Linux ever actually, you know, make any sort of document that doesn't start with a hashbang? Am I possibly missing something here?

     Anyway I'm basically left with doing the entire thing completely over again, which would be a pain in the ass with Word but is even more so in the abortion of a word processor I'm stuck with.

     

    EDIT: Also: Anybody have any suggestions for Word Processors that run on Linux that don't suck ass?



  • Wine + Microsoft office dude.

    Anyway, yes, writing a word processor really is that hard. You'd think that it would be easy to just put some text on the screen. But it's just not a "some text on the screen", it is a lot of intermixing formating rules, layout rules, line breaking rules and what not.



  • @henke37 said:

    Wine + Microsoft office dude.

    I may try that. Never even occured to me. Thanks. I tend to avoid WINE for a few reasons but this may be a case to make an exception.

     

     

    Anyway, yes, writing a word processor really is that hard. You'd think that it would be easy to just put some text on the screen. But it's just not a "some text on the screen", it is a lot of intermixing formating rules, layout rules, line breaking rules and what not.
     

    To be fair  LibreOffice and OO seem to work OK for creating basic documents, or opening basic documents for that matter (It was able to open my Cover letter just fine). But I think a lot of it boils down to that I'm not really interested in how hard it is to make something if it doesn't work. If it works for what I need, sure- maybe I might find some trivia about how they implemented some feature interesting, but when it doesn't work, "It's hard" merely sounds like an excuse.

     What do hardcore FOSS advocates use to write their resume's? Or do they just hold potential employers under their armpits until they give them a job? I would imagine WINE+MSOffice is not on the table for them, because of their religion. (emacs?)



  • @BC_Programmer said:

    What do hardcore FOSS advocates use to write their resume's? Or do they just hold potential employers under their armpits until they give them a job? I would imagine WINE+MSOffice is not on the table for them, because of their religion. (emacs?)

    If they're applying to a job where demonstrating that they can use LaTeX is going to impress the employer, they use LaTeX. (And they might use it anyway, although in that case they're making a point that nobody cares about.) And they write the LaTeX in their favourite text editor (or possibly LyX if they don't know it and are just pretending).

    Otherwise, LibreOffice and generate a PDF (to make absolutely sure that the HR drone receiving it can open it). Its problems with opening Word files are at least as much to do with Word's output routines being incomprehensible as their own input routines, so things written in LO from the start tend to work quite well. (Historically, it's been a bit of a crapshoot trying to open Word files; there are at least some instances of LO opening them better than Word does, although for recent Word formats, Word generally runs out. You should see what a mess Office can make of LO files, btw, if you want an argument in the other direction

     



  • @BC_Programmer said:

    Is making a word processor really this hard?

     

    Yes, it is. The spec doc for the Microsoft Word ".doc" file format is 600+ pages!

     

    And as far as opening up old files, Microsoft Word can do a great job of fucking up files from older versions without any help from open-source. 

     

    If you want a reasonably portable, editable file format then use RTF. That's what is was designed for. The Word .doc format started from essentially a binary copy of the in-memory data structures used by the program.



  • Trying to work with the .doc format, which seems to have been set up using the "Let's make this as difficult as possible" design philosophy, is, not just hard, but impossible.

    And their .xml format - basically "Let's make .doc .xml-y" doesn't make it much better. There is a 'standard', but even MS doesn't follow it. And the standard includes such lovely language as "Do this like Office97 did". 

    So, basically, Office stores your documents as scribbled-crayon-contaminated-monkey-faeces. FIFO, it seems!



  • @havokk said:

    Yes, it is. The spec doc for the Microsoft Word ".doc" file format is 600+ pages!

    True; but the C# ECMA spec is 553 pages and Mono didn't seem to have as much trouble. (This is probably an Apples/Oranges comparison though in some respects, though).

     

    And as far as opening up old files, Microsoft Word can do a great job of fucking up files from older versions without any help from open-source.

     

    I've yet to see this. Though I certainly don't doubt it. Office 2007 seemed able to handle a few old Word 6.0 files I found. Though they are so old I don't know what they are supposed to look like anyway.

     

    I'm just miffed because as it is now I'm pretty  much fucked if I need to change them for the moment (seems my Office discs are in storage now too :/). I suspect simply replacing the piece of wordart with an embedded image would allow OO to read it properly. (It would probably look better that way regardless).

     

     

     



  • @henke37 said:

    Wine + Microsoft office dude.

    Anyway, yes, writing a word processor really is that hard. You'd think that it would be easy to just put some text on the screen. But it's just not a "some text on the screen", it is a lot of intermixing formating rules, layout rules, line breaking rules and what not.

    Office Online / Office 365 / whatever its called now would be quicker and easier.



  • @havokk said:

    And as far as opening up old files, Microsoft Word can do a great job of fucking up files from older versions without any help from open-source.

    @robbak said:

    So, basically, Office stores your documents as scribbled-crayon-contaminated-monkey-faeces. FIFO, it seems!

    Slashdot's leaking again.



  • @havokk said:

    Yes, it is. The spec doc for the Microsoft Word ".doc" file format is 600+ pages!
    Come to think of it, I once used a document from Microsoft (for which you need to sign an NDA, god forbid that somebody would tell the world what a crap format it is) to write a Word importer. It's not a nice format. Apart from the pure binary structure, the blocks of data are all over the place. You need to open it as a random access file and jump all over the place. Obviously, this information was not present in the document they sent us, presumably because they supposed that we'd be using some Microsoft library to open the files.

     


  • Discourse touched me in a no-no place

    @BC_Programmer said:

    Is making a word processor really this hard?
    If the word processor concerned is not called Word, and you're handing it a .doc that was created in Word, yes.



  • @BC_Programmer said:

    @havokk said:

    Yes, it is. The spec doc for the Microsoft Word ".doc" file format is 600+ pages!

    True; but the C# ECMA spec is 553 pages and Mono didn't seem to have as much trouble. (This is probably an Apples/Oranges comparison though in some respects, though).

    ECMA C# Specification is actually decently standardized, is publicly avaiablable and has been open for comments before every version.
    The .doc specification is Microsoft internal and only viewable for outsiders by signing a NDA. Furthermore it's a legacy and crappy mess that has things like "If this then parse with the Office 2000 code"

    OpenXML (the 2007/2010 format) is somewhat better, but still contains a lot of crap.
    It was first specified by ECMA in 2006, later by ISO in 2008. ISO 29500 has a transitional and strict format. The transitional is similar to the ECMA variant and the one that Microsoft originally submitted, but it was rejected because, among other things, it allows the inclusion of binary parts according to (propretairy) Office 97-2003 spec. So that means the ECMA/Transitional format is complete bullcrap.
    The strict variant is relatively decent, but still not supported by Office 2010 (came out 2 years after the standard was finalized). Most open-source implementations can read ISO 29500 strict relatively fine, but since Office 2010 doesn't generate it so you still have the compatibility issieus. I don't know if Office 15/2013 will (Microsoft did promise it would in 2010) and whether it will be the default (it definitely won't be if you convert a 97-2003 file).

    Note that the ODF spec isn't that much better. The first 2 versions (from 2005, aka OASIS 1.0 and ISO 26300:2006 and 2007 aka OASIS 1.1) contain major fault making them practically unusable (both Microsoft and Openoffice say so) and thus have never been used in OpenOffice or other software, they used variants*. A usable one (OASIS 1.2) was defined in 2011 and OpenOffice/LibreOffice and other OO software conform relatively well to it. Microsoft said it will support OASIS 1.2 in Office 15/2013 so hopefully opendocument will finally become a usable interchangeable format.

    * A lot of important things are unspecced, e.g. various offsets and sorting order. Most Open source packages followed the de-facto (i.e. OpenOffice) way, but Microsoft Office often did things otherwise, most say deliberately. Fact remains that the spec is at fault for leaving this kind of room for interpretation.



  • @dtech said:

    ECMA C# Specification is actually decently standardized, is publicly avaiablable and has been open for comments before every version.
    The .doc specification is Microsoft internal and only viewable for outsiders by signing a NDA. Furthermore it's a legacy and crappy mess that has things like "If this then parse with the Office 2000 code"
     

    I don't have to sign an NDA to view or download the files from [url=http://msdn.microsoft.com/en-us/library/cc313118.aspx]MSDN.[/url]. Seems it's now almost 600 pages. I can't find anything like "if this then parse with Office 2000 code". Maybe that was cleaned up.

     

     



  • @havokk said:

    @BC_Programmer said:

    Is making a word processor really this hard?

     

    Yes, it is. The spec doc for the Microsoft Word ".doc" file format is 600+ pages

    600+? Well, good thing you're not trying to write an Office Open XML importer, because its specification is 6546 pages long:

    [img]http://blog.janik.cz//images/OOXMLSpec.png[/img]



  • On the original topic, might the problem be that your old computer used some fancy font which is not installed on your laptop?

    Also, I know some linux distributions are pretty strict about not installing non-OS stuff, so your laptop may very well be missing some essential font which is being replaced with an improper substitute when you open it on your laptop.

    No comment on Google Docs, I don't use it that much.



  • If you're not opening a .doc on the same version as it was written, chances are it could look like crap. I've had documents come through with colours all crazy, just because it was written in a different version of Word to the one I was using, and you'd think something like colours is a simple thing to get right?

    As for the bullets problem, I recently received a .docx file that had been edited by two different versions of Word, and the bullet styles came out completely messed up when it finally got to me, with some bullets inserted as bullet characters and tabbed indentation on a regular line of text and some left as actual list item. All parties used the list button to create them, so it's just some WTFery coming from how different versions deal with the bullets. If even Word can't get this right when they're in full posession of the spec, how is anyone else meant to be able to deal with this level of crap when there are things not even in any official spec for them to refer to?



  • Even SAME versions of Word can be troublesome. 

    When I was working on my master's thesis, I began working in the Word-version-of-the-day (Office XP, I think). I had section breaks, column breaks, table of contents, index ... the whole she-bang. I was using style-based formatting.

    I passed it to my advisor who was annotating it with comments and such. He was using the same version of Word that I was.

    When he passed it back to me, the document was a mess ... the section breaks were all fouled up, the index and table of contents were both completely and utterly broken. And the styles had been undone. He swore all he did was add the comment annotations.

    I spent a week trying to fix it in-situ.

    I gave up and went to LaTex. I would then give him a PDF to mark up using Acrobat.



  • Been there, done that. 

    Is there any chance that your advisor has a different make/model of PRINTER than you do? 

    I've seen a lot of documents f***ed by Word just because of this.



  • @drclaw said:

    I've seen a lot of documents f***ed by Word just because of this.

    I can attest to this as well, especially in documents that have carefully formatted full-pagewidth tables.



  • Do you want me to get into the whole "specs are meaningless for products like spreadsheets and word processors" thing again?



  • The real problem is that the specs are not pixel exact and do in fact not even attempt to specify exact rendering algorithms.



  • Is now a good time to point out that margins and layouts in Word used to be (if they aren't still, for old formats) set based on, and impacted by, the page margins of your default printer?

    Yes.

    Your printer.



  • @BC_Programmer said:

    What do hardcore FOSS advocates use to write their resumes?
     

    HTML.

    Or some other editing tool, but the final output is saved as PDF. 



  •  I keep trying to be productive on Linux too, particularly as i am developing Linux to go onto an embedded device.... but it's just not productive. I just do what i have to do (generally on ubuntu) and then get the crap out.  It has no nice exchange clients, no office word/excel, no nice code editors, no indesign, no nice charting tools, PCB software is 7 years backwards.  Web browsers not so bad, streaming music ok, scripting OK but everything I need to script is on windows.  Apart from the kernel/tcpip stack the rest (gui) is valueless without nice apps.... maybe this is why they give it away?

    Why does it take about 4days to get a new system up and running with drivers settled? All 4 of my dev PCs have different driver issues.



  •  ObvDeviation: a PowerPointless 2007 document displayed different bullets when viewed in the PP Viewer - for some reason the bullets were a white 10 in a black blob.

    We put it down to the difference in typefaces installed across different machines. Installed full-blown PP and it rendered correctly, so theorised the full package bought some other fonts along as part of the install.

    (Was cheaper to get another licence for the full package rather than get a techie to fathom out what was wrong with it, given that other people had similar issues and the problem didn't look like it was going to be solved anytime soon.)



  • The Linux desktop experience is a stuttering clusterfuck.  Completely useless unless you are desperate for something to waste your time on.



  • @zelmak said:

    I passed it to my advisor who was annotating it with comments and such. He was using the same version of Word that I was.

    When he passed it back to me, the document was a mess

     

    I don't...

    what do you mean by "passed"?

     


  • Discourse touched me in a no-no place

    @Helix said:

     I keep trying to be productive on Linux too, particularly as i am
    developing Linux to go onto an embedded device.... but it's just not productive.
    I just do what i have to do (generally on ubuntu) and then get the crap
    out.  It has no nice exchange clients, no office word/excel, no nice code
    editors, no indesign, no nice charting tools, PCB software is 7 years backwards.
    Have you considered upgrading the 512mb flash to a 2gb card?@Helix said:
    Web browsers not so bad,
    Remind me to surreptitiously get Lynx into our next build... all we have at the moment is curl and wget.@Helix said:
    streaming music ok,
    You have speakers?



  • Bullets and Numbering in Word has been broken since at least Word 97 (possibly earlier) and hasn't really been fixed since. Trying to set up anything even slightly different from the supplied settings can cause utter mayhem in your document. The Bullets and Numbering dialog in Word 2010 sits atop the same screwed-up code it always did, and can screw things up in exactly just the same way we'veknown and loved all these years. So if you plan to use bullets in a Word document which you plan to attempt to open in any other WP application, you'd genuinely be better off setting a hanging indent manually and then pasting the appropriate bullet character (if necessary, making just the bullet character Wingdings font so you can use a tolerable glyph). That will give you — and your OO WP application — much less grief. But as others have said, it will still to some extent be a crapshoot. And though I don't often agree with blakey, he's spot on when he suggests saving the document as RTF. It's still a horrible, verbose file format, but at least RTF is a much more comprehensible and better understood horrible, verbose format than the block-wandering binary WTF-fest that is Microsoft DOC format. That is all.



  • @spamcourt said:

    @havokk said:

    @BC_Programmer said:

    Is making a word processor really this hard?

     

    Yes, it is. The spec doc for the Microsoft Word ".doc" file format is 600+ pages

    600+? Well, good thing you're not trying to write an Office Open XML importer, because its specification is 6546 pages long:
     

    To be fair, 6000 is included in 600+. And yes, that is an actual photo of the spec, it become famous once it was taken in a court. But when looking at it, keep in mind that THIS IS NOT COMPLETE. There are hundreds of references to internal specs (or at least we hope the Office team actualy have the specs) not contemplated there.

     

    About my CV, at the time it made any difference, I used to write it in LaTeX. Much better presentation than Word (I had nothing against writting it in Word, except that it can't handle formating text, and makes your document look like garbage). Nowadays I simply don't care. The last time I wrote a CV, it was text only, at the body of an email.

     



  • @Cad Delworth said:

    The Bullets and Numbering dialog in Word 2010 sits atop the same screwed-up code it always did, and can screw things up in exactly just the same way we'veknown and loved all these years
    Burn me once . . .

    I got lambasted by Blakey [url="http://forums.thedailywtf.com/forums/p/25555/277043.aspx#277043"]some time back[/url] for not wanting to use the Word master documents feature any more after getting burned by it in Word 97.  He declared that I had a "mental illness" for not wanting to use Word's "time-saving features".  If Microsoft isn't going to fix a commonly-used feature like bullets, how am I supposed to have any faith that they've fixed the problems with master documents that corrupted my document?

    I've used OpenOffice and LibreOffice very successfully, thank you.  My daughter has used them for schoolwork even when they needed to be e-mailed to the teacher.  You can use a Craftsman, Bosch, DeWalt . . . their interfaces are all a little different, but they all get the job done.  Just like Microsoft Office, OpenOffice, and LibreOffice.



  • @BC_Programmer said:

    ...EDIT: Also: Anybody have any suggestions for Word Processors that run on Linux that don't suck ass?

    No.

    But: if you really want to edit your screwed-up .doc document from a platform with no real support for them (like Linux), then get yourself a Microsoft Live account and load the .doc into Skydrive, then edit it with its Word Web App. It'll still be a screwed-up .doc document, but you'll be able to edit it just like on Windows and it won't be screwed up any more so than if you edited it in Word on Windows.



  • @zelmak said:

    Even SAME versions of Word can be troublesome. 

    My experience (with mid-90s versions of Word, mind) is that this is often to blame on printer drivers: if you chose a different printer or opened the document on another computer that had different printer drivers installed, Word would repaginate the document. This almost invariably lead to hard page breaks near the bottom of pages moving onto the next page, causing (almost) completely blank pages and throwing off the table of contents, because you had to actively tell Word to update that.

    Of course, the real WTF is using a word processor for page layout, or at least distributing word processor documents on the assumption the layout will come through correctly. (Something I've been guilty of myself, in the days before I found software to create PDFs on those old Twilight CD-ROMs.)



  • @Cassidy said:

    @BC_Programmer said:

    What do hardcore FOSS advocates use to write their resumes?
     

    HTML.

    Or some other editing tool, but the final output is saved as PDF. 

    LATEX


  • BINNED

    @Gurth said:

    My experience (with mid-90s versions of Word, mind) is that this is often to blame on printer drivers: if you chose a different printer or opened the document on another computer that had different printer drivers installed, Word would repaginate the document. This almost invariably lead to hard page breaks near the bottom of pages moving onto the next page, causing (almost) completely blank pages and throwing off the table of contents, because you had to actively tell Word to update that.

    Of course, the real WTF is using a word processor for page layout, or at least distributing word processor documents on the assumption the layout will come through correctly. (Something I've been guilty of myself, in the days before I found software to create PDFs on those old Twilight CD-ROMs.)

    I encountered that with LO too.

    My computer runs Linux. I used Calc to make a basic spreadsheet (only because it's less hassle than playing with tables in Writer, it was just for some mundane text data that has to be filled out). My coworker opened it on his Windows computer in Calc and, for some reason, all the cells ended up being higher, resulting in table breaking to a new page. We both use the same printer (it's networked), both using official HP drivers for our OSs respectively.

    I blamed the font at first but I installed it on the Windows machine and it works properly (except looking awful on the screen, don't know if it's the font or Windows version of LO but anti-aliasing seems to be busted).

    I wonder if default margin values are different on Windows and Linux for some reason? I didn't fiddle with it, and I know my coworker didn't either. Might have to check that.


Log in to reply