Stripping is causing a problem.
-
I am finding bug in somebody else's code. This is data related. We have one record.
The name of business is "Nagesh Bakery & Café"
The line of code that is responsible for last character getting mangled like this is
stra = "{0}".format(i['Name'].encode('utf-8').strip())
This is in python. If Ir remoe the strip part of the code, I get another error. The error is
'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)So Nagesh has applied simple solution and replaced last character with regular "e".
Unfortunately this is not acceptable.
So how to keep the e with accent and have it too>Side note - if English had won 100 years war, then we would not have to suffer this.
-
Try using AWK... I hear it can handle Unicode.
-
-
@Nagesh try making the string a unicode string. stra=u"{0}".format(...
-
@anotherusername said:
Try using AWK... I hear it can handle Unicode.
This is coding help and I am not appreciation of your joke.
-
Awkward...
-
@Nagesh Well, I don't see your "e with accent", only hieroglyphs:
-
@devjoe said:
@Nagesh try making the string a unicode string. stra=u"{0}".format(...
Now getting this error.
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 25
-
And just in case:
é
- é
é
- éYep, works both ways. NodeBB handles it just fine.
-
-
@DoctorJones Fuck you, I'm commuting right now. I thing all other passengers think I'm positively mental right now for laughing my ass off at seemingly nothing.
I guess what I'm trying to say is: , because an upvote is not enough.
Filed under: Ass, keyboard, ass. It's a perfectly good word, you censoring fuck.
-
@Onyx what can I say? I know my audience. ;-)
-
@Onyx said:
I guess what I'm trying to say is: <3 , because an upvote is not enough.
You can always <3 this post:
Filed under: unfortunately, you can't 3< this post
-
-
@Nagesh said:
Now getting this error.
So.... what's trying to convert it into ASCII? Tell it to stop, and you'll be fine.
I don't know much about Python, but if it's coercing it down to ASCII for some other reason (like, database column accepts only single-wide characters) there might not be anything you can do in this situation...
-
@Nagesh said:
if English had won 100 years war, then we would not have to suffer this
We beat the French on the Plains of Abraham in 1759, guess what we got for it...
-
@Tsaukpaetra said:
I don't know much about Python, but if it's coercing it down to ASCII for some other reason
assuming Python < 3 point.... 3ish i think it'll bo coercing down to ascii because "Unicode is hard gaiz, can't i just use ascii like all the cool kidz?"
-_-
no, seriously.... that's rather why i stopped using Python. that and the fact that Python 3 was first released 2008-12-03 and it is now well into 2016, five years later and still MASSIVE parts of the python ecosystem are still stuck on Python 2, and most of them have no plans to upgrade.
edit: wow.... i can't math gud at all.... it's OVER SEVEN YEARS YOU SILLY FOX! not five years.
you are obviously too tierd to think straight so i'm only going to say this once. @accalia, put down the keyboard and GO TO SLEEP.
-
@Nagesh Is this Python 2? Can you do a
repr(i['Name'])
on that line and tell us what it returns?
-
@smallshellscript said:
We beat the French on the Plains of Abraham in 1759
Who hasn't beaten the French? We did it in 1302.
-
-
Why are you making up countries?
-
@Nagesh
stra = i['Name'].strip().encode('utf-8')
-
@anonymous234
Yes. 2.7
-
OK, so I don't know exactly what's happening. But:
- It's trying to encode character u"\xa0". That's U+00A0 NO-BREAK SPACE, so you probably have it in your data, you might want to sanitize that. Although I suppose that's what the strip() is for.
"{0}".format(...)
is unnecessary. It's equivalent to str(...), but you're already calling encode() which returns a string.- i['Name'] should be a Unicode string (represented as u"something"). If it's a normal string, then that encode() is redundant and should carry a decode() first (yes, it's extremely confusing that you're allowed to encode an already encoded string). Unless you somehow changed the default encoding, but that's extremely uncommon and extremely discouraged.
- .strip() should be applied directly to the unicode string (before the .encode()).
In Python 2, mentally substitute
str
forbytes
.unicode
objects (u"...") are the real strings.I suggest you either migrate to Python 3, which does Unicode Right™, or convert all your data to UTF-EBCDIC so that developers will have to explicitly encode and decode and you won't have any more problems of this kind.
-
@anonymous234 said:
Python 3, which does Unicode Right™
I'm not convinced that it does, but it definitely gets it more right than Python 2 ever did.