C stringsþÝ«ÌΉŠ‹ÿ
-
Can I do stuff like this with these format strings?
Yes.I'd hope it can do everything printf can? Placeholder width, precision, truncation. Does it include time formatting?
Sadly, no. Also, the bigger problem is that format string must be provided at compile-time.
-
It's an OS limitation.
Hmmm. I thought some of those limits aren't true under NTFS but am not an expert.
http://technet.microsoft.com/en-us/library/cc781134(v=WS.10).aspx says file names (not paths) can be 255 Unicode characters, FWIW, and I thought that the native functions can get around the limits, but admittedly most people won't use those.
-
I wish Tuxera hadn't taken down their excellent NTFS internals page as it makes finding this information difficult.
...
According to some other guys, NTFS stores in a byte the number of characters, not bytes, in the name as I'd originally believed. So yes, 255 Unicode characters per path element.
-
Like I said, I thought I had read years ago that if you use the native API instead--remembering that Windows is, at this level, a personality over NT, like Posix, or the old OS/2 subsystem--the length and character restrictions go away, but I couldn't find anything to back that up on a quick search just now, so it could well be wrong.
-
Like I said, I thought I had read years ago that if you use the native API instead--remembering that Windows is, at this level, a personality over NT, like Posix, or the old OS/2 subsystem--the length and character restrictions go away, but I couldn't find anything to back that up on a quick search just now, so it could well be wrong.
Precede the filename with
\\?\
to make the length restrictions go away. It says so right here…
[spoiler]Top hit for “windows long file names prefix”…[/spoiler]
-
Precede the filename with \?\ to make the length restrictions go away.
It wasn't just that. Like I said, I had thought you could get rid of the restricted characters limit, too.
-
\\?\ allows you to have files that start/end with dots/spaces and have reserved device names (NUL, etc.).
But you still can't use the reserved characters (such as *) via win32 apis.
-
\?\ allows you to have files that start/end with dots/spaces and have reserved device names (NUL, etc.).But you still can't use the reserved characters via win32 apis.
If it mattered that much, to unreserve the characters you could use the \?\ (or like I said, I though the Native API could do it, although of course that's mostly undocumented) instead of the win32 apis.
Time to switch to the Bad Ideas thread?
-
Better to use
ContainsKey
for that anyway. Nulls make better sense as an optional extra than as default functionality; 99% of variables never need to hold a null, and having to wrap the ones that do in Nullable<T> serves the extra purpose of pretty much forcing a null check before anyone can even get at the value.
-
NAME
strlen - calculate the length of a stringSYNOPSIS
#include <string.h>size_t strlen(const char *s);
DESCRIPTION
The strlen() function calculates the length of the string s, excluding the terminating null byte ('\0').RETURN VALUE
The strlen() function returns the number of bytes in the string s.
Everything works according to spec. Again, it's programmer's fault to assume the length of string is number of characters in it.I've emboldened the ambiguous parts. "length of string" can mean quite a few things:
- number of characters (counting combining characters as separate characters)
- number of screen spaces occupied (combining characters don't contribute to the result)
- number of "screen cells" occupied (
result += 1
for most characters, butresult += 2
for wide characters) - number of bytes used in memory for the string in the particular encoding/normalization it is represented in (this seems to be what
strlen()
calculates) - ... ?
I also wouldn't assume anything wrt how
strlen()
behaves with strings where \0 is a valid part of the encoding of some characters; counting bytes until the first \0 is incorrect for encodings that have those.Of course the programmer shouldn't use
strlen()
unless what he's interested in is whatstrlen()
calculates, especially when dealing with strings that are not entirely composed of US-ASCII characters.
-
I've emboldened the ambiguous parts. "length of string" can mean quite a few things:
Don't look at description but at return value, because your code doesn't deal with description but with return value. And return value is unambiguously documented as number of bytes.I also wouldn't assume anything wrt how strlen() behaves with strings where \0 is a valid part of the encoding of some characters; counting bytes until the first \0 is incorrect for encodings that have those.
Except ISO C forbids \0 in text strings.Of course the programmer shouldn't use strlen() unless what he's interested in is what strlen() calculates, especially when dealing with strings that are not entirely composed of US-ASCII characters.
strlen()
is usually used to determine how far you can iterate from the pointer or how much memory you need to allocate when copying rather than how much screen space will be occupied.
-
I also wouldn't assume anything wrt how
Which is why you usestrlen()
behaves with strings where\0
is a valid part of the encoding of some characters; counting bytes until the first\0
is incorrect for encodings that have those.wcslen()
ormb_strlen()
instead. Right tool for the right job.
-
This post is deleted!
-
strlen()
... dealing with strings that are not entirely composed of US-ASCII charactersThat sounds like UB to me.
-
Maybe. Depends on whether there is a null terminator at all. So long as you have the null terminator inside valid memory, it's not.
First, one must understand what a string means in C. A string is a null terminated sequence of bytes. It is not a sentence, or text, or anything else. It's not whatever the programmer thinks he's passing to the function. If you use a multibyte encoding that allows '\0' characters, the first c-string in the memory will go from the first memory address you gave the function to the first '\0' it encounters. If the first byte is '\0', it's considered to be an empty string of length 0. So, if you have the array:
{ 'a', 'b', 'c', 0, '1', '2', '3', 0 }
You have two c strings, one starts at 'a', and is of length 3 (exclude the null terminator in the length), and the second one starts at '1' and is also length 3.
Similarly, if you hand a multibyte encoding, strln will return the number of bytes until the first null in the array.
-
I understand what
strlen()
does with achar*
, I'm just not really sure why anyone would be interested in the result it gives when the pointed-to string is a multibyte encoding.
-
You have two c arrays, one starts at 'a', and is of length 3 (exclude the null terminator in the length), and the second one starts at '1' and is also length 3.
Nitpick: 2 strings. One array.
-
Just pointing out it's not UB, unless there is no null terminating character. It may not be useful, and it may result in a bug in the program, but it's a well specified bug.
Nitpick: 2 strings. One array
Corrected. Good catch. I meant to say string but got it mixed up.
-
-
I understand what
strlen()
does with achar*
, I'm just not really sure why anyone would be interested in the result it gives when the pointed-to string is a multibyte encoding.That all depends on what your program intends to do. If you are interested more in the strings as a whole than in the individual characters, strlen() is the function you want because it tells you how much memory to allocate.
Character counts are mainly interesting when doing position-based substring operations, determining length limits (EG when inserting into a database column with character-based length limit), etc.
-
That all depends on what your program intends to do. If you are interested more in the strings as a whole than in the individual characters, strlen() is the function you want because it tells you how much memory to allocate.
Well, that's the whole point of what's being discussed. If you give strlen a sequence of characters with a multibyte enconding, such as UTF16, some of your bytes are going to be null because you are supposed to read characters many bytes at a time. So you're not going to receive the size you need to allocate, which is going to lead to bugs. tar wondered if that was UB, I explained it was not. It is well defined, but wrong.
-
I think the fact that it is wrong is probably more significant than the fact that it is defined behaviour though.
-
I think the fact that it is wrong is probably more significant than the fact that it is defined behaviour though.
Allow me to rephrase...
It's FUCKING WRONG you asshole. Defining behavior to be WRONG is still wrong and you're a terrible fucking person for liking stuff being wrong. This is why everything is shit!
-
I feel strangely aroused now...
-
It's FUCKING WRONG you asshole. Defining behavior to be WRONG is still wrong and you're a terrible fucking person for liking stuff being wrong.
I'm not sure who you're addressing this to.
-
-
Nice blakeyrant.
-
Thanks.
-
-
If you give strlen a sequence of characters with a multibyte enconding, such as UTF16
Then it means you've castedchar16_t
intochar
and treat it aschar
-string instead ofchar16_t
-string. THAT'S your problem, notstrlen()
.
-
No one said strlen was the problem. The person that came closest to that was OffByOne, and even they clarified:
@OffByOne said:Of course the programmer shouldn't use strlen() unless what he's interested in is what strlen() calculates, especially when dealing with strings that are not entirely composed of US-ASCII characters.
Aside from that, the problem is not just that the type is different. The problem1 is that a c-string has specific rules that are not necessarily enforced by every pointer to char16_t even. I could pack several c-strings of valid UTF-16 text into a single array of char16_t, one after the other, and if I use strlen thinking it will walk the array until the end of the array I would get surprising and unexpected behavior, which is wrong for your program but well defined according to the language. Which is what I meant to highlight before.
- Problem meaning "the tricky bit that catches beginners unaware". It's not a problem in itself.
-
I could pack several c-strings of valid UTF-16 text into a single array of char16_t, one after the other, and if I use strlen
Sorry to cut you off mid-sentence, but why would you want to use
strlen()
on achar16_t*
?
-
Assuming there's a version of strlen for char16_t that behaves similarly, giving you the number of words instead of bytes. Should have clarified that.
-
-
Naming things is hard.
-
Yes it is. So what?
-
I meant modulo control characters; just characters with values >31 and <128.
-
Wait, are we talking about C or C++ here?
C++ seems to have
std::char_traits<char16_t>::length(char16_t*)
for returning the number of characters instead of bytes.(and of course
std::char_traits<char32_t>::length(char32_t*)
forchar32_t
since it's actually a template)
-
Great another thread about stuff that only C++ programmers understand ... ;-)
-
It's a very elite club...
-
(Not to imply that knowledge of dark C++ secrets is intrinsically of any value...)
-
Great another thread about stuff that only C++ programmers understand ... ;-)
Better than the PHP threads.
-
-
we need more of those...
I never understand those. Some of the C/C++ goes over my head, but I learn from them. MEGO during PHP time.
-
MEGO during PHP time.
mine too.
but i want to see the person who would be posting those topics more than i want the topics.
he's been absent too long. :-(
-
-
It reminds me of old gentleman talking about arbitrary classifications on things that nobody else in the pub understands.
-
Old men can be pretty racist when they've had a few...
-
characters with values >31 and <128
Awesome! We've still got a DEL (127)…
(The term you were looking for is printable ASCII characters.)
-
Great another thread about stuff that only C++ programmers understand ...
Because only C++ has those problems.