I think i might be producing WTF code

Heikki Artiainen

I think my preferred style of coding might be an abomination. I tend to make solutions with lots of public static variables and public static methods that implement the "api" for that class.
I cleaned up the code shown a bit to make it more like what i would pass on(translated comments and re-read them and added a few, but not many). Deleted contents of most functions (they all hold the same style so including everything felt reduntant)This is for a project that only i will work on and maintain. Would still like to hear your opinnions on just how bad this is, since one the goals of the project is to just get better in readability.

I kind of fear the maintenance guys i have passed code onto have been afraid to call me out, since i did not hear complaints.

public class Post : IComparable<Post>
	{
		public static Point center = new Point(0, 0);
		//Post.point.toString() -> post
		public static Dictionary<string, Post> posts = new Dictionary<string, Post>();
		//when to delete -> post. Used to find disliked post quickly
		public static UtilityClasses.OverflowDictionary<long, Post> dyingThreads = new UtilityClasses.OverflowDictionary<long, Post>();
		//ordered by inevitable deletion moment. Used to delete naturally dying threads
		public static UtilityClasses.IndexedQueue<Post> deleteQueue = new UtilityClasses.IndexedQueue<Post>(1000);

		//posts sorted by axises, used to pull closest posts to user
		public static Loyc.Collections.BDictionary<float, Point> AxisXpoints = new Loyc.Collections.BDictionary<float, Point>();
		public static Loyc.Collections.BDictionary<float, Point> AxisYpoints = new Loyc.Collections.BDictionary<float, Point>();

		string storageLocation, type;
		long timePosted, deleteAt;
		double angle, distance;
		int likes = 0;
		bool uploadComplete = false;

		JObject serializedForm;
		CloudBlockBlob blob;
		Point point;

		static string createPost(string json) {
			JObject o = JObject.Parse(json);
			Point pos = new Point(JsonConvert.DeserializeObject<float>(o["x"].ToString()), JsonConvert.DeserializeObject<float>(o["x"].ToString()));
			string storageLocationName = pos.ToString();
			CloudBlockBlob blob = AzureStuff.cloudBlobContainer.GetBlockBlobReference(storageLocationName);

			string readLink = blob.Uri + blob.GetSharedAccessSignature(AzureStuff.readOnlyPolicy);
			string uploadLink = blob.Uri + blob.GetSharedAccessSignature(AzureStuff.uploadPolicy);
			Post p = new Post(pos, o["type"].ToString(), readLink);

			posts[storageLocationName] = p;
			deleteQueue.push(p);
			return uploadLink;
		}
		
                //called often, the rest of the solution is built around this
		static string getPostsAroundPoint(Point p)
		{
			JArray arr = new JArray();
                        //lots of code
			return arr.ToString();
		}

		static void markPostAsUploaded(string loc);
		static void updateBackup();

		//TODO synchronizes likes across clients
		void updateLikeCounts();

		string serialize();
		string toString();

		static Post deserialize(string json);
		void calculateDeleteMoment() {
			deleteAt = timePosted + effectOfLikes();
			if (deleteAt < DateTimeOffset.Now.ToUnixTimeMilliseconds()) {
				delete();
			}
		}

		//used to update time till deletion
		int effectOfLikes() {
			switch (likes) {
				case -1: return (int)TimeSpan.FromMinutes(60 * 12).TotalMilliseconds;
				case -2: return (int)TimeSpan.FromMinutes(60 * 8).TotalMilliseconds;
				case -3: return (int)TimeSpan.FromMinutes(60 * 4).TotalMilliseconds;
				case -4: return (int)TimeSpan.FromMinutes(60).TotalMilliseconds;
				case -5: return (int)TimeSpan.FromMinutes(10).TotalMilliseconds;
				case -6: return 0;
				default: return (int)TimeSpan.FromDays(7).TotalMilliseconds;
			}
		}
		void delete(long timerEstimate = 0) {
			string key = point.ToString();
			dyingThreads.remove(this.deleteAt, this);
			posts.Remove(key);
			deleteQueue.remove(this);
			AxisXpoints.Remove(point.X);
			AxisYpoints.Remove(point.Y);

			long currentTime = DateTimeOffset.Now.ToUnixTimeMilliseconds();
			Post p = deleteQueue.peek();
			while (p != null && p.deleteAt < currentTime) {
				p.delete();
			}
		}
		public void like() {
			likes++;
			updateLikeCounts();

			if (likes > 0) return; //over 0 like's means there is no need to change timers
			string key = point.ToString();
			if (likes == 0) {
				dyingThreads.remove(deleteAt, this);
				return;
			}
			deleteAt = DateTimeOffset.Now.ToUnixTimeMilliseconds() + effectOfLikes();
			if (deleteAt < DateTimeOffset.Now.ToUnixTimeMilliseconds()) {
				delete();
				return;
			}
		}

		public void dislike();
		public int CompareTo([AllowNull] Post other);
		private Post(Point p, string type, string storage) {

			if (type == "text") {
				storageLocation = type.Substring(10, type.Length - 10);
			}
			else {
				storageLocation = storage;
			}
			distance = GetDistance(p, center);
			angle = angleOf(p, center);
			timePosted = DateTimeOffset.Now.ToUnixTimeMilliseconds();
			calculateDeleteMoment();

			JObject o = new JObject();
			o["sas"] = storageLocation;
			o["type"] = type;
			o["posted"] = timePosted;
			o["likes"] = likes;
			o["x"] = point.X;
			o["y"] = point.X;
			serializedForm = o;
		}
	}

What do you think?

Zecc

All those public dictionaries feel like a smell. Makes me think you're doing all sorts of manipulations on them elsewhere instead of hiding them under methods in this class. This will make it more difficult to refactor in case you need to change the kind of container in the future or add more logic around operations.

Their names all start with lowercase, which is not the typical style choice.

@Heikki-Artiainen said in I think i might be producing WTF code:

Point pos = new Point(JsonConvert.DeserializeObject<float>(o["x"].ToString()), JsonConvert.DeserializeObject<float>(o["x"].ToString()));

You've got o["x"].ToString() twice. Are you sure the second one isn't o["y"].ToString()?

I'm pretty sure you could have just written that as Point pos = new Point(o.Value<float>("x"), o.Value<float>("y")), or heck just var pos = JsonConvert.DeserializeObject<Point>(json) without going through JObject.Parse first (though I do notice you refer to o["type"].ToString() later).

And you also have a static Post deserialize(string json) method? And they are both private?

Also, for what it's worth the plural of axis is "axes", not "axises".

dkf

@Zecc said in I think i might be producing WTF code:

All those public dictionaries feel like a smell.

It's particularly a problem if the dictionaries are writable. If they are (and they seem to be to me) then there's going to be fun working out who is changing that state. That they're static does not make things smell better.

I'm guessing all that static state ought to actually be put somewhere that will survive “unexpected application restart”. Like a database…

Heikki Artiainen

@Zecc All those public dictionaries feel like a smell. Makes me think you're doing all sorts of manipulations on them elsewhere instead of hiding them under methods in this class.
I am not even touching them from the outside yet, but i am reusing that same class on the client application side, and in there i will be maintaining the state of the dictionaries from the outside the class.

"You've got o["x"].ToString() twice. Are you sure the second one isn't o["y"].ToString()? " Haa! I wrote the code above yesterday, and have not had the time to debug it yet. That's what i'll be doing today. Thank you for noticing that one for me. Would have been one of the more anoying ones to find.

Point pos = new Point(o.Value<float>("x"), o.Value<float>("y")), or heck just var pos = JsonConvert.DeserializeObject<Point>(json)
im gonna start doing that from now on. Thats shorter and clearer. Im trying to minimize the amount of data sent so i am intentionally not using the very handy .DeserializeObject<>. I am sending only the fundamental parts of the classes, and reconstructing the rest of the object based on that, since the intetion is to run this on free trial / free promotional credits tier of service plans.

for what it's worth the plural of axis is "axes", not "axises".
God damn. Did not know that. Dont really want to fix it either since, when reading the word 'axes' my brain jumps to chopping wood at the cottage and vikings instead of coordinate systems. Fug

@dkf I'm guessing all that static state ought to actually be put somewhere that will survive “unexpected application restart”. Like a database…
I am planning on doing that once i have demo'ed the application and found out if the actual end user application is worth developping further.

dkf

@Heikki-Artiainen said in I think i might be producing WTF code:

@dkf

I'm guessing all that static state ought to actually be put somewhere that will survive “unexpected application restart”. Like a database…
I am planning on doing that once i have demo'ed the application and found out if the actual end user application is worth developping further.

Factor them out into their own little data storage class. Which can be internally just a bunch of private static fields with suitable accessors (that you refactoring later), but at least then you know where the rest of the code is fondling the state. If you're not changing what actually goes in there for now, such refactorings are pretty quick. (Then inject the state storage into the class you showed us instead of assuming static methods and you'll be a long way towards having the storage mess cleaned up.)

The point is this: a five-minute clean up now will save you a lot of time and effort later. Keeping "persistent" state segregated (and only read out of or written to via read-only objects) makes it much easier to reason about. Being easy to reason about makes it possible to go for the obviously-correct approach elsewhere, instead of having to stick to no-obvious-problems...

Gribnit

@Heikki-Artiainen is this the only, central, most important, or somehow other "singular" point? if not, probably want to make the parts that could differ from other points, private to any given point, vs static.

then the methods that deal with those can stop being static. then there can be more than one point. then you need to clamp down on protestantism with an iron fist.

Gribnit

@Zecc said in I think i might be producing WTF code:

Also, for what it's worth the plural of axis is "axes", not "axises".

Sorry, I changed that. It's axii now, axes was attracting lumberjacks.

Heikki Artiainen

@Gribnit The center point meant to be the origin point. I calculate and store the distance and angle between the origin all the other points. I should propable put that inside the point class definition instead, and rename it to origin.

Im thinking of keeping a list of largest clusters of points in which direction they are in relation to that origin and keep a list of very dense areas.
Thought i would try to make an algorith for that tries to reason about those by using just the angle and distance and lets me assign some "local density metric" value to each point

How can i get similiar highlights as you have in your posts btw?

Gribnit

@Heikki-Artiainen

Posts can use Markdown, seems to include GitHub Markdown extensions too, at least somewhat. Posts can also use HTML.
Here, they cannot use them on the same line. I understand this is blamed on Markdown.

Do you intend to deal with many millions of points? At millions, there is a representation change you would want to make. Up to about there, maybe, or perhaps 100Ks, the classic OO model would make sense.

The classic OO model, assuming 2D Cartesian space, is that a Point has an x and a y real-valued attribute. There are finer factorings if you want to mix 2D and 3D space. A 3D point would also have a z real-valued attribute. Iiuc, a quaternion would have an i, and so forth. The textbook exercise this resembles places Point, Point3D, and Quaternion in an inheritance relationship,

Point
   x
   y

Point3D extends Point
   z

Quaternion extends Point3D
   i

Note that so far, all the attributes would be modeled as private fields, vs static variables, of 3 classes. Any number of instances can be created of any of these, memory allowing. It's not clear to me whether the distinction between static and instance variables is clear to you.

If there is one and only one Point which is the Origin, and Point were an interface, it might model in Java like this:


public enum SpecialPoints implements Point {
   ORIGIN(0D,0D);

   private final double x;
   private final double y;

   SpecialPoints(final double x, final double y) {
       this.x = x;
       this.y = y;
   }
   // ... getters, no setters.
   // and this is probably wrong b/c it's written in a web form
}

Does a point ever need to change, once it exists? Once the distinction between instance and static data is more clear there's more to look at.

Heikki Artiainen

@Gribnit well that would be the dream, but i doubt it will happen. Im just trying to limit the amount of traffic waste that happens i serve each client 1000 or 10000 points when 10 would have been enough

Heikki Artiainen

@Gribnit And also i just really enjoy this kind of coding where i get to actually ponder about a CS problem and create an algorith. So much more rewarding than the drudgework that is 90% of a software project.

Gribnit

@Heikki-Artiainen said in I think i might be producing WTF code:

@Gribnit well that would be the dream, but i doubt it will happen. Im just trying to limit the amount of traffic waste that happens i serve each client 1000 or 10000 points when 10 would have been enough

Oh okay, you can maybe send them a histogram derived from the points, instead of sending the points at all then? The clean ways to do this in Java and in C# diverge.

Heikki Artiainen

@Gribnit
Its unbelieveable that hard to parse for humans things like HTML ever became the standard when things like Markdown show how much better things could be done. Shouldnt the first solution tried be the one that is easy to adopt? >:(
The user applications for this will be made in unity so throwing in HTML compatability would be over engineering at this point. That is also the reason why my coordinates are floats (im sure theres plenty of great engineering reasons their coordinates are floats.. but fuck me is that not the worst possible data type for coordinates ?)

I might move to 3D points, but the scale of the Z axis would so tiny by comparison, that i would just ignore it in the calculations .. were talking range of -1 to 1 vs -1 000,000 to 1 000 000. where increments 0.01 below are imperceptible (just ignored) A point does not need to change once it is created, but as i understand final and const qualifiers are just for readability effectively since any compiler of even relative competense should optimize everything to those.

I have been writing this code with the goal being to maintain 100,000 posts up at the same time at least running on a potato (in the client application i dont care about performance. It wont be limited by mem or processing speed.

The biggest reason i made this thread was i realised, that i may become blind to how weird my coding style is, because i was in the position of being the one who's coding style was spreading at work for a few years. The code on display is not fully representative, as to the debth of qualifiers i would use in a work environment

Heikki Artiainen

@Gribnit ~~can you go into further detail ? Sounds interesting~~
Ah i just didnt know the correct term for this. I was actually thinking of doing "bucketing" for the coordinates to intervals where it should be impossible for a client to notice that they are only served a smaller set instead of the whole. The intervals for the histogram i will have to test and find in real time tho.
I'll add that to the list of good ideas on how to improve the performance if it becomes a problem. Thank you

dkf

@Heikki-Artiainen said in I think i might be producing WTF code:

Shouldnt the first solution tried be the one that is easy to adopt?

The first solution tried is almost invariably the first solution thought of. It typically takes a lot more effort to consistently come up with one that is easy to adopt.

Gribnit

@dkf said in I think i might be producing WTF code:

@Heikki-Artiainen said in I think i might be producing WTF code:

Shouldnt the first solution tried be the one that is easy to adopt?

The first solution tried is almost invariably the first solution thought of. It typically takes a lot more effort to consistently come up with one that is easy to adopt.

The simplest thing is probably opaque, naive gzip compression, btw

Gąska

//Post.point.toString() -> post
public static Dictionary<string, Post> posts

Any particular reason why you don't index by point directly, without the string conversion?

Gąska

public static Loyc.Collections.BDictionary<float, Point> AxisXpoints

First and foremost, never index with float. Floats are quite finicky with their last few decimal places especially when transmitting over network as string (decoder may not always agree with encoder), so equality comparisons of any kind are best avoided. But even if it does happen work, you'll be screwed as soon as your first NaN appears, since NaN != NaN.

Since you don't actually need exact lookup (I'm guessing but I'm fairly confident in this guess), you could replace the dictionary with just a list of points sorted by X (and the AxisYPoints by a list sorted by Y). Although if you want to be super fancy, you should instead use a quadtree, a data structure specifically optimized for quickly finding points inside a given rectangle.

Interestingly, .Net Framework comes with a good implementation of quadtree container in the standard library, but it's package-private so you can't actually use it. Find something on Nuget or copy Microsoft's code from Github into your project (you'll need to adapt it a bit because it operates on rectangular elements rather than points. Or you may switch your posts to be rectangles instead.)

Heikki Artiainen

This post is deleted!

Heikki Artiainen

@Gąska
Mostly superstition.
if i were to map by point i believe the hashing algorithm, used(by default) would be something done over the complete memory representation of the object.
I assume that my points are going cluster heavily, and so would pref the extra bits of input in helping produce less uniform hashes
I also think that would make it way harder to keep backups- since my DB would be machine dependent?

Gąska

@Heikki-Artiainen said in I think i might be producing WTF code:

@Gąska
Mostly superstition.
if i were to map by point i believe the hashing algorithm, used(by default) would be something done over the complete memory representation of the object.
I assume that my points are going cluster heavily, and so would pref the extra bits of input in helping produce less uniform hashes

Constant string conversions are going to hurt your performance much more than slightly inefficient lookup. Remember, premature optimization is the root of all evil.

I also think that would make it way harder to keep backups- since my DB would be machine dependent?

You still have to serialize the value part anyway, so you're not getting any gains here either. Just keep it as a regular object until immediately before DB store.

Also, just noticed this part:

                //called often, the rest of the solution is built around this
		static string getPostsAroundPoint(Point p)
		{
			JArray arr = new JArray();
                        //lots of code
			return arr.ToString();
		}

Converting objects to strings is bad enough. Converting AN ENTIRE ARRAY of objects to strings is definitely a red flag. And doing so in a function that's "called often, the rest of the solution is built around this" is fucking DEFCON 2.

Remember: converting to string is just about the worst thing performance-wise you can do. Only ever do it if you really, really need the text form (e.g. when sending data to client), and do it immediately before you need it (e.g. in the function that sends data to client). And if you ever find yourself converting the string you made back to an object, you've definitely done something wrong.

Gribnit

@Heikki-Artiainen said in I think i might be producing WTF code:

@Gąska
Mostly superstition.
if i were to map by point i believe the hashing algorithm, used(by default) would be something done over the complete memory representation of the object.
I assume that my points are going cluster heavily, and so would pref the extra bits of input in helping produce less uniform hashes
I also think that would make it way harder to keep backups- since my DB would be machine dependent?

Isn't this a comparative structure, tho? Hash clustering doesn't apply to a BDictionary if B means Binary. If the structure isn't comparing the actual points, but some hash of them, it seems like a complicated way to randomly associate different posts.

Gribnit

@Heikki-Artiainen said in I think i might be producing WTF code:

I tend to make solutions with lots of public static variables and public static methods that implement the "api" for that class.

My dad codes the same way. He made a really neat data structure one time... too bad there could only be one of it.

Bulb

@Heikki-Artiainen said in I think i might be producing WTF code:

I tend to make solutions with lots of public static variables and public static methods that implement the "api" for that class.

I am not an accessor zealot and think making a member public instead of having getter and setter without any logic is just fine (especially in C# where you can replace it with a property later if needed). But these members look like something that does have invariants that should be ensured.

My suggestion would be to split this to two (or more) separate classes:

The class keeping the global state, with the static members. It would be just a module in languages that have that, but due to limitations of C# (Java has the same) it has to be a class. You should make the members private, or at worst internal, and manipulate them from methods that ensure the state is consistent.
The class holding the actual instances. If it needs to be friends with the global state class(es), that's better than making things public, because it still gives you the encapsulation envelope, it's just bigger. Many new languages have privacy at package rather than class level exactly because encapsulation often needs to span a couple of related types and a maybe some free (static) functions.

You might later want to make that global state non-static for sake of testing, or refactor it to store it persistent storage or something, and properly encapsulating it will help with that. And properly encapsulated means there should be just a handful of public members, really. Design the components to have small interfaces between them and then mold it over classes and packages as that's all C# gives you.

Zecc

@Heikki-Artiainen said in I think i might be producing WTF code:

@Gribnit And also i just really enjoy this kind of coding where i get to actually ponder about a CS problem and create an algorith. So much more rewarding than the drudgework that is 90% of a software project.

Binary space partitioning, octrees, kd-trees, potentially visible set, bounding volume hierarchy... these are things I've heard.

Whether this helps you or just adds noise, I'll leave that to you.

Bulb

@Zecc Also, this is one of the cases where prototyping without a database and later refactoring to use it might be a waste of effort. If you need some kind of spatial index, at least PostgreSQL (PostGIS extension) and SQLite (SpatiaLite extension) support them (other databases have similar extensions implementing this Simple Features standard), so it would be easier to just use them from the start instead of writing a prototype in C# just to replace it with the SQL version later anyway.

dkf

@Bulb said in I think i might be producing WTF code:

SpatiaLite

TIL

Bulb

@dkf It's funny how the Open Geospatial Consortium managed to usurp completely generic names for their standards: Simple Features Access, Well-Known Text, Well-Known Binary…

I worked with maps in my previous job. We used SQLite to do much of the map processing, but with our own geometry format, because it was already implemented and used throughout the software when I came in and introduced SQLIte as much superior to hand-rolling joins in C++. But later I started using QGIS, which is a fairly decent free tool for working with maps, and that is backed by Spatialite or PostGIS, so I also created a conversion to that format.

Watson

@Bulb said in I think i might be producing WTF code:

@dkf It's funny how the Open Geospatial Consortium managed to usurp completely generic names for their standards: Simple Features Access, Well-Known Text, Well-Known Binary…

Word, Access, Publisher...

Bulb

@Watson Those are applications, not standards. And two of the three names at least have something to do with the purpose of the application. But what does a “well known text” tell you about what it is?

Gribnit

@Bulb said in I think i might be producing WTF code:

@Watson Those are applications, not standards. And two of the three names at least have something to do with the purpose of the application. But what does a “well known text” tell you about what it is?

says it's a text. and well known. got any other questions?

LaoC

@Gribnit said in I think i might be producing WTF code:

@Heikki-Artiainen said in I think i might be producing WTF code:

I tend to make solutions with lots of public static variables and public static methods that implement the "api" for that class.

My dad codes the same way. He made a really neat data structure one time... too bad there could only be one of it.

It's "High Level Language", dad, not "High Lander"!

Gribnit

@LaoC truth, he was always more comfortable down in a tarpit. Forth and PostScript got leveraged to the hilt, but C, I dunno, C was too far from the machine to make sense I guess.

Gąska

@Watson said in I think i might be producing WTF code:

@Bulb said in I think i might be producing WTF code:

@dkf It's funny how the Open Geospatial Consortium managed to usurp completely generic names for their standards: Simple Features Access, Well-Known Text, Well-Known Binary…

Word, Access, Publisher...

Windows...

dkf

@Gribnit said in I think i might be producing WTF code:

Forth and PostScript got leveraged to the hilt, but C, I dunno, C was too far from the machine to make sense I guess.

Postscript, despite being effectively a Forth variant, is quite a long way from the machine. There's a lot going on underneath.

Gribnit

@dkf said in I think i might be producing WTF code:

@Gribnit said in I think i might be producing WTF code:

Forth and PostScript got leveraged to the hilt, but C, I dunno, C was too far from the machine to make sense I guess.

Postscript, despite being effectively a Forth variant, is quite a long way from the machine. There's a lot going on underneath.

Indeed. Perhaps a more appropriate distance is "too far from a pure and concise mathematical formalism".

Also C ain't far from the machine.

GOG

@Bulb said in I think i might be producing WTF code:

I am not an accessor zealot and think making a member public instead of having getter and setter without any logic is just fine (especially in C# where you can replace it with a property later if needed).

This bit kind of hit a nerve so I have to ask: if you're already making allowance for replacing a field with a property, why not just make it an auto property in the first place?

I mean, it takes very little extra code (still a one-liner) and allows you to refactor the worky bits of the class to your heart's content without any changes to the calling code.

Bulb

@GOG Auto-properties make it simple, so now it's why not vs. but why. They were not always there, and are not in all languages. It is unlikely to allow refactoring the classes without changes to the calling code anyway, because the field starts as effectively public and thus the calling code is still part of the same capsule.

The boundaries behind which you can safely refactor have interfaces, and interfaces don't have member variables, so if it was designed as properly encapsulated unit, it couldn't have a public member. But most encapsulation units have more than one class and between them accessors don't help.

GOG

@Bulb

It is unlikely to allow refactoring the classes without changes to the calling code anyway, because the field starts as effectively public and thus the calling code is still part of the same capsule.

Either I'm misunderstanding you or this very strongly suggests that there's something fishy about the encapsulation boundaries.

Normally, I'd expect the contract between the caller and callee to be: the caller may expect to receive value of type T as a result of the the call and it's none of the caller's business where that value came from. Alternatively, the callee will receive a value of type T from the caller and it's none of the caller's business what happens to that value (other than observing a desired effect, perhaps).

The point being that the calling code should make no assumptions as to what the called code does internally.

If the calling code needs to know what the called code does and how it does it in order to work, it means that it needs refactoring - either to not do that, or to get rolled into one class with called code (since they will always change as a unit).

The boundaries behind which you can safely refactor have interfaces, and interfaces don't have member variables, so if it was designed as properly encapsulated unit, it couldn't have a public member. But most encapsulation units have more than one class and between them accessors don't help.

I'm pretty sure I'm misunderstanding you now, because I'm having a hard time wrapping my head around "most encapsulation units have more than one class".

Encapsulation is meant to provide a black box to the caller - one that offers a specific contract/interface, but whose inner working are opaque. We can also do encapsulation as a series of nested black boxes, with - in C#, at least - accessibility levels providing the natural boundaries for nested encapsulation.

It seems to me that if "encapsulation units" are defined across those boundaries (as might happen with several tightly coupled classes), we're asking for problems down the line, when it becomes possible to call, say, half-a-black-box (just one of the several classes in the unit), even if its inner workings remain mostly opaque.

What am I missing?

dkf

@GOG said in I think i might be producing WTF code:

If the calling code needs to know what the called code does and how it does it in order to work, it means that it needs refactoring - either to not do that, or to get rolled into one class with called code (since they will always change as a unit).

It's not that simple. Really. Interfaces describe contracts, but some of the interfaces' constraints may be on the semantics and those are not necessarily described by the types of the interface. But they should be described in the text of the interface nonetheless; semantic constraints are arbitrarily difficult to describe and enforce formally so in the interest of getting shit done this century we don't mess with that. (For example, if the interface states that the object is not handed to another thread — not something that it is easy to see from the types at all — then the caller may assume that that is indeed the case. And this is certainly not the only non-trivial semantic requirement that could be specified.) If an implementation then goes ahead and doesn't comply with the interface, that's their own damn fault.

What am I missing?

Package-level scoping in Java (and its equivalent in C#). The classes in that scope level cooperate with each other closely, but have a much more restrictive interface for use from outside that scope.

Gąska

@dkf said in I think i might be producing WTF code:

For example, if the interface states that the object is not handed to another thread — not something that it is easy to see from the types at all

Rust to the rescue!

Send and Sync - The Rustonomicon

dkf

@Gąska said in I think i might be producing WTF code:

Rust to the rescue!

Now you just have to do that for all the other semantic properties that we can think of! (It's an open ended list where the proofs for some parts are potentially unsolved conjectures, and where all proofs must be constructive.)

Gąska

@dkf I get your point, it's just funny that you picked the one example which actually has been successfully implemented in the type system of one fairly popular programming language.

dkf

@Gąska I thought about that part way, but couldn't be bothered to think of a better example.

Bulb

@GOG said in I think i might be producing WTF code:

@Bulb

It is unlikely to allow refactoring the classes without changes to the calling code anyway, because the field starts as effectively public and thus the calling code is still part of the same capsule.

Either I'm misunderstanding you or this very strongly suggests that there's something fishy about the encapsulation boundaries.

Normally, I'd expect the contract between the caller and callee to be: the caller may expect to receive value of type T as a result of the the call and it's none of the caller's business where that value came from. Alternatively, the callee will receive a value of type T from the caller and it's none of the caller's business what happens to that value (other than observing a desired effect, perhaps).

The point being that the calling code should make no assumptions as to what the called code does internally.

The calling code does not need to make assumptions about what the called code does internally, but what the called code does internally affects what contract it can or cannot provide. The large components with interfaces that have just a few methods can be pretty good abstractions, have few preconditions and lot of degrees of freedom in how they are implemented. But as the number of methods, and especially getters and setters, grows, the contract becomes more complicated and the coupling tighter, to the point the class itself isn't really encapsulated.

If the calling code needs to know what the called code does and how it does it in order to work, it means that it needs refactoring - either to not do that, or to get rolled into one class with called code (since they will always change as a unit).

Many methods are coupled to both their invocant and their arguments. You can move those methods around as much as you like, they'll still be coupled to more than one class. And you can't merge the classes, because those need to be shaped according to your data. When you have multiple entity types, you need class for each, but many methods will need to understand the whole bunch, because there are constraints across the entities.

The boundaries behind which you can safely refactor have interfaces, and interfaces don't have member variables, so if it was designed as properly encapsulated unit, it couldn't have a public member. But most encapsulation units have more than one class and between them accessors don't help.

I'm pretty sure I'm misunderstanding you now, because I'm having a hard time wrapping my head around "most encapsulation units have more than one class".

Encapsulation is meant to provide a black box to the caller - one that offers a specific contract/interface, but whose inner working are opaque. We can also do encapsulation as a series of nested black boxes, with - in C#, at least - accessibility levels providing the natural boundaries for nested encapsulation.

Inner workings of a method are opaque almost by definition. But if you have a class with a bunch of getters and setters and have to set them in specific ways to call the methods, the contract is so complicated that it no longer deserves being called encapsulated. There is little room for changing the implementation without changing the interface as well.

It seems to me that if "encapsulation units" are defined across those boundaries (as might happen with several tightly coupled classes), we're asking for problems down the line, when it becomes possible to call, say, half-a-black-box (just one of the several classes in the unit), even if its inner workings remain mostly opaque.

What am I missing?

The “encapsulation units” are the boundaries. There are many cases when they don't coincide with class boundaries, and trying to force them to makes the code harder to understand and refactor, not easier. When the a class can be properly encapsulated, go for it. But sometimes it can't and then it's better not to bother and instead make sure the whole package has good minimal interface that forms a good encapsulation boundary.

That's why many new languages are giving up on classes altogether and moving the privacy boundary to the module level. You can still have modules with one public datatype, but often you end up with several closely related types in a module, or on the other hand submodules for separate aspects of one main datatype.

GOG

@dkf said in I think i might be producing WTF code:

Package-level scoping in Java (and its equivalent in C#). The classes in that scope level cooperate with each other closely, but have a much more restrictive interface for use from outside that scope.

I covered this in passing, when mentioning accessibility levels; encapsulation always refers to some scope relative to something else. Assembly-level internal bits may be viewed as encapsulated from outside the assembly, but are absolutely to be considered as part of the assembly as a unit.

Of course, if anyone uses reflection to get at the bits that aren't normally accessible, they deserve everything they get.

It's not that simple. Really. Interfaces describe contracts, but some of the interfaces' constraints may be on the semantics and those are not necessarily described by the types of the interface. But they should be described in the text of the interface nonetheless; semantic constraints are arbitrarily difficult to describe and enforce formally so in the interest of getting shit done this century we don't mess with that. (For example, if the interface states that the object is not handed to another thread — not something that it is easy to see from the types at all — then the caller may assume that that is indeed the case. And this is certainly not the only non-trivial semantic requirement that could be specified.) If an implementation then goes ahead and doesn't comply with the interface, that's their own damn fault.

I don't really disagree with any of the above, but I can't really see how it factors into the broader question of exposing internals or no.

In any case, any contract you cannot meaningfully enforce is scarcely a contract at all. More of a nod and a wink and a solemn Boy Scout promise Not to Do This Bad Thing. I acknowledge that this is something we sometimes have to live with, but we should at least make a solid attempt at not creating such situations.

@Bulb said in I think i might be producing WTF code:

The calling code does not need to make assumptions about what the called code does internally, but what the called code does internally affects what contract it can or cannot provide. The large components with interfaces that have just a few methods can be pretty good abstractions, have few preconditions and lot of degrees of freedom in how they are implemented. But as the number of methods, and especially getters and setters, grows, the contract becomes more complicated and the coupling tighter, to the point the class itself isn't really encapsulated.

It seems to me the that what you're describing here is a poorly specified interface, or maybe we still have a misunderstanding.

The presence of getters should not affect any other methods in the interface at all. A getter is - at the end of the day - nothing more than a zero-argument function that extracts some portion of the encapsulated state.

Setters are trickier, in that they modify the encapsulated state, which is why they're best avoided, unless necessary. They still shouldn't affect the workings of the other parts of the interface, since the entire type should be implemented so as to fulfill the interface regardless of state (inner state consistency should be assured at the point of mutation).

Inner workings of a method are opaque almost by definition. But if you have a class with a bunch of getters and setters and have to set them in specific ways to call the methods, the contract is so complicated that it no longer deserves being called encapsulated. There is little room for changing the implementation without changing the interface as well.

I almost wanted to skip the entire rest of your post just to get at this bit, because if you have to set your setters in specific ways to call the methods, you're doing encapsulation wrong, wrong, wrong. You might as well go wild with goto.

(I'm not really ranting, I'm just amazed.)

If a method has prerequisites, the place to assure they are met is in its arguments, and - on the implementation level - in the validation code you have before you have it start actually doing what it's supposed to be doing.

This goes for semantic prerequisites, as well as purely formal ones. If the correctness of a given result can only be assured by the object having a particular state at the time of the call, this should be assured by requiring the state to be specified via the call.

(Alternatively, you can create an immutable object with a known good state prior to the call, typically through constructor injection.)

Many methods are coupled to both their invocant and their arguments.

This, I frankly don't understand. A method coupled to its invocant means something, somewhere, has gone horribly wrong. Unless, you mean simply that without the invocant the method has no purpose, but that's a subtly different issue.

The whole point of OOP was that objects take care of their own data. You should be able to use the object just as easily, regardless of what you use it for. The object shouldn't be coupled to the caller at all.

The caller will be coupled to the object - or its interface, at least - but ideally you ought to be able to swap bits out at will without changing anything else. Most of the stuff I do at work is designed with this explicit goal in mind, simply because that's the only way to manage change at the pace required.

You can move those methods around as much as you like, they'll still be coupled to more than one class. And you can't merge the classes, because those need to be shaped according to your data. When you have multiple entity types, you need class for each, but many methods will need to understand the whole bunch, because there are constraints across the entities.

Isn't that where polymorphism is supposed to be your friend? I'm honestly having a hard time envisioning a design where this would be an issue (working in C#), because that's what inheritance, interfaces and generics are for.

The “encapsulation units” are the boundaries. There are many cases when they don't coincide with class boundaries, and trying to force them to makes the code harder to understand and refactor, not easier. When the a class can be properly encapsulated, go for it. But sometimes it can't and then it's better not to bother and instead make sure the whole package has good minimal interface that forms a good encapsulation boundary.

I have no cause to disbelieve you on this, so I don't, but I can't say that I fully comprehend it either.

To me, this speaks of some fundamental issues with the design, because I can't think of any situation where "put a well-understood and self-standing piece of code in a black box" (which is what encapsulating it in a class is) wouldn't make it easier to understand and refactor. The class boundary is - broadly speaking - the point where I can stop thinking and just plug it in per contract, safe in the knowledge that it will Just Work.

If it doesn't Just Work, then either the class is buggy (testing should have caught this), or my design is shit.

I've made a bunch of shitty designs in my life, and been bit by 'em. I try not to do so these days.

Jaloopa

@Gąska said in I think i might be producing WTF code:

fairly popular programming language.

[Citation needed]

Bulb

@GOG said in I think i might be producing WTF code:

if you have to set your setters in specific ways to call the methods, you're doing encapsulation wrong, wrong, wrong.

Yes.

@GOG said in I think i might be producing WTF code:

You might as well go wild with goto.

but no.

Often it is just the sign that the class, which has its shape dictated by the data, isn't a good unit of encapsulation.

@GOG said in I think i might be producing WTF code:

A method coupled to its invocant means something, somewhere, has gone horribly wrong.

Method is part of the invocant (class) so nothing has gone wrong.

Now if the method is coupled to the arguments, something might have gone wrong, but sometimes you simply need to work on multiple objects and the logic simply needs to understand all of them.

@GOG said in I think i might be producing WTF code:

Isn't that where polymorphism is supposed to be your friend? I'm honestly having a hard time envisioning a design where this would be an issue (working in C#), because that's what inheritance, interfaces and generics are for.

Different entities are not related by inheritance. They are related by, well, relations.

Besides, inheritance is maintenance programmer's worst enemy. The code might look clean and elegant to the team that created it, but it will not to the maintenance programmer. Inheritance is evil.

Gąska

@Jaloopa said in I think i might be producing WTF code:

@Gąska said in I think i might be producing WTF code:

fairly popular programming language.

[Citation needed]

GitHub Language Stats

dkf

@Bulb said in I think i might be producing WTF code:

Besides, inheritance is maintenance programmer's worst enemy. The code might look clean and elegant to the team that created it, but it will not to the maintenance programmer.

Nothing ever looks clean to the maintenance programmer. Doubly so after they've had their hands on it for a while.