Endianness and implications

When I first heard about endianness, I freaked out a bit.
I was wondering, why not settle on one rule, such as big endian? It is just the way the majority of  people write numbers, therefore it is easier to understand for us. I thought there are other endianness rules out there because some people just wanted to swim against the stream. I was quite wrong. I was worried that this will mess with my number definitions, bit-shifting operations, etc.

Naturally, I put on my tinfoil hat and started doing my homework and a few experiments!

First thing I discovered was that all my machines use little endian.  I did this using this field. As you may see, the page is pretty plain, only mentioning what the property represents. Only a few good months later (yesterday), when I looked up this page, the first paragraph in the Examples section struck me like a stray falling chunk of ice. All Windows systems run on little endian?! What is this sorcery?

So I had to look that up. It turns out the statement is almost true. [1]
One detail which was missed there is the fact that Windows 8 runs on ARM chips too[2], which could be big endian. [3]

This, coupled with the fact that .NET runs on Windows on ARM too and on Windows Phone and Xbox plus the fact that Mono, one of my target platforms, runs on Linux, which runs on a variety of architectures with various endianness rules, means that I must consider endianness in my code.

Upon experimenting a bit, I have removed one of my fears, regarding bit-shifting: “left-shift” and “right-shift” imply a dependence on the order of bits/bytes.
Well, there isn’t any. One can assume bits and bytes are in big endian when using bit-shifting. Shifting to the left means shifting towards more significant positions (and could overflow into the least significant one if circular) and shifting to the right means shifting towards less significant positions (and could underflow into the most significant one if circular).
This seems obvious when assuming everything is in big endian, as we are used to from the numbers we represent in natural language. The knowledge of little endian managed to confuse me about this at first, but gladly I figured it out before I did anything silly. I hope this bit of knowledge comes in handy for someone else.

As for number definitions, compilers take care of them, no matter what type and size. 0x00FF0000 is a big endian hexadecimal integer (00 FF 00 00), and the compiler will automatically convert it to 00 00 FF 00 for little endian targets.
The reason why I state this which may seem obvious to some is that thinking of this together with bit-shifting, bit masking and endianness can create a conglomerate of confusion.

So, why would endianness even matter to software if this is the case? Well, transmission.

For example, convention says that networked numbers should be big endian (a.k.a “network byte order”).[4]
This is so all nodes of a network interpret the sequence of bytes correctly.

Similarly, any network protocol designer should settle on endianness, and every implementer should be careful about it.
I failed to do it in vProto… here, here and here. I’ll fix this soon.
I will go for little endian to maintain backwards compatibility.

Another case for transmission is files. When a file has any chance of being moved/copied to another computer, its format better have an established number endianness.
Or, pray that all your numbers are palindromes in base 256.

And finally, the one thing you actually don’t have to worry about is bit endianness (inside a byte/word). The hardware already takes care of it: CPUs use little endian bytes, networks use big endian, and I haven’t found a single issue due to bit endianness in my research.

After finding this bit of information, I felt safe to remove my tinfoil hat.


[1] ‘‘Following Intel convention, word data always is stored with the most-significant byte in the higher memory location (see figure 2-13).’’ @ page 2-10 @ iAPX 86, iAPX 88 user’s manual
[2] ‘‘[…] including ARM-based systems from partners NVIDIA Corp., Qualcomm Inc. and Texas Instruments Inc.’’ @ first paragraph @ Microsoft Announces Support of System on a Chip Architectures From Intel, AMD, and ARM for Next Version of Windows
[3] ‘‘The processor can treat words of data in memory as being stored in either: / • Byte-invariant big-endian format / • Little-endian format.’’ @ 3.3 @ Cortex-R4 and Cortex-R4F Technical Reference Manual
[4] ‘‘[…] express numbers in decimal and to picture data in «big-endian» order’’ @ 2nd paragraph of page 3 @ RFC 1700 – Assigned Numbers


So, why am I posting about this?
Mainly because a worrying number of friends of mine have no idea what endianness is and how it will affect their code, which they hope to be cross-platform and often includes networking and/or saving files.
I’m just trying to save some people from a few headaches.

Remote Method Invocation in vProto

I finally finished a huge update of vProto!

The major feature in this update is RMI (Remote Method Invocation) – a very convenient way to interact with a remote entity.

I only saw one example of this before, in a library that I don’t even remember. It was really difficult finding this information, mainly because the classes that are used belong to WCF and are considered internal to the framework. They are documented, but the examples are really difficult to follow. (either this or I really failed at searching)

My savior was trial-error, as always. My classes are very light so the method is very easy to understand. Everything is outlined in this file. The rest is basic housekeeping logic.

Continue reading Remote Method Invocation in vProto

Stop discrimination against varargs

Back when I was just learning Lua, every single source I found made sure to mention that “everything is a first-class citizen except for varargs” (which are second-class).

Back then, I didn’t know why that was. They seemed like a mere variable declaration to me. Now I know, though – because of the stack and upvalues. Varargs means the function handles a variable number of positions in the stack. To transfer that to a closure, it needs a variable number of upvalues, which may be tricky to set up.

A common workaround is to do local varargs = {…} in the function which gets varargs and to use that upvalue in closures (eventually unpacking later).

I seriously believe this should be done automatically by the library for the sake of helping people learn Lua and getting rid of this painfully common workaround.

The compiler/interpreter should find if a non-vararg closure uses varargs from a different source. If found, they should be put in a table and set as a special upvalue. Usage of in the closure would equivalate to using unpack with the special upvalue (would ideally be done internally, not as a preprocessor-ish feature). And it would be nice if {…} would be converted to that special upvalue when compiled.

If I had the time, I would attempt to implement this in vanilla Lua. But sadly I don’t have the time (look at the gap between this post and the previous).

I chose to rant about this because it is, to me, the only truly missing feature of Lua, one of my favorite languages.

Continue reading Stop discrimination against varargs

Numbers

I am going to talk about numbers here because I really feel like doodling about this.

I have recently seen a video from Numberphile entitled “Do numbers EXIST?“, and this quite confuses me.

Why do numbers have to exist? Why do they have to not exist? Why aren’t numbers just numbers?

When I do mathematics (and I do this a lot), I don’t think of numbers as anything else. Numbers are numbers, nothing more and nothing less. They might be known, unknown, rational, irrational, purely imaginary, complex, quaternions, and so on. Why do they need to be associated with something?

I’m not trying to offend anyone, but you surely have better things to do than figuring out whether numbers really exist or not, or whether they exist in time and space.

Anyway, here’s a challenge: does infinity exist? Does 0 exist? Have fun!

The National

Ah… Nothing’s like typing on the screen of a 7-inch tablet.

This sucks. They’ve sent us to the dorms of some economic school. The rooms are old, moldy, ugly, hot and smelly. Thank God my dad is a military so we can stay at the garrison dorms, which are 10^3 times better… And we have our own bathroom here. With walls. All four.

So… 90/600 points. I got 0 on all problems except one, where I scored 90.
The tests were really difficult and my poor implementations failed easily…
But on the fourth problem – the one on which I scored 90/100 – we had to find the longest common subset of the given sets. I made extensive use of iterators and used tens of times less memory than the limit (~1/64 MB) and pretty darn good execution times (0.25/0.30 seconds in the worst possible case). But where did I fail..? The first test! Which had only one set… And my algorithm expected at least two… Oh, the irony…
I can’t wait to get my code… If I can.

Edit: My participant ID was 1137. So close. :(

More distractions

I think I’m spending most of my time trying to organize my time. I’m sure this is equivalent to procrastinating, because it takes away time that I could spend doing something else. Like working on my website.

Speaking of which, I started adding some stuff to my portfolio. I finally took the time to take some screenshots and strike some keys.

Real life, again

Well, I have been to a contest in Iași this weekend. (technically, last weekend at the moment of writing this – 37 minutes after midnight).

Long story short, I failed.

Out of the two subjects, the second seemed the easiest. So I invested 2 out of the 3 hours trying to solve it. I couldn’t even get the right results… Only 1 person got more than 0 points on it.

On the first one, I managed to do something… I think I would’ve gotten at least 20/100 on it if I hadn’t submitted the wrong source code at the end.

Anyway, this is the results spreadsheet. Sorry for hotlinking, couldn’t risk letting anyone who doesn’t understand Romanian get lost on the page.

By the way, I’ve been told the subjects were chosen and/or composed by students. Anyone with a sense of reality should know what that means. :)

Qualification

I’ve just been told 10 minutes ago by my teacher that I am qualified for the national Olympiad!

Apparently it takes ages to add this information to that website.

Anyway, I’ve gotta start grinding some C++!

Edit: I’ve been sent some source code, so here is mine, for anyone who can make use of it. Also, I’ll post the best code too. If anyone can read it… good for you.

Due to the stubbornness of the great syntax highlighter plugin, I am forced to post the codes on Pastebin.

The only thing I can say is, try the biggest key on your keyboard. Press it. It will make everyone’s life better.

He (the author) clearly has a different experience level. He’s familiar with many more classes, structures and functions.

And just for the lulz, I have to mention that there was a guy in the laboratory complaining that the compiler is broken because he gets syntax errors. Way to go, mate.

Edit 2: Can someone help me identify that signature at the end?

I hope he doesn’t really have TIAs. That wouldn’t be good.

Introductory post

(“first” is too generic)

The Website

At the time of writing this, the website is still incomplete. I’m still trying to solve the riddles of CSS to bring the theme a more uniform look (symmetrical paddings, etc.). A habit of mine is blaming Javascript for everything (mainly because it’s the only client scripting language involved here)…

I hate web development. It would be just so much better if I could draw basic squares, lines and text with basic colors and textures. Actually, that might be possible with canvases, but I don’t really know, and I don’t really care… yet. The way web browsers render is awesome, though. Too bad I don’t know how to control that very well.

On a related note, many thanks to TheBL1TzZ, my friend on Steam, for giving me the motivational boost to set up my website.

“Real Life”

In case that anyone actually cares about my interaction with the environment outside of my home, I’ll post a bit of what I care about and what I consider to be important for any future events or opportunities I might have.

To start up, I’ve been to the informatics Olympiad this Saturday (that would be the 2nd of March). I’ve been to this Olympiad since the 9th grade (that would make this the 3rd year in a row), and pretty much every subject was irrelevant to any situation I have encountered or thought I might encounter since I started programming (aka useless).

This year, the tides have turned. The second subject was asking to find the length of the longest reoccurring substring in a list of strings. To those who are not familiar with the terms, a string is basically a piece of text. A substring is a piece of text inside another (inside the string).

Well I scored 65/100 on this one. I actually expected 100/100 because I thoroughly tested the code to fit in the memory and execution time requirements. The only explanation I can think of is that they didn’t compile my code with a compiler that supports the C++11 standard.

In the reference documents they supplied us with, the string class description said it wraps over an array of chars internally. I guess the older standards had the string class wrap over a char* or something. I don’t know because I don’t use C++, so I didn’t learn much. Still, what I know is way over the level of the Olympiad.

So, from my little understanding, it means that, in C++11, reusing a string variable will just overwrite the old one and not clog up precious little bytes this way.

Oh well, no excuses, but still, I got the first place. I hope they give me a shiny piece of paper, at least. (You don’t need to know Romanian to read that table)

Looking forward to participate next year.