I want to work with this, because I always dreamed of trying to build
.exe on my own, without the help of ready-made compilers.
Answer 1, authority 100%
Each specific processor (for example, Intel Core i3-4160 or ARM Cortex-A9) has its own microarchitecture and implements the instruction set level architecture (eng. instruction set architecture ).
Microarchitecture defines the structure of the processor at the level of electronic components and logic gates.
Instruction Set Layer Architecture (ISA) , roughly speaking, determines which commands can execute processor. This architecture is abstracted from microarchitecture. Processors from different companies can implement the same architecture (for example, many Intel and AMD processors implement the same family of architectures x86 ).
If two processors implement the same ISA, then they can run the same programs. ISA defines which commands are available to the programmer, which registers he can use, how he can use paging, virtual memory, etc. It also defines the command format that the processor understands.
Each processor program is just a set of contiguous instructions. Upon startup, the processor fetches a command from memory at an address called the reset vector (reset vector ) and starts executing this program until the power is turned off.
Writing a program in machine codes is quite simple – you just need to take the ISA manual (for example, Intel 64 and IA-32 Architectures Software Developer Manuals ) that your processor implements and write the required instructions byte by byte.
Of course, in our time no one writes in machine codes, because it is difficult for a person to work with a large amount of numbers and complex instruction formats (especially in x86). Because of such difficulties, assembly languages which introduce simple mnemonics for processor instructions.
For example, one x86
MOV assembler instruction can encode about 20 different
MOV 1 processor instructions. The assembler reads your assembly language program and translates it into a binary file 2 , which, again, is simply a sequence of bytes encoding consecutive processor instructions.
This is how an excerpt from an assembly language program might look like:
cli lgdt (gdtr) mov% cr0,% eax or $ 0x1,% eax mov% eax,% cr0
This is what a machine language program looks like:
0000000 05ea 007c 3100 8ec0 8ed8 bcd0 7c00 1688 0000010 7cdb c031 c08e 00bb 8a80 db16 b67c b100 0000020 b502 b000 e830 0053 59e8 8400 75c0 fa30 0000030 010f f416 0f7c c020 8366 01c8 220f eac0 0000040 7c44 0008 b866 0010 d88e c08e e08e e88e 0000050 d08e 00bc 07c0 e800 03a4 0000 ebf4 befd 0000060 7cbc 03e8 f400 fdeb 5350 30fc b4ff ac0e 0000070 c084 0474 10cd f7eb 585b b4c3 cd02 7213 0000080 3102 c3c0 1e9c 0657 fa56 c031 d88e 10bf 0000090 f705 8ed0 bec0 0500 058a 2650 048a 2650 00000a0 04c6 c600 be05 8026 be3c 2658 0488 8858 00000b0 3105 74c0 4001 075e 1f5f c39d 3241 2030 00000c0 7369 6420 7369 6261 656c 2e64 4820 6c61 00000d0 2074 6874 2065 5043 2e55 0000 0000 0000 00000E0 0000 0000 FFFF 0000 9A00 00CF FFFF 0000 00000F0 9200 00CF 0017 7CDC 0000 0000 0000 0000
Obviously, an associate code and read, and write easier.
Now you have enough knowledge to open a reference book, like in the dictionary, write a program in machine codes and execute it on the processor. But it will not work if you want to write a program that will work in any operating system.
The operating system is another level of abstraction that fully deprives us of the opportunity to unlimited use our processor, forcing it to execute any of our 3 sup>commands. The OS makes a lot of different things, but we will dwell only on one – start executable files.
As I said, each program processor is just a command sequence, but each program operating system is a special byte sequence that has a special structure in which not Only processor commands.
If you take in the example of Windows 10, it works with executable files
.exe , which have a special format, called Portable Executable . It has a rather complicated structure . In addition to the actual set of machine commands, it contains in itself the information necessary to determine the address and size of sections, import and export tables, a special signature, etc.
Therefore, to manually write a program in machine codes, which will run in Windows 10, for example, we, by writing the program itself, you will need to bring it to the Portable Executable format.
But this will not be enough. We will have to get acquainted with the agreements that are called ABI and write a program in machine codes using these agreements, And not any other.
It is necessary that all parts of the puzzle come to each other in shape: the program must be valid for the processor, the binary file format must be understood by the operating system, the program should be able to correctly communicate with the OS, etc. It’s all very difficult to provide If you write a program in hexadecimal editor.
You can start with the writing of programs in the assembler language (yes, you will have to learn the syntax of the specified assembler language and dialect Intel or AT & amp; T). “Hello, World” in the language NASM will look like so :
; ---------------------------------------------------- -------------------------- ; HelloWorld.asm. ; ; This Is A Win32 Console Program That Writes "Hello, World" On One Line and ; Then EXITS. IT Needs to Be Linked with a C Library. ; ---------------------------------------------------- -------------------------- Global _Main. extern _printf. Section .text. _Main: Push Message. Call _printf. Add esp, 4 RET. Message: DB 'Hello, World', 10, 0
Do you need it?
Nowadays, computers have become very complex, with dozens of abstraction layers. Even the ISA instructions of modern processors are not atomic entities, and processors perform each such instruction as a set of even smaller instructions – Microerations (from such collections it takes Microcode ).
In fact, the ability to write in the assembler language (and even more so, in the engine) is pretty useless. The ability to just read and understand the assembler listing is much more practical and can really come in handy.
and impractically it first because nothing is more difficult “Hello, World!” In the machine codes you will not write. At the assembler – yes, write, but spend on this a volatile amount of time that you could spend on more useful things.
1. What is interesting, the
MOV instruction in x86 is Turing-full , i.e. any program can be written using the only instruction alone. There is even a special Compiler , which uses only one of this instruction. sup>
2. Some assemblers can immediately form executable files in the desired format. Including Portable Executable. sup>
3. I’m talking about modern Windows or Linux type OS. sup>