dyld中mach-o文件加载的简单分析

[TOC]

0x00 内容简介

Les commencements ont des charmes inexprimables.
万物初始,美妙不可名状。

通过阅读dyld源码,简单的分析macho在dyld中是如何被加载到内存中的。

0x01 源码分析

mach-o的格式和ELF大同小异,具体分析网上可以搜到很多。就不复述了。

Basics of the Mach-O file format

dyld源码下载

1.1 数据结构

1.1.1 macho_header

这个数据结构提供了对mach-o文件的头部做操作的API,函数都很简单不需要做过多的解释。

macho_header

1.1.2 ImageLoader

1
2
3
4
5
6
7
8
9
10
11
12
13
//
// ImageLoader is an abstract base class. To support loading a particular executable
// file format, you make a concrete subclass of ImageLoader.
//
// For each executable file (dynamic shared object) in use, an ImageLoader is instantiated.
//
// The ImageLoader base class does the work of linking together images, but it knows nothing
// about any particular file format.
//
//
class {ImageLoader
...
}

每一个加载的mach-o文件都会存在这样一个ImageLoader的实例。具体代码太多参考源码

每一种具体的mach-o文件都会继承ImageLoader类,大致继承关系如图所示:

mageLoder

在加载时会根据mach-o的格式不同选择不同的实例。

1.2 源码分析

1.2.1 _main

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
//
// Entry point for dyld. The kernel loads dyld and jumps to __dyld_start which
// sets up some registers and call this function.
//
// Returns address of main() in target program which __dyld_start jumps to
//
uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide,
int argc, const char* argv[], const char* envp[], const char* apple[],
uintptr_t* startGlue)
{
... //对全局变量一通操作
try {
// add dyld itself to UUID list
addDyldImageToUUIDList();
CRSetCrashLogMessage(sLoadingCrashMessage);
// instantiate ImageLoader for main executable
sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);//加载MACHO到image
... //不关心了
}
}

在调用_main函数之后,大致做了这么几件事情:

  • 选择运行环境(IOS模拟器)
  • 初始化数据、设置全局变量,上下文信息
  • 检测文件是否Restricted

走完这些流程,就会调用instantiateFromLoadedImage函数开始加载mach-o并实例化为ImageLoader

1.2.2 instantiateFromLoadedImage

1
2
3
4
5
6
7
8
9
10
11
12
13
// The kernel maps in main executable before dyld gets control.  We need to 
// make an ImageLoader* for the already mapped in main executable.
static ImageLoader* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path)
{

// try mach-o loader
if ( isCompatibleMachO((const uint8_t*)mh, path) ) {//检测是否合法
ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext); //加载
addImage(image);
return image;
}

throw "main executable not a known format";
}

这个函数的逻辑非常的简单,总过做了三件事:

  • 检测macho文件是否符合条件
  • 初始化实例
  • 添加image到管理的模块中

1.2.3 isCompatibleMachO

先看isCompatibleMacho函数。

通过注释可以知道,满足下面三个条件,即可认为是符合要求的mach-o文件。

  1. mach_header中的subtype当前CPU版本是否支持。
  1. mach_header中的subtype和当前正在运行的CPU版本相同。

  2. mach_header中的subtype在该CPU的所有版本都可以处理。

内核中machine.h定义了CPU_TYPECPU_SUBTYPE

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
#define CPU_TYPE_ANY		((cpu_type_t) -1)

#define CPU_TYPE_VAX ((cpu_type_t) 1)
/* skip ((cpu_type_t) 2) */
/* skip ((cpu_type_t) 3) */
/* skip ((cpu_type_t) 4) */
/* skip ((cpu_type_t) 5) */
#define CPU_TYPE_MC680x0 ((cpu_type_t) 6)
#define CPU_TYPE_X86 ((cpu_type_t) 7)
#define CPU_TYPE_I386 CPU_TYPE_X86 /* compatibility */
#define CPU_TYPE_X86_64 (CPU_TYPE_X86 | CPU_ARCH_ABI64)

/* skip CPU_TYPE_MIPS ((cpu_type_t) 8) */
/* skip ((cpu_type_t) 9) */
#define CPU_TYPE_MC98000 ((cpu_type_t) 10)
#define CPU_TYPE_HPPA ((cpu_type_t) 11)
#define CPU_TYPE_ARM ((cpu_type_t) 12)
#define CPU_TYPE_ARM64 (CPU_TYPE_ARM | CPU_ARCH_ABI64)
#define CPU_TYPE_MC88000 ((cpu_type_t) 13)
#define CPU_TYPE_SPARC ((cpu_type_t) 14)
#define CPU_TYPE_I860 ((cpu_type_t) 15)
/* skip CPU_TYPE_ALPHA ((cpu_type_t) 16) */
/* skip ((cpu_type_t) 17) */
#define CPU_TYPE_POWERPC ((cpu_type_t) 18)
#define CPU_TYPE_POWERPC64 (CPU_TYPE_POWERPC | CPU_ARCH_ABI64)
/*
* Machine subtypes (these are defined here, instead of in a machine
* dependent directory, so that any program can get all definitions
* regardless of where is it compiled).
*/


/*
* Capability bits used in the definition of cpu_subtype.
*/

#define CPU_SUBTYPE_MASK 0xff000000 /* mask for feature flags */
#define CPU_SUBTYPE_LIB64 0x80000000 /* 64 bit libraries */


/*
* Object files that are hand-crafted to run on any
* implementation of an architecture are tagged with
* CPU_SUBTYPE_MULTIPLE. This functions essentially the same as
* the "ALL" subtype of an architecture except that it allows us
* to easily find object files that may need to be modified
* whenever a new implementation of an architecture comes out.
*
* It is the responsibility of the implementor to make sure the
* software handles unsupported implementations elegantly.
*/

#define CPU_SUBTYPE_MULTIPLE ((cpu_subtype_t) -1)
#define CPU_SUBTYPE_LITTLE_ENDIAN ((cpu_subtype_t) 0)
#define CPU_SUBTYPE_BIG_ENDIAN ((cpu_subtype_t) 1)

/*
* Machine threadtypes.
* This is none - not defined - for most machine types/subtypes.
*/

#define CPU_THREADTYPE_NONE ((cpu_threadtype_t) 0)

/*
* VAX subtypes (these do *not* necessary conform to the actual cpu
* ID assigned by DEC available via the SID register).
*/


#define CPU_SUBTYPE_VAX_ALL ((cpu_subtype_t) 0)
#define CPU_SUBTYPE_VAX780 ((cpu_subtype_t) 1)
#define CPU_SUBTYPE_VAX785 ((cpu_subtype_t) 2)
#define CPU_SUBTYPE_VAX750 ((cpu_subtype_t) 3)
#define CPU_SUBTYPE_VAX730 ((cpu_subtype_t) 4)
#define CPU_SUBTYPE_UVAXI ((cpu_subtype_t) 5)
#define CPU_SUBTYPE_UVAXII ((cpu_subtype_t) 6)
#define CPU_SUBTYPE_VAX8200 ((cpu_subtype_t) 7)
#define CPU_SUBTYPE_VAX8500 ((cpu_subtype_t) 8)
#define CPU_SUBTYPE_VAX8600 ((cpu_subtype_t) 9)
#define CPU_SUBTYPE_VAX8650 ((cpu_subtype_t) 10)
#define CPU_SUBTYPE_VAX8800 ((cpu_subtype_t) 11)
#define CPU_SUBTYPE_UVAXIII ((cpu_subtype_t) 12)

简单的说cputype就是cpu的平台,x86ARMPROWERPC等。而subtype就是不同平台的不同版本,例如arm6arm7



1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
bool isCompatibleMachO(const uint8_t* firstPage, const char* path)
{

#if CPU_SUBTYPES_SUPPORTED
// 支持检测CPU版本的情况
// It is deemed compatible if any of the following are true:
// 1) mach_header subtype is in list of compatible subtypes for running processor
// 2) mach_header subtype is same as running processor subtype
// 3) mach_header subtype runs on all processor variants
const mach_header* mh = (mach_header*)firstPage;
if ( mh->magic == sMainExecutableMachHeader->magic ) {
//传入的mach-o文件的magic是否和加载的主mach-o文件是否相同
//这一次运行到这里的时候mh与sMainExecutableMacHeader应该是指向同一个mach-o的
if ( mh->cputype == sMainExecutableMachHeader->cputype ) {
if ( (mh->cputype & CPU_TYPE_MASK) == sHostCPU ) {
//加载的mh是否在当前平台可以运行。
// get preference ordered list of subtypes that this machine can use
const cpu_subtype_t* subTypePreferenceList = findCPUSubtypeList(mh->cputype, sHostCPUsubtype);
if ( subTypePreferenceList != NULL ) {
//如果该CPU的版本存在一个检测的列表,则进行检测
// if image's subtype is in the list, it is compatible
for (const cpu_subtype_t* p = subTypePreferenceList; *p != CPU_SUBTYPE_END_OF_LIST; ++p) {
if ( *p == mh->cpusubtype )
return true;
}
// have list and not in list, so not compatible
throwf("incompatible cpu-subtype: 0x%08X in %s", mh->cpusubtype, path);
}
// unknown cpu sub-type, but if exact match for current subtype then ok to use
if ( mh->cpusubtype == sHostCPUsubtype )
//加载的mh与当前运行环境的CPU版本相同
return true;
}

// cpu type has no ordered list of subtypes
// 这两种CPU支持所有版本的mach-o文件
switch (mh->cputype) {
case CPU_TYPE_I386:
case CPU_TYPE_X86_64:
// subtypes are not used or these architectures
return true;
}
}
}
#else
// For architectures that don't support cpu-sub-types
// this just check the cpu type.
// 不支持检测CPU版本的时候,就只判断是mh的版本与CPU相同。
const mach_header* mh = (mach_header*)firstPage;
if ( mh->magic == sMainExecutableMachHeader->magic ) {
if ( mh->cputype == sMainExecutableMachHeader->cputype ) {
return true;
}
}
#endif
return false;
}

1.2.4 instantiateMainExecutable

函数流程主要是通过sniffloadcommands来判断mach-o文件是否是压缩过的,然后选择相应的类进行实例化。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// create image for main executable
ImageLoader* ImageLoaderMachO::instantiateMainExecutable(const macho_header* mh, uintptr_t slide, const char* path, const LinkContext& context)
{
//dyld::log("ImageLoader=%ld, ImageLoaderMachO=%ld, ImageLoaderMachOClassic=%ld, ImageLoaderMachOCompressed=%ld\n",
// sizeof(ImageLoader), sizeof(ImageLoaderMachO), sizeof(ImageLoaderMachOClassic), sizeof(ImageLoaderMachOCompressed));
bool compressed;
unsigned int segCount;
unsigned int libCount;
const linkedit_data_command* codeSigCmd;
const encryption_info_command* encryptCmd;
sniffLoadCommands(mh, path, false, &compressed, &segCount, &libCount, context, &codeSigCmd, &encryptCmd); //判断macho是普通的还是压缩的
// instantiate concrete class based on content of load commands
if ( compressed )
return ImageLoaderMachOCompressed::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
else
#if SUPPORT_CLASSIC_MACHO
return ImageLoaderMachOClassic::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
#else
throw "missing LC_DYLD_INFO load command";
#endif
}

这个函数的逻辑也非常简单。流程图如下图所示:

instantiateMainExecutable

1.2.5 sniffLoadCommands

这个函数主要做了两件事情:

  • 判断Mach-O文件是classic的还是compressed的。
  • 获取mach-O文件的segment的数量。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
// determine if this mach-o file has classic or compressed LINKEDIT and number of segments it has
void ImageLoaderMachO::sniffLoadCommands(const macho_header* mh, const char* path, bool inCache, bool* compressed,
unsigned int* segCount, unsigned int* libCount, const LinkContext& context,
const linkedit_data_command** codeSigCmd,
const encryption_info_command** encryptCmd)
{
*compressed = false;
*segCount = 0;
*libCount = 0;
*codeSigCmd = NULL;
*encryptCmd = NULL;

const uint32_t cmd_count = mh->ncmds;
//获取cmds的个数,保存在mach-o文件的头部ncmds字段中
const struct load_command* const startCmds = (struct load_command*)(((uint8_t*)mh) + sizeof(macho_header));
//获取command段开始的地址,startCmds = mach-o地址 + mach-o头部长度
const struct load_command* const endCmds = (struct load_command*)(((uint8_t*)mh) + sizeof(macho_header) + mh->sizeofcmds);
//获取command段结束的地址,endCmds = mach-o地址 + mach-o头部长度 + cmds所用的长度
const struct load_command* cmd = startCmds;
bool foundLoadCommandSegment = false;
for (uint32_t i = 0; i < cmd_count; ++i) {
//遍历每一个command
uint32_t cmdLength = cmd->cmdsize;
struct macho_segment_command* segCmd;
if ( cmdLength < 8 ) {
//格式检测:长度就不对抛出异常
dyld::throwf("malformed mach-o image: load command #%d length (%u) too small in %s",
i, cmdLength, path);
}
const struct load_command* const nextCmd = (const struct load_command*)(((char*)cmd)+cmdLength);
if ( (nextCmd > endCmds) || (nextCmd < cmd) ) {
//格式检测:通过当前command长度寻找nextcmd时,如果nextcmd指不合法的位置就抛出异常
dyld::throwf("malformed mach-o image: load command #%d length (%u) would exceed sizeofcmds (%u) in %s",
i, cmdLength, mh->sizeofcmds, path);
}
switch (cmd->cmd) {
//针对每种类型的command做不同的操作
case LC_DYLD_INFO:
case LC_DYLD_INFO_ONLY:
*compressed = true;
//mach-o文件为压缩的mach-o文件
break;
case LC_SEGMENT_COMMAND:
segCmd = (struct macho_segment_command*)cmd;
#if __MAC_OS_X_VERSION_MIN_REQUIRED
// rdar://problem/19617624 allow unmapped segments on OSX (but not iOS)
// 如果segCmd的文件长度大于segCmd的vmszie,抛出异常。
// todo:结合mach-o文件加载内核部分再详细解释
if ( (segCmd->filesize > segCmd->vmsize) && (segCmd->vmsize != 0) )
#else
if ( segCmd->filesize > segCmd->vmsize )
#endif
dyld::throwf("malformed mach-o image: segment load command %s filesize is larger than vmsize", segCmd->segname);
// ignore zero-sized segments
// 忽略长度为0的segments,计算segments的个数
if ( segCmd->vmsize != 0 )
*segCount += 1;
if ( context.codeSigningEnforced ) {
//如果有强制代码签名,则需要更加严格的segments格式合法性检测。
uintptr_t vmStart = segCmd->vmaddr;
uintptr_t vmSize = segCmd->vmsize;
uintptr_t vmEnd = vmStart + vmSize;
uintptr_t fileStart = segCmd->fileoff;
uintptr_t fileSize = segCmd->filesize;

//对参数做合法性检测,如果mach-o文件不合法则抛出异常
if ( (intptr_t)(vmEnd) < 0)
dyld::throwf("malformed mach-o image: segment load command %s vmsize too large", segCmd->segname);
if ( vmStart > vmEnd )
dyld::throwf("malformed mach-o image: segment load command %s wraps around address space", segCmd->segname);
if ( vmSize != fileSize ) {
if ( (segCmd->initprot == 0) && (fileSize != 0) )
dyld::throwf("malformed mach-o image: unaccessable segment %s has filesize != 0", segCmd->segname);
else if ( vmSize < fileSize )
dyld::throwf("malformed mach-o image: segment %s has vmsize < filesize", segCmd->segname);
}
if ( inCache ) {
if ( (fileSize != 0) && (segCmd->initprot == (VM_PROT_READ | VM_PROT_EXECUTE)) ) {
if ( foundLoadCommandSegment )
throw "load commands in multiple segments";
foundLoadCommandSegment = true;
}
}
else if ( (fileStart < mh->sizeofcmds) && (fileSize != 0) ) {
// <rdar://problem/7942521> all load commands must be in an executable segment
if ( (fileStart != 0) || (fileSize < (mh->sizeofcmds+sizeof(macho_header))) )
dyld::throwf("malformed mach-o image: segment %s does not span all load commands", segCmd->segname);
if ( segCmd->initprot != (VM_PROT_READ | VM_PROT_EXECUTE) )
dyld::throwf("malformed mach-o image: load commands found in segment %s with wrong permissions", segCmd->segname);
if ( foundLoadCommandSegment )
throw "load commands in multiple segments";
foundLoadCommandSegment = true;
}

const struct macho_section* const sectionsStart = (struct macho_section*)((char*)segCmd + sizeof(struct macho_segment_command));
const struct macho_section* const sectionsEnd = &sectionsStart[segCmd->nsects];
for (const struct macho_section* sect=sectionsStart; sect < sectionsEnd; ++sect) {
if (!inCache && sect->offset != 0 && ((sect->offset + sect->size) > (segCmd->fileoff + segCmd->filesize)))
dyld::throwf("malformed mach-o image: section %s,%s of '%s' exceeds segment %s booundary", sect->segname, sect->sectname, path, segCmd->segname);
}
}
break;
case LC_SEGMENT_COMMAND_WRONG:
dyld::throwf("malformed mach-o image: wrong LC_SEGMENT[_64] for architecture");
break;
case LC_LOAD_DYLIB:
case LC_LOAD_WEAK_DYLIB:
case LC_REEXPORT_DYLIB:
case LC_LOAD_UPWARD_DYLIB:
*libCount += 1;
break;
case LC_CODE_SIGNATURE:
*codeSigCmd = (struct linkedit_data_command*)cmd; // only support one LC_CODE_SIGNATURE per image
break;
case LC_ENCRYPTION_INFO:
case LC_ENCRYPTION_INFO_64:
*encryptCmd = (struct encryption_info_command*)cmd; // only support one LC_ENCRYPTION_INFO[_64] per image
break;
}
cmd = nextCmd;
}

if ( context.codeSigningEnforced && !foundLoadCommandSegment )
throw "load commands not in a segment";

// <rdar://problem/13145644> verify every segment does not overlap another segment
if ( context.codeSigningEnforced ) {
//如果设置了强制代码签名,则需要更加严格的检测,确认segments没有互相覆盖。
uintptr_t lastFileStart = 0;
uintptr_t linkeditFileStart = 0;
const struct load_command* cmd1 = startCmds;
for (uint32_t i = 0; i < cmd_count; ++i) {
if ( cmd1->cmd == LC_SEGMENT_COMMAND ) {
struct macho_segment_command* segCmd1 = (struct macho_segment_command*)cmd1;
uintptr_t vmStart1 = segCmd1->vmaddr;
uintptr_t vmEnd1 = segCmd1->vmaddr + segCmd1->vmsize;
uintptr_t fileStart1 = segCmd1->fileoff;
uintptr_t fileEnd1 = segCmd1->fileoff + segCmd1->filesize;

if (fileStart1 > lastFileStart)
lastFileStart = fileStart1;

if ( strcmp(&segCmd1->segname[0], "__LINKEDIT") == 0 ) {
linkeditFileStart = fileStart1;
}

const struct load_command* cmd2 = startCmds;
for (uint32_t j = 0; j < cmd_count; ++j) {
if ( cmd2 == cmd1 )
continue;
if ( cmd2->cmd == LC_SEGMENT_COMMAND ) {
struct macho_segment_command* segCmd2 = (struct macho_segment_command*)cmd2;
uintptr_t vmStart2 = segCmd2->vmaddr;
uintptr_t vmEnd2 = segCmd2->vmaddr + segCmd2->vmsize;
uintptr_t fileStart2 = segCmd2->fileoff;
uintptr_t fileEnd2 = segCmd2->fileoff + segCmd2->filesize;
if ( ((vmStart2 <= vmStart1) && (vmEnd2 > vmStart1) && (vmEnd1 > vmStart1))
|| ((vmStart2 >= vmStart1) && (vmStart2 < vmEnd1) && (vmEnd2 > vmStart2)) )
dyld::throwf("malformed mach-o image: segment %s vm overlaps segment %s", segCmd1->segname, segCmd2->segname);
if ( ((fileStart2 <= fileStart1) && (fileEnd2 > fileStart1) && (fileEnd1 > fileStart1))
|| ((fileStart2 >= fileStart1) && (fileStart2 < fileEnd1) && (fileEnd2 > fileStart2)) )
dyld::throwf("malformed mach-o image: segment %s file content overlaps segment %s", segCmd1->segname, segCmd2->segname);
}
cmd2 = (const struct load_command*)(((char*)cmd2)+cmd2->cmdsize);
}
}
cmd1 = (const struct load_command*)(((char*)cmd1)+cmd1->cmdsize);
}

if (lastFileStart != linkeditFileStart)
dyld::throwf("malformed mach-o image: __LINKEDIT must be last segment");
}

// fSegmentsArrayCount is only 8-bits
if ( *segCount > 255 )
dyld::throwf("malformed mach-o image: more than 255 segments in %s", path);

// fSegmentsArrayCount is only 8-bits
if ( *libCount > 4095 )
dyld::throwf("malformed mach-o image: more than 4095 dependent libraries in %s", path);

if ( needsAddedLibSystemDepency(*libCount, mh) )
*libCount = 1;
}

1.2.5 instantiateMainExecutable

classic与compressed的初始化大同小异,先看一下Classic的初始化函数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// create image for main executable
ImageLoaderMachOClassic* ImageLoaderMachOClassic::instantiateMainExecutable(const macho_header* mh, uintptr_t slide, const char* path,
unsigned int segCount, unsigned int libCount, const LinkContext& context)
{
ImageLoaderMachOClassic* image = ImageLoaderMachOClassic::instantiateStart(mh, path, segCount, libCount);
//实例化image

//为PIE设置所需的参数,Position Independent Executables
//todo:分析了解PIE
// set slide for PIE programs
image->setSlide(slide);

// for PIE record end of program, to know where to start loading dylibs
if ( slide != 0 )
fgNextPIEDylibAddress = (uintptr_t)image->getEnd();

//设置一堆参数
image->disableCoverageCheck();
image->instantiateFinish(context);
image->setMapped(context);

#if __i386__
// kernel may have mapped in __IMPORT segment read-only, we need it read/write to do binding
if ( image->fReadOnlyImportSegment ) {
for(unsigned int i=0; i < image->fSegmentsCount; ++i) {
if ( image->segIsReadOnlyImport(i) )
image->segMakeWritable(i, context);
}
}
#endif

//如果设置了context.verboseMapping,打印详细的LOG
if ( context.verboseMapping ) {
dyld::log("dyld: Main executable mapped %s\n", path);
for(unsigned int i=0, e=image->segmentCount(); i < e; ++i) {
const char* name = image->segName(i);
if ( (strcmp(name, "__PAGEZERO") == 0) || (strcmp(name, "__UNIXSTACK") == 0) )
dyld::log("%18s at 0x%08lX->0x%08lX\n", name, image->segPreferredLoadAddress(i), image->segPreferredLoadAddress(i)+image->segSize(i));
else
dyld::log("%18s at 0x%08lX->0x%08lX\n", name, image->segActualLoadAddress(i), image->segActualEndAddress(i));
}
}

return image;
}

可以看到加载的核心代码还在ImageLoaderMachOClassic::instantiateStart函数中。

1.2.6 instantiateStart

1
2
3
4
5
6
7
8
9
10
11
12
// construct ImageLoaderMachOClassic using "placement new" with SegmentMachO objects array at end
ImageLoaderMachOClassic* ImageLoaderMachOClassic::instantiateStart(const macho_header* mh, const char* path,
unsigned int segCount, unsigned int libCount)
{
size_t size = sizeof(ImageLoaderMachOClassic) + segCount * sizeof(uint32_t) + libCount * sizeof(ImageLoader*);
ImageLoaderMachOClassic* allocatedSpace = static_cast<ImageLoaderMachOClassic*>(malloc(size));
if ( allocatedSpace == NULL )
throw "malloc failed";
uint32_t* segOffsets = ((uint32_t*)(((uint8_t*)allocatedSpace) + sizeof(ImageLoaderMachOClassic)));
bzero(&segOffsets[segCount], libCount*sizeof(void*)); // zero out lib array
return new (allocatedSpace) ImageLoaderMachOClassic(mh, path, segCount, segOffsets, libCount);
}

这里仍然没有出现加载核心代码,只是根据之前获得的数据申请了内存,并计算了segments的指针。而

ImageLoaderMachOClassic的构造才是加载逻辑。

1.2.7 ImageLoaderMachO

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
ImageLoaderMachOClassic::ImageLoaderMachOClassic(const macho_header* mh, const char* path, 
unsigned int segCount, uint32_t segOffsets[], unsigned int libCount)
: ImageLoaderMachO(mh, path, segCount, segOffsets, libCount), fStrings(NULL), fSymbolTable(NULL), fDynamicInfo(NULL)
{
}

ImageLoaderMachO::ImageLoaderMachO(const macho_header* mh, const char* path, unsigned int segCount,
uint32_t segOffsets[], unsigned int libCount)
: ImageLoader(path, libCount), fCoveredCodeLength(0), fMachOData((uint8_t*)mh), fLinkEditBase(NULL), fSlide(0),
fEHFrameSectionOffset(0), fUnwindInfoSectionOffset(0), fDylibIDOffset(0),
fSegmentsCount(segCount), fIsSplitSeg(false), fInSharedCache(false),
#if TEXT_RELOC_SUPPORT
fTextSegmentRebases(false),
fTextSegmentBinds(false),
#endif
#if __i386__
fReadOnlyImportSegment(false),
#endif
fHasSubLibraries(false), fHasSubUmbrella(false), fInUmbrella(false), fHasDOFSections(false), fHasDashInit(false),
fHasInitializers(false), fHasTerminators(false), fRegisteredAsRequiresCoalescing(false)
{
fIsSplitSeg = ((mh->flags & MH_SPLIT_SEGS) != 0);

// construct SegmentMachO object for each LC_SEGMENT cmd using "placement new" to put
// each SegmentMachO object in array at end of ImageLoaderMachO object
const uint32_t cmd_count = mh->ncmds;
const struct load_command* const cmds = (struct load_command*)&fMachOData[sizeof(macho_header)];
const struct load_command* cmd = cmds;
for (uint32_t i = 0, segIndex=0; i < cmd_count; ++i) {
if ( cmd->cmd == LC_SEGMENT_COMMAND ) {
const struct macho_segment_command* segCmd = (struct macho_segment_command*)cmd;
// ignore zero-sized segments
if ( segCmd->vmsize != 0 ) {
// record offset of load command
segOffsets[segIndex++] = (uint32_t)((uint8_t*)segCmd - fMachOData);
}
}
cmd = (const struct load_command*)(((char*)cmd)+cmd->cmdsize);
}

}

这里也没有什么复杂的代码了,就是根据mach-o文件segments的规则将数据加载到内存中。

这边返回之后就剩下调用addimage函数了。

1.2.7 addimage

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
static void addImage(ImageLoader* image)
{

// add to master list
// 对所有images的容器原子添加image
allImagesLock();
sAllImages.push_back(image);
allImagesUnlock();

// update mapped ranges
// 更新内存分布的数据
uintptr_t lastSegStart = 0;
uintptr_t lastSegEnd = 0;
for(unsigned int i=0, e=image->segmentCount(); i < e; ++i) {
if ( image->segUnaccessible(i) )
continue;
uintptr_t start = image->segActualLoadAddress(i);
uintptr_t end = image->segActualEndAddress(i);
if ( start == lastSegEnd ) {
// two segments are contiguous, just record combined segments
lastSegEnd = end;
}
else {
// non-contiguous segments, record last (if any)
if ( lastSegEnd != 0 )
addMappedRange(image, lastSegStart, lastSegEnd);
lastSegStart = start;
lastSegEnd = end;
}
}
if ( lastSegEnd != 0 )
addMappedRange(image, lastSegStart, lastSegEnd);


if ( sEnv.DYLD_PRINT_LIBRARIES || (sEnv.DYLD_PRINT_LIBRARIES_POST_LAUNCH && (sMainExecutable!=NULL) && sMainExecutable->isLinked()) ) {
dyld::log("dyld: loaded: %s\n", image->getPath());
}

}

addimage也只是做了一些数据更新

  • 将image添加到管理容器中
  • 更新了内存分布的信息

0x02 小结

整个加载过程基本分为三个部分

  • 数据合法检测
  • 根据mach-o文件的头部信息,将segments的具体信息构建到image的实例中
  • 添加image到管理容器

重要的几个函数如下图所示:

重要函数

那么mach-o文件的加载流程更多的细节就需要通过分析xnu内核了。

reference

1.对dyld的分析(源码.代码签名等)

http://cocoahuke.com/2016/02/14/dyld%E5%8A%A0%E8%BD%BD%E8%BF%87%E7%A8%8B/

2.mach-o文件加载的全过程(1)

http://dongaxis.github.io/2015/01/01/mac-o%E6%96%87%E4%BB%B6%E5%8A%A0%E8%BD%BD%E7%9A%84%E5%85%A8%E8%BF%87%E7%A8%8B-1/

PS:

更多文章可以在我的学习分享博客http://BLOGIMAGE/找到,希望可以多多交流,不足之处还希望大家可以给与指正:)

文章目录
  1. 1. 0x00 内容简介
  2. 2. 0x01 源码分析
  3. 3. 1.1 数据结构
    1. 3.0.1. 1.1.1 macho_header
  4. 3.1. 1.1.2 ImageLoader
  5. 3.2. 1.2 源码分析
    1. 3.2.1. 1.2.1 _main
    2. 3.2.2. 1.2.2 instantiateFromLoadedImage
    3. 3.2.3. 1.2.3 isCompatibleMachO
    4. 3.2.4. 1.2.4 instantiateMainExecutable
    5. 3.2.5. 1.2.5 sniffLoadCommands
    6. 3.2.6. 1.2.5 instantiateMainExecutable
    7. 3.2.7. 1.2.6 instantiateStart
    8. 3.2.8. 1.2.7 ImageLoaderMachO
    9. 3.2.9. 1.2.7 addimage
  • 4. 0x02 小结
  • 5. reference
  • ,