2016-03-01

dyld中mach-o文件加载的简单分析

[TOC]

0x00 内容简介

Les commencements ont des charmes inexprimables.
万物初始，美妙不可名状。

通过阅读dyld源码，简单的分析macho在dyld中是如何被加载到内存中的。

0x01 源码分析

mach-o的格式和ELF大同小异，具体分析网上可以搜到很多。就不复述了。

Basics of the Mach-O file format

dyld源码下载

1.1 数据结构

1.1.1 macho_header

这个数据结构提供了对mach-o文件的头部做操作的API，函数都很简单不需要做过多的解释。

macho_header

1.1.2 ImageLoader

//
// ImageLoader is an abstract base class.  To support loading a particular executable
// file format, you make a concrete subclass of ImageLoader.
//
// For each executable file (dynamic shared object) in use, an ImageLoader is instantiated.
//
// The ImageLoader base class does the work of linking together images, but it knows nothing
// about any particular file format.
//
//
class  {ImageLoader
...
}

每一个加载的mach-o文件都会存在这样一个ImageLoader的实例。具体代码太多参考源码。

每一种具体的mach-o文件都会继承ImageLoader类，大致继承关系如图所示：

mageLoder

在加载时会根据mach-o的格式不同选择不同的实例。

1.2 源码分析

1.2.1 _main

//
// Entry point for dyld.  The kernel loads dyld and jumps to __dyld_start which
// sets up some registers and call this function.
//
// Returns address of main() in target program which __dyld_start jumps to
//
uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, 
		int argc, const char* argv[], const char* envp[], const char* apple[], 
		uintptr_t* startGlue)
{
  ... //对全局变量一通操作
	try {
		// add dyld itself to UUID list
		addDyldImageToUUIDList();
		CRSetCrashLogMessage(sLoadingCrashMessage);
		// instantiate ImageLoader for main executable
		sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);//加载MACHO到image
		... //不关心了
    }
}

在调用_main函数之后，大致做了这么几件事情：

选择运行环境(IOS模拟器)
初始化数据、设置全局变量，上下文信息
检测文件是否Restricted

走完这些流程，就会调用instantiateFromLoadedImage函数开始加载mach-o并实例化为ImageLoader。

1.2.2 instantiateFromLoadedImage

// The kernel maps in main executable before dyld gets control.  We need to 
// make an ImageLoader* for the already mapped in main executable.
static ImageLoader* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path)
{
	// try mach-o loader
	if ( isCompatibleMachO((const uint8_t*)mh, path) ) {//检测是否合法
		ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext); //加载
		addImage(image);
		return image;
	}
	
	throw "main executable not a known format";
}

这个函数的逻辑非常的简单，总过做了三件事：

检测macho文件是否符合条件
初始化实例
添加image到管理的模块中

1.2.3 isCompatibleMachO

先看isCompatibleMacho函数。

通过注释可以知道，满足下面三个条件，即可认为是符合要求的mach-o文件。

mach_header中的subtype当前CPU版本是否支持。

mach_header中的subtype和当前正在运行的CPU版本相同。
mach_header中的subtype在该CPU的所有版本都可以处理。

内核中machine.h定义了CPU_TYPE与CPU_SUBTYPE。

#define CPU_TYPE_ANY		((cpu_type_t) -1)

#define CPU_TYPE_VAX		((cpu_type_t) 1)
/* skip				((cpu_type_t) 2)	*/
/* skip				((cpu_type_t) 3)	*/
/* skip				((cpu_type_t) 4)	*/
/* skip				((cpu_type_t) 5)	*/
#define	CPU_TYPE_MC680x0	((cpu_type_t) 6)
#define CPU_TYPE_X86		((cpu_type_t) 7)
#define CPU_TYPE_I386		CPU_TYPE_X86		/* compatibility */
#define	CPU_TYPE_X86_64		(CPU_TYPE_X86 | CPU_ARCH_ABI64)

/* skip CPU_TYPE_MIPS		((cpu_type_t) 8)	*/
/* skip 			((cpu_type_t) 9)	*/
#define CPU_TYPE_MC98000	((cpu_type_t) 10)
#define CPU_TYPE_HPPA           ((cpu_type_t) 11)
#define CPU_TYPE_ARM		((cpu_type_t) 12)
#define CPU_TYPE_ARM64          (CPU_TYPE_ARM | CPU_ARCH_ABI64)
#define CPU_TYPE_MC88000	((cpu_type_t) 13)
#define CPU_TYPE_SPARC		((cpu_type_t) 14)
#define CPU_TYPE_I860		((cpu_type_t) 15)
/* skip	CPU_TYPE_ALPHA		((cpu_type_t) 16)	*/
/* skip				((cpu_type_t) 17)	*/
#define CPU_TYPE_POWERPC		((cpu_type_t) 18)
#define CPU_TYPE_POWERPC64		(CPU_TYPE_POWERPC | CPU_ARCH_ABI64)
/*
 *	Machine subtypes (these are defined here, instead of in a machine
 *	dependent directory, so that any program can get all definitions
 *	regardless of where is it compiled).
 */

/*
 * Capability bits used in the definition of cpu_subtype.
 */
#define CPU_SUBTYPE_MASK	0xff000000	/* mask for feature flags */
#define CPU_SUBTYPE_LIB64	0x80000000	/* 64 bit libraries */


/*
 *	Object files that are hand-crafted to run on any
 *	implementation of an architecture are tagged with
 *	CPU_SUBTYPE_MULTIPLE.  This functions essentially the same as
 *	the "ALL" subtype of an architecture except that it allows us
 *	to easily find object files that may need to be modified
 *	whenever a new implementation of an architecture comes out.
 *
 *	It is the responsibility of the implementor to make sure the
 *	software handles unsupported implementations elegantly.
 */
#define	CPU_SUBTYPE_MULTIPLE		((cpu_subtype_t) -1)
#define CPU_SUBTYPE_LITTLE_ENDIAN	((cpu_subtype_t) 0)
#define CPU_SUBTYPE_BIG_ENDIAN		((cpu_subtype_t) 1)

/*
 *     Machine threadtypes.
 *     This is none - not defined - for most machine types/subtypes.
 */
#define CPU_THREADTYPE_NONE		((cpu_threadtype_t) 0)

/*
 *	VAX subtypes (these do *not* necessary conform to the actual cpu
 *	ID assigned by DEC available via the SID register).
 */

#define	CPU_SUBTYPE_VAX_ALL	((cpu_subtype_t) 0) 
#define CPU_SUBTYPE_VAX780	((cpu_subtype_t) 1)
#define CPU_SUBTYPE_VAX785	((cpu_subtype_t) 2)
#define CPU_SUBTYPE_VAX750	((cpu_subtype_t) 3)
#define CPU_SUBTYPE_VAX730	((cpu_subtype_t) 4)
#define CPU_SUBTYPE_UVAXI	((cpu_subtype_t) 5)
#define CPU_SUBTYPE_UVAXII	((cpu_subtype_t) 6)
#define CPU_SUBTYPE_VAX8200	((cpu_subtype_t) 7)
#define CPU_SUBTYPE_VAX8500	((cpu_subtype_t) 8)
#define CPU_SUBTYPE_VAX8600	((cpu_subtype_t) 9)
#define CPU_SUBTYPE_VAX8650	((cpu_subtype_t) 10)
#define CPU_SUBTYPE_VAX8800	((cpu_subtype_t) 11)
#define CPU_SUBTYPE_UVAXIII	((cpu_subtype_t) 12)

简单的说cputype就是cpu的平台，x86、ARM、PROWERPC等。而subtype就是不同平台的不同版本，例如arm6、arm7。

bool isCompatibleMachO(const uint8_t* firstPage, const char* path)
{
#if CPU_SUBTYPES_SUPPORTED
  	// 支持检测CPU版本的情况
	// It is deemed compatible if any of the following are true:
	//  1) mach_header subtype is in list of compatible subtypes for running processor
	//  2) mach_header subtype is same as running processor subtype
	//  3) mach_header subtype runs on all processor variants
	const mach_header* mh = (mach_header*)firstPage;
	if ( mh->magic == sMainExecutableMachHeader->magic ) { 
		//传入的mach-o文件的magic是否和加载的主mach-o文件是否相同
		//这一次运行到这里的时候mh与sMainExecutableMacHeader应该是指向同一个mach-o的
		if ( mh->cputype == sMainExecutableMachHeader->cputype ) {
			if ( (mh->cputype & CPU_TYPE_MASK) == sHostCPU ) {
				//加载的mh是否在当前平台可以运行。
				// get preference ordered list of subtypes that this machine can use
				const cpu_subtype_t* subTypePreferenceList = findCPUSubtypeList(mh->cputype, sHostCPUsubtype);
				if ( subTypePreferenceList != NULL ) {
                  	  //如果该CPU的版本存在一个检测的列表，则进行检测
					// if image's subtype is in the list, it is compatible
					for (const cpu_subtype_t* p = subTypePreferenceList; *p != CPU_SUBTYPE_END_OF_LIST; ++p) {
						if ( *p == mh->cpusubtype )
							return true;
					}
					// have list and not in list, so not compatible
					throwf("incompatible cpu-subtype: 0x%08X in %s", mh->cpusubtype, path);
				}
				// unknown cpu sub-type, but if exact match for current subtype then ok to use
				if ( mh->cpusubtype == sHostCPUsubtype ) 
                  	//加载的mh与当前运行环境的CPU版本相同
					return true;
			}
			
			// cpu type has no ordered list of subtypes
             // 这两种CPU支持所有版本的mach-o文件
			switch (mh->cputype) {
				case CPU_TYPE_I386:
				case CPU_TYPE_X86_64:
					// subtypes are not used or these architectures
					return true;
			}
		}
	}
#else
	// For architectures that don't support cpu-sub-types
	// this just check the cpu type.
  	// 不支持检测CPU版本的时候，就只判断是mh的版本与CPU相同。
	const mach_header* mh = (mach_header*)firstPage;
	if ( mh->magic == sMainExecutableMachHeader->magic ) {
		if ( mh->cputype == sMainExecutableMachHeader->cputype ) {
			return true;
		}
	}
#endif
	return false;
}

1.2.4 instantiateMainExecutable

函数流程主要是通过sniffloadcommands来判断mach-o文件是否是压缩过的，然后选择相应的类进行实例化。

// create image for main executable
ImageLoader* ImageLoaderMachO::instantiateMainExecutable(const macho_header* mh, uintptr_t slide, const char* path, const LinkContext& context)
{
	//dyld::log("ImageLoader=%ld, ImageLoaderMachO=%ld, ImageLoaderMachOClassic=%ld, ImageLoaderMachOCompressed=%ld\n",
	//	sizeof(ImageLoader), sizeof(ImageLoaderMachO), sizeof(ImageLoaderMachOClassic), sizeof(ImageLoaderMachOCompressed));
	bool compressed;
	unsigned int segCount;
	unsigned int libCount;
	const linkedit_data_command* codeSigCmd;
	const encryption_info_command* encryptCmd;
	sniffLoadCommands(mh, path, false, &compressed, &segCount, &libCount, context, &codeSigCmd, &encryptCmd); //判断macho是普通的还是压缩的
	// instantiate concrete class based on content of load commands
	if ( compressed ) 
		return ImageLoaderMachOCompressed::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
	else
#if SUPPORT_CLASSIC_MACHO
		return ImageLoaderMachOClassic::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
#else
		throw "missing LC_DYLD_INFO load command";
#endif
}

这个函数的逻辑也非常简单。流程图如下图所示：

instantiateMainExecutable

1.2.5 sniffLoadCommands

这个函数主要做了两件事情：

判断Mach-O文件是classic的还是compressed的。
获取mach-O文件的segment的数量。

// determine if this mach-o file has classic or compressed LINKEDIT and number of segments it has
void ImageLoaderMachO::sniffLoadCommands(const macho_header* mh, const char* path, bool inCache, bool* compressed,
											unsigned int* segCount, unsigned int* libCount, const LinkContext& context,
											const linkedit_data_command** codeSigCmd,
											const encryption_info_command** encryptCmd)
{
	*compressed = false;
	*segCount = 0;
	*libCount = 0;
	*codeSigCmd = NULL;
	*encryptCmd = NULL;

	const uint32_t cmd_count = mh->ncmds;
	//获取cmds的个数,保存在mach-o文件的头部ncmds字段中
	const struct load_command* const startCmds    = (struct load_command*)(((uint8_t*)mh) + sizeof(macho_header));
	//获取command段开始的地址，startCmds = mach-o地址 + mach-o头部长度
	const struct load_command* const endCmds = (struct load_command*)(((uint8_t*)mh) + sizeof(macho_header) + mh->sizeofcmds);
	//获取command段结束的地址，endCmds = mach-o地址 + mach-o头部长度 + cmds所用的长度
	const struct load_command* cmd = startCmds;
	bool foundLoadCommandSegment = false;
	for (uint32_t i = 0; i < cmd_count; ++i) {
		//遍历每一个command
		uint32_t cmdLength = cmd->cmdsize;
		struct macho_segment_command* segCmd;
		if ( cmdLength < 8 ) {
			//格式检测：长度就不对抛出异常
			dyld::throwf("malformed mach-o image: load command #%d length (%u) too small in %s",
											   i, cmdLength, path);
		}
		const struct load_command* const nextCmd = (const struct load_command*)(((char*)cmd)+cmdLength);
		if ( (nextCmd > endCmds) || (nextCmd < cmd) ) {
			//格式检测：通过当前command长度寻找nextcmd时，如果nextcmd指不合法的位置就抛出异常
			dyld::throwf("malformed mach-o image: load command #%d length (%u) would exceed sizeofcmds (%u) in %s",
											   i, cmdLength, mh->sizeofcmds, path);
		}
		switch (cmd->cmd) {
			//针对每种类型的command做不同的操作
			case LC_DYLD_INFO:
			case LC_DYLD_INFO_ONLY:
				*compressed = true;
				//mach-o文件为压缩的mach-o文件
				break;
			case LC_SEGMENT_COMMAND:
				segCmd = (struct macho_segment_command*)cmd;
#if __MAC_OS_X_VERSION_MIN_REQUIRED
				// rdar://problem/19617624 allow unmapped segments on OSX (but not iOS)
				// 如果segCmd的文件长度大于segCmd的vmszie，抛出异常。
				// todo:结合mach-o文件加载内核部分再详细解释
				if ( (segCmd->filesize > segCmd->vmsize) && (segCmd->vmsize != 0) )
#else
				if ( segCmd->filesize > segCmd->vmsize )
#endif
				    dyld::throwf("malformed mach-o image: segment load command %s filesize is larger than vmsize", segCmd->segname);
				// ignore zero-sized segments
				// 忽略长度为0的segments，计算segments的个数
				if ( segCmd->vmsize != 0 )
					*segCount += 1;
				if ( context.codeSigningEnforced ) {
					//如果有强制代码签名，则需要更加严格的segments格式合法性检测。
					uintptr_t vmStart   = segCmd->vmaddr;
					uintptr_t vmSize    = segCmd->vmsize;
					uintptr_t vmEnd     = vmStart + vmSize;
					uintptr_t fileStart = segCmd->fileoff;
					uintptr_t fileSize  = segCmd->filesize;
					
					//对参数做合法性检测，如果mach-o文件不合法则抛出异常
					if ( (intptr_t)(vmEnd) < 0)
						dyld::throwf("malformed mach-o image: segment load command %s vmsize too large", segCmd->segname);
					if ( vmStart > vmEnd )
						dyld::throwf("malformed mach-o image: segment load command %s wraps around address space", segCmd->segname);
					if ( vmSize != fileSize ) {
						if ( (segCmd->initprot == 0) && (fileSize != 0) )
							dyld::throwf("malformed mach-o image: unaccessable segment %s has filesize != 0", segCmd->segname);
						else if ( vmSize < fileSize )
							dyld::throwf("malformed mach-o image: segment %s has vmsize < filesize", segCmd->segname);
					}
					if ( inCache ) {
						if ( (fileSize != 0) && (segCmd->initprot == (VM_PROT_READ | VM_PROT_EXECUTE)) ) {
							if ( foundLoadCommandSegment )
								throw "load commands in multiple segments";
							foundLoadCommandSegment = true;
						}
					}
					else if ( (fileStart < mh->sizeofcmds) && (fileSize != 0) ) {
						// <rdar://problem/7942521> all load commands must be in an executable segment
						if ( (fileStart != 0) || (fileSize < (mh->sizeofcmds+sizeof(macho_header))) )
							dyld::throwf("malformed mach-o image: segment %s does not span all load commands", segCmd->segname); 
						if ( segCmd->initprot != (VM_PROT_READ | VM_PROT_EXECUTE) ) 
							dyld::throwf("malformed mach-o image: load commands found in segment %s with wrong permissions", segCmd->segname); 
						if ( foundLoadCommandSegment )
							throw "load commands in multiple segments";
						foundLoadCommandSegment = true;
					}

					const struct macho_section* const sectionsStart = (struct macho_section*)((char*)segCmd + sizeof(struct macho_segment_command));
					const struct macho_section* const sectionsEnd = &sectionsStart[segCmd->nsects];
					for (const struct macho_section* sect=sectionsStart; sect < sectionsEnd; ++sect) {
						if (!inCache && sect->offset != 0 && ((sect->offset + sect->size) > (segCmd->fileoff + segCmd->filesize)))
							dyld::throwf("malformed mach-o image: section %s,%s of '%s' exceeds segment %s booundary", sect->segname, sect->sectname, path, segCmd->segname);
					}
				}
				break;
			case LC_SEGMENT_COMMAND_WRONG:
				dyld::throwf("malformed mach-o image: wrong LC_SEGMENT[_64] for architecture"); 
				break;
			case LC_LOAD_DYLIB:
			case LC_LOAD_WEAK_DYLIB:
			case LC_REEXPORT_DYLIB:
			case LC_LOAD_UPWARD_DYLIB:
				*libCount += 1;
				break;
			case LC_CODE_SIGNATURE:
				*codeSigCmd = (struct linkedit_data_command*)cmd; // only support one LC_CODE_SIGNATURE per image
				break;
			case LC_ENCRYPTION_INFO:
			case LC_ENCRYPTION_INFO_64:
				*encryptCmd = (struct encryption_info_command*)cmd; // only support one LC_ENCRYPTION_INFO[_64] per image
				break;
		}
		cmd = nextCmd;
	}

	if ( context.codeSigningEnforced && !foundLoadCommandSegment )
		throw "load commands not in a segment";

	// <rdar://problem/13145644> verify every segment does not overlap another segment
	if ( context.codeSigningEnforced ) {
		//如果设置了强制代码签名，则需要更加严格的检测，确认segments没有互相覆盖。
		uintptr_t lastFileStart = 0;
		uintptr_t linkeditFileStart = 0;
		const struct load_command* cmd1 = startCmds;
		for (uint32_t i = 0; i < cmd_count; ++i) {
			if ( cmd1->cmd == LC_SEGMENT_COMMAND ) {
				struct macho_segment_command* segCmd1 = (struct macho_segment_command*)cmd1;
				uintptr_t vmStart1   = segCmd1->vmaddr;
				uintptr_t vmEnd1     = segCmd1->vmaddr + segCmd1->vmsize;
				uintptr_t fileStart1 = segCmd1->fileoff;
				uintptr_t fileEnd1   = segCmd1->fileoff + segCmd1->filesize;

				if (fileStart1 > lastFileStart)
					lastFileStart = fileStart1;

				if ( strcmp(&segCmd1->segname[0], "__LINKEDIT") == 0 ) {
					linkeditFileStart = fileStart1;
				}

				const struct load_command* cmd2 = startCmds;
				for (uint32_t j = 0; j < cmd_count; ++j) {
					if ( cmd2 == cmd1 )
						continue;
					if ( cmd2->cmd == LC_SEGMENT_COMMAND ) {
						struct macho_segment_command* segCmd2 = (struct macho_segment_command*)cmd2;
						uintptr_t vmStart2   = segCmd2->vmaddr;
						uintptr_t vmEnd2     = segCmd2->vmaddr + segCmd2->vmsize;
						uintptr_t fileStart2 = segCmd2->fileoff;
						uintptr_t fileEnd2   = segCmd2->fileoff + segCmd2->filesize;
						if ( ((vmStart2 <= vmStart1) && (vmEnd2 > vmStart1) && (vmEnd1 > vmStart1)) 
						|| ((vmStart2 >= vmStart1) && (vmStart2 < vmEnd1) && (vmEnd2 > vmStart2)) )
							dyld::throwf("malformed mach-o image: segment %s vm overlaps segment %s", segCmd1->segname, segCmd2->segname);
						if ( ((fileStart2 <= fileStart1) && (fileEnd2 > fileStart1) && (fileEnd1 > fileStart1))
						  || ((fileStart2 >= fileStart1) && (fileStart2 < fileEnd1) && (fileEnd2 > fileStart2)) )
							dyld::throwf("malformed mach-o image: segment %s file content overlaps segment %s", segCmd1->segname, segCmd2->segname); 
					}
					cmd2 = (const struct load_command*)(((char*)cmd2)+cmd2->cmdsize);
				}
			}
			cmd1 = (const struct load_command*)(((char*)cmd1)+cmd1->cmdsize);
		}

		if (lastFileStart != linkeditFileStart)
			dyld::throwf("malformed mach-o image: __LINKEDIT must be last segment");
	}

	// fSegmentsArrayCount is only 8-bits
	if ( *segCount > 255 )
		dyld::throwf("malformed mach-o image: more than 255 segments in %s", path);

	// fSegmentsArrayCount is only 8-bits
	if ( *libCount > 4095 )
		dyld::throwf("malformed mach-o image: more than 4095 dependent libraries in %s", path);

	if ( needsAddedLibSystemDepency(*libCount, mh) )
		*libCount = 1;
}

1.2.5 instantiateMainExecutable

classic与compressed的初始化大同小异，先看一下Classic的初始化函数。

// create image for main executable
ImageLoaderMachOClassic* ImageLoaderMachOClassic::instantiateMainExecutable(const macho_header* mh, uintptr_t slide, const char* path, 
																		unsigned int segCount, unsigned int libCount, const LinkContext& context)
{
	ImageLoaderMachOClassic* image = ImageLoaderMachOClassic::instantiateStart(mh, path, segCount, libCount);
	//实例化image
	
	//为PIE设置所需的参数，Position Independent Executables
	//todo:分析了解PIE
	// set slide for PIE programs
	image->setSlide(slide);

	// for PIE record end of program, to know where to start loading dylibs
	if ( slide != 0 )
		fgNextPIEDylibAddress = (uintptr_t)image->getEnd();

	//设置一堆参数
	image->disableCoverageCheck();
	image->instantiateFinish(context);
	image->setMapped(context);

#if __i386__
	// kernel may have mapped in __IMPORT segment read-only, we need it read/write to do binding
	if ( image->fReadOnlyImportSegment ) {
		for(unsigned int i=0; i < image->fSegmentsCount; ++i) {
			if ( image->segIsReadOnlyImport(i) )
				image->segMakeWritable(i, context);
		}
	}
#endif
	
	//如果设置了context.verboseMapping，打印详细的LOG
	if ( context.verboseMapping ) {
		dyld::log("dyld: Main executable mapped %s\n", path);
		for(unsigned int i=0, e=image->segmentCount(); i < e; ++i) {
			const char* name = image->segName(i);
			if ( (strcmp(name, "__PAGEZERO") == 0) || (strcmp(name, "__UNIXSTACK") == 0)  )
				dyld::log("%18s at 0x%08lX->0x%08lX\n", name, image->segPreferredLoadAddress(i), image->segPreferredLoadAddress(i)+image->segSize(i));
			else
				dyld::log("%18s at 0x%08lX->0x%08lX\n", name, image->segActualLoadAddress(i), image->segActualEndAddress(i));
		}
	}

	return image;
}

可以看到加载的核心代码还在ImageLoaderMachOClassic::instantiateStart函数中。

1.2.6 instantiateStart

// construct ImageLoaderMachOClassic using "placement new" with SegmentMachO objects array at end
ImageLoaderMachOClassic* ImageLoaderMachOClassic::instantiateStart(const macho_header* mh, const char* path,
																		unsigned int segCount, unsigned int libCount)
{
	size_t size = sizeof(ImageLoaderMachOClassic) + segCount * sizeof(uint32_t) + libCount * sizeof(ImageLoader*);
	ImageLoaderMachOClassic* allocatedSpace = static_cast<ImageLoaderMachOClassic*>(malloc(size));
	if ( allocatedSpace == NULL )
		throw "malloc failed";
	uint32_t* segOffsets = ((uint32_t*)(((uint8_t*)allocatedSpace) + sizeof(ImageLoaderMachOClassic)));
	bzero(&segOffsets[segCount], libCount*sizeof(void*));	// zero out lib array
	return new (allocatedSpace) ImageLoaderMachOClassic(mh, path, segCount, segOffsets, libCount);
}

这里仍然没有出现加载核心代码，只是根据之前获得的数据申请了内存，并计算了segments的指针。而

ImageLoaderMachOClassic的构造才是加载逻辑。

1.2.7 ImageLoaderMachO

ImageLoaderMachOClassic::ImageLoaderMachOClassic(const macho_header* mh, const char* path, 
													unsigned int segCount, uint32_t segOffsets[], unsigned int libCount)
 : ImageLoaderMachO(mh, path, segCount, segOffsets, libCount), fStrings(NULL), fSymbolTable(NULL), fDynamicInfo(NULL)
{
}

ImageLoaderMachO::ImageLoaderMachO(const macho_header* mh, const char* path, unsigned int segCount, 
																uint32_t segOffsets[], unsigned int libCount)
 : ImageLoader(path, libCount), fCoveredCodeLength(0), fMachOData((uint8_t*)mh), fLinkEditBase(NULL), fSlide(0),
	fEHFrameSectionOffset(0), fUnwindInfoSectionOffset(0), fDylibIDOffset(0), 
fSegmentsCount(segCount), fIsSplitSeg(false), fInSharedCache(false),
#if TEXT_RELOC_SUPPORT
	fTextSegmentRebases(false),
	fTextSegmentBinds(false),
#endif
#if __i386__
	fReadOnlyImportSegment(false),
#endif
	fHasSubLibraries(false), fHasSubUmbrella(false), fInUmbrella(false), fHasDOFSections(false), fHasDashInit(false),
	fHasInitializers(false), fHasTerminators(false), fRegisteredAsRequiresCoalescing(false)
{
	fIsSplitSeg = ((mh->flags & MH_SPLIT_SEGS) != 0);        

	// construct SegmentMachO object for each LC_SEGMENT cmd using "placement new" to put 
	// each SegmentMachO object in array at end of ImageLoaderMachO object
	const uint32_t cmd_count = mh->ncmds;
	const struct load_command* const cmds = (struct load_command*)&fMachOData[sizeof(macho_header)];
	const struct load_command* cmd = cmds;
	for (uint32_t i = 0, segIndex=0; i < cmd_count; ++i) {
		if ( cmd->cmd == LC_SEGMENT_COMMAND ) {
			const struct macho_segment_command* segCmd = (struct macho_segment_command*)cmd;
			// ignore zero-sized segments
			if ( segCmd->vmsize != 0 ) {
				// record offset of load command
				segOffsets[segIndex++] = (uint32_t)((uint8_t*)segCmd - fMachOData);
			}
		}
		cmd = (const struct load_command*)(((char*)cmd)+cmd->cmdsize);
	}

}

这里也没有什么复杂的代码了，就是根据mach-o文件segments的规则将数据加载到内存中。

这边返回之后就剩下调用addimage函数了。

1.2.7 addimage

static void addImage(ImageLoader* image)
{
	// add to master list
	// 对所有images的容器原子添加image
    allImagesLock();
        sAllImages.push_back(image);
    allImagesUnlock();
	
	// update mapped ranges
	// 更新内存分布的数据
	uintptr_t lastSegStart = 0;
	uintptr_t lastSegEnd = 0;
	for(unsigned int i=0, e=image->segmentCount(); i < e; ++i) {
		if ( image->segUnaccessible(i) ) 
			continue;
		uintptr_t start = image->segActualLoadAddress(i);
		uintptr_t end = image->segActualEndAddress(i);
		if ( start == lastSegEnd ) {
			// two segments are contiguous, just record combined segments
			lastSegEnd = end;
		}
		else {
			// non-contiguous segments, record last (if any)
			if ( lastSegEnd != 0 )
				addMappedRange(image, lastSegStart, lastSegEnd);
			lastSegStart = start;
			lastSegEnd = end;
		}		
	}
	if ( lastSegEnd != 0 )
		addMappedRange(image, lastSegStart, lastSegEnd);

	
	if ( sEnv.DYLD_PRINT_LIBRARIES || (sEnv.DYLD_PRINT_LIBRARIES_POST_LAUNCH && (sMainExecutable!=NULL) && sMainExecutable->isLinked()) ) {
		dyld::log("dyld: loaded: %s\n", image->getPath());
	}
	
}

addimage也只是做了一些数据更新

将image添加到管理容器中
更新了内存分布的信息

0x02 小结

整个加载过程基本分为三个部分

数据合法检测
根据mach-o文件的头部信息,将segments的具体信息构建到image的实例中
添加image到管理容器

重要的几个函数如下图所示：

重要函数

那么mach-o文件的加载流程更多的细节就需要通过分析xnu内核了。

reference

1.对dyld的分析(源码.代码签名等)

http://cocoahuke.com/2016/02/14/dyld%E5%8A%A0%E8%BD%BD%E8%BF%87%E7%A8%8B/

2.mach-o文件加载的全过程(1)

http://dongaxis.github.io/2015/01/01/mac-o%E6%96%87%E4%BB%B6%E5%8A%A0%E8%BD%BD%E7%9A%84%E5%85%A8%E8%BF%87%E7%A8%8B-1/

PS：

更多文章可以在我的学习分享博客http://BLOGIMAGE/找到，希望可以多多交流，不足之处还希望大家可以给与指正：）

本文标题:dyld中mach-o文件加载的简单分析

文章作者:mrh

发布时间:2016年03月01日 - 17时06分

最后更新:2016年03月23日 - 23时35分

原始链接:http://turingh.github.io/2016/03/01/dyld中macho加载的简单分析/

许可协议: "署名-非商用-相同方式共享 3.0" 转载请保留原文链接及作者。