Android脱壳2-so文件加载过程剖析

youncyb 发布于 20 天前 266 次阅读 reverse


这篇文章深入解析了 Android 系统中 so 文件加载的流程,重点介绍了 Android 应用在加载本地库时的机制和各个关键步骤。文章分析了 so 文件的内存布局、加载器工作原理以及相关的系统调用

1. 简介

Android加固过程有一步是对so文件进行加固,其中有一种是:“自定义loader加载so”。这部分就涉及到了so的加载原理,所以为了对付这种加固手段,必须要对so的加载过程有一个深入的了解。

在 Android 7.0 之前,Java 核心库源码在libcore/luni/下,luni 代表 lang、util、net、io,是 Java 中最常见的包;Android 7.0 及其以上,核心库在libcore/ojluni/下,oj 代表 OpenJDK。[1]本次分析建立在Android10-d4-release,并且需要对so格式有一个较深的认识。

2. System.loadLibrary

Java有两种方式加载库文件,一种是System.load(),另一种是System.loadLibrary()。前一种需要传入绝对路径,后一种只需要指出库文件名称即可,安卓一般采取后一种方式。在分析前,可以先看看图1,直观感受调用链:

image-20220301170600486

图1

System.loadLibrary会调用runtime.loadLibrary0方法,首先通过loader.findLibrary()方法找到so的完整路径,这里的loader一般为PathClassLoader,PathClassLoader拥有该dex的完整信息,所以可以找出so的完整路径。

BootClassLoader用于Android系统加载常用类,没有实现findLibrary方法,所以此处被当成null对待。

private synchronized void loadLibrary0(ClassLoader loader, Class<?> callerClass, String libname) {
  ....
    String libraryName = libname;
  if (loader != null && !(loader instanceof BootClassLoader)) {
    String filename = loader.findLibrary(libraryName); //获取完整的so路径
    ....
      String error = nativeLoad(filename, loader); //转入native方法
    if (error != null) {
      throw new UnsatisfiedLinkError(error);
    }
    return;
  }
  ....

中途经过几次直接调用方法,进入了JavaVMExt::LoadNativeLibrary方法,首先判断是否已经加载过该so,如果未加载则通过android::OpenNativeLibrary加载。

bool JavaVMExt::LoadNativeLibrary(JNIEnv* env,
                                  const std::string& path,
                                  jobject class_loader,
                                  jclass caller_class,
                                  std::string* error_msg) {
  ....
  SharedLibrary* library;
  Thread* self = Thread::Current();
  {
    MutexLock mu(self, *Locks::jni_libraries_lock_);
    library = libraries_->Get(path); // 从内存中加载so
  }
  
  ....
    
  if (library != nullptr) {
    if (library->GetClassLoaderAllocator() != class_loader_allocator) {// 同一个so不能被不同的classloader加载
			
      ....
      
    if (!library->CheckOnLoadResult()) { // 该函数会等待JNI_Onload函数执行完毕
      StringAppendF(error_msg, "JNI_OnLoad failed on a previous attempt "
          "to load \"%s\"", path.c_str());
      return false;
    }
    return true;
  }
    ....
      
  ScopedLocalRef<jstring> library_path(env, GetLibrarySearchPath(env, class_loader));
  ...
  void* handle = android::OpenNativeLibrary(
      env,
      runtime_->GetTargetSdkVersion(),
      path_str,
      class_loader,
      (caller_location.empty() ? nullptr : caller_location.c_str()),
      library_path.get(),
      &needs_native_bridge,
      &nativeloader_error_msg);
  ...
  // 将加载完毕的so指针放入内存里
  // Create a new entry.
  // TODO: move the locking (and more of this logic) into Libraries.
  bool created_library = false;
  {
    // Create SharedLibrary ahead of taking the libraries lock to maintain lock ordering.
    std::unique_ptr<SharedLibrary> new_library(
        new SharedLibrary(env,
                          self,
                          path,
                          handle,
                          needs_native_bridge,
                          class_loader,
                          class_loader_allocator));

    MutexLock mu(self, *Locks::jni_libraries_lock_);
    library = libraries_->Get(path);
    if (library == nullptr) {  // We won race to get libraries_lock.
      library = new_library.release();
      libraries_->Put(path, library);
      created_library = true;
    }
  }

进入OpenNativeLibrary根据class_loader是否存在进入不同的分支,但跟下去后最终都会调用do_dlopen()函数。

static void* dlopen_ext(const char* filename,
                        int flags,
                        const android_dlextinfo* extinfo,
                        const void* caller_addr) {
  ScopedPthreadMutexLocker locker(&g_dl_mutex);
  g_linker_logger.ResetState();
  void* result = do_dlopen(filename, flags, extinfo, caller_addr);
  if (result == nullptr) {
    __bionic_format_dlerror("dlopen failed", linker_get_error_buffer());
    return nullptr;
  }
  return result;
}

3. do_dlopen

dlopenandroid_dlopen_ext最终都会调用do_dlopen,首先理解其传入的参数:path和RTLD_NOW。path指代so文件所在标准绝对路径,RTLD_NOW说明在加载so时,把so中包含的符号(变量、函数)全部重定位到内存。对应的还有RTLD_LAZY,使用某个函数时才将该函数重定位。

do_dlopen函数原型如下:

  • name:so的名字
  • flags:固定为RTLD_NOW
  • extinfo:其他信息
  • caller_addr:当前函数返回地址,由__builtin_return_address(0)获取。
void* do_dlopen(const char* name, int flags,
                const android_dlextinfo* extinfo,
                const void* caller_addr)

首先调用find_containing_library,查看当前函数返回地址是位于已加载的系统动态库的范围,如果是,则返回该动态库的soinfo*指针。这一步是为了后面判断是否调用了一些禁止调用的系统so。

soinfo* const caller = find_containing_library(caller_addr);

soinfo* find_containing_library(const void* p) {
  ElfW(Addr) address = reinterpret_cast<ElfW(Addr)>(p);
  for (soinfo* si = solist_get_head(); si != nullptr; si = si->next) {
    if (address < si->base || address - si->base >= si->size) {
      continue;
    }
    ElfW(Addr) vaddr = address - si->load_bias;
    for (size_t i = 0; i != si->phnum; ++i) {
      const ElfW(Phdr)* phdr = &si->phdr[i];
      if (phdr->p_type != PT_LOAD) {
        continue;
      }
      if (vaddr >= phdr->p_vaddr && vaddr < phdr->p_vaddr + phdr->p_memsz) {
        return si;
      }
    }
  }
  return nullptr;
}

然后调用 find_library加载so文件,转入find_libraries函数。

soinfo* si = find_library(ns, translated_name, flags, extinfo, caller);

static soinfo* find_library(android_namespace_t* ns,
                            const char* name, int rtld_flags,
                            const android_dlextinfo* extinfo,
                            soinfo* needed_by) {
  soinfo* si = nullptr;

  if (name == nullptr) {
    si = solist_get_somain();
    
    //加载so
  } else if (!find_libraries(ns,
                             needed_by,
                             &name,
                             1,
                             &si,
                             nullptr,
                             0,
                             rtld_flags,
                             extinfo,
                             false /* add_as_children */,
                             true /* search_linked_namespaces */)) {
    if (si != nullptr) {
      soinfo_unload(si);
    }
    return nullptr;
  }

  // so引用+1
  si->increment_ref_count();

  return si;
}

find_libraries函数,做了如下几件事:

3.1 生成LoadTask对象

为每个so生成了一个LoadTask对象,由于此处只加载一个so,所以只生成一个LoadTask对象,该对象是对soinfo结构体操作的封装。

  for (size_t i = 0; i < library_names_count; ++i) {
    const char* name = library_names[i];
    load_tasks.push_back(LoadTask::create(name, start_with, ns, &readers_map));
  }

3.2 so读取和加载

继续调用find_library_internal加载so

for (size_t i = 0; i<load_tasks.size(); ++i) {
  LoadTask* task = load_tasks[i];
  soinfo* needed_by = task->get_needed_by();
  ...
    ...
  if (!find_library_internal(const_cast<android_namespace_t*>(task->get_start_from()),
                             task,
                             &zip_archive_cache,
                             &load_tasks,
                             rtld_flags,
                             search_linked_namespaces || is_dt_needed)) {
    return false;
  }
}

3.2.1 find_library_internal

首先判断是否已经将so加载到内存,如果没有,则调用load_library函数

static bool find_library_internal(android_namespace_t* ns,
                                  LoadTask* task,
                                  ZipArchiveCache* zip_archive_cache,
                                  LoadTaskList* load_tasks,
                                  int rtld_flags,
                                  bool search_linked_namespaces) {
  soinfo* candidate;

  // 首先判断是否已经加载过当前指定的so,如果是,就放入task中,并返回,
  // 否则转入load_library加载。
  if (find_loaded_library_by_soname(ns, task->get_name(), search_linked_namespaces, &candidate)) {
    LD_LOG(kLogDlopen,
           "find_library_internal(ns=%s, task=%s): Already loaded (by soname): %s",
           ns->get_name(), task->get_name(), candidate->get_realpath());
    task->set_soinfo(candidate);
    return true;
  }

	... ...

  // 通过load_library加载so,so的名字、extinfo等等都位于task对象。
  if (load_library(ns, task, zip_archive_cache, load_tasks, rtld_flags, search_linked_namespaces)) {
    return true;
  }

3.2.2 load_library

load_library函数首先判断是否已经获得了so的文件描述符fd,如果没有,则采用open_library函数打开so,返回文件描述符,最后传入另一个load_library函数。

static bool load_library(android_namespace_t* ns,
                         LoadTask* task,
                         ZipArchiveCache* zip_archive_cache,
                         LoadTaskList* load_tasks,
                         int rtld_flags,
                         bool search_linked_namespaces) {
  // 获取so名字、extinfo
  const char* name = task->get_name();
  soinfo* needed_by = task->get_needed_by();
  const android_dlextinfo* extinfo = task->get_extinfo();
  
  // 判断so是否已经打开,如果打开了,直接传入重载的load_library函数。
  if (extinfo != nullptr && (extinfo->flags & ANDROID_DLEXT_USE_LIBRARY_FD) != 0) {
    file_offset = 0;
    if ((extinfo->flags & ANDROID_DLEXT_USE_LIBRARY_FD_OFFSET) != 0) {
      file_offset = extinfo->library_fd_offset;
    }

		... ...

    task->set_fd(extinfo->library_fd, false);
    task->set_file_offset(file_offset);
    return load_library(ns, task, load_tasks, rtld_flags, realpath, search_linked_namespaces);
  }

  // 否则,使用open_library函数打开so,再传入load_library函数
  int fd = open_library(ns, zip_archive_cache, name, needed_by, &file_offset, &realpath);
  if (fd == -1) {
    DL_ERR("library \"%s\" not found", name);
    return false;
  }

  task->set_fd(fd, true);
  task->set_file_offset(file_offset);

  return load_library(ns, task, load_tasks, rtld_flags, realpath, search_linked_namespaces);
}

首先调用soinfo_alloc()函数申请soinfo结构体,然后通过task->read()函数,填充结构体。

  soinfo* si = soinfo_alloc(ns, realpath.c_str(), &file_stat, file_offset, rtld_flags);
  if (si == nullptr) {
    return false;
  }

  task->set_soinfo(si);

  // Read the ELF header and some of the segments.
  if (!task->read(realpath.c_str(), file_stat.st_size)) {
    soinfo_free(si);
    task->set_soinfo(nullptr);
    return false;
  }

3.2.3 ElfReader.Read

该函数调用了5个函数用于读取和验证so文件。ReadElfHeader用于读取ELF header,VerifyElfHeader用于:

  1. ELFMAG验证ELF魔术头
  2. e_ident[EI_CLASS]判断64/32位
  3. e_ident[EI_DATA]是否为小端存储
  4. e_type是否是DYN(共享对象)
  5. e_version是否是EV_CURRENT
  6. e_machine,CPU是否是arm、aarch64、mips、x86_64中一种
  7. e_shentsize是否正确(sizeof(ElfW(Shdr)
  8. e_shstrndx不能为0.
  bool read(const char* realpath, off64_t file_size) {
    ElfReader& elf_reader = get_elf_reader();
    return elf_reader.Read(realpath, fd_, file_offset_, file_size);
  }
  
...
bool ElfReader::Read(const char* name, int fd, off64_t file_offset, off64_t file_size) {
  if (did_read_) {
    return true;
  }
  name_ = name;
  fd_ = fd;
  file_offset_ = file_offset;
  file_size_ = file_size;

  if (ReadElfHeader() &&
      VerifyElfHeader() &&
      ReadProgramHeaders() &&
      ReadSectionHeaders() &&
      ReadDynamicSection()) {
    did_read_ = true;
  }

  return did_read_;
}

bool ElfReader::VerifyElfHeader() {
  if (memcmp(header_.e_ident, ELFMAG, SELFMAG) != 0) {
    DL_ERR("\"%s\" has bad ELF magic: %02x%02x%02x%02x", name_.c_str(),
           header_.e_ident[0], header_.e_ident[1], header_.e_ident[2], header_.e_ident[3]);
    return false;
  }

  // Try to give a clear diagnostic for ELF class mismatches, since they're
  // an easy mistake to make during the 32-bit/64-bit transition period.
  int elf_class = header_.e_ident[EI_CLASS];
#if defined(__LP64__)
  if (elf_class != ELFCLASS64) {
    if (elf_class == ELFCLASS32) {
      DL_ERR("\"%s\" is 32-bit instead of 64-bit", name_.c_str());
    } else {
      DL_ERR("\"%s\" has unknown ELF class: %d", name_.c_str(), elf_class);
    }
    return false;
  }
#else
  if (elf_class != ELFCLASS32) {
    if (elf_class == ELFCLASS64) {
      DL_ERR("\"%s\" is 64-bit instead of 32-bit", name_.c_str());
    } else {
      DL_ERR("\"%s\" has unknown ELF class: %d", name_.c_str(), elf_class);
    }
    return false;
  }
#endif

  if (header_.e_ident[EI_DATA] != ELFDATA2LSB) {
    DL_ERR("\"%s\" not little-endian: %d", name_.c_str(), header_.e_ident[EI_DATA]);
    return false;
  }

  if (header_.e_type != ET_DYN) {
    DL_ERR("\"%s\" has unexpected e_type: %d", name_.c_str(), header_.e_type);
    return false;
  }

  if (header_.e_version != EV_CURRENT) {
    DL_ERR("\"%s\" has unexpected e_version: %d", name_.c_str(), header_.e_version);
    return false;
  }

  if (header_.e_machine != GetTargetElfMachine()) {
    DL_ERR("\"%s\" is for %s (%d) instead of %s (%d)",
           name_.c_str(),
           EM_to_string(header_.e_machine), header_.e_machine,
           EM_to_string(GetTargetElfMachine()), GetTargetElfMachine());
    return false;
  }

  if (header_.e_shentsize != sizeof(ElfW(Shdr))) {
    // Fail if app is targeting Android O or above
    if (get_application_target_sdk_version() >= __ANDROID_API_O__) {
      DL_ERR_AND_LOG("\"%s\" has unsupported e_shentsize: 0x%x (expected 0x%zx)",
                     name_.c_str(), header_.e_shentsize, sizeof(ElfW(Shdr)));
      return false;
    }
    DL_WARN_documented_change(__ANDROID_API_O__,
                              "invalid-elf-header_section-headers-enforced-for-api-level-26",
                              "\"%s\" has unsupported e_shentsize 0x%x (expected 0x%zx)",
                              name_.c_str(), header_.e_shentsize, sizeof(ElfW(Shdr)));
    add_dlwarning(name_.c_str(), "has invalid ELF header");
  }

  if (header_.e_shstrndx == 0) {
    // Fail if app is targeting Android O or above
    if (get_application_target_sdk_version() >= __ANDROID_API_O__) {
      DL_ERR_AND_LOG("\"%s\" has invalid e_shstrndx", name_.c_str());
      return false;
    }

    DL_WARN_documented_change(__ANDROID_API_O__,
                              "invalid-elf-header_section-headers-enforced-for-api-level-26",
                              "\"%s\" has invalid e_shstrndx", name_.c_str());
    add_dlwarning(name_.c_str(), "has invalid ELF header");
  }

  return true;
}
3.2.3_1 ReadProgramHeaders

由于程序头表每个条目的大小是固定的,所以可以通过如下方式获取程序头表的大小:

$size = phdr_num_ * sizeof(ElfW(Phdr))$

得到phdr的大小后,再通过phdr_fragment_.Map()函数进行映射,最终映射调用的是mmap。

此时并没有将文件复制到内存,只是生成了相关的结构体。[2]

此处的file_offset_参数指文件入口的偏移量(e_entry),可以忽略,重要的是e_phoff和size。

bool ElfReader::ReadProgramHeaders() {
  phdr_num_ = header_.e_phnum;
  size_t size = phdr_num_ * sizeof(ElfW(Phdr));
  ...
  if (!phdr_fragment_.Map(fd_, file_offset_, header_.e_phoff, size)) {
    DL_ERR("\"%s\" phdr mmap failed: %s", name_.c_str(), strerror(errno));
    return false;
  }
	...
  phdr_table_ = static_cast<ElfW(Phdr)*>(phdr_fragment_.data());
  return true;
}

阅读MappedFileFragment::Map函数,首先看page_start函数,本质上就是将低12位置0,获取页的起始位置。(此处,选择32位的进行说明,代码是64位,但是同理)

$offset_addr\ &\ 0xfffff000$


#define PAGE_SIZE 4096
constexpr off64_t kPageMask = ~static_cast<off64_t>(PAGE_SIZE-1); 

off64_t page_start(off64_t offset) {
  return offset & kPageMask; //0xfffff000 32位
}

相对应的page_offset函数,实际上是将高于12位置0,获取在该页的偏移量,以下代码等价于以下公式。

$offset_addr\ & \ 0x0fff$

size_t page_offset(off64_t offset) {
  return static_cast<size_t>(offset & (PAGE_SIZE-1));
}

mmap内存映射以页为单位(android 10 定义为4096),即不足一页的需要用0填充;其次,程序头表的文件大小必须小于等于内存的映射大小;返回程序头表所在页的虚拟地址。

同时,需要理清楚的是除了mmap返回真正的虚拟地址,其他的关于addr的全是基于基址(load_bias)的偏移,假设:

phdr_vaddr=0x00004EA9

phdr_page=0x00004000

phdr_in_page_offset=0x00000EA9

map_start=0xa0004000

程序头表的虚拟地址则存放在data_变量。

bool MappedFileFragment::Map(int fd, off64_t base_offset, size_t elf_offset, size_t size) {
  off64_t offset;
  CHECK(safe_add(&offset, base_offset, elf_offset)); // 此时,offset就是程序头表起始地址偏移量(相对于fd)
	
  off64_t page_min = page_start(offset); // 程序头表所在页的地址偏移量
  off64_t end_offset;

  CHECK(safe_add(&end_offset, offset, size)); // end_offset此时等于程序头表的结束地址偏移量(相对于fd)
  CHECK(safe_add(&end_offset, end_offset, page_offset(offset))); // 确保内存映射的区域容量大于文件中程序头表的大小

  size_t map_size = static_cast<size_t>(end_offset - page_min); // 文件映射大小
  CHECK(map_size >= size);

  uint8_t* map_start = static_cast<uint8_t*>(
                          mmap64(nullptr, map_size, PROT_READ, MAP_PRIVATE, fd, page_min)); //返回phdr所在页的虚拟地址 

  if (map_start == MAP_FAILED) {
    return false;
  }

  map_start_ = map_start;
  map_size_ = map_size;

  data_ = map_start + page_offset(offset); //程序头表的虚拟地址
  size_ = size;

  return true;
}
3.2.3_2 ReadSectionHeaders

同3.2.3_1,映射Section Header。

bool ElfReader::ReadSectionHeaders() {
  shdr_num_ = header_.e_shnum;

  if (shdr_num_ == 0) {
    DL_ERR_AND_LOG("\"%s\" has no section headers", name_.c_str());
    return false;
  }

  size_t size = shdr_num_ * sizeof(ElfW(Shdr));
  if (!CheckFileRange(header_.e_shoff, size, alignof(const ElfW(Shdr)))) {
    DL_ERR_AND_LOG("\"%s\" has invalid shdr offset/size: %zu/%zu",
                   name_.c_str(),
                   static_cast<size_t>(header_.e_shoff),
                   size);
    return false;
  }

  if (!shdr_fragment_.Map(fd_, file_offset_, header_.e_shoff, size)) {
    DL_ERR("\"%s\" shdr mmap failed: %s", name_.c_str(), strerror(errno));
    return false;
  }

  shdr_table_ = static_cast<const ElfW(Shdr)*>(shdr_fragment_.data());
  return true;
}
3.2.3_3 ReadDynamicSection
  1. 首先通过Section Header Table中sh_type为SHT_DYNAMIC的条目,找到动态节区头。
bool ElfReader::ReadDynamicSection() {
  // 通过Section Header中sh_type为SHT_DYNAMIC的条目,找到dynamic_shdr
  const ElfW(Shdr)* dynamic_shdr = nullptr;
  for (size_t i = 0; i < shdr_num_; ++i) {
    if (shdr_table_[i].sh_type == SHT_DYNAMIC) {
      dynamic_shdr = &shdr_table_ [i];
      break;
    }
  }
  1. 然后再通过Program Header Table找到p_type为PT_DYNAMIC的条目,与第一步找到的动态节区头,对比动态节区头包含的偏移和大小是否一致。
  size_t pt_dynamic_offset = 0;
  size_t pt_dynamic_filesz = 0;
  for (size_t i = 0; i < phdr_num_; ++i) {
    const ElfW(Phdr)* phdr = &phdr_table_[i];
    if (phdr->p_type == PT_DYNAMIC) {
      pt_dynamic_offset = phdr->p_offset;
      pt_dynamic_filesz = phdr->p_filesz;
    }
  }
 if (pt_dynamic_offset != dynamic_shdr->sh_offset) {
   ....
 if (pt_dynamic_filesz != dynamic_shdr->sh_size) {
  1. 当偏移和大小在两个表头中都一致,就会获取.dynstr头表(动态字符串表头),dynamic_shdr->sh_link永远表示.dynstr在节区头表的索引。
  const ElfW(Shdr)* strtab_shdr = &shdr_table_[dynamic_shdr->sh_link];

  if (strtab_shdr->sh_type != SHT_STRTAB) {
    DL_ERR_AND_LOG("\"%s\" .dynamic section has invalid link(%d) sh_type: %d (expected SHT_STRTAB)",
                   name_.c_str(), dynamic_shdr->sh_link, strtab_shdr->sh_type);
    return false;
  }
  1. 映射动态节区(.dynamic),将动态节区在内存中的虚拟地址赋值给dynamic_
  if (!CheckFileRange(dynamic_shdr->sh_offset, dynamic_shdr->sh_size, alignof(const ElfW(Dyn)))) {
    DL_ERR_AND_LOG("\"%s\" has invalid offset/size of .dynamic section", name_.c_str());
    return false;
  }

  if (!dynamic_fragment_.Map(fd_, file_offset_, dynamic_shdr->sh_offset, dynamic_shdr->sh_size)) {
    DL_ERR("\"%s\" dynamic section mmap failed: %s", name_.c_str(), strerror(errno));
    return false;
  }

  dynamic_ = static_cast<const ElfW(Dyn)*>(dynamic_fragment_.data());
  1. 映射动态字符串节区(.dynstr),将动态字符串节区在内存中的虚拟地址赋值给strtab_
  if (!CheckFileRange(strtab_shdr->sh_offset, strtab_shdr->sh_size, alignof(const char))) {
    DL_ERR_AND_LOG("\"%s\" has invalid offset/size of the .strtab section linked from .dynamic section",
                   name_.c_str());
    return false;
  }

  if (!strtab_fragment_.Map(fd_, file_offset_, strtab_shdr->sh_offset, strtab_shdr->sh_size)) {
    DL_ERR("\"%s\" strtab section mmap failed: %s", name_.c_str(), strerror(errno));
    return false;
  }

  strtab_ = static_cast<const char*>(strtab_fragment_.data());
  strtab_size_ = strtab_fragment_.size();

最后返回到调用task->read()位置。

3.2.4 添加依赖tasks

将动态节区中所有d_tag为DT_NEEDED的条目放入tasks,用于后续加载依赖,最后返回到find_libraries函数。

//承接3.2.2 load_library  
for_each_dt_needed(task->get_elf_reader(), [&](const char* name) {
    LD_LOG(kLogDlopen, "load_library(ns=%s, task=%s): Adding DT_NEEDED task: %s",
           ns->get_name(), task->get_name(), name);
    load_tasks->push_back(LoadTask::create(name, si, ns, task->get_readers_map()));
  });


//
template<typename F>
static void for_each_dt_needed(const ElfReader& elf_reader, F action) {
  for (const ElfW(Dyn)* d = elf_reader.dynamic(); d->d_tag != DT_NULL; ++d) {
    if (d->d_tag == DT_NEEDED) {
      action(fix_dt_needed(elf_reader.get_string(d->d_un.d_val), elf_reader.name()));
    }
  }
}

3.2.5 ElfReader::Load

find_libraries函数中通过ElfReader::Load将so的PT_LOAD段映射到内存,Load按顺序调用了三个函数:

  1. ReserveAddressSpace(address_space)
  2. LoadSegments()
  3. FindPhdr()
  for (auto&& task : load_list) {
    address_space_params* address_space =
        (reserved_address_recursive || !task->is_dt_needed()) ? &extinfo_params : &default_params;
    if (!task->load(address_space)) {
      return false;
    }
  }

// Load函数
bool ElfReader::Load(address_space_params* address_space) {
  CHECK(did_read_);
  if (did_load_) {
    return true;
  }
  if (ReserveAddressSpace(address_space) && LoadSegments() && FindPhdr()) {
    did_load_ = true;
  }

  return did_load_;
}
3.2.5_1 ReserveAddressSpace

该方法用mmap函数为可加载段预申请空间。

该空间的flag设置为MAP_PRIVATE | MAP_ANONYMOUS,表示这是一个匿名私有映射,匿名表示该空间不与任何文件描述符fd绑定,私有表示该空间是一个"私人复制",对该空间的操作不会影响到原始文件。

该空间的属性设置为PROT_NONE,表示该空间此时既不能读取也不能存储。

bool ElfReader::ReserveAddressSpace(address_space_params* address_space) {
  ElfW(Addr) min_vaddr;
  // 返回p_type为PT_LOAD的所有段加起来所占page的大小,min_vaddr为第一个可加载段page的偏移量
  load_size_ = phdr_table_get_load_size(phdr_table_, phdr_num_, &min_vaddr);
	...

  uint8_t* addr = reinterpret_cast<uint8_t*>(min_vaddr);
  void* start;
  
  //
	...
  start = ReserveAligned(load_size_, kLibraryAlignment);
	...
    
  load_start_ = start;
  load_bias_ = reinterpret_cast<uint8_t*>(start) - addr;
  return true;
}
static void* ReserveAligned(size_t size, size_t align) {
  int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS;
  if (align == PAGE_SIZE) {
    void* mmap_ptr = mmap(nullptr, size, PROT_NONE, mmap_flags, -1, 0);
    if (mmap_ptr == MAP_FAILED) {
      return nullptr;
    }
    return mmap_ptr;
  }
3.2.5_2 LoadSegments

ReserveAddressSpace进行了空间预留的操作,那么LoadSegments就用于在预留空间依次加载p_type为PT_LOAD的段。

bool ElfReader::LoadSegments() {
  for (size_t i = 0; i < phdr_num_; ++i) {
    const ElfW(Phdr)* phdr = &phdr_table_[i];

    if (phdr->p_type != PT_LOAD) {
      continue;
    }

    // Segment addresses in memory.
    // ph头表的内存地址:start与end
    ElfW(Addr) seg_start = phdr->p_vaddr + load_bias_;
    ElfW(Addr) seg_end   = seg_start + phdr->p_memsz;

    ElfW(Addr) seg_page_start = PAGE_START(seg_start);
    ElfW(Addr) seg_page_end   = PAGE_END(seg_end);

    ElfW(Addr) seg_file_end   = seg_start + phdr->p_filesz;

    // File offsets.
    // ph头表的文件地址:start与end
    ElfW(Addr) file_start = phdr->p_offset;
    ElfW(Addr) file_end   = file_start + phdr->p_filesz;

    ElfW(Addr) file_page_start = PAGE_START(file_start);
    ElfW(Addr) file_length = file_end - file_page_start;

 ... ...

      void* seg_addr = mmap64(reinterpret_cast<void*>(seg_page_start),
                            file_length,
                            prot,
                            MAP_FIXED|MAP_PRIVATE,
                            fd_,
                            file_offset_ + file_page_start);
      if (seg_addr == MAP_FAILED) {
        DL_ERR("couldn't map \"%s\" segment %zd: %s", name_.c_str(), i, strerror(errno));
        return false;
      }
    }

    // if the segment is writable, and does not end on a page boundary,
    // zero-fill it until the page limit.
    if ((phdr->p_flags & PF_W) != 0 && PAGE_OFFSET(seg_file_end) > 0) {
      memset(reinterpret_cast<void*>(seg_file_end), 0, PAGE_SIZE - PAGE_OFFSET(seg_file_end));
    }

    seg_file_end = PAGE_END(seg_file_end);

    // seg_file_end is now the first page address after the file
    // content. If seg_end is larger, we need to zero anything
    // between them. This is done by using a private anonymous
    // map for all extra pages.
    if (seg_page_end > seg_file_end) {
      size_t zeromap_size = seg_page_end - seg_file_end;
      void* zeromap = mmap(reinterpret_cast<void*>(seg_file_end),
                           zeromap_size,
                           PFLAGS_TO_PROT(phdr->p_flags),
                           MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE,
                           -1,
                           0);
      if (zeromap == MAP_FAILED) {
        DL_ERR("couldn't zero fill \"%s\" gap: %s", name_.c_str(), strerror(errno));
        return false;
      }

      prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, zeromap, zeromap_size, ".bss");
    }
  }
  return true;
}
3.2.5_3 FindPhdr

该方法用于赋值loaded_phdr_,并用CheckPhdr函数检查整个程序头的映射地址是小于等于PT_LOAD段的范围。

bool ElfReader::FindPhdr() {
  const ElfW(Phdr)* phdr_limit = phdr_table_ + phdr_num_;

  // If there is a PT_PHDR, use it directly.
  for (const ElfW(Phdr)* phdr = phdr_table_; phdr < phdr_limit; ++phdr) {
    if (phdr->p_type == PT_PHDR) {
      return CheckPhdr(load_bias_ + phdr->p_vaddr);
    }
  }

  // Otherwise, check the first loadable segment. If its file offset
  // is 0, it starts with the ELF header, and we can trivially find the
  // loaded program header from it.
  for (const ElfW(Phdr)* phdr = phdr_table_; phdr < phdr_limit; ++phdr) {
    if (phdr->p_type == PT_LOAD) {
      if (phdr->p_offset == 0) {
        ElfW(Addr)  elf_addr = load_bias_ + phdr->p_vaddr;
        const ElfW(Ehdr)* ehdr = reinterpret_cast<const ElfW(Ehdr)*>(elf_addr);
        ElfW(Addr)  offset = ehdr->e_phoff;
        return CheckPhdr(reinterpret_cast<ElfW(Addr)>(ehdr) + offset);
      }
      break;
    }
  }

  DL_ERR("can't find loaded phdr for \"%s\"", name_.c_str());
  return false;
}

最后,当ElfReader::Load()执行完成后,会对soinfo结构体进行赋值。

bool load(address_space_params* address_space) {
  ElfReader& elf_reader = get_elf_reader();
  if (!elf_reader.Load(address_space)) {
    return false;
  }

  si_->base = elf_reader.load_start();
  si_->size = elf_reader.load_size();
  si_->set_mapped_by_caller(elf_reader.is_mapped_by_caller());
  si_->load_bias = elf_reader.load_bias();
  si_->phnum = elf_reader.phdr_count();
  si_->phdr = elf_reader.loaded_phdr();

  return true;
}

至此整个so的读取和加载全部完成,在task对象里保留了关于该so的一些重要信息,如:phdr_table指针、shdr_table指针、动态节区指针、动态字符串节区指针。

在该过程中,做了如下几个动作:

  1. 读取ELF文件头
  2. 对ELF文件头的e_ident、e_type、e_machine、e_version、e_shentsize、e_shstrndx进行检查
  3. 读取程序头表
  4. 读取节区头表
  5. 读取动态节区,并通过程序头表动态节区偏移和节区头表动态节区的偏移进行验证,是否一致;读取动态字符串区
  6. 为PT_LOAD段申请空间
  7. 加载可PT_LOAD段
  8. 赋值loaded_phdr_

有了动态节区的信息后,下一步便是so的链接过程。

Reference


  1. https://github.com/qianbinbin/android-notes/blob/master/print-log-in-libcore.md ↩︎

  2. mmap 文件映射内存详解 ↩︎