自定义Linker实现分析之路

自定义Linker实现分析之路
2024-4-17 17:36:26 Author: mp.weixin.qq.com(查看原文) 阅读量:9 收藏

环境准备

◆手机：Pixel 6a

◆系统：Android 12

◆Linker源码 android12-d1-release

◆一个好的IDE对阅读源码来说事半功倍，这里推荐CLion

前言

Linker是Android系统中的一个重要组件，它负责将各个模块编译生成的so文件链接成一个整体，以及对so文件进行重定位等操作。Linker的源码位于bionic/linker目录下，通过对Linker源码的学习、分析，我们也可以实现一个自定义的Linker加载器。

Linker的加载过程

Linker的加载过程主要包括以下几个步骤：

1.读取程序头表，获取动态段信息

2.加载动态段中的so文件

3.进行重定位操作

4.初始化so文件中的全局变量

5.调用so文件中的初始化函数

6.结束

Linker源码核心函数详解

本文只讨论在dlopen函数下打开一个So的链接过程，Jni情况下的链接过程不在本文讨论范围内。

接下来使用dlopen函数来打开一个so文件，我们来看一下dlopen函数的实现。

static void* dlopen_ext(const char* filename,
                        int flags,
                        const android_dlextinfo* extinfo,
                        const void* caller_addr) {
    void* result = do_dlopen(filename, flags, extinfo, caller_addr);
    return result;
}

我们可以看到最终调用的是do_dlopen函数，do_dlopen即为链接器的起始入口，我们来看一下do_dlopen函数的实现，代码的位置在bionic/linker/linker.cpp文件中。

void* do_dlopen(const char* name, int flags,
                const android_dlextinfo* extinfo,
                const void* caller_addr) {
 ...
  soinfo* const caller = find_containing_library(caller_addr);
  android_namespace_t* ns = get_caller_namespace(caller);
 ...
 ...
  soinfo* si = find_library(ns, translated_name, flags, extinfo, caller);
  loading_trace.End();  if (si != nullptr) {
    void* handle = si->to_handle();
    LD_LOG(kLogDlopen,
           "... dlopen calling constructors: realpath=\"%s\", soname=\"%s\", handle=%p",
           si->get_realpath(), si->get_soname(), handle);
    si->call_constructors();
    failure_guard.Disable();
    LD_LOG(kLogDlopen,
           "... dlopen successful: realpath=\"%s\", soname=\"%s\", handle=%p",
           si->get_realpath(), si->get_soname(), handle);
    return handle;
  }
  return nullptr;
}

由于篇幅的原因，这里只展示了部分核心代码，do_dlopen函数的主要功能是通过find_library函数找到指定的so文件，然后调用so文件中的构造函数（即init_array），最后返回so文件的句柄。

这里有个 android_namespace_t* ns = get_caller_namespace(caller);
这个函数是获取调用者的namespace，namespace是Android 7.0引入的概念，用于解决so文件的命名冲突问题，这里不做详细讨论。

我们来看一下find_library函数的实现。

static soinfo* find_library(android_namespace_t* ns,
                            const char* name, int rtld_flags,
                            const android_dlextinfo* extinfo,
                            soinfo* needed_by) {
  soinfo* si = nullptr;
  if (name == nullptr) {
    si = solist_get_somain();
  } else if (!find_libraries(ns,
                             needed_by,
                             &name,
                             1,
                             &si,
                             nullptr,
                             0,
                             rtld_flags,
                             extinfo,
                             false /* add_as_children */)) {
    if (si != nullptr) {
      soinfo_unload(si);
    }
    return nullptr;
  }
  si->increment_ref_count();
  return si;
}
}

可以看到只是一个中转函数，最终调用的是find_libraries函数，find_libraries就是比较重要的部分了我们将详细的对其进行分析。

bool find_libraries(android_namespace_t *ns,
                    soinfo *start_with,
                    const char *const library_names[],
                    size_t library_names_count,
                    soinfo *soinfos[],
                    std::vector<soinfo *> *ld_preloads,
                    size_t ld_preloads_count,
                    int rtld_flags,
                    const android_dlextinfo *extinfo,
                    bool add_as_children,
                    std::vector<android_namespace_t *> *namespaces = nullptr) {
...
  // Step 0: prepare.
    // 第一步开始准备 加入一个tasks list里
    std::unordered_map<const soinfo *, ElfReader> readers_map;
    LoadTaskList load_tasks;
    for (size_t i = 0; i < library_names_count; ++i) {
        const char *name = library_names[i];
        LOGI("load task create %s ", name);
        load_tasks.push_back(LoadTask::create(name, start_with, ns, &readers_map));
    }
 }
 ...
    //这一步是动态链接过程中的一个重要环节，其目的是扩展要加载的库（load_tasks）列表，
    // 以包括所有通过DT_NEEDED条目指定的依赖库。DT_NEEDED条目是在ELF（Executable and Linkable Format）文件的动态段中指定的，
    // 表示当前库需要加载的其他库。这一步骤并不立即加载这些依赖库，而是准备加载任务
  for (size_t i = 0; i < load_tasks.size(); ++i) {
        LoadTask *task = load_tasks[i];
        soinfo *needed_by = task->get_needed_by();
        bool is_dt_needed = needed_by != nullptr && (needed_by != start_with || add_as_children);
        task->set_extinfo(is_dt_needed ? nullptr : extinfo);
        task->set_dt_needed(is_dt_needed);        LOGI("find_libraries(ns=%s): task=%s, is_dt_needed=%d", "null",
             task->get_name(), is_dt_needed);
        //
        // Note: start from the namespace that is stored in the LoadTask. This namespace
        // is different from the current namespace when the LoadTask is for a transitive
        // dependency and the lib that created the LoadTask is not found in the
        // current namespace but in one of the linked namespace.
        if (!find_library_internal(const_cast<android_namespace_t *>(task->get_start_from()),
                                   task,
                                   &zip_archive_cache,
                                   &load_tasks,
                                   rtld_flags)) {
            return false;
        }
  }

我们在for循环下面之后先不进行展开分析，我们先看一下find_library_internal函数的实现。

static bool find_library_internal(android_namespace_t *ns,
                                  LoadTask *task,
                                  ZipArchiveCache *zip_archive_cache,
                                  LoadTaskList *load_tasks,
                                  int rtld_flags) {
    soinfo *candidate;
    if (find_loaded_library_by_soname(ns, task->get_name(), true /* search_linked_namespaces */,
                                      &candidate)) {
        LOGI(
                "find_library_internal(ns=%s, task=%s): Already loaded (by soname): %s",
                ns->get_name(), task->get_name(), candidate->get_realpath());
        task->set_soinfo(candidate);
        return true;
    }
    //start  load_library
    if (load_library(ns, task, zip_archive_cache, load_tasks, rtld_flags,
                     true /* search_linked_namespaces */)) {
        return true;
    }
    return false;
}

可以看到代码非常清晰，先查找是否已经加载了so文件，如果没有则调用load_library函数进行加载，我们来看一下load_library函数的实现。

static bool load_library(android_namespace_t *ns,
                         LoadTask *task,
                         ZipArchiveCache *zip_archive_cache,
                         LoadTaskList *load_tasks,
                         int rtld_flags,
                         bool search_linked_namespaces) {
    const char *name = task->get_name();
    soinfo *needed_by = task->get_needed_by();
    //use dlopen extinfo is null; so  we dont need extinfo code
    LOGI(
            "load_library(ns=%s, task=%s, flags=0x%x, search_linked_namespaces=%d): calling "
            "open_library",
            ns->get_name(), name, rtld_flags, search_linked_namespaces);
    // Open the file.
    off64_t file_offset;
    std::string realpath;
    int fd = open_library(ns, zip_archive_cache, name, needed_by, &file_offset, &realpath);
    if (fd == -1) {
        LOGE("library \"%s\" not found", name);
        return false;
    }
    task->set_fd(fd, true);
    task->set_file_offset(file_offset);
    return load_library(ns, task, load_tasks, rtld_flags, realpath, search_linked_namespaces);
}

先是调用open_library函数打开so文件，然后调用load_library函数进行加载，open_library 的实现就不具体展开了，由于在Android系统中有权限等机制限制，并且在Jni下的链接路径限制，看起来挺复杂，但实际就是一个open()，我们看下load_library函数的实现。

static bool load_library(android_namespace_t *ns,
                         LoadTask *task,
                         LoadTaskList *load_tasks,
                         int rtld_flags,
                         const std::string &realpath,
                         bool search_linked_namespaces) {
    ...
    // we dont need accessible file so ignore this code
    //  if ((fs_stat.f_type != TMPFS_MAGIC) && (!ns->is_accessible(realpath))) {
    soinfo *si = soinfo_alloc(ns, realpath.c_str(), &file_stat, file_offset, rtld_flags);
    task->set_soinfo(si);
    // Read the ELF header and some of the segments.
    if (!task->read(realpath.c_str(), file_stat.st_size)) {
        task->remove_cached_elf_reader();
        task->set_soinfo(nullptr);
        soinfo_free(si);
        return false;
    }                
    ...
    for_each_dt_needed(task->get_elf_reader(), [&](const char* name) {
    LD_LOG(kLogDlopen, "load_library(ns=%s, task=%s): Adding DT_NEEDED task: %s",
           ns->get_name(), task->get_name(), name);
    load_tasks->push_back(LoadTask::create(name, si, ns, task->get_readers_map()));
  });       }

load_library函数的实现比较复杂，我们只分析核心部分
task->read(realpath.c_str(), file_stat.st_size)

该函数的作用是读取ELF的文件信息，代码的位置在bionic/linker/linker_phdr.cpp文件中。

bool ElfReader::Read(const char *name, int fd, off64_t file_offset, off64_t file_size) {
    if (did_read_) {
        return true;
    }
    name_ = name;
    fd_ = fd;
    file_offset_ = file_offset;
    file_size_ = file_size;
    if (ReadElfHeader() &&
        VerifyElfHeader() &&
        ReadProgramHeaders() &&
        ReadSectionHeaders() &&
        ReadDynamicSection()) {
        did_read_ = true;
    }
    return did_read_;
}

该函数的作用是读取ELF的文件信息，包括ELF头、程序头、段头、动态段等信息，这些信息是后续链接过程的基础，该部分的实现还是比较简单，主要是对ELF文件格式的解析。

我们需要看一下另外一个实现for_each_dt_needed(task->get_elf_reader(), [&](const char* name) 这个对我们自定义linker来说实现上是有区别的，
在linker源码中，这个函数的作用是获取so文件中的DT_NEEDED字段，然后将DT_NEEDED字段中的so文件加入到加载列表中，由于简化流程考虑，这里我的实现直接用dlopen打开系统库。（不包含链接第三方库so的情况）

   for_each_dt_needed(task->get_elf_reader(), [&](const char *name) {
        // our so  need load so
        //we use dlopen
        void *pVoid = dlopen(name, RTLD_NOW);
    });

接下来让我们回到find_libraries看下find_library_internal之后的处理。

        soinfo *si = task->get_soinfo();
        LOGI(" after find_library_internal call is_dt_needed %s %i", task->get_name(),
             is_dt_needed);
        if (is_dt_needed) {
            needed_by->add_child(si);
        }
        ...
        address_space_params default_params;
        size_t relro_fd_offset = 0;
        for (auto &&task: load_list) {
            address_space_params *address_space = &default_params;
            if (!task->load(address_space)) {
                return false;
            }
        }

可以看到上面的实现和linker源码里是有区别的，这里同样简化了代码，由于我们是直接通过dlopen打开 extinfo_params参数是不需要的,这里有一个核心的实现 if (!task->load(address_space))。

 bool load(address_space_params *address_space) {
        ElfReader &elf_reader = get_elf_reader();        if (!elf_reader.Load(address_space)) {
            return false;
        }
        si_->base = elf_reader.load_start();
        si_->size = elf_reader.load_size();
        si_->set_mapped_by_caller(elf_reader.is_mapped_by_caller());
        si_->load_bias = elf_reader.load_bias();
        si_->phnum = elf_reader.phdr_count();
        si_->phdr = elf_reader.loaded_phdr();
        si_->set_gap_start(elf_reader.gap_start());
        si_->set_gap_size(elf_reader.gap_size());
        return true;
    }

继续看一下elf_reader.Load(address_space)的实现。

bool ElfReader::Load(address_space_params *address_space) {
    if (did_load_) {
        return true;
    }
    if (ReserveAddressSpace(address_space) && LoadSegments()
        && FindPhdr() && FindGnuPropertySection()) {
        did_load_ = true;
#if defined(__aarch64__)
        // For Armv8.5-A loaded executable segments may require PROT_BTI.
        LOGI("isBTICompatible = %d", note_gnu_property_.IsBTICompatible());
        if (note_gnu_property_.IsBTICompatible()) {
            did_load_ = (phdr_table_protect_segments(phdr_table_, phdr_num_, load_bias_,
                                                     ¬e_gnu_property_) == 0);
        }
#endif
    }
    return did_load_;
}

ReserveAddressSpace()函数是动态链接器中用于为将要加载的ELF文件预留足够的虚拟地址空间的过程。它根据ELF文件中的程序头表信息，计算所有可加载段（loadable segments）所需的地址空间大小，
并尝试在进程的地址空间中预留这块区域。这一步是将ELF文件从磁盘映射到内存中的前置工作。

接下来的是LoadSegments()。

  ...  
  void* seg_addr = mmap64(reinterpret_cast<void*>(seg_page_start),
                            file_length,
                            prot,
                            MAP_FIXED|MAP_PRIVATE,
                            fd_,
                            file_offset_ + file_page_start);
  if (seg_addr == MAP_FAILED) {
        DL_ERR("couldn't map \"%s\" segment %zd: %s", name_.c_str(), i, strerror(errno));
        return false;
  }

该函数的作用是将ELF文件中的可加载段（loadable segments）映射到进程的虚拟地址空间中，
这里的实现可以拓展下，我们可知mmap如果传入文件描述符，那么最终文件的路径会在/proc/self/maps中显示，这里其实可以通过另外的方式实现，从而完成so文件的隐藏。

接下来是FindPhdr()函数。

const ElfW(Phdr) *phdr_limit = phdr_table_ + phdr_num_;
    // If there is a PT_PHDR, use it directly.
    // If there is a PT_PHDR, use it directly.
    for (const ElfW(Phdr) *phdr = phdr_table_; phdr < phdr_limit; ++phdr) {
        if (phdr->p_type == PT_PHDR) {
            return CheckPhdr(load_bias_ + phdr->p_vaddr);
        }
    }
    // Otherwise, check the first loadable segment. If its file offset
    // is 0, it starts with the ELF header, and we can trivially find the
    // loaded program header from it.
    for (const ElfW(Phdr) *phdr = phdr_table_; phdr < phdr_limit; ++phdr) {
        if (phdr->p_type == PT_LOAD) {
            if (phdr->p_offset == 0) {
                ElfW(Addr) elf_addr = load_bias_ + phdr->p_vaddr;
                const ElfW(Ehdr) *ehdr = reinterpret_cast<const ElfW(Ehdr) *>(elf_addr);
                ElfW(Addr) offset = ehdr->e_phoff;
                return CheckPhdr(reinterpret_cast<ElfW(Addr)>(ehdr) + offset);
            }
            break;
        }
    }

这段代码的目的是在动态链接过程中，为已加载的ELF文件定位程序头表（Program Header Table, PHDR）
的内存地址。这一步骤对于后续的库重定位和初始化至关重要。

在ELF文件格式中，程序头表描述了文件的段（比如代码段、数据段等）如何映射到进程的虚拟地址空间中。

不同于phdr_table_，这是一个临时的、在链接器内部使用的拷贝，loaded_phdr_指向的是映射到内存中、即将被实际使用的程序头表的地址。

以上的话linker的加载so在内存里的申请就结束了，接下来就是重定位操作，初始化全局变量，调用构造函数等操作。由于篇幅有限，将在后续继续开始分析。

先附上源码链接
ImyLinker（https://github.com/IIIImmmyyy/ImyLinker）

看雪ID：IIImmmyyy

https://bbs.kanxue.com/user-home-810816.htm

*本文为看雪论坛优秀文章，由 IIImmmyyy 原创，转载请注明来自看雪社区

# 往期推荐

1、逆向分析VT加持的无畏契约纯内核挂

2、阿里云CTF2024-暴力ENOTYOURWORLD题解

3、Hypervisor From Scratch - 基本概念和配置测试环境、进入 VMX 操作

4、V8漏洞利用之对象伪造漏洞利用模板

5、套娃之arm架构下的MacBook通过parallels+rosetta安装Linux amd64版本的IDA Pro

球分享

球点赞

球在看

点击阅读原文查看更多

文章来源: https://mp.weixin.qq.com/s?__biz=MjM5NTc2MDYxMw==&mid=2458550539&idx=2&sn=e3a883e6de9929783e4920b1ae75802d&chksm=b18db18186fa38971cf9a67439421e62a1c3e1dbeb2cdc974c70ab52186fe92738ed759cf003&scene=58&subscene=0#rd
如有侵权请联系:admin#unsafe.sh