今天(几个月前)有人问了这个问题,需求还有些奇葩,想在fork出来的子进程中去运行一部分动态下发的代码,而且不知道动态下发的代码的具体内容,有可能导致崩溃,所以想在子进程中执行。
这里不从系统源码和安全上分析,就从写出实现代码,执行,根据异常信息去分析。
我们写过双进程反调试,知道fork应用进程去执行是没有问题的,jni调用生成一个字符串之类的也没问题。但是一般的反调试代码都是一个死循环,阻塞了子进程,并没有继续去往下执行。而如果你放开循环,比如以下代码:
jstring myFork(JNIEnv* env, jobject obj){ pid_t pid = fork(); const char *tmp; if (pid) { tmp = "father"; XLOGE("father pid=%d", getpid()); } else { XLOGE("chlild pid=%d", getpid()); tmp = "chlild"; } return env->NewStringUTF(tmp); }
定义了一个jni函数,fork自身并返回一个字符串,通过点击事件触发调用。
bt.setOnClickListener(new View.OnClickListener() { @Override public void onClick(View v) { String fork = fork(); Log.e("zhuo", fork); } });
执行后,通过日志发现两个进程都执行了,子进程也返回了字符串打印了,但是接下来就异常了。
08-08 15:54:45.518 27638-27638/com.zhuotong.myunpack E/zhuo: father pid=27638 father 08-08 15:54:45.518 28273-28273/? E/zhuo: chlild pid=28273 chlild 08-08 15:54:45.518 28273-28273/? A/Looper: Thread identity changed from 0x276b00006bf6 to 0x276b00006e71 while dispatching to android.view.ViewRootImpl$ViewRootHandler android.view.View$UnsetPressedState@42702040 what=0 08-08 15:54:45.538 28273-28273/? A/libc: Fatal signal 11 (SIGSEGV) at 0x7507f028 (code=1), thread 28273 (com.tencent.mm)
这个异常可以通过分析函数调用逐步定位,或者分析过looper实现的应该看到looper的那条日志有印象:
http://androidxref.com/4.4_r1/xref/frameworks/base/core/java/android/os/Looper.java
public static void loop() { final Looper me = myLooper(); if (me == null) { throw new RuntimeException("No Looper; Looper.prepare() wasn't called on this thread."); } final MessageQueue queue = me.mQueue; // Make sure the identity of this thread is that of the local process, // and keep track of what that identity token actually is. Binder.clearCallingIdentity(); final long ident = Binder.clearCallingIdentity(); for (;;) { Message msg = queue.next(); // might block if (msg == null) { // No message indicates that the message queue is quitting. return; } // This must be in a local variable, in case a UI event sets the logger Printer logging = me.mLogging; if (logging != null) { logging.println(">>>>> Dispatching to " + msg.target + " " + msg.callback + ": " + msg.what); } msg.target.dispatchMessage(msg); if (logging != null) { logging.println("<<<<< Finished to " + msg.target + " " + msg.callback); } // Make sure that during the course of dispatching the // identity of the thread wasn't corrupted. final long newIdent = Binder.clearCallingIdentity(); if (ident != newIdent) { Log.wtf(TAG, "Thread identity changed from 0x" + Long.toHexString(ident) + " to 0x" + Long.toHexString(newIdent) + " while dispatching to " + msg.target.getClass().getName() + " " + msg.callback + " what=" + msg.what); } msg.recycle(); } }
因为ident != newIdent,所以触发异常,调用Log.wtf,http://androidxref.com/4.4_r1/xref/frameworks/base/core/java/android/util/Log.java#255
public static int wtf(String tag, String msg) { return wtf(LOG_ID_MAIN, tag, msg, null, false); } static int wtf(int logId, String tag, String msg, Throwable tr, boolean localStack) { TerribleFailure what = new TerribleFailure(msg, tr); int bytes = println_native(logId, ASSERT, tag, msg + '\n' + getStackTraceString(localStack ? what : tr)); sWtfHandler.onTerribleFailure(tag, what); return bytes; } private static TerribleFailureHandler sWtfHandler = new TerribleFailureHandler() { public void onTerribleFailure(String tag, TerribleFailure what) { RuntimeInit.wtf(tag, what); } }; public static void wtf(String tag, Throwable t) { try { if (ActivityManagerNative.getDefault().handleApplicationWtf( mApplicationObject, tag, new ApplicationErrorReport.CrashInfo(t))) { // The Activity Manager has already written us off -- now exit. Process.killProcess(Process.myPid()); System.exit(10); } } catch (Throwable t2) { Slog.e(TAG, "Error reporting WTF", t2); Slog.e(TAG, "Original WTF:", t); } }
这是这个流程,但是异常还不是在这里,但我们也能看出关系了,当远程调用调用binder并调用进程内部的接口的时候我们调用clearCallingIdentity清除并保存一个long类型的值,执行完再调用restoreCallingIdentity还原。http://androidxref.com/4.4_r1/xref/frameworks/native/libs/binder/IPCThreadState.cpp#375
int64_t IPCThreadState::clearCallingIdentity() { int64_t token = ((int64_t)mCallingUid<<32) | mCallingPid; clearCaller(); return token; } void IPCThreadState::clearCaller() { mCallingPid = getpid(); mCallingUid = getuid(); }
返回long类型其实就是uid和pid,父进程pid=27638=0x6BF6,子进程pid=28273=0x6E71,结合日志发现就是因为进程pid变了,触发这条日志。
08-08 15:54:45.518 27638-27638/com.zhuotong.myunpack E/zhuo: father pid=27638 father 08-08 15:54:45.518 28273-28273/? E/zhuo: chlild pid=28273 chlild 08-08 15:54:45.518 28273-28273/? A/Looper: Thread identity changed from 0x276b00006bf6 to 0x276b00006e71 while dispatching to android.view.ViewRootImpl$ViewRootHandler android.view.View$UnsetPressedState@42702040 what=0 08-08 15:54:45.538 28273-28273/? A/libc: Fatal signal 11 (SIGSEGV) at 0x7507f028 (code=1), thread 28273 (com.tencent.mm)
而真实产生异常的函数为readAligned, http://androidxref.com/4.4_r1/xref/frameworks/native/libs/binder/Parcel.cpp#911
template<class T> status_t Parcel::readAligned(T *pArg) const { COMPILE_TIME_ASSERT_FUNCTION_SCOPE(PAD_SIZE(sizeof(T)) == sizeof(T)); if ((mDataPos+sizeof(T)) <= mDataSize) { const void* data = mData+mDataPos; mDataPos += sizeof(T); *pArg = *reinterpret_cast<const T*>(data); return NO_ERROR; } else { return NOT_ENOUGH_DATA; } } template<class T> T Parcel::readAligned() const { T result; if (readAligned(&result) != NO_ERROR) { result = 0; } return result; }
执行到*pArg = *reinterpret_cast<const T*>(data);时出错,内存
。。。草稿箱发现未发表,但是后面的部分丢失了,懒得再追代码再写了,直接说结论,如果不阻塞肯定会崩溃的,如果阻塞,确认调用的代码不会触发到binder的执行,也可以运行。
补充:凭记忆大概是zygote fork完成后,新进程中执行onZygoteInit(),启动binder线程池。
virtual void onZygoteInit() { // Re-enable tracing now that we're no longer in Zygote. atrace_set_tracing_enabled(true); sp<ProcessState> proc = ProcessState::self(); ALOGV("App process: starting thread pool.\n"); proc->startThreadPool(); } ProcessState::ProcessState() : mDriverFD(open_driver()) , mVMStart(MAP_FAILED) , mManagesContexts(false) , mBinderContextCheckFunc(NULL) , mBinderContextUserData(NULL) , mThreadPoolStarted(false) , mThreadPoolSeq(1) { if (mDriverFD >= 0) { // XXX Ideally, there should be a specific define for whether we // have mmap (or whether we could possibly have the kernel module // availabla). #if !defined(HAVE_WIN32_IPC) // mmap the binder, providing a chunk of virtual address space to receive transactions. mVMStart = mmap(0, BINDER_VM_SIZE, PROT_READ, MAP_PRIVATE | MAP_NORESERVE, mDriverFD, 0); if (mVMStart == MAP_FAILED) { // *sigh* ALOGE("Using /dev/binder failed: unable to mmap transaction memory.\n"); close(mDriverFD); mDriverFD = -1; } #else mDriverFD = -1; #endif } LOG_ALWAYS_FATAL_IF(mDriverFD < 0, "Binder driver could not be opened. Terminating."); }
打开/dev/binder驱动设备,再利用mmap()映射内核的地址空间,将Binder驱动的fd赋值ProcessState对象的变量mDriverFD,mmap调用到内核中触发binder_mmap:kernel/drivers/android/binder.c
static int binder_mmap(struct file *filp, struct vm_area_struct *vma/*用户态虚拟地址空间描述,地址空间在0~3G*/) { int ret; /* 一块连续的内核虚拟地址空间描述,32位体系架构中地址空间在 3G+896M + 8M ~ 4G之间*/ struct vm_struct *area; struct binder_proc *proc = filp->private_data; const char *failure_string; struct binder_buffer *buffer; if (proc->tsk != current) return -EINVAL; //申请空间不能大于4M,如果大于4M就改为4M大小。 if ((vma->vm_end - vma->vm_start) > SZ_4M) vma->vm_end = vma->vm_start + SZ_4M; binder_debug(BINDER_DEBUG_OPEN_CLOSE, "binder_mmap: %d %lx-%lx (%ld K) vma %lx pagep %lx\n", proc->pid, vma->vm_start, vma->vm_end, (vma->vm_end - vma->vm_start) / SZ_1K, vma->vm_flags, (unsigned long)pgprot_val(vma->vm_page_prot)); //检查vma是否被forbidden,vma是一块连续的用户态虚拟内存地址空间的描述 if (vma->vm_flags & FORBIDDEN_MMAP_FLAGS) { ret = -EPERM; failure_string = "bad vm_flags"; goto err_bad_arg; } //打开VM_DONTCOPY,关闭VM_MAYWRITE vma->vm_flags = (vma->vm_flags | VM_DONTCOPY) & ~VM_MAYWRITE; //加上binder_mmap_lock互斥锁,因为接下来要操作proc结构体,可能发生多线程竞争 mutex_lock(&binder_mmap_lock); //一个进程已经有一次mmap,如要执行新的map,需先将之前的unmap。 if (proc->buffer) { ret = -EBUSY; failure_string = "already mapped"; goto err_already_mapped; } /* 获取一块与用户态空间大小一致的内核的连续虚拟地址空间, * 注意虚拟地址空间是在此一次性分配的,物理页面却是需要时才去申请和映射 */ area = get_vm_area(vma->vm_end - vma->vm_start, VM_IOREMAP); if (area == NULL) { ret = -ENOMEM; failure_string = "get_vm_area"; goto err_get_vm_area_failed; } //将内核虚拟地址记录在proc的buffer中 proc->buffer = area->addr; /* 记录用户态虚拟地址空间与内核态虚拟地址空间的偏移量, * 这样通过buffer和user_buffer_offset就可以计算出用户态的虚拟地址。 */ proc->user_buffer_offset = vma->vm_start - (uintptr_t)proc->buffer; /*释放互斥锁*/ mutex_unlock(&binder_mmap_lock); #ifdef CONFIG_CPU_CACHE_VIPT /* CPU的缓存方式是否为: VIPT(Virtual Index Physical Tag):使用虚拟地址的索引域和物理地址的标记域。 * 这里先不管,有兴趣的可参照:[https://blog.csdn.net/Q_AN1314/article/details/78980191](https://blog.csdn.net/Q_AN1314/article/details/78980191) */ if (cache_is_vipt_aliasing()) { while (CACHE_COLOUR((vma->vm_start ^ (uint32_t)proc->buffer))) { pr_info("binder_mmap: %d %lx-%lx maps %p bad alignment\n", proc->pid, vma->vm_start, vma->vm_end, proc->buffer); vma->vm_start += PAGE_SIZE; } } #endif /*分配存放物理页地址的数组*/ proc->pages = kzalloc(sizeof(proc->pages[0]) * ((vma->vm_end - vma->vm_start) / PAGE_SIZE), GFP_KERNEL); if (proc->pages == NULL) { ret = -ENOMEM; failure_string = "alloc page array"; goto err_alloc_pages_failed; } /*将虚拟地址空间的大小记录在proc的buffer_size中*/ proc->buffer_size = vma->vm_end - vma->vm_start; /* 安装vma线性空间操作函数:open,close,fault * open-> binder_vma_open: 简单的输出日志,pid,虚拟地址的起止、大小、标志位(vm_flags和vm_page_prot) * close -> binder_vma_close: 将proc的vma,vma_vm_mm设为NULL,并将proc加入到binder_deferred_workqueue队列, * binder驱动有一个单独的线程处理这个队列。 * fault -> binder_vam_fault: 直接返回VM_FAULT_SIGBUS, */ vma->vm_ops = &binder_vm_ops; /*在vma的vm_private_data字段里存入proc的引指针*/ vma->vm_private_data = proc; /* 先分配1个物理页,并将其分别映射到内核线性地址和用户态虚拟地址上,具体详见2 */ if (binder_update_page_range(proc, 1, proc->buffer, proc->buffer **+ PAGE_SIZE**, vma)) { ret = -ENOMEM; failure_string = "alloc small buf"; goto err_alloc_small_buf_failed; } /*成功分配了物理页并建立好的映射关系后,内核起始虚地址做为第一个binder_buffer的地址*/ buffer = proc->buffer; /*接着将内核虚拟内存链入proc的buffers和free_buffers链表中,free标志位设为1 INIT_LIST_HEAD(&proc->buffers); list_add(&buffer->entry, &proc->buffers); buffer->free = 1; binder_insert_free_buffer(proc, buffer); /*异步只能使用整个地址空间的一半*/ proc->free_async_space = proc->buffer_size / 2; barrier(); proc->files = get_files_struct(current); proc->vma = vma; proc->vma_vm_mm = vma->vm_mm;/*vma->vm_mm: vma对应的mm_struct,描述一个进程的虚拟地址空间,一个进程只有一个*/ /*pr_info("binder_mmap: %d %lx-%lx maps %p\n", proc->pid, vma->vm_start, vma->vm_end, proc->buffer);*/ return 0; /*出错处理*/ err_alloc_small_buf_failed: kfree(proc->pages); proc->pages = NULL; err_alloc_pages_failed: mutex_lock(&binder_mmap_lock); vfree(proc->buffer); proc->buffer = NULL; err_get_vm_area_failed: err_already_mapped: mutex_unlock(&binder_mmap_lock); err_bad_arg: pr_err("binder_mmap: %d %lx-%lx %s failed %d\n", proc->pid, vma->vm_start, vma->vm_end, failure_string, ret); return ret; }
vma->vm_flags = (vma->vm_flags | VM_DONTCOPY) & ~VM_MAYWRITE;因为设置了不能拷贝,所以fork之后的子进程并没有复制这块内存。。。
原因大概就是这样,所以也许可以再fork子进程后,重新打开/dev/binder,建立和system_server的binder,替换java层绑定binder?也许可以试试,感觉可能能成功。