指点成金-最美分享吧

登录

Android之dropbox 分析

佚名 举报

篇首语:本文由小编为大家整理,主要介绍了Android之dropbox 分析相关的知识,希望对你有一定的参考价值。

参考技术A 简介

adb查询

app接口

dropbox启动

dropbox日志路径:/data/system/dropbox

记录的系统错误
1.系统正常启动后的自检工作
1)SYSTEM_BOOT
开机一次,记录一次

2)SYSTEM_RESTART
如果system_server在设备运行过程中异常,则会有记录

3)SYSTEM_LAST_KMSG
kernel异常。
pstore是persistent storage的缩写,内核发生异常通过此把异常日志记录下来,方便定位问题。
ramoops指的是采用ram保存oops信息(kernel 异常信息)的一个功能,利用pstore技术实现。

4)SYSTEM_TOMBSTONE
TOMBSTONE 是 android 用来记录 native 进程崩溃的 core dump 日志, 系统服务在启动完成后会增加一个 Observer 来侦测 tombstone 日志文件的变化, 每当生成新的 tombstone 文件, 就会增加一条 SYSTEM_TOMBSTONE 记录到 DropBoxManager 中.

5)SYSTEM_RECOVERY_LOG/SYSTEM_RECOVERY_KMSG
SYSTEM_RECOVERY_KMSG:recovery kerenl日志
SYSTEM_RECOVERY_LOG:recovery 升级或恢复出厂设置等等日志

6)SYSTEM_FSCK
文件系统完整性校验日志

7)SYSTEM_AUDIT
kernel 异常信息的查漏补缺日志

2.java/native crash。-- crash/native_crash
java/native层异常的区分在于eventType:crash/native_crash

3.anr 异常。-- anr
这里涉及广播、Service、Provider等组件的anr以及触摸按键事件的anr

4.wtf(What a Terrible Failure)。--- wtf
android.util.Log.wtf(String, String),应用可调用布局异常点

5.strict mode。---**_strictmode
严格模式,主要为性能监测使用
StrictMode (严格模式), 顾名思义, 就是在比正常模式检测得更严格, 通常用来监测不应当在主线程执行的网络, 文件等操作. 任何 StrictMode 违例都会被 ActivityManagerService 在 DropBoxManager 中记录为一次 strict_mode 违例.

6.lowmem。低内存报告

7.watchdog
如果 WatchDog 监测到系统进程(system_server)出现问题, 会增加一条 watchdog 记录到 DropBoxManager 中, 并终止系统进程的执行.

8.其他
1)netstats_error/netstats_dump
NetworkStatsService 负责收集并持久化存储网络状态的统计数据, 当遇到明显的网络状态错误时, 它会增加一条 netstats_error 记录到 DropBoxManager.

2)BATTERY_DISCHARGE_INFO
BatteryService 负责检测充电状态, 并更新手机电池信息. 当遇到明显的 discharge 事件, 它会增加一条 BATTERY_DISCHARGE_INFO 记录到 DropBoxManager.

3)storage_benchmark/storage_trim
StorageManagerService 负责存储设备管理,例如sdcard或usb mass storage
fstrim提升磁盘性能,缓解Android卡顿

4)network_watchlist_report
NetworkWatchlistService

5)incident
frameworks/base/cmds/incidentd

6)keymaster
system/security/keystore

参考学习

Android ANR log trace日志文件分析

Trace文件怎么获取?

traces.txt只保留最后一次ANR的信息,Android系统有个DropBox功能功能,它能记录系统出现的crash错误.因此保留有发生过的ANR的信息.(log路径:/data/system/dropbox)

获取系统crash log: adb shell dumpsys dropbox --print >>log.txt

系统生成的Trace文件保存在data/anr,可以用过命令adb pull data/anr/取出

Trace文件怎么生成的?

当APP(包括系统APP和用户APP)进程出现ANR、应用响应慢或WatchDog的监视没有得到回馈时,系统会dump此时的top进程,进程中Thread的运行状态就都dump到这个Trace文件中了.

导致ANR的常见几种情况:
  • KeyDispatchTimeout(5s): 按键或触摸事件在特定时间内无法处理完成
  • BroadcastTimeout(10s): 广播在特定时间内无法处理完成
  • ServiceTimeout(20s): Service在特定的时间无法处理完成 另外还有ProviderTimeout和WatchDog等导致的ANR

应用ANR产生的时候,ActivityManagerService的appNotResponding方法就会被调用,然后在/data/anr/traces.txt文件中写入ANR相关信息.

final void appNotResponding(ProcessRecord app, ActivityRecord activity,            ActivityRecord parent, boolean aboveSystem, final String annotation) {        ArrayList firstPids = new ArrayList(5);        SparseArray lastPids = new SparseArray(20);         if (mController != null) {            try {                // 0 == continue, -1 = kill process immediately                int res = mController.appEarlyNotResponding(app.processName, app.pid, annotation);                if (res < 0 && app.pid != MY_PID) {                    app.kill("anr", true);                }            } catch (RemoteException e) {                mController = null;                Watchdog.getInstance().setActivityController(null);            }        }         long anrTime = SystemClock.uptimeMillis();        if (MONITOR_CPU_USAGE) {            updateCpuStatsNow();    // 更新CPU使用率        }         synchronized (this) {            // PowerManager.reboot() can block for a long time, so ignore ANRs while shutting down.            if (mShuttingDown) {                Slog.i(TAG, "During shutdown skipping ANR: " + app + " " + annotation);                return;            } else if (app.notResponding) {                Slog.i(TAG, "Skipping duplicate ANR: " + app + " " + annotation);                return;            } else if (app.crashing) {                Slog.i(TAG, "Crashing app skipping ANR: " + app + " " + annotation);                return;            }             // In case we come through here for the same app before completing            // this one, mark as anring now so we will bail out.            app.notResponding = true;             // Log the ANR to the event log.            EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid,                    app.processName, app.info.flags, annotation);             // Dump thread traces as quickly as we can, starting with "interesting" processes.            firstPids.add(app.pid);             int parentPid = app.pid;            if (parent != null && parent.app != null && parent.app.pid > 0) parentPid = parent.app.pid;            if (parentPid != app.pid) firstPids.add(parentPid);             if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID);             for (int i = mLruProcesses.size() - 1; i >= 0; i--) {                ProcessRecord r = mLruProcesses.get(i);                if (r != null && r.thread != null) {                    int pid = r.pid;                    if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) {                        if (r.persistent) {                            firstPids.add(pid);                        } else {                            lastPids.put(pid, Boolean.TRUE);                        }                    }                }            }        }         // Log the ANR to the main log.        StringBuilder info = new StringBuilder();        info.setLength(0);        info.append("ANR in ").append(app.processName);        if (activity != null && activity.shortComponentName != null) {            info.append(" (").append(activity.shortComponentName).append(")");        }        info.append("");        info.append("PID: ").append(app.pid).append("");        if (annotation != null) {            info.append("Reason: ").append(annotation).append("");        }        if (parent != null && parent != activity) {            info.append("Parent: ").append(parent.shortComponentName).append("");        }         final ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);        // dumpStackTraces是输出traces文件的函数        File tracesFile = dumpStackTraces(true, firstPids, processCpuTracker, lastPids,                NATIVE_STACKS_OF_INTEREST);         String cpuInfo = null;        if (MONITOR_CPU_USAGE) {            updateCpuStatsNow();    // 再次更新CPU信息            synchronized (mProcessCpuTracker) {                // 输出ANR发生前一段时间内的CPU使用率                cpuInfo = mProcessCpuTracker.printCurrentState(anrTime);            }            info.append(processCpuTracker.printCurrentLoad());            info.append(cpuInfo);        }        // 输出ANR发生后一段时间内的CPU使用率        info.append(processCpuTracker.printCurrentState(anrTime));         Slog.e(TAG, info.toString());        if (tracesFile == null) {            // There is no trace file, so dump (only) the alleged culprit"s threads to the log            Process.sendSignal(app.pid, Process.SIGNAL_QUIT);        }        // 将ANR信息同时输出到DropBox中        addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,                cpuInfo, tracesFile, null);         if (mController != null) {            try {                // 0 == show dialog, 1 = keep waiting, -1 = kill process immediately                int res = mController.appNotResponding(app.processName, app.pid, info.toString());                if (res != 0) {                    if (res < 0 && app.pid != MY_PID) {                        app.kill("anr", true);                    } else {                        synchronized (this) {                            mServices.scheduleServiceTimeoutLocked(app);                        }                    }                    return;                }            } catch (RemoteException e) {                mController = null;                Watchdog.getInstance().setActivityController(null);            }        }         // Unless configured otherwise, swallow ANRs in background processes & kill the process.        boolean showBackground = Settings.Secure.getInt(mContext.getContentResolver(),                Settings.Secure.ANR_SHOW_BACKGROUND, 0) != 0;         synchronized (this) {            mBatteryStatsService.noteProcessAnr(app.processName, app.uid);             if (!showBackground && !app.isInterestingToUserLocked() && app.pid != MY_PID) {                app.kill("bg anr", true);                return;            }             // Set the app"s notResponding state, and look up the errorReportReceiver            makeAppNotRespondingLocked(app,                    activity != null ? activity.shortComponentName : null,                    annotation != null ? "ANR " + annotation : "ANR",                    info.toString());             //Set the trace file name to app name + current date format to avoid overrinding trace file            String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);            if (tracesPath != null && tracesPath.length() != 0) {                File traceRenameFile = new File(tracesPath);                String newTracesPath;                int lpos = tracesPath.lastIndexOf (".");                if (-1 != lpos)                    newTracesPath = tracesPath.substring (0, lpos) + "_" + app.processName + "_" + mTraceDateFormat.format(new Date()) + tracesPath.substring (lpos);                else                    newTracesPath = tracesPath + "_" + app.processName;                 traceRenameFile.renameTo(new File(newTracesPath));            }            // 显示ANR提示对话框            // Bring up the infamous App Not Responding dialog            Message msg = Message.obtain();            HashMap map = new HashMap();            msg.what = SHOW_NOT_RESPONDING_MSG;            msg.obj = map;            msg.arg1 = aboveSystem ? 1 : 0;            map.put("app", app);            if (activity != null) {                map.put("activity", activity);            }             mUiHandler.sendMessage(msg);        }    }

 

避免ANR?

  1. UI线程尽量只做跟UI相关的工作

  2. 耗时的工作(比如数据库操作,I/O,连接网络或者别的有可能阻碍UI线程的操作)把它放入单独的线程处理

  3. 尽量用Handler来处理UIthread和别的thread之间的交互

分析ANR的Log:

出现ANR的log如下:

06-22 10:37:46.204  3547  3604 E ActivityManager: ANR in org.code:MessengerService    // ANR出现的进程包名 E ActivityManager: PID: 17027        // ANR进程ID E ActivityManager: Reason: executing service org.code/.ipc.MessengerService    //导致ANR的原因 E ActivityManager: Load: 8.31 / 8.12 / 8.47 E ActivityManager: CPU usage from 0ms to 6462ms later:    //CPU在ANR发生后的使用情况 E ActivityManager:   61% 3547/system_server: 21% user + 39% kernel / faults: 13302 minor 2 major E ActivityManager:   0.2% 475/debuggerd: 0% user + 0.1% kernel / faults: 6086 minor 1 major E ActivityManager:   10% 5742/com.android.phone: 5.1% user + 5.1% kernel / faults: 7597 minor E ActivityManager:   6.9% 5342/com.android.systemui: 2.1% user + 4.8% kernel / faults: 4158 minor E ActivityManager:   0.1% 477/debuggerd64: 0% user + 0.1% kernel / faults: 4013 minor E ActivityManager:   5.7% 16313/org.code: 3.2% user + 2.4% kernel / faults: 2412 minor E ActivityManager:   3.7% 17027/org.code:MessengerService: 1.7% user + 2% kernel / faults: 2571 minor 6 major E ActivityManager:   2.6% 306/cfinteractive: 0% user + 2.6% kernel                    ... ...E ActivityManager:  +0% 17168/kworker/0:1: 0% user + 0% kernel E ActivityManager: 0% TOTAL: 0% user + 0% kernel + 0% softirq    // CUP占用情况 E ActivityManager: CPU usage from 5628ms to 6183ms later: E ActivityManager:   42% 3547/system_server: 17% user + 24% kernel / faults: 11 minor E ActivityManager:     12% 3604/ActivityManager: 1.7% user + 10% kernel E ActivityManager:     12% 3609/android.display: 8.7% user + 3.5% kernel E ActivityManager:     3.5% 5304/Binder_6: 1.7% user + 1.7% kernel E ActivityManager:     3.5% 5721/Binder_A: 1.7% user + 1.7% kernel E ActivityManager:     3.5% 5746/Binder_C: 3.5% user + 0% kernel E ActivityManager:     1.7% 3599/Binder_1: 0% user + 1.7% kernel E ActivityManager:     1.7% 3600/Binder_2: 0% user + 1.7% kernel I ActivityManager: Killing 17027:org.code:MessengerService/u0a85 (adj 1): bg anr I art     : Wrote stack traces to "/data/anr/traces.txt"    //art这个TAG打印对traces操作的信息 D GraphicsStats: Buffer count: 9 W ActivityManager: Scheduling restart of crashed service org.code/.ipc.MessengerService in 1000ms 

 

log打印了ANR的基本信息,我们可以分析CPU使用率得知ANR的简单情况;如果CPU使用率很高,接近100%,可能是在进行大规模的计算更可能是陷入死循环;如果CUP使用率很低,说明主线程被阻塞了,并且当IOwait很高,可能是主线程在等待I/O操作的完成.

对于ANR只是分析Log很难知道问题所在,我们还需要通过Trace文件分析stack调用情况.

 ----- pid 17027 at 2017-06-22 10:37:39 -----    // ANR出现的进程pid和时间(和上述log中pid一致)Cmd line: org.code:MessengerService                // ANR出现的进程名Build fingerprint: "Homecare/qucii8976v3_64:6.0.1/pansen06141150:eng/test-keys"        // 下面记录系统版本,内存等状态信息ABI: "arm64"Build type: optimizedZygote loaded classes=6576 post zygote classes=13Intern table: 13780 strong; 17 weakJNI: CheckJNI is on; globals=283 (plus 158 weak)Libraries: /system/lib64/libandroid.so /system/lib64/libcompiler_rt.so /system/lib64/libjavacrypto.so /system/lib64/libjnigraphics.so /system/lib64/libmedia_jni.so /system/lib64/libwebviewchromium_loader.so libjavacore.so (7)Heap: 29% free, 8MB/12MB; 75731 objectsDumping cumulative Gc timingsTotal number of allocations 75731Total bytes allocated 8MBTotal bytes freed 0BFree memory 3MBFree memory until GC 3MBFree memory until OOME 183MBTotal memory 12MBMax memory 192MBZygote space size 3MBTotal mutator paused time: 0Total time waiting for GC to complete: 0Total GC count: 0Total GC time: 0Total blocking GC count: 0Total blocking GC time: 0 suspend all histogram:    Sum: 76us 99% C.I. 0.100us-28us Avg: 7.600us Max: 28usDALVIK THREADS (15):// Signal Catcher是记录traces信息的线程// Signal Catche(线程名)、(daemon)表示守护进程、prio(线程优先级,默认是5)、tid(线程唯一标识ID)、Runnable(线程当前状态)"Signal Catcher" daemon prio=5 tid=3 Runnable//线程组名称、suspendCount、debugSuspendCount、线程的Java对象地址、线程的Native对象地址  | group="system" sCount=0 dsCount=0 obj=0x12d8f0a0 self=0x5598ae55d0  //sysTid是线程号(主线程的线程号和进程号相同)  | sysTid=17033 nice=0 cgrp=default sched=0/0 handle=0x7fb2350450  | state=R schedstat=( 4348125 172343 3 ) utm=0 stm=0 core=1 HZ=100  | stack=0x7fb2254000-0x7fb2256000 stackSize=1013KB  | held mutexes= "mutator lock"(shared held)  native: #00 pc 0000000000489e28  /system/lib64/libart.so (art::DumpNativeStack(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, int, char const*, art::ArtMethod*, void*)+236)  native: #01 pc 0000000000458fe8  /system/lib64/libart.so (art::Thread::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) const+220)  native: #02 pc 0000000000465bc8  /system/lib64/libart.so (art::DumpCheckpoint::Run(art::Thread*)+688)  native: #03 pc 0000000000466ae0  /system/lib64/libart.so (art::ThreadList::RunCheckpoint(art::Closure*)+276)  native: #04 pc 000000000046719c  /system/lib64/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+188)  native: #05 pc 0000000000467a84  /system/lib64/libart.so (art::ThreadList::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+492)  native: #06 pc 0000000000431194  /system/lib64/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+96)  native: #07 pc 000000000043e604  /system/lib64/libart.so (art::SignalCatcher::HandleSigQuit()+1256)  native: #08 pc 000000000043f214  /system/lib64/libart.so (art::SignalCatcher::Run(void*)+452)  native: #09 pc 0000000000068714  /system/lib64/libc.so (__pthread_start(void*)+52)  native: #10 pc 000000000001c604  /system/lib64/libc.so (__start_thread+16)  (no managed stack frames) //main(线程名)、prio(线程优先级,默认是5)、tid(线程唯一标识ID)、Sleeping(线程当前状态)"main" prio=5 tid=1 Sleeping  | group="main" sCount=1 dsCount=0 obj=0x73132d10 self=0x5598a5f5e0  //sysTid是线程号(主线程的线程号和进程号相同)  | sysTid=17027 nice=0 cgrp=default sched=0/0 handle=0x7fb6db6fe8  | state=S schedstat=( 420582038 5862546 143 ) utm=24 stm=18 core=6 HZ=100  | stack=0x7fefba3000-0x7fefba5000 stackSize=8MB  | held mutexes=  // java 堆栈调用信息(这里可查看导致ANR的代码调用流程)(分析ANR最重要的信息)  at java.lang.Thread.sleep!(Native method)  - sleeping on <0x0c60f3c7> (a java.lang.Object)  at java.lang.Thread.sleep(Thread.java:1031)  - locked <0x0c60f3c7> (a java.lang.Object)    // 锁住对象0x0c60f3c7  at java.lang.Thread.sleep(Thread.java:985)  at android.os.SystemClock.sleep(SystemClock.java:120)  at org.code.ipc.MessengerService.onCreate(MessengerService.java:63)    //导致ANR的代码  at android.app.ActivityThread.handleCreateService(ActivityThread.java:2877)  at android.app.ActivityThread.access$1900(ActivityThread.java:150)  at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1427)  at android.os.Handler.dispatchMessage(Handler.java:102)  at android.os.Looper.loop(Looper.java:148)  at android.app.ActivityThread.main(ActivityThread.java:5417)  at java.lang.reflect.Method.invoke!(Native method)  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:726)  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:616)

 

在log中显示的pid在traces文件中与之对应,然后通过查看堆栈调用信息分析ANR的代码.

上述ANR实际上在org.code.ipc.MessengerService.onCreate中63行调用SystemClock.sleep(10000)代码导致的;这是发生在Java层的,如果在native层,那么堆栈中就不会有相关调用的路径,这种情况只能在native层添加更多的Log,一步步来查找了


线程状态的分类: java中定义的线程状态:

    // libcore/libart/src/main/java/java/lang/Thread.java    /**     * A representation of a thread"s state. A given thread may only be in one     * state at a time.     */    public enum State {        /**         * The thread has been created, but has never been started.         */        NEW,        /**         * The thread may be run.         */        RUNNABLE,        /**         * The thread is blocked and waiting for a lock.         */        BLOCKED,        /**         * The thread is waiting.         */        WAITING,        /**         * The thread is waiting for a specified amount of time.         */        TIMED_WAITING,        /**         * The thread has been terminated.         */        TERMINATED    }

 

C代码中定义的线程状态:

// /art/runtime/thread_state.henum ThreadState {  //                                   Thread.State   JDWP state  kTerminated = 66,                 // TERMINATED     TS_ZOMBIE    Thread.run has returned, but Thread* still around  kRunnable,                        // RUNNABLE       TS_RUNNING   runnable  kTimedWaiting,                    // TIMED_WAITING  TS_WAIT      in Object.wait() with a timeout  kSleeping,                        // TIMED_WAITING  TS_SLEEPING  in Thread.sleep()  kBlocked,                         // BLOCKED        TS_MONITOR   blocked on a monitor  kWaiting,                         // WAITING        TS_WAIT      in Object.wait()  kWaitingForGcToComplete,          // WAITING        TS_WAIT      blocked waiting for GC  kWaitingForCheckPointsToRun,      // WAITING        TS_WAIT      GC waiting for checkpoints to run  kWaitingPerformingGc,             // WAITING        TS_WAIT      performing GC  kWaitingForDebuggerSend,          // WAITING        TS_WAIT      blocked waiting for events to be sent  kWaitingForDebuggerToAttach,      // WAITING        TS_WAIT      blocked waiting for debugger to attach  kWaitingInMainDebuggerLoop,       // WAITING        TS_WAIT      blocking/reading/processing debugger events  kWaitingForDebuggerSuspension,    // WAITING        TS_WAIT      waiting for debugger suspend all  kWaitingForJniOnLoad,             // WAITING        TS_WAIT      waiting for execution of dlopen and JNI on load code  kWaitingForSignalCatcherOutput,   // WAITING        TS_WAIT      waiting for signal catcher IO to complete  kWaitingInMainSignalCatcherLoop,  // WAITING        TS_WAIT      blocking/reading/processing signals  kWaitingForDeoptimization,        // WAITING        TS_WAIT      waiting for deoptimization suspend all  kWaitingForMethodTracingStart,    // WAITING        TS_WAIT      waiting for method tracing to start  kWaitingForVisitObjects,          // WAITING        TS_WAIT      waiting for visiting objects  kWaitingForGetObjectsAllocated,   // WAITING        TS_WAIT      waiting for getting the number of allocated objects  kStarting,                        // NEW            TS_WAIT      native thread started, not yet ready to run managed code  kNative,                          // RUNNABLE       TS_RUNNING   running in a JNI native method  kSuspended,                       // RUNNABLE       TS_RUNNING   suspended by GC or debugger};

 

两者可以看出在C代码中定义更为详细,Traces中显示的线程状态都是C代码定义的.我们可以通过查看线程状态对应的信息分析ANR问题.

如: TimedWaiting对应的线程状态是TIMED_WAITING

kTimedWaiting, // TIMED_WAITING TS_WAIT in Object.wait() with a timeout 执行了无超时参数的wait函数

kSleeping, // TIMED_WAITING TS_SLEEPING in Thread.sleep() 执行了带有超时参数的sleep函数

ZOMBIE                              线程死亡,终止运行RUNNING/RUNNABLE                    线程可运行或正在运行TIMED_WAIT                          执行了带有超时参数的wait、sleep或join函数MONITOR                             线程阻塞,等待获取对象锁WAIT                                执行了无超时参数的wait函数INITIALIZING                        新建,正在初始化,为其分配资源STARTING                            新建,正在启动NATIVE                              正在执行JNI本地函数VMWAIT                              正在等待VM资源SUSPENDED                           线程暂停,通常是由于GC或debug被暂停

 

以上是关于Android之dropbox 分析的主要内容,如果未能解决你的问题,请参考以下文章