In late 2019, while working at HashCube on mobile puzzle games, we encountered one of those frustrating production crashes that every mobile engineer dreads: random, hard-to-reproduce native crashes that only seemed to happen in the wild.
What started as a mysterious JSAbortIfWrongThread
crash turned into a fascinating investigation that revealed fundamental gaps in how Cocos2d-x handles Android lifecycle events.
The Problem
Our mobile game, built with Cocos2d-x using JavaScript bindings, was experiencing native crashes in production. The symptoms were maddening:
- Random occurrence: No clear reproduction steps
- Device pattern: Primarily older Android devices
- Crash signature:
JSAbortIfWrongThread
deep inside the JS engine - Stack trace: Minimal useful context in Crashlytics
- Frequency: Steady reports in user logs but impossible to reproduce locally
The crash appeared to be coming from somewhere deep in the JavaScript engine, which initially led us down the wrong path entirely.
Initial Debugging Attempts
Crashlytics Wasn't Enough
Our first instinct was to rely on Crashlytics for insights. Unfortunately, the native crash reports were frustratingly vague:
Fatal Exception: JSAbortIfWrongThread
at [native code]
Multiple attempts to reproduce the crash locally failed. We initially suspected rendering issues, which led to optimization work that, while beneficial for performance, didn't solve the actual problem.
The Bugsnag Breakthrough
My teammate Ashish suggested integrating Bugsnag to get richer event context. This decision proved crucial. Bugsnag's breadcrumb feature started showing us patterns that Crashlytics missed.
The key insight came from repeatedly seeing this in the logs just before crashes:
onActivitySaveInstanceState
This breadcrumb appeared consistently before crashes, often without any user input. Something was triggering the Android activity lifecycle without user interaction.
Following the Trail
The Background Pattern
Digging deeper into the logs, we noticed a suspicious pattern:
- App goes into background
onActivitySaveInstanceState
is called- App comes back to foreground
- Crash occurs shortly after
The critical detail: users weren't manually switching apps or restarting. Something else was causing the activity to restart.
Developer Settings Revelation
The breakthrough came when I enabled "Limit background processes" in Android Developer Options and set it to 1 process maximum.
Boom. Instant reproduction.
This setting forces Android to aggressively kill background apps to reclaim memory. When our game returned to the foreground, Android would recreate the activity, but Cocos2d-x wasn't properly handling this lifecycle transition.
Comparing with the Original Game
Our game had been ported from Game Closure's DevKit to Cocos2d-x-js, so we had a perfect reference point. We tested the original Game Closure version under the same conditions:
- Set process limit to 1
- Background the game
- Force Android to kill it
- Resume the game
The Game Closure version handled this perfectly - it cleaned up properly and restarted fresh. We also checked other apps and games - they all either cleaned up properly or restarted completely. Our Cocos2d-x-js port was the outlier, trying to resurrect from an invalid state.
The Root Cause
Android's Silent App Killing
Here's what was actually happening:
- User backgrounds the app: Normal behavior
- Android kills the process: Silently, due to memory pressure
- User returns to app: Android recreates the activity
- Cocos2d-x revival: Engine attempts to restore state
- Native crash: JS engine and OpenGL context are in inconsistent states
The JSAbortIfWrongThread
error was just a symptom – the real issue was that Cocos2d-x wasn't designed to handle process death and revival gracefully when using JavaScript bindings.
Community Confirmation
Our investigation led us to critical community resources that confirmed the broader pattern:
GitHub Issue #20466: "Android killed activity may cause JS engine crash"
This issue was opened by the maintainer after I reported the problem. It documented the crash in Cocos2d-x v3.17.1 JavaScript branch:
- Error:
malformed UTF-8 character sequence at offset 0
- Crash signature:
js_abortifwrongthread
(MOZ_CRASH) - Root cause: JS engine fails to handle activity recreation properly
- Quick reproduction: Enable "Don't keep activities" in Developer Options
The issue helped document the problem for others facing the same crash.
Forum Discussion: Crash on device language change
Hatim, who later implemented our fix, had reported in this forum thread that the crash wasn't limited to memory pressure scenarios:
- Configuration changes (like language) also trigger activity recreation
- Same crash occurs when returning to game after language change
- Affects any touch event after activity recreation
- Confirmed on Android 5.x through 9
- Only affects JavaScript bindings, not C++ projects
The Common Thread: Whether Android kills your activity due to memory pressure or configuration changes, the result is the same – Cocos2d-x's JavaScript engine can't handle the activity recreation, leading to native crashes when the engine tries to execute JS code in an invalid state.
The Solution
Initial Workaround Ideas
My first instincts were to work around the issue rather than fix it:
- Full app restart using libraries like ProcessPhoenix
- Manifest changes with
android:launchMode="singleTask"
- State detection to trigger restart in
onCreate
While these might have masked the problem, they weren't addressing the root cause.
The Real Fix
The breakthrough came when Ramprasad (our CTO) suggested looking into cocos2d-x-lite for inspiration. Following this lead, Hatim implemented the proper fix – cleanup in the activity's onDestroy
method:
@Override
protected void onDestroy() {
Cocos2dxAudioFocusManager.unregisterAudioFocusListener(this);
CAAgent.onDestroy(); // analytics cleanup
super.onDestroy();
if (mGLSurfaceView != null) {
Cocos2dxHelper.terminateProcess(); // Critical: terminates native engine safely
}
}
The key line is Cocos2dxHelper.terminateProcess()
, which ensures the native engine shuts down properly instead of leaving it in a corrupted state when Android kills the process. This was a much cleaner solution than any of the workarounds I'd been considering.
Impact and Lessons
Immediate Impact
- Crash elimination: Complete resolution in production
- User experience: No more silent crashes during app resume
- Device coverage: Especially beneficial for low-memory Android devices
Process Improvements
- Better observability: Bugsnag's breadcrumbs were game-changing for debugging
- QA practices: We started testing under different developer settings
- Lifecycle awareness: Better understanding of Android process management
Technical Insights
This bug revealed a fundamental gap in how Cocos2d-x handles Android lifecycle events when JavaScript bindings are involved. The engine assumed process continuity that Android doesn't guarantee on memory-constrained devices.
Key Takeaways
- Observability matters: The right debugging tools can reveal patterns invisible to basic crash reporting
- Test edge cases: Developer settings can simulate real-world conditions that normal testing misses
- Understand your platform: Android's aggressive memory management can break assumptions about app lifecycle
- Community resources: Open source projects often have others who've hit the same issues
- Follow the breadcrumbs: Seemingly unrelated log entries can be crucial clues
Reproduction Steps for Testing
If you're working with Cocos2d-x on Android, you can test for this issue:
- Enable Developer Options on an Android device
- Set "Limit background processes" to 1
- Launch your app, then background it
- Open another app to trigger process kill
- Return to your app
If you see crashes on app resume, you're likely hitting the same lifecycle management issue.
This debugging experience reinforced my belief that the most interesting bugs often hide behind misleading error messages. Sometimes the crash you see isn't the problem you need to solve.