From 30e1e157aa28d7927b9a0f4df5d299cf57280554 Mon Sep 17 00:00:00 2001
From: Bruno Bornsztein <bruno.bornsztein@gmail.com>
Date: Sun, 11 Jan 2026 09:32:55 -0600
Subject: [PATCH 1/2] docs: analyze Claude sandboxing vs cloud execution
 approaches

Research and compare different approaches for secure remote task execution:
- Claude Code native sandboxing (bubblewrap/Seatbelt)
- Devcontainers for container-based isolation
- Hetzner/VPS (current approach)
- Fly.io Machines for per-task VM isolation

Recommend hybrid approach starting with native sandbox on existing
infrastructure, with devcontainers and Fly.io as future enhancements.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 docs/REMOTE_EXECUTION_ANALYSIS.md | 282 ++++++++++++++++++++++++++++++
 1 file changed, 282 insertions(+)
 create mode 100644 docs/REMOTE_EXECUTION_ANALYSIS.md

diff --git a/docs/REMOTE_EXECUTION_ANALYSIS.md b/docs/REMOTE_EXECUTION_ANALYSIS.md
new file mode 100644
index 00000000..981e3270
--- /dev/null
+++ b/docs/REMOTE_EXECUTION_ANALYSIS.md
@@ -0,0 +1,282 @@
+# Remote Execution & Sandboxing Analysis
+
+This document analyzes different approaches to running Claude Code securely for our Task TUI application, comparing Claude Code's native sandboxing, devcontainers, and cloud-based approaches.
+
+## Current State
+
+Our task TUI already has substantial cloud infrastructure:
+
+- **`task cloud init`** - Interactive wizard for Hetzner/VPS setup
+- **`task cloud status/logs/sync`** - Remote management commands
+- **SSH access via Wish** - Connect to TUI from anywhere (`ssh -p 2222 server`)
+- **Git worktrees** - File-level isolation between parallel tasks
+- **Runner user** - Non-root execution on remote servers
+
+What we lack: **kernel-level sandboxing** to prevent malicious code from affecting the host.
+
+## Approaches Compared
+
+### 1. Claude Code Native Sandboxing
+
+Claude Code has built-in OS-level sandboxing using:
+- **Linux**: bubblewrap (namespace-based isolation)
+- **macOS**: Seatbelt sandbox
+
+**Capabilities:**
+| Feature | Description |
+|---------|-------------|
+| Filesystem isolation | R/W only to working directory, read-only elsewhere |
+| Network isolation | Only approved domains accessible |
+| Process isolation | All child processes inherit restrictions |
+| Auto-allow mode | Commands run without permission prompts if within sandbox |
+
+**Pros:**
+- ✅ Zero additional infrastructure needed
+- ✅ Works locally and on any Linux/macOS server
+- ✅ Enables `--dangerously-skip-permissions` safely
+- ✅ Configurable via `settings.json`
+- ✅ Handles filesystem AND network restrictions
+
+**Cons:**
+- ❌ Doesn't isolate between concurrent tasks (shared kernel)
+- ❌ No Windows support yet
+- ❌ Broad domain allowlists can be bypassed (domain fronting)
+- ❌ Unix socket access can break isolation (e.g., docker.sock)
+
+**Integration with our app:**
+```go
+// Already supported in executor.go
+args := []string{"claude"}
+if dangerous {
+    args = append(args, "--dangerously-skip-permissions")
+}
+// Claude Code's sandbox applies automatically
+```
+
+To enable, we could add to task execution:
+```json
+// .claude/settings.json in worktree
+{
+  "sandbox": {
+    "permissions": {
+      "fs": {
+        "write": {"allow": ["$CWD/**"]}
+      },
+      "network": {
+        "allowedDomains": ["github.com", "api.anthropic.com"]
+      }
+    }
+  }
+}
+```
+
+### 2. Devcontainers
+
+Claude Code provides a [reference devcontainer implementation](https://github.com/anthropics/claude-code/tree/main/.devcontainer) with:
+
+- Node.js 20 base image
+- Custom firewall (iptables) restricting outbound traffic
+- VS Code integration with Remote Containers extension
+- Pre-configured shell environment (ZSH + fzf + git)
+
+**Pros:**
+- ✅ Strong isolation via Docker
+- ✅ Consistent environment across machines
+- ✅ Can run `--dangerously-skip-permissions` safely
+- ✅ Network firewall blocks unauthorized connections
+- ✅ Isolates between tasks (each task = separate container)
+
+**Cons:**
+- ❌ Docker daemon required on host
+- ❌ Container startup overhead (~5-10 seconds)
+- ❌ More complex than native sandboxing
+- ❌ Credential exfiltration still possible within container
+- ❌ Requires VS Code or compatible tooling
+
+**Integration with our app:**
+
+Instead of git worktrees, we'd spawn Docker containers:
+```go
+func executeInDevcontainer(task *db.Task) error {
+    // Create ephemeral container from project's .devcontainer
+    containerName := fmt.Sprintf("task-%d", task.ID)
+
+    cmd := exec.Command("docker", "run",
+        "--name", containerName,
+        "--rm",
+        "-v", fmt.Sprintf("%s:/workspace", worktreePath),
+        "-e", fmt.Sprintf("TASK_ID=%d", task.ID),
+        "--network=task-network", // Custom network with egress rules
+        "task-devcontainer:latest",
+        "claude", "--dangerously-skip-permissions", "-p", prompt)
+
+    return cmd.Run()
+}
+```
+
+### 3. Remote Hetzner/VPS (Current Approach)
+
+What we have now:
+- Dedicated Linux server with `runner` user
+- Tasks run in git worktrees
+- SSH access via Wish on port 2222
+- Systemd service for auto-restart
+
+**Pros:**
+- ✅ Already implemented and working
+- ✅ Full Linux environment
+- ✅ Persistent state across sessions
+- ✅ SSH access from anywhere
+- ✅ Can be combined with native sandboxing
+
+**Cons:**
+- ❌ No isolation between tasks (shared filesystem)
+- ❌ Compromised task can affect others
+- ❌ Paying for idle server
+- ❌ Single point of failure
+
+**Enhancement**: Add Claude Code's native sandboxing:
+```bash
+# In systemd service
+ExecStart=/home/runner/bin/taskd -addr :2222 -dangerous
+# Claude runs with sandbox enabled by default
+```
+
+### 4. Fly.io Machines (Sprites)
+
+Fly.io Machines are fast-starting VMs (~300ms) with:
+- Per-invocation billing (pay only when running)
+- Auto-suspend on idle
+- Ephemeral or persistent volumes
+- Global edge network
+
+**Pros:**
+- ✅ True VM isolation (not containers)
+- ✅ Fast cold starts (~300ms vs ~30s for full VMs)
+- ✅ Per-second billing, scale to zero
+- ✅ Each task = separate machine (perfect isolation)
+- ✅ Can persist state via volumes
+- ✅ Global distribution
+
+**Cons:**
+- ❌ Not implemented yet
+- ❌ More complex orchestration
+- ❌ Network latency for remote ops
+- ❌ Fly.io dependency
+- ❌ Costs add up for many concurrent tasks
+
+**Integration concept:**
+```go
+func executeOnFlyMachine(task *db.Task) error {
+    // Create ephemeral Fly Machine
+    machine, err := flyClient.CreateMachine(MachineConfig{
+        Image: "task-worker:latest",
+        Size:  "shared-cpu-1x",
+        Env: map[string]string{
+            "TASK_ID":        strconv.Itoa(task.ID),
+            "ANTHROPIC_KEY":  os.Getenv("ANTHROPIC_API_KEY"),
+            "PROJECT_REPO":   task.ProjectURL,
+        },
+        AutoStop: &AutoStop{
+            IdleTimeout: 5 * time.Minute,
+            Strategy:    "suspend", // or "stop" for full isolation
+        },
+    })
+
+    // Machine clones repo, runs Claude, pushes results
+    return machine.WaitForCompletion()
+}
+```
+
+## Recommendation
+
+**Hybrid approach** combining multiple layers:
+
+### Phase 1: Enhance Current Setup (Low effort, immediate value)
+1. **Enable Claude Code sandboxing** on our existing Hetzner setup
+2. **Configure allowed domains** in `.claude/settings.json` per project
+3. **Use `--dangerously-skip-permissions`** since sandbox provides protection
+
+```go
+// internal/executor/executor.go - add sandbox config to worktree setup
+func (e *Executor) setupWorktreeSandboxConfig(worktreePath string) error {
+    sandboxConfig := map[string]interface{}{
+        "sandbox": map[string]interface{}{
+            "permissions": map[string]interface{}{
+                "fs": map[string]interface{}{
+                    "write": map[string][]string{
+                        "allow": []string{"$CWD/**", "/tmp/**"},
+                    },
+                },
+                "network": map[string]interface{}{
+                    "allowedDomains": []string{
+                        "github.com",
+                        "api.github.com",
+                        "api.anthropic.com",
+                        "registry.npmjs.org",
+                        // Add project-specific domains
+                    },
+                },
+            },
+        },
+    }
+    // Write to .claude/settings.json in worktree
+    return writeJSON(filepath.Join(worktreePath, ".claude", "settings.json"), sandboxConfig)
+}
+```
+
+### Phase 2: Devcontainers for Full Isolation (Medium effort)
+1. **Add project-level `.devcontainer/`** configs
+2. **Run tasks in ephemeral containers** instead of worktrees
+3. **Custom firewall rules** per project type
+4. **VS Code integration** for developers who want GUI
+
+### Phase 3: Fly.io for Scale (Higher effort, future)
+1. **Task-per-machine model** for ultimate isolation
+2. **Auto-scaling** based on queue depth
+3. **Geographic distribution** for low latency
+4. **Pay-per-use** economics at scale
+
+## Comparison Matrix
+
+| Feature | Native Sandbox | Devcontainer | Hetzner VPS | Fly.io |
+|---------|----------------|--------------|-------------|--------|
+| Setup complexity | ⭐ Low | ⭐⭐ Medium | ⭐⭐ Medium | ⭐⭐⭐ High |
+| Task isolation | ⭐ Process | ⭐⭐⭐ Container | ⭐ Process | ⭐⭐⭐ VM |
+| Startup time | ⭐⭐⭐ Instant | ⭐⭐ ~5s | ⭐⭐⭐ Instant | ⭐⭐ ~300ms |
+| Cost at scale | ⭐⭐⭐ Free | ⭐⭐ Docker overhead | ⭐⭐ Fixed monthly | ⭐⭐⭐ Pay-per-use |
+| Idle cost | N/A | N/A | ⭐ ~$5-20/mo | ⭐⭐⭐ $0 |
+| Skip permissions | ✅ Yes | ✅ Yes | ⚠️ Risky | ✅ Yes |
+| Already implemented | ⚠️ Partial | ❌ No | ✅ Yes | ❌ No |
+
+## Implementation Priority
+
+1. **Immediate**: Enable `sandbox` settings in executor.go for worktrees
+2. **Short-term**: Add `--dangerously-skip-permissions` flag (already exists)
+3. **Medium-term**: Create reference devcontainer for our task-worker
+4. **Long-term**: Evaluate Fly.io if scaling beyond single server
+
+## Security Considerations
+
+With any approach, these remain concerns:
+
+1. **Credential exfiltration** - Claude has access to API keys within its environment
+2. **Allowed domains** - GitHub.com access means attacker could push to repos
+3. **Prompt injection** - Malicious code in repo could manipulate Claude
+4. **Resource exhaustion** - Tasks could consume excessive CPU/memory
+
+Mitigations:
+- Use read-only API tokens where possible
+- Consider separate Claude API keys per project
+- Review Claude's actions in task logs
+- Set resource limits (already have suspension after idle)
+
+## Conclusion
+
+The best path forward combines **Claude Code's native sandboxing** (Phase 1) with our existing Hetzner infrastructure. This gives us:
+
+- Immediate security improvements with minimal changes
+- Ability to safely use `--dangerously-skip-permissions`
+- Foundation for devcontainer/Fly.io expansion later
+
+The native sandbox addresses most security concerns while keeping our current architecture intact. Devcontainers and Fly.io provide upgrade paths when we need stronger isolation or better scaling.

From 2270593f14e99dff5e399cacc55157f0457e990f Mon Sep 17 00:00:00 2001
From: Bruno Bornsztein <bruno.bornsztein@gmail.com>
Date: Thu, 15 Jan 2026 13:48:37 -0600
Subject: [PATCH 2/2] docs: add implementation recommendations for Claude
 sandboxing

Add SANDBOX_RECOMMENDATIONS.md with concrete implementation guidance
for enabling Claude Code's native sandboxing in the task TUI app.

Key findings:
- Native sandboxing is already built into Claude Code (bubblewrap/Seatbelt)
- Can be enabled immediately via .claude/settings.json config
- Requires minimal code changes (~50 lines in executor.go)
- Better than Hetzner/Fly.io for immediate security improvements
- Devcontainers and Fly.io remain good future options

Recommendation: Implement Phase 1 (native sandboxing) immediately
for filesystem/network isolation and safe auto-execution mode.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---
 docs/SANDBOX_RECOMMENDATIONS.md | 227 ++++++++++++++++++++++++++++++++
 1 file changed, 227 insertions(+)
 create mode 100644 docs/SANDBOX_RECOMMENDATIONS.md

diff --git a/docs/SANDBOX_RECOMMENDATIONS.md b/docs/SANDBOX_RECOMMENDATIONS.md
new file mode 100644
index 00000000..127e94d5
--- /dev/null
+++ b/docs/SANDBOX_RECOMMENDATIONS.md
@@ -0,0 +1,227 @@
+# Claude Code Sandboxing Implementation for Task TUI
+
+## Executive Summary
+
+After reviewing Claude Code's native sandboxing and devcontainer features, **the best approach is to implement Claude Code's native sandboxing immediately** as Phase 1, with devcontainers and Fly.io as optional future enhancements.
+
+The existing REMOTE_EXECUTION_ANALYSIS.md document provides an excellent foundation. This document adds implementation details and final recommendations based on the actual Claude Code capabilities.
+
+## Key Findings
+
+### Claude Code Native Sandboxing (Ready to Use Now)
+
+Claude Code already has OS-level sandboxing built-in using:
+- **Linux**: bubblewrap (namespace-based isolation)
+- **macOS**: Seatbelt sandbox
+
+**How it works:**
+1. Filesystem restrictions - R/W only to working directory, read elsewhere
+2. Network restrictions - only approved domains accessible
+3. Process isolation - all subprocesses inherit restrictions
+4. Auto-allow mode - commands run without permission prompts if within sandbox
+
+**Critical insight**: Your executor already runs Claude Code (executor.go:1082), so you get sandboxing **for free** by simply configuring it via `.claude/settings.json` files.
+
+### What This Means for Your Task TUI
+
+Your current architecture at executor.go:1029-1109 runs Claude like this:
+```go
+script := fmt.Sprintf(`TASK_ID=%d TASK_SESSION_ID=%s claude %s--chrome "$(cat %q)"`,
+    taskID, sessionID, dangerousFlag, promptFile.Name())
+```
+
+The sandboxing is already happening! You just need to configure it properly.
+
+## Implementation Recommendation
+
+### Phase 1: Enable Sandboxing (Immediate, Low Effort)
+
+**Goal**: Add sandbox configuration to each task's worktree.
+
+**Changes needed**:
+
+1. **Modify setupWorktree() in executor.go** to create sandbox config:
+
+```go
+// After line 2005, add:
+if err := e.setupSandboxConfig(worktreePath, task.Project); err != nil {
+    e.logger.Warn("could not setup sandbox config", "error", err)
+}
+```
+
+2. **Add new method to Executor**:
+
+```go
+// setupSandboxConfig creates a .claude/settings.json with sandbox configuration
+func (e *Executor) setupSandboxConfig(worktreePath, project string) error {
+    claudeDir := filepath.Join(worktreePath, ".claude")
+    if err := os.MkdirAll(claudeDir, 0755); err != nil {
+        return fmt.Errorf("create .claude dir: %w", err)
+    }
+
+    settingsPath := filepath.Join(claudeDir, "settings.json")
+
+    // Get project-specific allowed domains
+    allowedDomains := e.getProjectAllowedDomains(project)
+
+    sandboxConfig := map[string]interface{}{
+        "sandbox": map[string]interface{}{
+            "permissions": map[string]interface{}{
+                "fs": map[string]interface{}{
+                    "write": map[string][]string{
+                        "allow": []string{"$CWD/**", "/tmp/**"},
+                    },
+                },
+                "network": map[string]interface{}{
+                    "allowedDomains": allowedDomains,
+                },
+            },
+        },
+    }
+
+    data, err := json.MarshalIndent(sandboxConfig, "", "  ")
+    if err != nil {
+        return err
+    }
+
+    return os.WriteFile(settingsPath, data, 0644)
+}
+
+// getProjectAllowedDomains returns network domains allowed for a project
+func (e *Executor) getProjectAllowedDomains(project string) []string {
+    // Base domains needed for Claude to function
+    base := []string{
+        "api.anthropic.com",
+        "api.github.com",
+        "github.com",
+        "registry.npmjs.org",
+        "pypi.org",
+    }
+
+    // Check if project has custom allowed domains
+    proj, err := e.db.GetProjectByName(project)
+    if err == nil && proj != nil {
+        // You could add a new field to projects: allowed_domains
+        // For now, return base + common dev tools
+    }
+
+    return base
+}
+```
+
+3. **Update TASK_DANGEROUS_MODE usage**:
+
+Your current code at executor.go:1078-1081 only uses `--dangerously-skip-permissions` when TASK_DANGEROUS_MODE=1. With sandboxing configured, you can safely enable this by default:
+
+```go
+// Instead of checking TASK_DANGEROUS_MODE, enable by default when sandbox is configured
+dangerousFlag := "--dangerously-skip-permissions "
+```
+
+**Benefits of this approach**:
+- ✅ Zero infrastructure changes needed
+- ✅ Works on your existing Hetzner VPS immediately
+- ✅ Enables automatic command execution (no permission prompts)
+- ✅ Protects against filesystem and network abuse
+- ✅ Each task gets isolated permissions via worktrees
+- ✅ ~50 lines of code, can be done in 1 hour
+
+**Security gains**:
+- Tasks can't modify files outside their worktree
+- Tasks can't connect to arbitrary servers
+- Malicious code/dependencies are contained
+- Prompt injection attacks are mitigated
+
+### Phase 2: Devcontainers (Medium Term, If Needed)
+
+**When to consider**: If you need:
+- Stronger isolation between concurrent tasks
+- Per-task resource limits
+- Reproducible environments across machines
+- Team collaboration features
+
+**Implementation**: Replace git worktrees with Docker containers in executor.go:executeTask()
+
+**Effort**: ~2-3 days of development + testing
+
+### Phase 3: Fly.io Machines (Future, If Scaling)
+
+**When to consider**: If you:
+- Run >10 concurrent tasks regularly
+- Need geographic distribution
+- Want to stop paying for idle VPS
+- Need true VM-level isolation
+
+**Effort**: ~1-2 weeks of development + migration
+
+## Comparison Matrix Update
+
+| Feature | Native Sandbox (Phase 1) | Devcontainer (Phase 2) | Fly.io (Phase 3) |
+|---------|-------------------------|------------------------|------------------|
+| Implementation time | 1 hour | 2-3 days | 1-2 weeks |
+| Works with current code | ✅ Minimal changes | ⚠️ Moderate changes | ❌ Major refactor |
+| Filesystem isolation | ⭐⭐ Worktree-level | ⭐⭐⭐ Container-level | ⭐⭐⭐ VM-level |
+| Network isolation | ✅ Domain allowlist | ✅ iptables firewall | ✅ VPC isolation |
+| Auto-execute (skip perms) | ✅ Yes | ✅ Yes | ✅ Yes |
+| Cost at scale | ⭐⭐⭐ Free | ⭐⭐ Docker overhead | ⭐⭐⭐ Pay-per-use |
+| Idle cost | N/A | N/A | ⭐⭐⭐ $0 |
+
+## Recommended Action Plan
+
+### This Week
+1. ✅ Read documentation (done)
+2. Add `setupSandboxConfig()` method to executor.go
+3. Call it from `setupWorktree()` after line 2005
+4. Test with a sample task
+5. Deploy to Hetzner VPS
+
+### This Month
+1. Monitor sandbox violations in logs
+2. Tune allowed domains per project type
+3. Add project-level `allowed_domains` field to database
+4. Document for team usage
+
+### Future (If Needed)
+1. Evaluate devcontainers if task isolation becomes an issue
+2. Consider Fly.io if costs or scaling become concerns
+
+## Code Changes Summary
+
+**Files to modify**:
+- `internal/executor/executor.go` - Add sandbox configuration (~50 lines)
+- `internal/db/projects.go` - Add `allowed_domains` field (optional, ~10 lines)
+
+**No changes needed**:
+- SSH server, TUI, task lifecycle
+- Worktree management
+- Claude hooks system
+- Database schema (except optional domains field)
+
+## Security Considerations
+
+Even with sandboxing, these risks remain:
+
+1. **Credential exfiltration** - Claude has access to API keys in its environment
+   - Mitigation: Use read-only tokens where possible
+
+2. **Allowed domain bypass** - GitHub access means attacker could push to repos
+   - Mitigation: Use separate git credentials per project
+
+3. **Prompt injection** - Malicious code in repo could manipulate Claude
+   - Mitigation: Review Claude's actions, use hooks for suspicious activity
+
+4. **Resource exhaustion** - Tasks could consume excessive CPU/memory
+   - Mitigation: Current suspension system already handles this
+
+## Conclusion
+
+**The path forward is clear**: Implement Claude Code's native sandboxing (Phase 1) immediately. It's:
+- Already built into the tool you're using
+- Requires minimal code changes (~50 lines)
+- Works with your current architecture
+- Provides substantial security improvements
+- Enables `--dangerously-skip-permissions` safely
+
+Devcontainers and Fly.io remain excellent options for future enhancement, but aren't necessary to get significant security and UX benefits right now.
+
+The existing REMOTE_EXECUTION_ANALYSIS.md document correctly identified this as the best approach. This document confirms that assessment and provides concrete implementation details.