diff --git a/docs/REMOTE_EXECUTION_ANALYSIS.md b/docs/REMOTE_EXECUTION_ANALYSIS.md new file mode 100644 index 00000000..981e3270 --- /dev/null +++ b/docs/REMOTE_EXECUTION_ANALYSIS.md @@ -0,0 +1,282 @@ +# Remote Execution & Sandboxing Analysis + +This document analyzes different approaches to running Claude Code securely for our Task TUI application, comparing Claude Code's native sandboxing, devcontainers, and cloud-based approaches. + +## Current State + +Our task TUI already has substantial cloud infrastructure: + +- **`task cloud init`** - Interactive wizard for Hetzner/VPS setup +- **`task cloud status/logs/sync`** - Remote management commands +- **SSH access via Wish** - Connect to TUI from anywhere (`ssh -p 2222 server`) +- **Git worktrees** - File-level isolation between parallel tasks +- **Runner user** - Non-root execution on remote servers + +What we lack: **kernel-level sandboxing** to prevent malicious code from affecting the host. + +## Approaches Compared + +### 1. Claude Code Native Sandboxing + +Claude Code has built-in OS-level sandboxing using: +- **Linux**: bubblewrap (namespace-based isolation) +- **macOS**: Seatbelt sandbox + +**Capabilities:** +| Feature | Description | +|---------|-------------| +| Filesystem isolation | R/W only to working directory, read-only elsewhere | +| Network isolation | Only approved domains accessible | +| Process isolation | All child processes inherit restrictions | +| Auto-allow mode | Commands run without permission prompts if within sandbox | + +**Pros:** +- ✅ Zero additional infrastructure needed +- ✅ Works locally and on any Linux/macOS server +- ✅ Enables `--dangerously-skip-permissions` safely +- ✅ Configurable via `settings.json` +- ✅ Handles filesystem AND network restrictions + +**Cons:** +- ❌ Doesn't isolate between concurrent tasks (shared kernel) +- ❌ No Windows support yet +- ❌ Broad domain allowlists can be bypassed (domain fronting) +- ❌ Unix socket access can break isolation (e.g., docker.sock) + +**Integration with our app:** +```go +// Already supported in executor.go +args := []string{"claude"} +if dangerous { + args = append(args, "--dangerously-skip-permissions") +} +// Claude Code's sandbox applies automatically +``` + +To enable, we could add to task execution: +```json +// .claude/settings.json in worktree +{ + "sandbox": { + "permissions": { + "fs": { + "write": {"allow": ["$CWD/**"]} + }, + "network": { + "allowedDomains": ["github.com", "api.anthropic.com"] + } + } + } +} +``` + +### 2. Devcontainers + +Claude Code provides a [reference devcontainer implementation](https://github.com/anthropics/claude-code/tree/main/.devcontainer) with: + +- Node.js 20 base image +- Custom firewall (iptables) restricting outbound traffic +- VS Code integration with Remote Containers extension +- Pre-configured shell environment (ZSH + fzf + git) + +**Pros:** +- ✅ Strong isolation via Docker +- ✅ Consistent environment across machines +- ✅ Can run `--dangerously-skip-permissions` safely +- ✅ Network firewall blocks unauthorized connections +- ✅ Isolates between tasks (each task = separate container) + +**Cons:** +- ❌ Docker daemon required on host +- ❌ Container startup overhead (~5-10 seconds) +- ❌ More complex than native sandboxing +- ❌ Credential exfiltration still possible within container +- ❌ Requires VS Code or compatible tooling + +**Integration with our app:** + +Instead of git worktrees, we'd spawn Docker containers: +```go +func executeInDevcontainer(task *db.Task) error { + // Create ephemeral container from project's .devcontainer + containerName := fmt.Sprintf("task-%d", task.ID) + + cmd := exec.Command("docker", "run", + "--name", containerName, + "--rm", + "-v", fmt.Sprintf("%s:/workspace", worktreePath), + "-e", fmt.Sprintf("TASK_ID=%d", task.ID), + "--network=task-network", // Custom network with egress rules + "task-devcontainer:latest", + "claude", "--dangerously-skip-permissions", "-p", prompt) + + return cmd.Run() +} +``` + +### 3. Remote Hetzner/VPS (Current Approach) + +What we have now: +- Dedicated Linux server with `runner` user +- Tasks run in git worktrees +- SSH access via Wish on port 2222 +- Systemd service for auto-restart + +**Pros:** +- ✅ Already implemented and working +- ✅ Full Linux environment +- ✅ Persistent state across sessions +- ✅ SSH access from anywhere +- ✅ Can be combined with native sandboxing + +**Cons:** +- ❌ No isolation between tasks (shared filesystem) +- ❌ Compromised task can affect others +- ❌ Paying for idle server +- ❌ Single point of failure + +**Enhancement**: Add Claude Code's native sandboxing: +```bash +# In systemd service +ExecStart=/home/runner/bin/taskd -addr :2222 -dangerous +# Claude runs with sandbox enabled by default +``` + +### 4. Fly.io Machines (Sprites) + +Fly.io Machines are fast-starting VMs (~300ms) with: +- Per-invocation billing (pay only when running) +- Auto-suspend on idle +- Ephemeral or persistent volumes +- Global edge network + +**Pros:** +- ✅ True VM isolation (not containers) +- ✅ Fast cold starts (~300ms vs ~30s for full VMs) +- ✅ Per-second billing, scale to zero +- ✅ Each task = separate machine (perfect isolation) +- ✅ Can persist state via volumes +- ✅ Global distribution + +**Cons:** +- ❌ Not implemented yet +- ❌ More complex orchestration +- ❌ Network latency for remote ops +- ❌ Fly.io dependency +- ❌ Costs add up for many concurrent tasks + +**Integration concept:** +```go +func executeOnFlyMachine(task *db.Task) error { + // Create ephemeral Fly Machine + machine, err := flyClient.CreateMachine(MachineConfig{ + Image: "task-worker:latest", + Size: "shared-cpu-1x", + Env: map[string]string{ + "TASK_ID": strconv.Itoa(task.ID), + "ANTHROPIC_KEY": os.Getenv("ANTHROPIC_API_KEY"), + "PROJECT_REPO": task.ProjectURL, + }, + AutoStop: &AutoStop{ + IdleTimeout: 5 * time.Minute, + Strategy: "suspend", // or "stop" for full isolation + }, + }) + + // Machine clones repo, runs Claude, pushes results + return machine.WaitForCompletion() +} +``` + +## Recommendation + +**Hybrid approach** combining multiple layers: + +### Phase 1: Enhance Current Setup (Low effort, immediate value) +1. **Enable Claude Code sandboxing** on our existing Hetzner setup +2. **Configure allowed domains** in `.claude/settings.json` per project +3. **Use `--dangerously-skip-permissions`** since sandbox provides protection + +```go +// internal/executor/executor.go - add sandbox config to worktree setup +func (e *Executor) setupWorktreeSandboxConfig(worktreePath string) error { + sandboxConfig := map[string]interface{}{ + "sandbox": map[string]interface{}{ + "permissions": map[string]interface{}{ + "fs": map[string]interface{}{ + "write": map[string][]string{ + "allow": []string{"$CWD/**", "/tmp/**"}, + }, + }, + "network": map[string]interface{}{ + "allowedDomains": []string{ + "github.com", + "api.github.com", + "api.anthropic.com", + "registry.npmjs.org", + // Add project-specific domains + }, + }, + }, + }, + } + // Write to .claude/settings.json in worktree + return writeJSON(filepath.Join(worktreePath, ".claude", "settings.json"), sandboxConfig) +} +``` + +### Phase 2: Devcontainers for Full Isolation (Medium effort) +1. **Add project-level `.devcontainer/`** configs +2. **Run tasks in ephemeral containers** instead of worktrees +3. **Custom firewall rules** per project type +4. **VS Code integration** for developers who want GUI + +### Phase 3: Fly.io for Scale (Higher effort, future) +1. **Task-per-machine model** for ultimate isolation +2. **Auto-scaling** based on queue depth +3. **Geographic distribution** for low latency +4. **Pay-per-use** economics at scale + +## Comparison Matrix + +| Feature | Native Sandbox | Devcontainer | Hetzner VPS | Fly.io | +|---------|----------------|--------------|-------------|--------| +| Setup complexity | ⭐ Low | ⭐⭐ Medium | ⭐⭐ Medium | ⭐⭐⭐ High | +| Task isolation | ⭐ Process | ⭐⭐⭐ Container | ⭐ Process | ⭐⭐⭐ VM | +| Startup time | ⭐⭐⭐ Instant | ⭐⭐ ~5s | ⭐⭐⭐ Instant | ⭐⭐ ~300ms | +| Cost at scale | ⭐⭐⭐ Free | ⭐⭐ Docker overhead | ⭐⭐ Fixed monthly | ⭐⭐⭐ Pay-per-use | +| Idle cost | N/A | N/A | ⭐ ~$5-20/mo | ⭐⭐⭐ $0 | +| Skip permissions | ✅ Yes | ✅ Yes | ⚠️ Risky | ✅ Yes | +| Already implemented | ⚠️ Partial | ❌ No | ✅ Yes | ❌ No | + +## Implementation Priority + +1. **Immediate**: Enable `sandbox` settings in executor.go for worktrees +2. **Short-term**: Add `--dangerously-skip-permissions` flag (already exists) +3. **Medium-term**: Create reference devcontainer for our task-worker +4. **Long-term**: Evaluate Fly.io if scaling beyond single server + +## Security Considerations + +With any approach, these remain concerns: + +1. **Credential exfiltration** - Claude has access to API keys within its environment +2. **Allowed domains** - GitHub.com access means attacker could push to repos +3. **Prompt injection** - Malicious code in repo could manipulate Claude +4. **Resource exhaustion** - Tasks could consume excessive CPU/memory + +Mitigations: +- Use read-only API tokens where possible +- Consider separate Claude API keys per project +- Review Claude's actions in task logs +- Set resource limits (already have suspension after idle) + +## Conclusion + +The best path forward combines **Claude Code's native sandboxing** (Phase 1) with our existing Hetzner infrastructure. This gives us: + +- Immediate security improvements with minimal changes +- Ability to safely use `--dangerously-skip-permissions` +- Foundation for devcontainer/Fly.io expansion later + +The native sandbox addresses most security concerns while keeping our current architecture intact. Devcontainers and Fly.io provide upgrade paths when we need stronger isolation or better scaling. diff --git a/docs/SANDBOX_RECOMMENDATIONS.md b/docs/SANDBOX_RECOMMENDATIONS.md new file mode 100644 index 00000000..127e94d5 --- /dev/null +++ b/docs/SANDBOX_RECOMMENDATIONS.md @@ -0,0 +1,227 @@ +# Claude Code Sandboxing Implementation for Task TUI + +## Executive Summary + +After reviewing Claude Code's native sandboxing and devcontainer features, **the best approach is to implement Claude Code's native sandboxing immediately** as Phase 1, with devcontainers and Fly.io as optional future enhancements. + +The existing REMOTE_EXECUTION_ANALYSIS.md document provides an excellent foundation. This document adds implementation details and final recommendations based on the actual Claude Code capabilities. + +## Key Findings + +### Claude Code Native Sandboxing (Ready to Use Now) + +Claude Code already has OS-level sandboxing built-in using: +- **Linux**: bubblewrap (namespace-based isolation) +- **macOS**: Seatbelt sandbox + +**How it works:** +1. Filesystem restrictions - R/W only to working directory, read elsewhere +2. Network restrictions - only approved domains accessible +3. Process isolation - all subprocesses inherit restrictions +4. Auto-allow mode - commands run without permission prompts if within sandbox + +**Critical insight**: Your executor already runs Claude Code (executor.go:1082), so you get sandboxing **for free** by simply configuring it via `.claude/settings.json` files. + +### What This Means for Your Task TUI + +Your current architecture at executor.go:1029-1109 runs Claude like this: +```go +script := fmt.Sprintf(`TASK_ID=%d TASK_SESSION_ID=%s claude %s--chrome "$(cat %q)"`, + taskID, sessionID, dangerousFlag, promptFile.Name()) +``` + +The sandboxing is already happening! You just need to configure it properly. + +## Implementation Recommendation + +### Phase 1: Enable Sandboxing (Immediate, Low Effort) + +**Goal**: Add sandbox configuration to each task's worktree. + +**Changes needed**: + +1. **Modify setupWorktree() in executor.go** to create sandbox config: + +```go +// After line 2005, add: +if err := e.setupSandboxConfig(worktreePath, task.Project); err != nil { + e.logger.Warn("could not setup sandbox config", "error", err) +} +``` + +2. **Add new method to Executor**: + +```go +// setupSandboxConfig creates a .claude/settings.json with sandbox configuration +func (e *Executor) setupSandboxConfig(worktreePath, project string) error { + claudeDir := filepath.Join(worktreePath, ".claude") + if err := os.MkdirAll(claudeDir, 0755); err != nil { + return fmt.Errorf("create .claude dir: %w", err) + } + + settingsPath := filepath.Join(claudeDir, "settings.json") + + // Get project-specific allowed domains + allowedDomains := e.getProjectAllowedDomains(project) + + sandboxConfig := map[string]interface{}{ + "sandbox": map[string]interface{}{ + "permissions": map[string]interface{}{ + "fs": map[string]interface{}{ + "write": map[string][]string{ + "allow": []string{"$CWD/**", "/tmp/**"}, + }, + }, + "network": map[string]interface{}{ + "allowedDomains": allowedDomains, + }, + }, + }, + } + + data, err := json.MarshalIndent(sandboxConfig, "", " ") + if err != nil { + return err + } + + return os.WriteFile(settingsPath, data, 0644) +} + +// getProjectAllowedDomains returns network domains allowed for a project +func (e *Executor) getProjectAllowedDomains(project string) []string { + // Base domains needed for Claude to function + base := []string{ + "api.anthropic.com", + "api.github.com", + "github.com", + "registry.npmjs.org", + "pypi.org", + } + + // Check if project has custom allowed domains + proj, err := e.db.GetProjectByName(project) + if err == nil && proj != nil { + // You could add a new field to projects: allowed_domains + // For now, return base + common dev tools + } + + return base +} +``` + +3. **Update TASK_DANGEROUS_MODE usage**: + +Your current code at executor.go:1078-1081 only uses `--dangerously-skip-permissions` when TASK_DANGEROUS_MODE=1. With sandboxing configured, you can safely enable this by default: + +```go +// Instead of checking TASK_DANGEROUS_MODE, enable by default when sandbox is configured +dangerousFlag := "--dangerously-skip-permissions " +``` + +**Benefits of this approach**: +- ✅ Zero infrastructure changes needed +- ✅ Works on your existing Hetzner VPS immediately +- ✅ Enables automatic command execution (no permission prompts) +- ✅ Protects against filesystem and network abuse +- ✅ Each task gets isolated permissions via worktrees +- ✅ ~50 lines of code, can be done in 1 hour + +**Security gains**: +- Tasks can't modify files outside their worktree +- Tasks can't connect to arbitrary servers +- Malicious code/dependencies are contained +- Prompt injection attacks are mitigated + +### Phase 2: Devcontainers (Medium Term, If Needed) + +**When to consider**: If you need: +- Stronger isolation between concurrent tasks +- Per-task resource limits +- Reproducible environments across machines +- Team collaboration features + +**Implementation**: Replace git worktrees with Docker containers in executor.go:executeTask() + +**Effort**: ~2-3 days of development + testing + +### Phase 3: Fly.io Machines (Future, If Scaling) + +**When to consider**: If you: +- Run >10 concurrent tasks regularly +- Need geographic distribution +- Want to stop paying for idle VPS +- Need true VM-level isolation + +**Effort**: ~1-2 weeks of development + migration + +## Comparison Matrix Update + +| Feature | Native Sandbox (Phase 1) | Devcontainer (Phase 2) | Fly.io (Phase 3) | +|---------|-------------------------|------------------------|------------------| +| Implementation time | 1 hour | 2-3 days | 1-2 weeks | +| Works with current code | ✅ Minimal changes | ⚠️ Moderate changes | ❌ Major refactor | +| Filesystem isolation | ⭐⭐ Worktree-level | ⭐⭐⭐ Container-level | ⭐⭐⭐ VM-level | +| Network isolation | ✅ Domain allowlist | ✅ iptables firewall | ✅ VPC isolation | +| Auto-execute (skip perms) | ✅ Yes | ✅ Yes | ✅ Yes | +| Cost at scale | ⭐⭐⭐ Free | ⭐⭐ Docker overhead | ⭐⭐⭐ Pay-per-use | +| Idle cost | N/A | N/A | ⭐⭐⭐ $0 | + +## Recommended Action Plan + +### This Week +1. ✅ Read documentation (done) +2. Add `setupSandboxConfig()` method to executor.go +3. Call it from `setupWorktree()` after line 2005 +4. Test with a sample task +5. Deploy to Hetzner VPS + +### This Month +1. Monitor sandbox violations in logs +2. Tune allowed domains per project type +3. Add project-level `allowed_domains` field to database +4. Document for team usage + +### Future (If Needed) +1. Evaluate devcontainers if task isolation becomes an issue +2. Consider Fly.io if costs or scaling become concerns + +## Code Changes Summary + +**Files to modify**: +- `internal/executor/executor.go` - Add sandbox configuration (~50 lines) +- `internal/db/projects.go` - Add `allowed_domains` field (optional, ~10 lines) + +**No changes needed**: +- SSH server, TUI, task lifecycle +- Worktree management +- Claude hooks system +- Database schema (except optional domains field) + +## Security Considerations + +Even with sandboxing, these risks remain: + +1. **Credential exfiltration** - Claude has access to API keys in its environment + - Mitigation: Use read-only tokens where possible + +2. **Allowed domain bypass** - GitHub access means attacker could push to repos + - Mitigation: Use separate git credentials per project + +3. **Prompt injection** - Malicious code in repo could manipulate Claude + - Mitigation: Review Claude's actions, use hooks for suspicious activity + +4. **Resource exhaustion** - Tasks could consume excessive CPU/memory + - Mitigation: Current suspension system already handles this + +## Conclusion + +**The path forward is clear**: Implement Claude Code's native sandboxing (Phase 1) immediately. It's: +- Already built into the tool you're using +- Requires minimal code changes (~50 lines) +- Works with your current architecture +- Provides substantial security improvements +- Enables `--dangerously-skip-permissions` safely + +Devcontainers and Fly.io remain excellent options for future enhancement, but aren't necessary to get significant security and UX benefits right now. + +The existing REMOTE_EXECUTION_ANALYSIS.md document correctly identified this as the best approach. This document confirms that assessment and provides concrete implementation details.