Turning 1,870 JSONL Files into Automated Skill Updates: A Pipeline Story

When you're managing an AI platform, manual analysis of user interaction logs can eat up hours. I faced a challenge: I needed to sift through nearly 2,000 JSONL files to identify gaps in my existing skill set. Instead of repeating that grind, I built a repeatable pipeline that automates the process. Below, I answer key questions about how it works, the tradeoffs, and why this approach matters for anyone building AI tools.

What was the problem you faced?

I had a simple but expensive question: Scan all my Claude Code JSONL files and suggest new skills. The data lived across 45 project directories, totaling about 904 MB and containing 2,752 real user-typed prompts. To find skill gaps, I had to cross-reference these prompts against my existing 56 skills (9 user-defined and 47 plugin-based). Doing that manually was a deep-dive that took far too long to repeat. I needed a way to automate the entire workflow so I could run it on demand without burning hours.

Turning 1,870 JSONL Files into Automated Skill Updates: A Pipeline Story — Source: dev.to

How did you build the pipeline?

The pipeline sits at the core of my AI ecosystem, tightly integrated with Nexus and ARIA. I wrote scripts in Node.js to parse JSONL files and extract meaningful user interactions. Then I used Python for the ranking logic: each prompt was compared against existing skills; if it didn't match, I calculated its relevance and frequency. The system returned 12 candidate skills after ranking. The entire pipeline runs on my own server using local compute, keeping costs low. Outputs are written back as actionable configuration files.

What skills were created from the first run?

Out of 12 candidates, six shipped immediately. One standout was narrative-docs-update, which captures my policy of documentary-grade writing—it matched 147 hits across 30 projects. Another was whats-next, a skill that briefs me on session restarts (62+ hits). The other four targeted specific repetitive tasks like code review summaries and environment setup. Each skill is a reusable, user-level action that my AI platform can invoke, saving me from manually performing those same actions every session.

What were the tradeoffs and limitations?

On the first run, some prompts were misclassified as noise because of inconsistent formatting in the JSONL files. I had to manually tweak the filter logic at 2 AM to catch edge cases—not ideal, but it fixed the issue. Also, the pipeline is a batch process, not real-time. It assumes static data, meaning if new JSONL files appear before the next run, they won't be included until the pipeline is triggered again. I plan to address this in v2 by adding incremental processing and real-time event triggers.

Why should others automate their analysis?

This isn't just about skills—it's about turning manual grunt work into a system. If you're building AI tools, you've probably faced the same slog: repeating analysis that a script could do. Automating this process saved me hours per session, and it scales as your project count grows. Next, I'm wiring this pipeline into Nexus for continuous updates. What repetitive tasks are you automating in your builds? Let me know—I'm always hunting for the next pipeline opportunity.

Tags:

Turning 1,870 JSONL Files into Automated Skill Updates: A Pipeline Story

What was the problem you faced?

How did you build the pipeline?

What skills were created from the first run?

What were the tradeoffs and limitations?

Why should others automate their analysis?

Related Articles

Recommended

Discover More