Unify HTML, CSS, and JS into a single, element-centric JSON structure.
This library is designed for developers who need to extract data from HTML while preserving its visual and functional context. Unlike traditional parsers, @osmn-byhn/htmlparser inlines styles from <style> tags and resolves JavaScript event handlers (like onclick) into their actual function bodies.
- Deep Extraction: Don't just get the HTML; get the "computed" feel of it. Styles that live in the
<head>are automatically mapped to the elements they target in the<body>. - Function Intelligence: If an element has an
onclick="doSomething()", this library searches the<script>tags, findsdoSomething, and includes its full source code in the JSON entry for that element. - AI Friendly: The unified, self-contained JSON output is perfect for feeding into LLMs (Large Language Models) for UI analysis, code generation, or automated testing.
- Zero Heavy Dependencies: Built with performance and simplicity in mind.
npm install @osmn-byhn/htmlparser
# or
pnpm add @osmn-byhn/htmlparser
# or
yarn add @osmn-byhn/htmlparserimport { extractUnifiedFromHTML } from "@osmn-byhn/htmlparser";
const html = `
<html>
<head>
<style>.btn { color: red; }</style>
</head>
<body>
<button class="btn" onclick="sayHi()">Click Me</button>
<script>function sayHi() { console.log('Hi!'); }</script>
</body>
</html>
`;
async function main() {
const result = await extractUnifiedFromHTML(html);
const button = result.body.children[0];
console.log(button.inlineStyle); // { color: 'red' }
console.log(button.events.click.function); // "function sayHi() { ... }"
}
main();import { extractUnifiedFromHTML } from "@osmn-byhn/htmlparser";
const result = await extractUnifiedFromHTML('<div>Hello</div>');
console.log(result.body);const { extractUnifiedFromHTML } = require("@osmn-byhn/htmlparser");
extractUnifiedFromHTML('<div>Hello</div>').then(result => {
console.log(result.body);
});The output is a UnifiedExtraction object:
| Field | Description |
|---|---|
metadata |
Stats: totalElements, maxDepth, totalTextNodes, etc. |
body |
The root UnifiedElement (usually the <body> tag). |
Every element in the tree has this structure:
{
"tag": "div",
"id": "main-container",
"class": "active primitive",
"attrs": { "data-custom": "value" },
"inlineStyle": {
"color": "red",
"font-size": "16px"
},
"events": {
"click": {
"handler": "myFunc()",
"function": "function myFunc() { ... }"
}
},
"children": [ ... ],
"textContent": "Hello World"
}- Web Scraping: Extract data from modern web pages while keeping the styling info associated with the data points.
- LLM / AI Processing: Convert messy HTML into a structured JSON format that AI can easily understand and reason about.
- UI-to-Code: Build tools that convert existing websites into React/Vue/Tailwind components by having all styles and logic per-element.
- Automated Audits: Programmatically check if elements have specific styles or correctly mapped event handlers.
MIT © osmn-byhn