tgoop.com/max_academy/28355
Last Update:
• Scrape - pull out page content in Markdown, Json, HTML or screenshot
• CRAWL - go through all links on the page and collect their contents
• MAP - scan the site and issue a list of all URLs
• Search - find on the Internet and return the contents of the pages found
• Extract - get structured data from one or thousands of pages
• He fights with a bottlenecks
• knows how to click, scroll, wait, log in
• Parses PDF, DOCX, images
• You can configure: what tags to exclude how deeply climb, which headlines to transmit
• Now you can feed thousands of links at once - it will process them asynchronously
They recently got a new thing - Firestarter.
Platform for assembling bots on their data (see video)
You can scrap the site, train the bot and work on your sources
DEM: TOOLS.firecraftl.dev/firestarter
Github: github.com/mendableai/firestarter
🆔 @Max_Academy