Extracts text from a HTML url

The task allows to download a website and extract text and title

Input

The task allows to download a website and extract text and title. If multiple urls are provided only the first 20 urls are used.

[1.0.0]:
First release
VersionAI ModelCreatedLink
1.0.0intern_translator28.03.2023

API

The REST API allows you to call the tool with the same costs as when running the tool. Please generate an Personal access token before using the REST API.

Parameters

  • url (URL): The HTML url(s) to extract the text from
  • user_agent (User Agent): The user agent that is shown to the user
  • max_content_length (Max Content Length): The maximum content length that is returned
  • Call the REST API by cURL
    curl -v -H "Authorization: Bearer PERSONAL_ACCESS_TOKEN" https://api.anysolve.ai/rest/v1/intern-web-extract-text-from-url/1.0.0?url=https%3A%2F%2Fde.wikipedia.org%2Fwiki%2FJayne_Mansfield&user_agent=Mozilla%2F5.0%20(Windows%20NT%2010.0%3B%20Win64%3B%20x64%3B%20rv%3A112.0)%20Gecko%2F20100101%20Firefox%2F112.0&max_content_length=4000
  • Install the package with pip
    python3 -m pip install anysolve
  • Run in python3
    import os
    from anysolve import AnySolve
    anysolve_token = os.environ.get('ANYSOLVE_PERSONAL_ACCESS_TOKEN') # Resolve your personal access token here
    client = AnySolve(anysolve_token)
    res = client.run('intern-web-extract-text-from-url','1.0.0', {'url': 'https://de.wikipedia.org/wiki/Jayne_Mansfield', 'user_agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:112.0) Gecko/20100101 Firefox/112.0', 'max_content_length': '4000'})
    print(res)
  • Coming soon: Within AnySolve ChatComplete prompts you can use the following command to execute the task:
    /run('intern-web-extract-text-from-url','1.0.0', url='https://de.wikipedia.org/wiki/Jayne_Mansfield', user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:112.0) Gecko/20100101 Firefox/112.0', max_content_length='4000')