Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calling api with schema isn't working as intended #1134

Closed
Ala2hhh opened this issue Jan 21, 2025 · 2 comments
Closed

calling api with schema isn't working as intended #1134

Ala2hhh opened this issue Jan 21, 2025 · 2 comments

Comments

@Ala2hhh
Copy link

Ala2hhh commented Jan 21, 2025

using the Extract by JSON Schema example provided in the paper, and also can be found in colab

is not giving a response with the provided schema and the data of the output is in MD rather than json

here is the curl command:

curl -v -X POST https://r.jina.ai/ \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -H "X-No-Cache: true" \
  -d @- <<EOFEOF
  {
    "url": "https://news.ycombinator.com/",
    "jsonSchema": {
      "type": "object",
      "properties": {
        "title": {"type": "string", "description": "News thread title"},
        "url": {"type": "string", "description": "Thread URL"},
        "summary": {"type": "string", "description": "Article summary"},
        "keywords": {"type": "list", "description": "Descriptive keywords"},
        "author": {"type": "string", "description": "Thread author"},
        "comments": {"type": "integer", "description": "Comment count"}
      },
    "required": ["title", "url", "date", "points", "author", "comments"]
    }
  }
EOFEOF

Am I calling the API incorrectly?

@nomagick
Copy link
Member

Hi @Ala2hhh .
You are missing the "respondWith" or "returnFormat" or "engine" parameter.
Specify any of them to "readerlm-v2".
Also, the instruction is very important for the model to behave.

Try this:

curl -v -X POST https://r.jina.ai/ \
  -H "Accept: text/event-stream" \
  -H "Content-Type: application/json" \
  -H "X-No-Cache: true" \
  -d @- <<EOFEOF
  {
    "url": "https://news.ycombinator.com/",
    "instruction": "Extract the specified information from a list of news threads and present it in a structured JSON format.",
    "jsonSchema": {
      "type": "object",
      "properties": {
        "title": {"type": "string", "description": "News thread title"},
        "url": {"type": "string", "description": "Thread URL"},
        "summary": {"type": "string", "description": "Article summary"},
        "keywords": {"type": "list", "description": "Descriptive keywords"},
        "author": {"type": "string", "description": "Thread author"},
        "comments": {"type": "integer", "description": "Comment count"}
      },
    "required": ["title", "url", "date", "points", "author", "comments"]
    },
    "respondWith": "readerlm-v2"
  }
EOFEOF

Speaking of this feature, sometimes I find the model crashes by repeating nonsense words or does not yield enough content, in this case, the parameters of the generation, namely repetition_penalty, and potentially others, need to be further tweaked. However, we currently don't expose these parameters to the API.
So for JSON generation, it's better to run the model locally so you can tweak the parameters.

@Ala2hhh
Copy link
Author

Ala2hhh commented Jan 24, 2025

Thank you that was helpful

@Ala2hhh Ala2hhh closed this as completed Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants