SLM For Template Extraction With Ollama — The NuExtract model of SLMs that are fine-tuned for templated data extraction

You can use ollama to run this cool set of new local models tuned on Qwen and phi-3 that Simon Willison just noted with

ollama run sroecker/nuextract-tiny-v1.5

And verify it’s there with

ollama list

And a refresher, this is how you call ollama locally — which you need to set temperature param

curl -X POST http://localhost:11434/api/generate \
-H "Content-Type: application/json" \
-d '{
  "model": "llama3",
  "prompt": "Why is the sky blue?",
  "max_tokens": 100,
  "temperature": 0.7,
  "stream": false
}'

And for this model you’ll want to set temperature to 0.0 and put this in a file template.txt

<|input|>
### Template:
{
    "Model": {
        "Name": "",
        "Number of parameters": "",
        "Number of max token": "",
        "Architecture": []
    },
    "Usage": {
        "Use case": [],
        "Licence": ""
    }
}

### Text:
We introduce Mistral 7B, a 7–billion-parameter language model engineered for
superior performance and efficiency. Mistral 7B outperforms the best open 13B
model (Llama 2) across all evaluated benchmarks, and the best released 34B
model (Llama 1) in reasoning, mathematics, and code generation. Our model
leverages grouped-query attention (GQA) for faster inference, coupled with sliding
window attention (SWA) to effectively handle sequences of arbitrary length with a
reduced inference cost. We also provide a model fine-tuned to follow instructions,
Mistral 7B – Instruct, that surpasses Llama 2 13B – chat model both on human and
automated benchmarks. Our models are released under the Apache 2.0 license.
Code: <https://github.com/mistralai/mistral-src\>
Webpage: <https://mistral.ai/news/announcing-mistral-7b/\>

<|output|>

And when you run it, do this

curl -s -X POST http://localhost:11434/api/generate \
-H "Content-Type: application/json" \
-d "$(jq -n --arg prompt "$(cat template.txt)" --arg model "sroecker/nuextract-tiny-v1.5" --argjson max_tokens 100 --argjson temperature 0.0 --argjson stream false '{
    model: $model,
    prompt: $prompt,
    max_tokens: $max_tokens,
    temperature: $temperature,
    stream: $stream
}')"

The output you get is:

### Template:{"Model":{"Name":"","Number of parameters":"","Number of max token":"","Architecture":[]},"Usage":{"Use case":[],"Licence":""}}### Text:Mistral 7B, a 7\textquotesingle billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms the best open 13B model (Llama 2) across all evaluated benchmarks, and the best released 34B model (Llama 1) in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B \textquotesingle Instruct, that surpasses Llama 2 13B \textquotesingle chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license. Code: <https://github.com/mistralai/mistral-src> Webpage: <https://mistral.ai/news/announcing-mistral-7b/>

Just looking at the Template’s response you need to do:

curl -s -X POST http://localhost:11434/api/generate \
-H "Content-Type: application/json" \
-d "$(jq -n --arg prompt "$(cat template.txt)" --arg model "sroecker/nuextract-tiny-v1.5" --argjson max_tokens 100 --argjson temperature 0.0 --argjson stream false '{
    model: $model,
    prompt: $prompt,
    max_tokens: $max_tokens,
    temperature: $temperature,
    stream: $stream
}')" | jq -r '.response' | jq

Which gives you this

{
  "Model": {
    "Name": "Mistral 7B",
    "Number of parameters": "7–billion",
    "Number of max token": "",
    "Architecture": [
      "grouped-query attention (GQA)",
      "sliding window attention (SWA)"
    ]
  },
  "Usage": {
    "Use case": [],
    "Licence": "Apache 2.0"
  }
}