Skip to content

Releases: ollama/ollama

v0.3.9

31 Aug 19:26
a1cef4d
Compare
Choose a tag to compare

What's Changed

  • Fixed error that would occur when running Ollama on Linux machines with the ARM architecture
  • Ollama will now show an improved error message when attempting to run unsupported models
  • Fixed issue where Ollama would not auto-detect the chat template for Llama 3.1 models
  • OLLAMA_HOST will now work with with URLs that contain paths

New Contributors

Full Changelog: v0.3.8...v0.3.9

v0.3.8

28 Aug 01:09
93ea924
Compare
Choose a tag to compare

What's Changed

  • Fixed error where the ollama CLI couldn't be found on the path when upgrading Ollama on Windows

New Contributors

Full Changelog: v0.3.7...v0.3.8

v0.3.7

20 Aug 17:45
0f92b19
Compare
Choose a tag to compare

New Models

  • Hermes 3: Hermes 3 is the latest version of the flagship Hermes series of LLMs by Nous Research, which includes support for tool calling.
  • Phi 3.5: A lightweight AI model with 3.8 billion parameters with performance overtaking similarly and larger sized models.
  • SmolLM: A family of small models with 135M, 360M, and 1.7B parameters, trained on a new high-quality dataset.

What's Changed

  • CUDA 12 support: improving performance by up to 10% on newer NVIDIA GPUs
  • Improved performance of ollama pull and ollama push on slower connections
  • Fixed issue where setting OLLAMA_NUM_PARALLEL would cause models to be reloaded on lower VRAM systems
  • Ollama on Linux is now distributed as a tar.gz file, which contains the ollama binary along with required libraries.

New Contributors

Full Changelog: v0.3.6...v0.3.7

v0.3.6

13 Aug 20:27
4c4fe3f
Compare
Choose a tag to compare

What's Changed

  • Fixed issue where /api/embed would return an error instead of loading the model when the input field was not provided.
  • ollama create can now import Phi-3 models from Safetensors
  • Added progress information to ollama create when importing GGUF files
  • Ollama will now import GGUF files faster by minimizing file copies

Full Changelog: v0.3.5...v0.3.6

v0.3.5

11 Aug 20:50
15c2d8f
Compare
Choose a tag to compare

What's Changed

  • Fixed Incorrect function error when downloading models on Windows
  • Fixed issue where temporary files would not be cleaned up
  • Fix rare error when Ollama would start up due to invalid model data
  • Ollama will now provide an error instead of crashing on Windows when running models that are too large to fit into total memory

New Contributors

Full Changelog: v0.3.4...v0.3.5

v0.3.4

06 Aug 16:28
de4fc29
Compare
Choose a tag to compare
Screenshot 2024-08-06 at 8 16 44 PM

New embedding models

  • BGE-M3: a large embedding model from BAAI distinguished for its versatility in Multi-Functionality, Multi-Linguality, and Multi-Granularity.
  • BGE-Large: a large embedding model trained in english.
  • Paraphrase-Multilingual: A multilingual embedding model trained on parallel data for 50 languages.

New embedding API with batch support

Ollama now supports a new API endpoint /api/embed for embedding generation:

curl http://localhost:11434/api/embed -d '{
  "model": "all-minilm",
  "input": ["Why is the sky blue?", "Why is the grass green?"]
}'

This API endpoint supports new features:

  • Batches: generate embeddings for several documents in one request
  • Normalized embeddings: embeddings are now normalized, improving similarity results
  • Truncation: a new truncate parameter that will error if set to false
  • Metrics: responses include load_duration, total_duration and prompt_eval_count metrics

See the API documentation for more details and examples.

What's Changed

  • Fixed initial slow download speeds on Windows
  • NUMA support will now be autodetected by Ollama to improve performance
  • Fixed issue where the /api/embed would sometimes return embedding results out of order

New Contributors

Full Changelog: v0.3.3...v0.3.4

v0.3.3

02 Aug 16:37
ce1fb44
Compare
Choose a tag to compare

What's Changed

  • The /api/embed endpoint now returns statistics: total_duration, load_duration, and prompt_eval_count
  • Added usage metrics to the /v1/embeddings OpenAI compatibility API
  • Fixed issue where /api/generate would respond with an empty string if provided a context
  • Fixed issue where /api/generate would return an incorrect value for context
  • /show modefile will now render MESSAGE commands correctly

New Contributors

Full Changelog: v0.3.2...v0.3.3

v0.3.2

01 Aug 01:14
4c14855
Compare
Choose a tag to compare

What's Changed

  • Fixed issue where ollama pull would not resume download progress
  • Fixed issue where phi3 would report an error on older versions

New Contributors

Full Changelog: v0.3.1...v0.3.2

v0.3.1

30 Jul 04:18
5d66578
Compare
Choose a tag to compare

Gemma 2

Google's Gemma 2 has a new 2B parameter small model!

ollama run gemma2:2b

New models

  • Gemma 2 2B: A new 2B parameter model by Google DeepMind

What's Changed

  • Added support for min_p sampling option
  • ollama create will now autodetect required stop parameters when importing certain models
  • Ollama on Windows will now show better error messages if required files are missing
  • Fixed issue where /save would cause parameters to be saved incorrectly
  • OpenAI-compatible API will now return a finish_reason of tool_calls if a tool call occured.
  • Performance and reliability improvements when downloading models using ollama pull
  • Ollama's Linux install script will now return a better error on unsupported CUDA versions

New Contributors

Full Changelog: v0.3.0...v0.3.1

v0.3.0

25 Jul 01:57
bbf8f10
Compare
Choose a tag to compare

an image of ollama selecting the right tool for the job, holding up a hammer to nail wooden boards - please support ollama! Let open source win

Tool support

Ollama now supports tool calling with popular models such as Llama 3.1. This enables a model to answer a given prompt using tool(s) it knows about, making it possible for models to perform more complex tasks or interact with the outside world.

Example tools include:

  • Functions and APIs
  • Web browsing
  • Code interpreter
  • much more!
352317275-aea4d7c1-f1be-41fd-9077-023d37a9d052.mov

To use tools, provide the tools field when using Ollama's Chat API:

import ollama

response = ollama.chat(
    model='llama3.1',
    messages=[{'role': 'user', 'content': 'What is the weather in Toronto?'}],

    # provide a weather checking tool to the model
    tools=[{
      'type': 'function',
      'function': {
        'name': 'get_current_weather',
        'description': 'Get the current weather for a city',
        'parameters': {
          'type': 'object',
          'properties': {
            'city': {
              'type': 'string',
              'description': 'The name of the city',
            },
          },
          'required': ['city'],
        },
      },
    },
  ],
)

print(response['message']['tool_calls'])

More information:

New models

  • Llama 3.1: a new state-of-the-art model from Meta available in 8B, 70B and 405B parameter sizes with support for tool calling.
  • Mistral Large 2: Mistral's new 123B flagship model that is significantly more capable in code generation, tool calling, mathematics, and reasoning with 128k context window and support for dozens of languages.
  • Firefunction v2: An open weights function calling model based on Llama 3, competitive with GPT-4o function calling capabilities.
  • Llama-3-Groq-Tool-Use: A series of models from Groq that represent a significant advancement in open-source AI capabilities for tool use/function calling.

What's Changed

  • Fixed duplicate error message when running ollama create

New Contributors

Full Changelog: v0.2.8...v0.3.0