ChatLLM

An LLM (Large Language Model) is a next token generation model that generates the most probable next token given a set of tokens.

This is a rapidly developing research area and the definitions of its types might have some nuances. But, for a layman, you can consider that they come in three flavours:

Foundation - Trained on a massive corpus of text from scratch
Instruct - Foundation model, fine-tuned on instruction following (eg: text-davinci-003)
Chat - Instruct models, fine-tuned on chatting with humans (eg: gpt-3.5-turbo)

The Instruct and Chat based fine-tunes are the most useful for application development. And the classes associated with these in Embedia are LLM and ChatLLM respectively.

ChatLLM is an abstract class. Inherit from this class and define the _reply method. The _reply method should take in a string as the input prompt, send it to the large language model fine-tuned on chat and return the reply. To use ChatLLM, call the class instance like a function with the input prompt as the argument.

ℹ️

You can convert an LLM to a ChatLLM (Learn more about: Converting LLM to ChatLLM)

Attributes

tokenizer (Tokenizer, Optional): Used for counting no. of tokens in the prompt and response.
max_input_tokens (int, Optional): Used for checking if the sum of all the message contents in chat_history is less than max_input_tokens.
chat_history (List[Message]): Contains all the messages sent and received by the ChatLLM instance. It is automatically initialized with an empty list when the class instance is created. (Learn more about: Message schema)
llm (Optional[LLM]): Contains the LLM instance if the ChatLLM object was created using the from_llm classmethod. Otherwise, it is None.

Methods

_reply (abstract): Implement this method to generate the reply given a prompt. Do not call this method directly. Instead, use the __call__ method.

Input: Optional[str]
Output: str

ℹ️

The input prompt is kept optional in the _reply function because a Message object with the role attribute set to MessageRole.user and content attribute set to the input prompt will automatically be added to the chat_history when the __call__ method is called with the input prompt.

__call__ : Internally calls the _reply method. Use this method by calling the class instance like a function with the input text as the argument.

Input: str
Output: str
Adds a Message object with the role attribute set to MessageRole.user and content attribute set to the input prompt to the chat_history before calling the _reply method.
Counts the number of tokens in the system prompt, input prompt and the output reply if the tokenizer argument is passed to the constructor.
Checks if the sum of all the message contents in chat_history is less than max_input_tokens if the max_input_tokens argument is passed to the constructor.
Publishes an ChatLLMStart event before calling the _reply method.
The _reply method is called with the input prompt as the argument if it is defined to take it. Else, it is called without any arguments.
Publishes an ChatLLMEnd event after calling the _reply method.
Adds a Message object with the role attribute set to MessageRole.assistant and content attribute set to the output reply to the chat_history after calling the _reply method.

ℹ️

If the ChatLLM object is created using the from_llm classmethod, instead of calling the _reply method, it calls the _complete method of the LLM object with the following prompt structure:

  system: <system_prompt>
  user: <prompt_1>
  assistant: <reply_1>
  user: <prompt_2>
  assistant: <reply_2>
  <...>
  assistant:

set_system_prompt : This erases the chat_history and sets a Message object with the role set to MessageRole.system and content set to the provided system prompt as the first message in the chat_history.

Input: str
Output: None
Publishes an ChatLLMInit event after adding the system prompt to the chat history.

save_chat : Saves the chat_history in a pickle file.

Input: str (path to the pickle file)
Output: None

load_chat : Loads the chat_history from a pickle file.

Input: str (path to the pickle file)
Output: None

from_llm (classmethod): Converts an LLM instance into a ChatLLM instance.

Input: LLM (instance of your LLM subclass)
Output: ChatLLM (instance of ChatLLM)

Usage

Basic Usage

A ChatLLM might have a variety of usecases in your webapp. An example would be to add a chatbot with a certain personality like a physics teacher or a content writer.

You can connect any ChatLLM to Embedia. It might be an open-source model like Llama-2, Vicuna, Falcon, etc. or a paid api from OpenAI, Google, Anthropic, etc.

Make sure to connect the instruct based models to LLM and the chat based models to ChatLLM

Let's connect the OpenAI's gpt-3.5-turbo model to Embedia. Since OpenAI's ChatCompletion API takes in the entire chat history, we'll need to pass it the entire chat history instead of just the current prompt.

ℹ️

This functionality will be built differently if using the Google's Palm 2 model for example. At the time of writing, the google-generativeai library keeps a copy of the entire chat history, hence, we'll need to pass it only the current prompt.

import asyncio
import os
 
import openai
from embedia import ChatLLM
 
 
class OpenAIChatLLM(ChatLLM):
 
    def __init__(self):
        super().__init__()
        openai.api_key = os.environ['OPENAI_API_KEY']
 
    async def _reply(self, prompt):
        completion = await openai.ChatCompletion.acreate(
            model="gpt-3.5-turbo",
            messages=[{
                'role': msg.role,
                'content': msg.content
            } for msg in self.chat_history],
        )
        return completion.choices[0].message.content
 
 
if __name__ == '__main__':
    chatllm = OpenAIChatLLM()
    reply = asyncio.run(chatllm('What is the capital of France?'))

Running the above code will print the following output because there are two events published internally, namely: ChatLLMStart and ChatLLMEnd. (Learn more about: Publish-Subscribe Event System)

[time: 2023-09-24T06:31:26.402688+00:00] [id: 140338366051664] [event: ChatLLM Start]
user (None tokens):
What is the capital of France?
 
[time: 2023-09-24T06:31:27.507751+00:00] [id: 140338366051664] [event: ChatLLM End]
assistant (None tokens):
The capital of France is Paris.

Saving and loading `chat_history`

You can save and load the chat_history variable in a pickle file by using the save_chat and load_chat methods respectively.

import asyncio
import os
 
import openai
from embedia import ChatLLM
 
 
class OpenAIChatLLM(ChatLLM):
 
    def __init__(self):
        super().__init__()
        openai.api_key = os.environ['OPENAI_API_KEY']
 
    async def _reply(self, prompt):
        completion = await openai.ChatCompletion.acreate(
            model="gpt-3.5-turbo",
            messages=[{
                'role': msg.role,
                'content': msg.content
            } for msg in self.chat_history],
        )
        return completion.choices[0].message.content
 
 
if __name__ == '__main__':
    chatllm = OpenAIChatLLM()
    reply = asyncio.run(chatllm('What is the capital of France?'))
    reply = asyncio.run(chatllm('What is the capital of Italy?'))
    asyncio.run(chatllm.save_chat('openai_chatllm.pkl'))
    asyncio.run(chatllm.load_chat('openai_chatllm.pkl'))
    assert os.path.exists('openai_chatllm.pkl')
    print(chatllm.chat_history)

Running the above code will print the following output:

[time: 2023-09-24T06:36:25.321676+00:00] [id: 139750960313680] [event: ChatLLM Start]
user (None tokens):
What is the capital of France?
 
[time: 2023-09-24T06:36:26.326593+00:00] [id: 139750960313680] [event: ChatLLM End]
assistant (None tokens):
The capital of France is Paris.
 
[time: 2023-09-24T06:36:26.329971+00:00] [id: 139750960313680] [event: ChatLLM Start]
user (None tokens):
What is the capital of Italy?
 
[time: 2023-09-24T06:36:27.316278+00:00] [id: 139750960313680] [event: ChatLLM End]
assistant (None tokens):
The capital of Italy is Rome.
[Message(role=<MessageRole.user: 'user'>, content='What is the capital of France?', id='42a95b2e-89b4-4638-ac79-60633604e2a2', created_at='2023-09-24 06:36:25.321613+00:00'), Message(role=<MessageRole.assistant: 'assistant'>, content='The capital of France is Paris.', id='6a756ef5-3a1a-4d77-8126-3c31f468feba', created_at='2023-09-24 06:36:26.326527+00:00'), Message(role=<MessageRole.user: 'user'>, content='What is the capital of Italy?', id='1d1ea9fc-bac2-40b2-a7ef-580b0c4ce771', created_at='2023-09-24 06:36:26.329854+00:00'), Message(role=<MessageRole.assistant: 'assistant'>, content='The capital of Italy is Rome.', id='ea33491b-1471-4a2d-bf5f-531cf953bef2', created_at='2023-09-24 06:36:27.316237+00:00')]

Adding a system prompt

You can set the system prompt for the ChatLLM subclass by using its set_system_prompt method. This erases the chat_history and sets the provided system prompt as the first message in the chat_history. There are a bunch of predefined system prompts available in the Persona class. (Learn more about: Persona prompts) You can also create your own system prompt.

ℹ️

Using a lower temperature when asking the LLM to write code gives more predictable results.

import asyncio
import os
 
import openai
from embedia import ChatLLM, Persona
 
 
class OpenAIChatLLM(ChatLLM):
 
    def __init__(self):
        super().__init__()
        openai.api_key = os.environ['OPENAI_API_KEY']
 
    async def _reply(self, prompt):
        completion = await openai.ChatCompletion.acreate(
            model="gpt-3.5-turbo",
            temperature=0.1,
            messages=[{
                'role': msg.role,
                'content': msg.content
            } for msg in self.chat_history],
        )
        return completion.choices[0].message.content
 
 
if __name__ == '__main__':
    chatllm = OpenAIChatLLM()
    asyncio.run(
        chatllm.set_system_prompt(
            Persona.CodingLanguageExpert.format(language='Python')))
    reply = asyncio.run(
        chatllm('Count the number of python code lines in the current folder'))

Running the above code will print the following output:

[time: 2023-09-24T06:53:26.213236+00:00] [id: 140328954393936] [event: ChatLLM Init]
system (None tokens):
You are an expert in writing Python code. Only use Python default libraries. Reply only with the code and nothing else
 
[time: 2023-09-24T06:53:26.215463+00:00] [id: 140328954393936] [event: ChatLLM Start]
user (None tokens):
Count the number of python code lines in the current folder
 
[time: 2023-09-24T06:53:29.265166+00:00] [id: 140328954393936] [event: ChatLLM End]
assistant (None tokens):
import os
 
count = 0
 
for root, dirs, files in os.walk('.'):
    for file in files:
        if file.endswith('.py'):
            with open(os.path.join(root, file), 'r') as f:
                count += sum(1 for line in f if line.strip())
 
print(count)

Adding the optional `Tokenizer`

Notice that the number of tokens is None in the above-printed log. This is because the ChatLLM class doesn't have the optional tokenizer parameter in the constructor. Let's link a Tokenizer in the next example (Learn more about: Tokenizer)

ℹ️

Note that the way your tokenizer counts the number of tokens might slightly vary from how a service provider (eg: OpenAI) counts them. They might add a few tokens internally for the service to function properly.

import asyncio
import os
 
import openai
import tiktoken
from embedia import ChatLLM, Persona, Tokenizer
 
 
class OpenAITokenizer(Tokenizer):
 
    def __init__(self):
        super().__init__()
 
    async def _tokenize(self, text):
        return tiktoken.encoding_for_model("gpt-3.5-turbo").encode(text)
 
 
class OpenAIChatLLM(ChatLLM):
 
    def __init__(self):
        super().__init__(tokenizer=OpenAITokenizer())
        openai.api_key = os.environ['OPENAI_API_KEY']
 
    async def _reply(self, prompt):
        completion = await openai.ChatCompletion.acreate(
            model="gpt-3.5-turbo",
            temperature=0.1,
            messages=[{
                'role': msg.role,
                'content': msg.content
            } for msg in self.chat_history],
        )
        return completion.choices[0].message.content
 
 
if __name__ == '__main__':
    chatllm = OpenAIChatLLM()
    asyncio.run(
        chatllm.set_system_prompt(
            Persona.CodingLanguageExpert.format(language='Python')))
    reply = asyncio.run(
        chatllm('Count the number of python code lines in the current folder'))

Running the above code will print the following output:

[time: 2023-09-24T07:04:30.005131+00:00] [id: 139954984253664] [event: ChatLLM Init]
system (23 tokens):
You are an expert in writing Python code. Only use Python default libraries. Reply only with the code and nothing else
 
[time: 2023-09-24T07:04:30.016053+00:00] [id: 139954984253664] [event: ChatLLM Start]
user (11 tokens):
Count the number of python code lines in the current folder
 
[time: 2023-09-24T07:04:33.392050+00:00] [id: 139954984253664] [event: ChatLLM End]
assistant (65 tokens):
import os
 
count = 0
 
for root, dirs, files in os.walk('.'):
    for file in files:
        if file.endswith('.py'):
            with open(os.path.join(root, file), 'r') as f:
                count += sum(1 for line in f if line.strip())
 
print(count)

Adding the optional `max_input_tokens` parameter

There's also another optional parameter in the ChatLLM constructor called max_input_tokens. If the sum of all the message contents in chat_history is greater than max_input_tokens, the class will raise a ValueError.

ℹ️

Note that max_input_tokens will not have any effect if the tokenizer argument is not passed to the ChatLLM constructor.

import asyncio
import os
 
import openai
import tiktoken
from embedia import ChatLLM, Persona, Tokenizer
 
 
class OpenAITokenizer(Tokenizer):
 
    def __init__(self):
        super().__init__()
 
    async def _tokenize(self, text):
        return tiktoken.encoding_for_model("gpt-3.5-turbo").encode(text)
 
 
class OpenAIChatLLM(ChatLLM):
 
    def __init__(self):
        super().__init__(tokenizer=OpenAITokenizer(), max_input_tokens=1)
        openai.api_key = os.environ['OPENAI_API_KEY']
 
    async def _reply(self, prompt):
        completion = await openai.ChatCompletion.acreate(
            model="gpt-3.5-turbo",
            temperature=0.1,
            messages=[{
                'role': msg.role,
                'content': msg.content
            } for msg in self.chat_history],
        )
        return completion.choices[0].message.content
 
 
if __name__ == '__main__':
    chatllm = OpenAIChatLLM()
    asyncio.run(
        chatllm.set_system_prompt(
            Persona.CodingLanguageExpert.format(language='Python')))
    reply = asyncio.run(
        chatllm('Count the number of python code lines in the current folder'))

There will be two messages set in the chat_history when the __call__ method. One with the system prompt (23 tokens) and the other with the user prompt (11 tokens). The sum of these two is 34. Hence the above code will thow the following error:

ValueError: Length of input text: 34 token(s) is longer than max_input_tokens: 1

Converting LLM to ChatLLM

You can convert an instance of an LLM subclass into an instance of a ChatLLM subclass using the from_llm classmethod present in the ChatLLM class. Once you've converted the LLM instance, you can use it exactly like a ChatLLM instance.

This is very useful since a lot of LLM service providers (and even open-source models) only provide a next-token generation interface and not a chat interface.

The tokenizer and max_input_tokens parameters behave the same way as they would if it were an LLM. Setting the system prompt is also supported for these kinds of instances.

import asyncio
import os
 
import openai
import tiktoken
from embedia import LLM, ChatLLM, Persona, Tokenizer
 
 
class OpenAITokenizer(Tokenizer):
 
    def __init__(self):
        super().__init__()
 
    async def _tokenize(self, text):
        return tiktoken.encoding_for_model("gpt-3.5-turbo").encode(text)
 
 
class OpenAILLM(LLM):
 
    def __init__(self):
        super().__init__(tokenizer=OpenAITokenizer())
        openai.api_key = os.environ['OPENAI_API_KEY']
 
    async def _complete(self, prompt):
        completion = await openai.Completion.acreate(model="text-davinci-003",
                                                     prompt=prompt,
                                                     max_tokens=100,
                                                     temperature=0.1)
        return completion.choices[0].text
 
 
if __name__ == '__main__':
    llm = OpenAILLM()
    chatllm = ChatLLM.from_llm(llm)
    asyncio.run(
        chatllm.set_system_prompt(
            Persona.CodingLanguageExpert.format(language='Python')))
    reply = asyncio.run(
        chatllm('Count the number of python code lines in the current folder'))

ℹ️

Internally, Embedia combines all the messages from the chat_history in the following format:

system: <system_prompt>
user: <prompt_1>
assistant: <reply_1>
user: <prompt_2>
assistant: <reply_2>
<...>
assistant:

This entire string is then sent to the __call__ function of the underlying LLM. This makes an LLM with a next-token generation interface behave like an LLM with a chat interface.

Running the above code will trigger a ChatLLMInit when the system prompt is set and then a LLMStart and LLMEnd when the __call__ method of LLM is called.

[time: 2023-09-24T07:30:30.641946+00:00] [id: 139949812477728] [event: ChatLLM Init]
system (23 tokens):
You are an expert in writing Python code. Only use Python default libraries. Reply only with the code and nothing else
 
[time: 2023-09-24T07:30:30.644154+00:00] [id: 139949834272432] [event: LLM Start]
Prompt (43 tokens):
system: You are an expert in writing Python code. Only use Python default libraries. Reply only with the code and nothing else
user: Count the number of python code lines in the current folder
assistant:
 
[time: 2023-09-24T07:30:33.204390+00:00] [id: 139949834272432] [event: LLM End]
Completion (69 tokens):
 
 
import os
 
def count_lines(path):
    count = 0
    for root, dirs, files in os.walk(path):
        for file in files:
            if file.endswith('.py'):
                with open(os.path.join(root, file)) as f:
                    count += len(f.readlines())
    return count
 
print(count_lines('.'))

LLM Tool

ChatLLM

Attributes

Methods

Usage

Basic Usage

Saving and loading chat_history

Adding a system prompt

Adding the optional Tokenizer

Adding the optional max_input_tokens parameter

Converting LLM to ChatLLM

Saving and loading `chat_history`

Adding the optional `Tokenizer`

Adding the optional `max_input_tokens` parameter