ChatLLM
An LLM (Large Language Model) is a next token generation model that generates the most probable next token given a set of tokens.
This is a rapidly developing research area and the definitions of its types might have some nuances. But, for a layman, you can consider that they come in three flavours:
- Foundation - Trained on a massive corpus of text from scratch
- Instruct - Foundation model, fine-tuned on instruction following (eg: text-davinci-003)
- Chat - Instruct models, fine-tuned on chatting with humans (eg: gpt-3.5-turbo)
The Instruct and Chat based fine-tunes are the most useful for application
development. And the classes associated with these in Embedia are LLM and
ChatLLM respectively.
ChatLLM is an abstract class. Inherit from this class and define the _reply
method. The _reply method should take in a string as the input prompt, send it
to the large language model fine-tuned on chat and return the reply. To use
ChatLLM, call the class instance like a function with the input prompt as the
argument.
You can convert an LLM to a ChatLLM (Learn more about: Converting LLM to
ChatLLM)
Attributes
-
tokenizer(Tokenizer, Optional): Used for counting no. of tokens in the prompt and response. -
max_input_tokens(int, Optional): Used for checking if the sum of all the message contents inchat_historyis less thanmax_input_tokens. -
chat_history(List[Message]): Contains all the messages sent and received by theChatLLMinstance. It is automatically initialized with an empty list when the class instance is created. (Learn more about: Message schema) -
llm(Optional[LLM]): Contains theLLMinstance if theChatLLMobject was created using thefrom_llmclassmethod. Otherwise, it isNone.
Methods
_reply(abstract): Implement this method to generate the reply given a prompt. Do not call this method directly. Instead, use the__call__method.
- Input:
Optional[str] - Output:
str
The input prompt is kept optional in the _reply function because a Message
object with the role attribute set to MessageRole.user and content
attribute set to the input prompt will automatically be added to the
chat_history when the __call__ method is called with the input prompt.
__call__: Internally calls the_replymethod. Use this method by calling the class instance like a function with the input text as the argument.
- Input:
str - Output:
str - Adds a
Messageobject with theroleattribute set toMessageRole.userandcontentattribute set to the input prompt to thechat_historybefore calling the_replymethod. - Counts the number of tokens in the system prompt, input prompt and the output
reply if the
tokenizerargument is passed to the constructor. - Checks if the sum of all the message contents in
chat_historyis less thanmax_input_tokensif themax_input_tokensargument is passed to the constructor. - Publishes an
ChatLLMStartevent before calling the_replymethod. - The
_replymethod is called with the input prompt as the argument if it is defined to take it. Else, it is called without any arguments. - Publishes an
ChatLLMEndevent after calling the_replymethod. - Adds a
Messageobject with theroleattribute set toMessageRole.assistantandcontentattribute set to the output reply to thechat_historyafter calling the_replymethod.
If the ChatLLM object is created using the from_llm classmethod, instead
of calling the _reply method, it calls the _complete method of the LLM
object with the following prompt structure:
system: <system_prompt>
user: <prompt_1>
assistant: <reply_1>
user: <prompt_2>
assistant: <reply_2>
<...>
assistant:set_system_prompt: This erases thechat_historyand sets aMessageobject with theroleset toMessageRole.systemandcontentset to the provided system prompt as the first message in thechat_history.
- Input:
str - Output:
None - Publishes an
ChatLLMInitevent after adding the system prompt to the chat history.
save_chat: Saves thechat_historyin a pickle file.
- Input:
str(path to the pickle file) - Output:
None
load_chat: Loads thechat_historyfrom a pickle file.
- Input:
str(path to the pickle file) - Output:
None
from_llm(classmethod): Converts anLLMinstance into aChatLLMinstance.
- Input:
LLM(instance of yourLLMsubclass) - Output:
ChatLLM(instance ofChatLLM)
Usage
Basic Usage
A ChatLLM might have a variety of usecases in your webapp. An example would be
to add a chatbot with a certain personality like a physics teacher or a content
writer.
You can connect any ChatLLM to Embedia. It might be an open-source model like Llama-2, Vicuna, Falcon, etc. or a paid api from OpenAI, Google, Anthropic, etc.
Make sure to connect the instruct based models to LLM and the chat based
models to ChatLLM
Let's connect the OpenAI's gpt-3.5-turbo model to Embedia. Since OpenAI's
ChatCompletion API takes in the entire chat history, we'll need to pass it the
entire chat history instead of just the current prompt.
This functionality will be built differently if using the Google's Palm 2 model for example. At the time of writing, the google-generativeai library keeps a copy of the entire chat history, hence, we'll need to pass it only the current prompt.
import asyncio
import os
import openai
from embedia import ChatLLM
class OpenAIChatLLM(ChatLLM):
def __init__(self):
super().__init__()
openai.api_key = os.environ['OPENAI_API_KEY']
async def _reply(self, prompt):
completion = await openai.ChatCompletion.acreate(
model="gpt-3.5-turbo",
messages=[{
'role': msg.role,
'content': msg.content
} for msg in self.chat_history],
)
return completion.choices[0].message.content
if __name__ == '__main__':
chatllm = OpenAIChatLLM()
reply = asyncio.run(chatllm('What is the capital of France?'))
Running the above code will print the following output because there are two
events published internally, namely: ChatLLMStart and ChatLLMEnd. (Learn
more about: Publish-Subscribe Event System)
[time: 2023-09-24T06:31:26.402688+00:00] [id: 140338366051664] [event: ChatLLM Start]
user (None tokens):
What is the capital of France?
[time: 2023-09-24T06:31:27.507751+00:00] [id: 140338366051664] [event: ChatLLM End]
assistant (None tokens):
The capital of France is Paris.Saving and loading chat_history
You can save and load the chat_history variable in a pickle file by using
the save_chat and load_chat methods respectively.
import asyncio
import os
import openai
from embedia import ChatLLM
class OpenAIChatLLM(ChatLLM):
def __init__(self):
super().__init__()
openai.api_key = os.environ['OPENAI_API_KEY']
async def _reply(self, prompt):
completion = await openai.ChatCompletion.acreate(
model="gpt-3.5-turbo",
messages=[{
'role': msg.role,
'content': msg.content
} for msg in self.chat_history],
)
return completion.choices[0].message.content
if __name__ == '__main__':
chatllm = OpenAIChatLLM()
reply = asyncio.run(chatllm('What is the capital of France?'))
reply = asyncio.run(chatllm('What is the capital of Italy?'))
asyncio.run(chatllm.save_chat('openai_chatllm.pkl'))
asyncio.run(chatllm.load_chat('openai_chatllm.pkl'))
assert os.path.exists('openai_chatllm.pkl')
print(chatllm.chat_history)
Running the above code will print the following output:
[time: 2023-09-24T06:36:25.321676+00:00] [id: 139750960313680] [event: ChatLLM Start]
user (None tokens):
What is the capital of France?
[time: 2023-09-24T06:36:26.326593+00:00] [id: 139750960313680] [event: ChatLLM End]
assistant (None tokens):
The capital of France is Paris.
[time: 2023-09-24T06:36:26.329971+00:00] [id: 139750960313680] [event: ChatLLM Start]
user (None tokens):
What is the capital of Italy?
[time: 2023-09-24T06:36:27.316278+00:00] [id: 139750960313680] [event: ChatLLM End]
assistant (None tokens):
The capital of Italy is Rome.
[Message(role=<MessageRole.user: 'user'>, content='What is the capital of France?', id='42a95b2e-89b4-4638-ac79-60633604e2a2', created_at='2023-09-24 06:36:25.321613+00:00'), Message(role=<MessageRole.assistant: 'assistant'>, content='The capital of France is Paris.', id='6a756ef5-3a1a-4d77-8126-3c31f468feba', created_at='2023-09-24 06:36:26.326527+00:00'), Message(role=<MessageRole.user: 'user'>, content='What is the capital of Italy?', id='1d1ea9fc-bac2-40b2-a7ef-580b0c4ce771', created_at='2023-09-24 06:36:26.329854+00:00'), Message(role=<MessageRole.assistant: 'assistant'>, content='The capital of Italy is Rome.', id='ea33491b-1471-4a2d-bf5f-531cf953bef2', created_at='2023-09-24 06:36:27.316237+00:00')]Adding a system prompt
You can set the system prompt for the ChatLLM subclass by using its
set_system_prompt method. This erases the chat_history and sets the provided
system prompt as the first message in the chat_history. There are a bunch of
predefined system prompts available in the Persona class. (Learn more about:
Persona prompts) You can also create your
own system prompt.
Using a lower temperature when asking the LLM to write code gives more predictable results.
import asyncio
import os
import openai
from embedia import ChatLLM, Persona
class OpenAIChatLLM(ChatLLM):
def __init__(self):
super().__init__()
openai.api_key = os.environ['OPENAI_API_KEY']
async def _reply(self, prompt):
completion = await openai.ChatCompletion.acreate(
model="gpt-3.5-turbo",
temperature=0.1,
messages=[{
'role': msg.role,
'content': msg.content
} for msg in self.chat_history],
)
return completion.choices[0].message.content
if __name__ == '__main__':
chatllm = OpenAIChatLLM()
asyncio.run(
chatllm.set_system_prompt(
Persona.CodingLanguageExpert.format(language='Python')))
reply = asyncio.run(
chatllm('Count the number of python code lines in the current folder'))
Running the above code will print the following output:
[time: 2023-09-24T06:53:26.213236+00:00] [id: 140328954393936] [event: ChatLLM Init]
system (None tokens):
You are an expert in writing Python code. Only use Python default libraries. Reply only with the code and nothing else
[time: 2023-09-24T06:53:26.215463+00:00] [id: 140328954393936] [event: ChatLLM Start]
user (None tokens):
Count the number of python code lines in the current folder
[time: 2023-09-24T06:53:29.265166+00:00] [id: 140328954393936] [event: ChatLLM End]
assistant (None tokens):
import os
count = 0
for root, dirs, files in os.walk('.'):
for file in files:
if file.endswith('.py'):
with open(os.path.join(root, file), 'r') as f:
count += sum(1 for line in f if line.strip())
print(count)Adding the optional Tokenizer
Notice that the number of tokens is None in the above-printed log. This is
because the ChatLLM class doesn't have the optional tokenizer parameter in
the constructor. Let's link a Tokenizer in the next example (Learn more about:
Tokenizer)
Note that the way your tokenizer counts the number of tokens might slightly vary from how a service provider (eg: OpenAI) counts them. They might add a few tokens internally for the service to function properly.
import asyncio
import os
import openai
import tiktoken
from embedia import ChatLLM, Persona, Tokenizer
class OpenAITokenizer(Tokenizer):
def __init__(self):
super().__init__()
async def _tokenize(self, text):
return tiktoken.encoding_for_model("gpt-3.5-turbo").encode(text)
class OpenAIChatLLM(ChatLLM):
def __init__(self):
super().__init__(tokenizer=OpenAITokenizer())
openai.api_key = os.environ['OPENAI_API_KEY']
async def _reply(self, prompt):
completion = await openai.ChatCompletion.acreate(
model="gpt-3.5-turbo",
temperature=0.1,
messages=[{
'role': msg.role,
'content': msg.content
} for msg in self.chat_history],
)
return completion.choices[0].message.content
if __name__ == '__main__':
chatllm = OpenAIChatLLM()
asyncio.run(
chatllm.set_system_prompt(
Persona.CodingLanguageExpert.format(language='Python')))
reply = asyncio.run(
chatllm('Count the number of python code lines in the current folder'))
Running the above code will print the following output:
[time: 2023-09-24T07:04:30.005131+00:00] [id: 139954984253664] [event: ChatLLM Init]
system (23 tokens):
You are an expert in writing Python code. Only use Python default libraries. Reply only with the code and nothing else
[time: 2023-09-24T07:04:30.016053+00:00] [id: 139954984253664] [event: ChatLLM Start]
user (11 tokens):
Count the number of python code lines in the current folder
[time: 2023-09-24T07:04:33.392050+00:00] [id: 139954984253664] [event: ChatLLM End]
assistant (65 tokens):
import os
count = 0
for root, dirs, files in os.walk('.'):
for file in files:
if file.endswith('.py'):
with open(os.path.join(root, file), 'r') as f:
count += sum(1 for line in f if line.strip())
print(count)Adding the optional max_input_tokens parameter
There's also another optional parameter in the ChatLLM constructor called
max_input_tokens. If the sum of all the message contents in chat_history is
greater than max_input_tokens, the class will raise a ValueError.
Note that max_input_tokens will not have any effect if the tokenizer
argument is not passed to the ChatLLM constructor.
import asyncio
import os
import openai
import tiktoken
from embedia import ChatLLM, Persona, Tokenizer
class OpenAITokenizer(Tokenizer):
def __init__(self):
super().__init__()
async def _tokenize(self, text):
return tiktoken.encoding_for_model("gpt-3.5-turbo").encode(text)
class OpenAIChatLLM(ChatLLM):
def __init__(self):
super().__init__(tokenizer=OpenAITokenizer(), max_input_tokens=1)
openai.api_key = os.environ['OPENAI_API_KEY']
async def _reply(self, prompt):
completion = await openai.ChatCompletion.acreate(
model="gpt-3.5-turbo",
temperature=0.1,
messages=[{
'role': msg.role,
'content': msg.content
} for msg in self.chat_history],
)
return completion.choices[0].message.content
if __name__ == '__main__':
chatllm = OpenAIChatLLM()
asyncio.run(
chatllm.set_system_prompt(
Persona.CodingLanguageExpert.format(language='Python')))
reply = asyncio.run(
chatllm('Count the number of python code lines in the current folder'))
There will be two messages set in the chat_history when the __call__ method.
One with the system prompt (23 tokens) and the other with the user prompt (11
tokens). The sum of these two is 34. Hence the above code will thow the
following error:
ValueError: Length of input text: 34 token(s) is longer than max_input_tokens: 1Converting LLM to ChatLLM
You can convert an instance of an LLM subclass into an instance of a ChatLLM
subclass using the from_llm classmethod present in the ChatLLM class. Once
you've converted the LLM instance, you can use it exactly like a ChatLLM
instance.
This is very useful since a lot of LLM service providers (and even open-source models) only provide a next-token generation interface and not a chat interface.
The tokenizer and max_input_tokens parameters behave the same way as they
would if it were an LLM. Setting the system prompt is also supported for these
kinds of instances.
import asyncio
import os
import openai
import tiktoken
from embedia import LLM, ChatLLM, Persona, Tokenizer
class OpenAITokenizer(Tokenizer):
def __init__(self):
super().__init__()
async def _tokenize(self, text):
return tiktoken.encoding_for_model("gpt-3.5-turbo").encode(text)
class OpenAILLM(LLM):
def __init__(self):
super().__init__(tokenizer=OpenAITokenizer())
openai.api_key = os.environ['OPENAI_API_KEY']
async def _complete(self, prompt):
completion = await openai.Completion.acreate(model="text-davinci-003",
prompt=prompt,
max_tokens=100,
temperature=0.1)
return completion.choices[0].text
if __name__ == '__main__':
llm = OpenAILLM()
chatllm = ChatLLM.from_llm(llm)
asyncio.run(
chatllm.set_system_prompt(
Persona.CodingLanguageExpert.format(language='Python')))
reply = asyncio.run(
chatllm('Count the number of python code lines in the current folder'))
Internally, Embedia combines all the messages from the chat_history in the
following format:
system: <system_prompt>
user: <prompt_1>
assistant: <reply_1>
user: <prompt_2>
assistant: <reply_2>
<...>
assistant:This entire string is then sent to the __call__ function of the underlying
LLM. This makes an LLM with a next-token generation interface behave like an
LLM with a chat interface.
Running the above code will trigger a ChatLLMInit when the system prompt is
set and then a LLMStart and LLMEnd when the __call__ method of LLM is
called.
[time: 2023-09-24T07:30:30.641946+00:00] [id: 139949812477728] [event: ChatLLM Init]
system (23 tokens):
You are an expert in writing Python code. Only use Python default libraries. Reply only with the code and nothing else
[time: 2023-09-24T07:30:30.644154+00:00] [id: 139949834272432] [event: LLM Start]
Prompt (43 tokens):
system: You are an expert in writing Python code. Only use Python default libraries. Reply only with the code and nothing else
user: Count the number of python code lines in the current folder
assistant:
[time: 2023-09-24T07:30:33.204390+00:00] [id: 139949834272432] [event: LLM End]
Completion (69 tokens):
import os
def count_lines(path):
count = 0
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith('.py'):
with open(os.path.join(root, file)) as f:
count += len(f.readlines())
return count
print(count_lines('.'))