TextDoc
TextDoc represents a text document that can be used with the VectorDB and
EmbeddingModel classes. It has in-built helper functions for text processing.
Attributes
contents(str): The contents of the text document.meta(dict, optional): Any metadata related to the text document. Defaults to None.id(str, optional): The id of the text document. Defaults to a random uuid.created_at(str, optional): The timestamp of the text document. Defaults to the current time with system's timezone.
Methods
from_file(classmethod): Create aTextDocinstance from a file.
- Input:
path: str, meta: Optional[dict] = None, encoding: str = "utf-8" - Output:
TextDoc
split_on_separator: Split the contents on a separator and return a list ofTextDocinstances.
- Input:
separator: str = "\n", strip_after_split: bool = False - Output:
List[TextDoc]
extract_regex: ExtractTextDocinstances from the content using a regex pattern.
- Input:
pattern: str - Output:
List[TextDoc]