Legal Research Assistant using Arctic and Llama3.1–405B in Snowflakes Notebooks
Legal research, traditionally a time-consuming process that can take up to 6–8 hours, is being revolutionized with AI-powered tools. By leveraging models like Llama 3.1–405B and Snowflakes Arctic in Snowflakes Notebooks, this process can now be reduced to under 30 minutes. These advancements allow for rapid analysis of vast legal databases, helping professionals identify relevant cases, statutes, and precedents with unprecedented speed and accuracy. This shift in efficiency is driven by the ability of AI to process and cross-reference extensive data sets in minutes, freeing up lawyers to focus on higher-level tasks such as strategy and case analysis. In addition to saving time, AI also introduces new ways to uncover legal insights, offering ideas and patterns that might otherwise be missed through manual research. However, it’s important to remember that while AI accelerates research, it should complement, not replace, thorough legal work.
The benefits of AI in legal research are clear: enhanced productivity, improved access to information, and more strategic use of lawyers’ time. With tools like these, legal professionals can cut down their research time significantly and spend more time on tasks that require human expertise.
Court Listener
CourtListener is a comprehensive legal research tool, offering free access to millions of legal opinions, court cases, and legal filings through its extensive database. Developed by the Free Law Project, CourtListener aggregates legal information from courts across the United States, making it a go-to resource for legal professionals, researchers, and the public. The platform’s API is particularly powerful, providing developers access to vast amounts of court records, allowing for advanced querying, data analysis, and integration into legal research systems.
With the API, users can search across various jurisdictions, including federal, state, and local courts, retrieving metadata, opinions, and citations. It supports real-time updates and enables automation of legal research tasks, making it highly valuable for legal tech solutions such as the Legal Research Assistant you’re building. Moreover, it aligns with the growing trend of legal data being more accessible and usable for technology-driven applications.
CourtListener’s open API is an essential tool for legal professionals aiming to automate and enhance their research workflows.
Workflow
The workflow for the Legal Research Assistant efficiently integrates several powerful tools to enhance legal research. Initially, for a given query, the CourtListener API retrieves the top 10 relevant previous judgments, providing a strong foundation of case law. These judgments are then summarized using Snowflakes Arctic LLM, tailoring the summaries to align closely with the user’s specific query. This targeted summarization ensures that the key elements of each case are highlighted, making it easier to glean insights quickly. Finally, the process culminates in the utilization of Llama 3.1–405B, which leverages its large context window to generate a comprehensive final answer. This step mimics the retrieval-augmented generation (RAG) approach, where both the original query and the summarized judgments are combined to produce a nuanced and well-informed response. By employing this multi-step workflow, legal professionals can achieve rapid, accurate, and contextually rich results in a fraction of the time it traditionally takes for legal research
Let’s start with the Code!
We will start with writing some helper functions
import requests
def callAPI(question):
url = f"https://www.courtlistener.com/api/rest/v4/search/?q={question}&type=rd"
token = ""
# You can get the token from https://www.courtlistener.com/help/api/rest/search/
# Set up the headers with the Bearer Token
headers = {
"Authorization": f"Bearer {token}"
}
# Make the GET request
response = requests.get(url, headers=headers)
# Check the status code and print the response
if response.status_code == 200:
print(response.json()) # Assuming the response is JSON
else:
print(f"Failed with status code: {response.status_code}")
return response.text
This code defines a function callAPI(question)
that interacts with the CourtListener API. Given a legal question or query, it sends a request to the API to retrieve relevant court cases or judgments. The function sets up an authentication token and sends a GET request to the API endpoint. If the request is successful, it prints the JSON response containing the case data; otherwise, it prints an error message with the status code. This function allows legal researchers to programmatically fetch case details related to a specific query.
from snowflake.cortex import Complete
def snowflake_arctic_summarize(text):
prompt = f"Write a summary of this case systematically, including all penal codes mentioned with respect to the query {question} and the text is {text}. Summarize in 250 words only"
return Complete('snowflake-arctic', prompt)
This snippet sends a case text to Snowflakes Arctic LLM, asking it to generate a concise, query-specific summary. The summarization focuses on systematically highlighting key aspects, including penal codes, and limits the output to 250 words. This step efficiently condenses the case information to make it more digestible for legal research.
import requests
import fitz # PyMuPDF
def readPDF(pdf_url):
pdf_response = requests.get(pdf_url)
# Check if the request was successful
if pdf_response.status_code == 200:
# Step 2: Save the PDF to a local file
pdf_file_path = 'downloaded_file.pdf'
with open(pdf_file_path, 'wb') as pdf_file:
pdf_file.write(pdf_response.content)
print("PDF downloaded successfully.")
# Step 3: Extract text from the PDF using PyMuPDF
with fitz.open(pdf_file_path) as pdf_document:
pdf_text = ""
for page_num in range(len(pdf_document)):
page = pdf_document.load_page(page_num) # Load each page
pdf_text += page.get_text() # Extract text from page
print("Text extracted from the PDF:")
return pdf_text
else:
print(f"Failed to download PDF. Status code: {pdf_response.status_code}")
This function downloads a PDF from a given URL, saves it locally, and then extracts its text using the PyMuPDF library. The extracted text is returned, allowing you to analyze or summarize the content of the PDF, such as legal documents or case files, for further use in legal research. It efficiently converts PDF files into a text format that can be processed programmatically.
import tiktoken
def split_text(text, max_tokens):
enc = tiktoken.encoding_for_model('gpt-3.5-turbo')
tokens = enc.encode(text)
chunks = []
current_chunk = []
current_length = 0
for token in tokens:
current_chunk.append(token)
current_length += 1
if current_length >= max_tokens:
chunks.append(enc.decode(current_chunk))
current_chunk = []
current_length = 0
if current_chunk: # Append any remaining tokens
chunks.append(enc.decode(current_chunk))
return chunks
This function splits a large text into smaller chunks based on a specified token limit. It uses the tiktoken
library to encode the text into tokens suitable for models like GPT-3.5-turbo. The text is then divided into chunks that fit within the max_tokens
limit. This ensures that large texts can be processed efficiently in systems with token constraints, such as when working with language models that have a limited context window.
import json
import tqdm
question = "Can I sue my wife for ignoring me?"
response = callAPI(question)
response = json.loads(response)
count = 0
finalSummary = " "
for result in response['results']:
count+=1
print(count)
filePath = "https://storage.courtlistener.com/" + result['filepath_local']
text = readPDF(filePath)
text_chunks = split_text(text, 2000)
summaries = []
for chunk in tqdm.tqdm(text_chunks):
summary = snowflake_arctic_summarize(chunk)
summaries.append(summary)
combined_summary = " ".join(summaries)
finalSummary += combined_summary + f" {filePath}"
if count==10:
break
This code executes a complete workflow for legal research. It starts by querying the CourtListener API with a legal question and fetching relevant case files. For each case file, it downloads and reads the PDF, extracts the text, and splits it into smaller chunks (due to token limitations). Each chunk is then summarized using Snowflakes Arctic LLM, and the summaries are combined to form a final case summary. The process stops after summarizing three cases, giving a concise overview of the most relevant judgments related to the query.
Finally, we will give everything using Llama3.1–405B. We would be using the higher context window of Llama3.1–405B
prompt = f"You are an Assistant to a lawyer and want to do a good research for a case {question}. You have collated past cases {finalSummary}. Now use these past evidences to make a good research for the case {question}. Please have citations both in favor of initiating the disciplinary proceedings and against it. Intensively use the penal codes those are mentioned in the past cases. Be very careful as this is a sensitive matter. The answer should be based only on the past cases. At the end of the answer give the reference links to cases cited also. Don't forget to give links at the end"
Complete('llama3.1-405b', prompt)
Streamlit Application
Conclusion
In conclusion, the integration of powerful tools like the CourtListener API, Snowflakes Arctic LLM, and Llama 3.1–405B offers a revolutionary approach to legal research. By streamlining the process from retrieving relevant cases to generating context-aware summaries and comprehensive answers, this workflow significantly reduces the time and effort required to perform thorough legal analysis. What once took hours can now be accomplished in minutes, without compromising on depth or accuracy. This combination of AI-driven tools allows legal professionals to focus more on strategic thinking and case preparation, ultimately transforming the way legal research is conducted in the modern era. The future of legal work is not just faster but also smarter, with AI assisting at every step to ensure high-quality results.
References:
- https://pro.bloomberglaw.com/insights/technology/ai-impact-on-legal-research/
- https://www.onelegal.com/blog/legal-generative-ai/
- https://insight.thomsonreuters.com.au/legal/posts/what-is-ai-assisted-research-for-lawyers-in-australia
- https://www.courtlistener.com/help/api/rest/search/
- https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions