🚀

Technical Documentation

TinyGen is a simplified version of Codegen, a code generation service designed to transform code from a public GitHub repository based on specific prompts. The service outputs the difference (diff) between the original and transformed code. To build this service, my implementation uses FastAPI, OpenAI’s LLM GPT-3.5 model, Supabase DB, AWS EC2, and a ton of small python utility functions to fetch, transform, and return information.

Local Setup

Step 1: Dependencies

Tinygen uses Python 3. All necessary libraries are listed in the requirements.txt file. Install them by running pip install -r requirements.txt.
 
  • Python ≥ 3.9.7
  • Libraries: difflib, os, OpenAI, python-dotenv, PyGithub, supabase, uvicorn, pydantic, fastapi
 

Step 2: Environment Variables

You can create a .env file locally to store the key information below, or pass it via secrets when deployed.
 
# ChatGPT API Access OPENAI_API_KEY="" # GitHub API Access GITHUB_TOKEN="" # Supabase Configuration SUPABASE_URL="" SUPABASE_KEY=""
 

Step 3: Running Tinygen

Go to the app/ directory and run:
 
python3 -m uvicorn main:app --reload
 

FastAPI App

The run_tiny_gen function serves as the core function and endpoint of the FastAPI application, processing requests to transform code from a public GitHub repository and a user prompts. It uses a series of utility functions to fetch the original code, apply transformations via ChatGPT, and generate a diff of the changes. See the Utility Functions section below
  • TinyGenRequest: Pydantic model capturing GitHub repo URL (repoUrl) and transformation prompt (prompt).
  • DiffResponse: Pydantic model outlining the response with the diff of the code changes.
  • get_repo_files_as_string: Fetches GitHub repo code as a string.
  • ask_chatgpt: Processes code and prompt with ChatGPT, returning transformed code.
  • calculate_code_diff: Calculates diff between original and transformed code.
  • router: APIRouter defining /run endpoint and returning DiffResponse.
 
from fastapi import FastAPI, APIRouter, HTTPException from pydantic import BaseModel from utils.github_interaction import get_repo_files_as_string from utils.chatgpt_interaction import ask_chatgpt from utils.calculate_diff import calculate_code_diff router = APIRouter() class TinyGenRequest(BaseModel): repoUrl: str prompt: str class DiffResponse(BaseModel): diff: str @router.post("/run", response_model=DiffResponse) async def run_tiny_gen(request: TinyGenRequest): try: original_code = get_repo_files_as_string(request.repoUrl) fixed_code = ask_chatgpt(request.prompt, original_code) reflection_text = ( "Review changes against requirements: '{prompt}'. " "Reply with [CONFIDENT] for no further improvements needed, or " "[REVISION NEEDED] for more adjustments." ).format(prompt=request.prompt) code_for_reflection = "Modified Code:\n{modified} \n Original Code:\n{original}\n\n".format( original=original_code, modified=fixed_code) reflection_response = ask_chatgpt(reflection_text, code_for_reflection) # Check if the response indicates that revisions are needed if "[REVISION NEEDED]" in reflection_response: fixed_code = reflection_response diff = calculate_code_diff(original_code, fixed_code) return DiffResponse(diff=diff) except Exception as e: raise HTTPException(status_code=500, detail=str(e)) app = FastAPI() app.include_router(router)
 
 
 

Utility Functions

chatgpt_interaction.py

 
Within the ChatGPT interaction utility, the function, ask_chatgpt, sends a prompt along with a code block to OpenAI's ChatGPT model for completion and returns the modified code as per the suggestions provided by the model. Key variables include:
  • prompt: A string representing the user's query or request for code modification.
  • code_block: A string containing the original code that needs analysis or modification.
  • client: An instance of the OpenAI client, initialized with an API key loaded from environment variables, used to communicate with the OpenAI API.
  • completion: The response from the OpenAI API, containing the model's output based on the prompt and code block provided.
 
import os from openai import OpenAI from dotenv import load_dotenv load_dotenv() # Initialize OpenAI client client = OpenAI(api_key=os.environ['OPENAI_API_KEY']) def ask_chatgpt(prompt, code_block): completion = client.chat.completions.create( model="gpt-3.5-turbo", messages=[ { "role": "user", "content": f"{prompt}. Code: {code_block},\n Please return the full modified code with all suggested changes incorporated." } ] ) return completion.choices[0].message.content
 

calculate_diff.py

 
The function uses difflib to calculate the diff between two code blocks, simplifying the comparison of changes.
  • original_code: A string containing the original version of the code.
  • suggested_changes: A string containing the modified version of the code.
 
import difflib def calculate_code_diff(original_code, suggested_changes): diff = difflib.unified_diff(original_code.splitlines(), suggested_changes.splitlines()) # Join the diff lines into a single string diff_str = '\n'.join(diff) # Add the diff header print(diff_str) return diff_str
 

github_interaction.py

 
This script retrieves the contents of a specified GitHub repository as a single string, excluding binary files and READMEs. It utilizes the github Python package, requiring a GitHub token for authentication. The process involves:
  1. Extracting the repository's name from the given URL.
  1. Fetching the repository's content, navigating through directories and files.
  1. Decoding file content from base64 when not binary, appending it to a collective string.
 
Key functions include:
  • is_binary_string(bytes_data): Determines if the provided bytes data is binary.
  • get_repo_files_as_string(repo_url: str): Fetches and concatenates non-binary file contents of the specified repository into a single string, excluding README files.
 
from github import Github import base64 import os from dotenv import load_dotenv load_dotenv() # Initialize OpenAI client github_token = os.environ['GITHUB_TOKEN'] def is_binary_string(bytes_data): """ Checks if the given bytes data represents a binary string. """ text_characters = bytearray({7, 8, 9, 10, 12, 13, 27} | set(range(0x20, 0x100)) - {0x7f}) return bool(bytes_data.translate(None, text_characters)) def get_repo_files_as_string(repo_url: str): g = Github(github_token) # Extract the repository's full name (user/repo) from the URL repo_name = repo_url.rstrip('/').split('/')[-2] + '/' + repo_url.rstrip('/').split('/')[-1] repo = g.get_repo(repo_name) contents = repo.get_contents("") repo_code = "" while contents: file_content = contents.pop(0) if file_content.type == "dir": contents.extend(repo.get_contents(file_content.path)) else: # Skip README.md files if file_content.name.lower().startswith('readme'): continue # Check if the content is binary if file_content.encoding == 'base64' and is_binary_string(base64.b64decode(file_content.content)): # Handle binary files differently if needed continue else: # Decode and append file content to repo_code content = file_content.decoded_content.decode('utf-8') repo_code += content + "\n\n" # Separate files with newlines return repo_code.strip() # Strip trailing newline
 

supabase_interaction.py

 
This Python script facilitates data insertion into a Supabase table through a store function. The function takes the name of the table and the data to be inserted as parameters. Key variables in the script include SUPABASE_URL, representing the URL of the Supabase project, and SUPABASE_KEY, holding the API key for authentication. Additionally, the supabase variable initializes the Supabase client using the provided URL and key.
 
  • Note: For production level access to Supabase DB the service_role secret is used.
 
import os from supabase import create_client, Client from typing import Dict from dotenv import load_dotenv # Load environment variables from .env file load_dotenv() # Initialize the Supabase client SUPABASE_URL = os.getenv("SUPABASE_URL") SUPABASE_KEY = os.getenv("SUPABASE_KEY") supabase: Client = create_client(SUPABASE_URL, SUPABASE_KEY) def store(table_name: str, data: Dict): """Inserts data into the specified Supabase table.""" response = supabase.table(table_name).insert(data).execute() return response
 
Example of user input inserted:
notion image
 
 

Getting Tinygen Live!

To get Tinygen live, I decided to run the FastAPI app on an EC2 instance.
 
notion image
 
Nginx Configuration for the EC2 Instance:
server { listen 80; server_name [[ENTER_YOUR_INSTANCE_IP_HERE]]; location { proxy_pass http://127.0.0.1:8000; } }
 
 

Try Tinygen out!

 
Run in Postman: