Advertisement

Building a Smart Python-to-R Code Converter with Gemini AI-Powered Validation and Feedback

In this tutorial, we delve into the creation of an intelligent Python-to-R code converter that integrates Google’s free Gemini API for validation and improvement suggestions. We start by defining the conversion logic, mapping Python functions, libraries, and syntactic patterns to their closest R equivalents. Then, we leverage Gemini AI to assess the quality of our R translations, giving us validation scores, improvement suggestions, and even refined R code. By combining static conversion rules with dynamic AI-driven analysis, we aim to produce more accurate and efficient R code directly from Python scripts.

import re
import requests
import json
import os
from typing import Dict, List, Tuple, Optional


import os
os.environ['GEMINI_API_KEY'] = 'Use Your Own API Key'

We begin by importing essential Python libraries, such as re, requests, and json, for handling HTTP requests and data processing. We also set the Gemini API key using an environment variable, allowing secure access to Google’s AI services for code validation.

class GeminiValidator:
    """
    Uses Google's free Gemini API to validate and improve R code conversions
    """


    def __init__(self, api_key: str = None):
        """
        Initialize with Gemini API key
        Get your free API key from: https://aistudio.google.com/
        """
        self.api_key = api_key or os.getenv('GEMINI_API_KEY')
        if not self.api_key:
            print("⚠  No Gemini API key provided. Set GEMINI_API_KEY environment variable")
            print("   or get a free key from: https://aistudio.google.com/")


        self.base_url = "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:generateContent"


    def validate_conversion(self, python_code: str, r_code: str) -> Dict:
        """
        Use Gemini to validate the Python to R conversion
        """
        if not self.api_key:
            return {
                "validation_score": "N/A",
                "suggestions": ["Set up Gemini API key for validation"],
                "improved_code": r_code,
                "error": "No API key provided"
            }


        prompt = f"""
        You are an expert in both Python and R programming languages, especially for statistical analysis.


        I have converted Python code to R code. Please validate this conversion and provide feedback.


        ORIGINAL PYTHON CODE:
        ```python
        {python_code}
        ```


        CONVERTED R CODE:
        ```r
        {r_code}
        ```


        Please analyze the conversion and provide:
        1. A validation score (0-100) for accuracy
        2. List of any errors or issues found
        3. Suggestions for improvement
        4. An improved version of the R code if needed


        Focus on:
        - Correct function mappings (pandas to dplyr, numpy to base R, etc.)
        - Proper R syntax and idioms
        - Statistical accuracy
        - Code efficiency and best practices


        Respond in JSON format:
        {{
            "validation_score": <number>,
            "issues_found": [<list of issues>],
            "suggestions": [<list of suggestions>],
            "improved_code": "<improved R code>",
            "summary": "<brief summary of the conversion quality>"
        }}
        """


        try:
            headers = {
                'Content-Type': 'application/json',
            }


            data = {
                "contents": [{
                    "parts": [{
                        "text": prompt
                    }]
                }]
            }


            response = requests.post(
                f"{self.base_url}?key={self.api_key}",
                headers=headers,
                json=data,
                timeout=30
            )


            if response.status_code == 200:
                result = response.json()
                text_response = result['candidates'][0]['content']['parts'][0]['text']


                try:
                    text_response = re.sub(r'```jsonn?', '', text_response)
                    text_response = re.sub(r'n?```', '', text_response)


                    validation_result = json.loads(text_response)
                    return validation_result
                except json.JSONDecodeError:
                    return {
                        "validation_score": "N/A",
                        "issues_found": ["Could not parse Gemini response"],
                        "suggestions": [text_response],
                        "improved_code": r_code,
                        "summary": "Gemini response received but could not be parsed as JSON"
                    }
            else:
                return {
                    "validation_score": "N/A",
                    "issues_found": [f"API Error: {response.status_code}"],
                    "suggestions": ["Check API key and internet connection"],
                    "improved_code": r_code,
                    "summary": f"API request failed with status {response.status_code}"
                }


        except Exception as e:
            return {
                "validation_score": "N/A",
                "issues_found": [f"Exception: {str(e)}"],
                "suggestions": ["Check API key and internet connection"],
                "improved_code": r_code,
                "summary": f"Error during validation: {str(e)}"
            }

We define the GeminiValidator class to handle the validation of our R code using Google’s Gemini API. Inside it, we craft a detailed prompt that contains both the original Python code and the converted R code, asking Gemini to evaluate the accuracy, suggest improvements, and even rewrite the R code if necessary. We then send this prompt to the Gemini endpoint & parse the JSON response to extract meaningful feedback for improving our code conversion.

class EnhancedPythonToRConverter:
    """
    Enhanced Python to R converter with Gemini AI validation
    """


    def __init__(self, gemini_api_key: str = None):
        self.validator = GeminiValidator(gemini_api_key)


        self.import_mappings = {
            'pandas': 'library(dplyr)nlibrary(tidyr)nlibrary(readr)',
            'numpy': 'library(base)',
            'matplotlib.pyplot': 'library(ggplot2)',
            'seaborn': 'library(ggplot2)nlibrary(RColorBrewer)',
            'scipy.stats': 'library(stats)',
            'sklearn': 'library(caret)nlibrary(randomForest)nlibrary(e1071)',
            'statsmodels': 'library(stats)nlibrary(lmtest)',
            'plotly': 'library(plotly)',
        }


        self.function_mappings = {
            'pd.DataFrame': 'data.frame',
            'pd.read_csv': 'read.csv',
            'pd.read_excel': 'read_excel',
            'df.head': 'head',
            'df.tail': 'tail',
            'df.shape': 'dim',
            'df.info': 'str',
            'df.describe': 'summary',
            'df.mean': 'mean',
            'df.median': 'median',
            'df.std': 'sd',
            'df.var': 'var',
            'df.sum': 'sum',
            'df.count': 'length',
            'df.groupby': 'group_by',
            'df.merge': 'merge',
            'df.drop': 'select',
            'df.dropna': 'na.omit',
            'df.fillna': 'replace_na',
            'df.sort_values': 'arrange',
            'df.value_counts': 'table',


            'np.array': 'c',
            'np.mean': 'mean',
            'np.median': 'median',
            'np.std': 'sd',
            'np.var': 'var',
            'np.sum': 'sum',
            'np.min': 'min',
            'np.max': 'max',
            'np.sqrt': 'sqrt',
            'np.log': 'log',
            'np.exp': 'exp',
            'np.random.normal': 'rnorm',
            'np.random.uniform': 'runif',
            'np.linspace': 'seq',
            'np.arange': 'seq',


            'plt.figure': 'ggplot',
            'plt.plot': 'geom_line',
            'plt.scatter': 'geom_point',
            'plt.hist': 'geom_histogram',
            'plt.bar': 'geom_bar',
            'plt.boxplot': 'geom_boxplot',
            'plt.show': 'print',
            'sns.scatterplot': 'geom_point',
            'sns.histplot': 'geom_histogram',
            'sns.boxplot': 'geom_boxplot',
            'sns.heatmap': 'geom_tile',


            'scipy.stats.ttest_ind': 't.test',
            'scipy.stats.chi2_contingency': 'chisq.test',
            'scipy.stats.pearsonr': 'cor.test',
            'scipy.stats.spearmanr': 'cor.test',
            'scipy.stats.normaltest': 'shapiro.test',
            'stats.ttest_ind': 't.test',


            'sklearn.linear_model.LinearRegression': 'lm',
            'sklearn.ensemble.RandomForestRegressor': 'randomForest',
            'sklearn.model_selection.train_test_split': 'sample',
        }


        self.syntax_patterns = [
            (r'bTrueb', 'TRUE'),
            (r'bFalseb', 'FALSE'),
            (r'bNoneb', 'NULL'),
            (r'blen(', 'length('),
            (r'range((d+))', r'1:1'),
            (r'range((d+),s*(d+))', r'1:2'),
            (r'.split(', '.strsplit('),
            (r'.strip()', '.str_trim()'),
            (r'.lower()', '.str_to_lower()'),
            (r'.upper()', '.str_to_upper()'),
            (r'[0]', '[1]'),
            (r'f"([^"]*)"', r'paste0("1")'),
            (r"f'([^']*)'", r"paste0('1')"),
        ]


    def convert_imports(self, code: str) -> str:
        """Convert Python import statements to R library statements."""
        lines = code.split('n')
        converted_lines = []


        for line in lines:
            line = line.strip()
            if line.startswith('import ') or line.startswith('from '):
                if ' as ' in line:
                    if 'import' in line and 'as' in line:
                        parts = line.split(' as ')
                        module = parts[0].replace('import ', '').strip()
                        if module in self.import_mappings:
                            converted_lines.append(f"# {line}")
                            converted_lines.append(self.import_mappings[module])
                        else:
                            converted_lines.append(f"# {line} # No direct R equivalent")
                    elif 'from' in line and 'import' in line and 'as' in line:
                        converted_lines.append(f"# {line} # Handle specific imports manually")
                elif line.startswith('from '):
                    parts = line.split(' import ')
                    module = parts[0].replace('from ', '').strip()
                    if module in self.import_mappings:
                        converted_lines.append(f"# {line}")
                        converted_lines.append(self.import_mappings[module])
                    else:
                        converted_lines.append(f"# {line} # No direct R equivalent")
                else:
                    module = line.replace('import ', '').strip()
                    if module in self.import_mappings:
                        converted_lines.append(f"# {line}")
                        converted_lines.append(self.import_mappings[module])
                    else:
                        converted_lines.append(f"# {line} # No direct R equivalent")
            else:
                converted_lines.append(line)


        return 'n'.join(converted_lines)


    def convert_functions(self, code: str) -> str:
        """Convert Python function calls to R equivalents."""
        for py_func, r_func in self.function_mappings.items():
            code = code.replace(py_func, r_func)
        return code


    def apply_syntax_patterns(self, code: str) -> str:
        """Apply regex patterns to convert Python syntax to R syntax."""
        for pattern, replacement in self.syntax_patterns:
            code = re.sub(pattern, replacement, code)
        return code


    def convert_pandas_operations(self, code: str) -> str:
        """Convert common pandas operations to dplyr/tidyr equivalents."""
        code = re.sub(r'df[['"](.*?)['"]]', r'df$1', code)
        code = re.sub(r'df.(w+)', r'df$1', code)


        code = re.sub(r'df[df[['"](.*?)['"]]s*([><=!]+)s*([^]]+)]', r'df[df$1 2 3, ]', code)


        return code


    def convert_plotting(self, code: str) -> str:
        """Convert matplotlib/seaborn plotting to ggplot2."""
        conversions = [
            (r'plt.figure(figsize=((d+),s*(d+)))', r'# Set figure size in ggplot theme'),
            (r'plt.title(['"](.*?)['"])', r'+ ggtitle("1")'),
            (r'plt.xlabel(['"](.*?)['"])', r'+ xlab("1")'),
            (r'plt.ylabel(['"](.*?)['"])', r'+ ylab("1")'),
            (r'plt.legend()', r'+ theme(legend.position="right")'),
            (r'plt.grid(True)', r'+ theme(panel.grid.major = element_line())'),
        ]


        for pattern, replacement in conversions:
            code = re.sub(pattern, replacement, code)


        return code


    def add_r_context(self, code: str) -> str:
        """Add R-specific context and comments."""
        r_header = '''# R Statistical Analysis Code
# Converted from Python using Enhanced Converter with Gemini AI Validation
# Install required packages: install.packages(c("dplyr", "ggplot2", "tidyr", "readr"))


'''
        return r_header + code


    def convert_code(self, python_code: str) -> str:
        """Main conversion method that applies all transformations."""
        code = python_code.strip()


        code = self.convert_imports(code)
        code = self.convert_functions(code)
        code = self.convert_pandas_operations(code)
        code = self.convert_plotting(code)
        code = self.apply_syntax_patterns(code)
        code = self.add_r_context(code)


        return code


    def convert_and_validate(self, python_code: str, use_gemini: bool = True) -> Dict:
        """
        Convert Python code to R and validate with Gemini AI
        """
        r_code = self.convert_code(python_code)


        result = {
            "original_python": python_code,
            "converted_r": r_code,
            "validation": None
        }


        if use_gemini and self.validator.api_key:
            print("🔍 Validating conversion with Gemini AI...")
            validation = self.validator.validate_conversion(python_code, r_code)
            result["validation"] = validation


            if validation.get("improved_code") and validation.get("improved_code") != r_code:
                result["final_r_code"] = validation["improved_code"]
            else:
                result["final_r_code"] = r_code
        else:
            result["final_r_code"] = r_code
            if not self.validator.api_key:
                result["validation"] = {"note": "Set GEMINI_API_KEY for AI validation"}


        return result


    def print_results(self, results: Dict):
        """Pretty print the conversion results"""
        print("=" * 80)
        print("🐍 ORIGINAL PYTHON CODE")
        print("=" * 80)
        print(results["original_python"])


        print("n" + "=" * 80)
        print("📊 CONVERTED R CODE")
        print("=" * 80)
        print(results["final_r_code"])


        if results.get("validation"):
            validation = results["validation"]
            print("n" + "=" * 80)
            print("🤖 GEMINI AI VALIDATION")
            print("=" * 80)


            if validation.get("validation_score"):
                print(f"📈 Score: {validation['validation_score']}/100")


            if validation.get("summary"):
                print(f"📝 Summary: {validation['summary']}")


            if validation.get("issues_found"):
                print("n⚠  Issues Found:")
                for issue in validation["issues_found"]:
                    print(f"   • {issue}")


            if validation.get("suggestions"):
                print("n💡 Suggestions:")
                for suggestion in validation["suggestions"]:
                    print(f"   • {suggestion}")

We define the EnhancedPythonToRConverter class to handle the entire transformation pipeline from Python to R. Inside the constructor, we map key libraries, functions, and syntax patterns between the two languages. We then create modular methods to convert import statements, function calls, pandas operations, and matplotlib plots to their R equivalents. Finally, we integrate Gemini AI to automatically validate the translated R code and print improvement suggestions, enabling us to enhance conversion accuracy and reliability with a single method call.

def setup_gemini_key():
    """
    Instructions for setting up Gemini API key
    """
    print("🔑 SETTING UP GEMINI API KEY")
    print("=" * 50)
    print("1. Go to https://aistudio.google.com/")
    print("2. Sign in with your Google account")
    print("3. Click 'Get API Key'")
    print("4. Create a new API key")
    print("5. Copy the key and set it as environment variable:")
    print("   For Colab: import os; os.environ['GEMINI_API_KEY'] = 'your_key_here'")
    print("   For local: export GEMINI_API_KEY='your_key_here'")
    print("n✅ The API is FREE to use within generous limits!")


def demo_with_gemini():
    """
    Demo function that shows how to use the enhanced converter
    """
    print("🚀 ENHANCED PYTHON TO R CONVERTER WITH GEMINI AI")
    print("=" * 60)


    api_key = os.getenv('GEMINI_API_KEY')
    if not api_key:
        print("⚠  No Gemini API key found. Running without validation.")
        setup_gemini_key()
        print("n" + "=" * 60)


    converter = EnhancedPythonToRConverter(api_key)


    python_example = '''
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats


# Load and analyze data
df = pd.read_csv('sales_data.csv')
print(df.head())
print(df.describe())


# Statistical analysis
mean_sales = df['sales'].mean()
std_sales = df['sales'].std()
correlation = df['sales'].corr(df['marketing_spend'])


# Data filtering and grouping
high_sales = df[df['sales'] > mean_sales]
monthly_avg = df.groupby('month')['sales'].mean()


# Visualization
plt.figure(figsize=(10, 6))
plt.scatter(df['marketing_spend'], df['sales'])
plt.title('Sales vs Marketing Spend')
plt.xlabel('Marketing Spend')
plt.ylabel('Sales')
plt.show()


# Statistical test
t_stat, p_value = stats.ttest_ind(df['sales'], df['competitor_sales'])
print(f"T-test result: {t_stat:.3f}, p-value: {p_value:.3f}")
'''


    results = converter.convert_and_validate(python_example, use_gemini=bool(api_key))


    converter.print_results(results)


    return results

We create a helper function, setup_gemini_key(), to guide users in generating and setting up their free Gemini API key, ensuring they can unlock AI validation features effortlessly. In the demo_with_gemini() function, we demonstrate the full power of the converter by processing a sample Python data analysis script. We run the conversion, invoke Gemini AI for validation (if the API key is available), and print detailed feedback, showcasing how easily we can transform and verify Python code in R.

def colab_setup():
    """
    Easy setup function for Google Colab
    """
    print("📱 GOOGLE COLAB SETUP")
    print("=" * 40)
    print("1. Run this cell to install dependencies:")
    print("   !pip install requests")
    print("n2. Set your Gemini API key:")
    print("   import os")
    print("   os.environ['GEMINI_API_KEY'] = 'your_key_here'")
    print("n3. Run the demo:")
    print("   results = demo_with_gemini()")


if __name__ == "__main__":
    demo_with_gemini()

We provide a convenient colab_setup() function to help users quickly configure their environment in Google Colab. It includes step-by-step instructions for installing dependencies, setting the Gemini API key, and running the demo. Finally, in the __main__ block, we call demo_with_gemini() to automatically execute the conversion and validation pipeline when the script is run directly.

In conclusion, we’ve built a powerful tool that translates Python code to R and also verifies and enhances it using Gemini AI. We walk through the conversion of imports, function mappings, DataFrame operations, and plotting routines, while Gemini provides a second layer of validation to ensure accuracy and best practices. With this system in place, we can confidently convert analytical scripts from Python to R, making our workflow smoother and enhancing our cross-language capabilities.


Check out the CODES. All credit for this research goes to the researchers of this project.

Sponsorship Opportunity: Reach the most influential AI developers in US and Europe. 1M+ monthly readers, 500K+ community builders, infinite possibilities. [Explore Sponsorship]

The post Building a Smart Python-to-R Code Converter with Gemini AI-Powered Validation and Feedback appeared first on MarkTechPost.