0% found this document useful (0 votes)

6 views14 pages

Nai - Research Paper

The document discusses the development and performance measurement of a coding agent based on a large language model (LLM) for generating unit tests in open-source repositories, specifically on the Nutanix Cloud Platform. It highlights the complexities involved in coding LLMs compared to natural language models, including precision, execution semantics, and context management. The article also outlines the benchmarking of the LLM's performance using popular Python packages and the challenges faced in real-time code generation and deployment.

Uploaded by

priyanshu.jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views14 pages

Nai - Research Paper

Uploaded by

priyanshu.jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

2/22/25, 5:48 PM It's Time For AI: LLM-Based Unit Tests for OpenSource Repositories | Nutanix / tech center

Nutanix Bible
It's Time For AI: LLM-Based UnitNutanix
TestsDev
fo… Nutanix.com Nutanix Community

ASSET TYPE
Blog Post
It's Time For AI: LLM-Based Unit
TECHNOLOGIES
Artificial Intelli…
Tests for OpenSource
Repositories
By Ria Mundhara, Intern; Marina Kalikanin, Member of Technical Staff; Harshil Dadlani, Member of
Technical Staff; Rajat Ghosh, Staff Data Scientist; Debojyoti Dutta, VP of Engineering

September 26, 2024 5:58 am

In this article, we present how a coding agent is built based on a large

language model (LLM) and how we measure its performance on the
Nutanix Cloud Platform solution, showcasing its robust capability in
managing intricate workflows.

PAGE CONTENT
Executive Summary
In this article, we present how a coding agent is built based on a large language model (LLM)
and how we measure its performance on the Nutanix Cloud Platform solution, showcasing its
robust capability in managing intricate workflows.

Introduction
When we discuss coding language models (LLMs) and natural language (NL) language models
comparatively, such as Llama3 vs. CodeLlama, we could readily identify some distinctions. In fact,
coding LLMs are significantly more challenging to develop and work with compared to NL LLMs
for the following reasons.

1. Precision and Syntax Sensitivity: Code is a formal language with strict syntax rules and
structures. A minor error, such as a misplaced bracket or a missing semicolon, can lead to
errors that prevent the code from functioning. This requires the LLM to have a high degree
of precision and an understanding of syntactic correctness, which is generally more
stringent than the flexibility seen in natural language.
2. Execution Semantics: Code not only needs to be syntactically correct, but it also has to be
semantically valid—that is, it needs to perform the function it is supposed to do. Unlike
natural language, where the meaning can be implicitly interpreted and still understood

https://www.nutanix.com/tech-center/blog/llm-based-unit-tests-for-opensource-repositories 1/15
2/22/25, 5:48 PM It's Time For AI: LLM-Based Unit Tests for OpenSource Repositories | Nutanix / tech center
even if somewhat imprecisely expressed, code execution needs to yield very specific
It's Time For AI:
outcomes. If aLLM-Based Unit
code LLM gets the Testswrong,
semantics fo… the program might not work at all or
might perform unintended operations.
3. Context and Dependency Management: Code often involves multiple files or modules that
interact with each other, and changes in one part can affect others. Understanding and
managing these dependencies and contexts is crucial for a coding LLM, which adds a layer
of complexity compared to handling standalone text in natural language.
4. Variety of Programming Languages: There are many programming languages, each with its
own syntax, idioms, and usage contexts. A coding LLM needs to potentially handle multiple
languages, understand their unique characteristics, and switch contexts appropriately. This
is analogous to a multilingual NL LLM but often with less tolerance for error.
5. Data Availability and Diversity: While there is a vast amount of natural language data
available from books, websites, and other sources, high-quality, annotated programming
data can be more limited. Code also lacks the redundancy and variability of natural
languages, which can make training more difficult.
6. Understanding the Underlying Logic: Writing effective code involves understanding
algorithms and logic. This requires not only language understanding but also
computational thinking, which adds an additional layer of complexity for LLMs designed to
generate or interpret code.
7. Integration and Testing Requirements: For a coding LLM, the generated code often needs to
be tested to ensure it works as intended. This involves integrating with software
development environments and tools, which is more complex than the generally self-
contained process of generating text in natural language.

Each of these aspects makes the development and effective operation of coding LLMs a
challenging task, often requiring more specialized knowledge and sophisticated techniques
compared to natural language LLMs.

The deployment and life-cycle management of a LLM-serving API is challenging because of the
autoregressive nature of the transformer-based generation algorithm. For code LLM, the problem
is more acute for the following reasons:

1. Real-Time Performance: In many applications, coding LLMs are expected to provide real-
time assistance to developers, such as for code completion, debugging, or even generating
code snippets on the fly. Meeting these performance expectations requires highly efficient
models and infrastructure to minimize latency, which can be technically challenging and
resource-intensive.
2. Scalability and Resource Management: Code generation tasks can be computationally
expensive, especially when handling complex codebases or generating lengthy code
outputs. Efficiently scaling the service to handle multiple concurrent users without
degrading performance demands sophisticated resource management and possibly
significant computational resources. Also, the attention computation in the inference time
takes quadratic time complexity with respect to the input. Often, the input sequence length
for the code models are significantly higher than the NL models.
3. Context Management: Effective code generation often requires understanding not just the
immediate code snippet but also broader project contexts, such as libraries used, the overall
software architecture, and even the specific project's coding standards. Maintaining and
accessing this contextual information in a way that is both accurate and efficient adds
complexity to the serving infrastructure.
4. Security Concerns: Serving a coding LLM involves potential security risks, not only in terms
of the security of the model itself (e.g., preventing unauthorized access) but also ensuring
that the code it generates does not introduce security vulnerabilities into user projects.
Ensuring both model and output security requires rigorous security measures and constant
vigilance.

https://www.nutanix.com/tech-center/blog/llm-based-unit-tests-for-opensource-repositories 2/15
2/22/25, 5:48 PM It's Time For AI: LLM-Based Unit Tests for OpenSource Repositories | Nutanix / tech center
In summary, code LLMs are much harder to train and deploy for inference than NL LLMs. In this
It'sarticle,
TimeweFor AI:
cover anLLM-Based Unit
API benchmarking for aTests fo…
code generation developed entirely on Nutanix
infrastructure.

Code Generation Workflow

Figure 1: Workflow of an LLM-assisted code generation system

Figure 1 shows an LLM-assisted code generation workflow. It combines a context with a prompt
with a prompt template to generate the input sequence to a large language model (LLM). Then,
the LLM generates the output which is passed to the evaluation system. If the output is
satisfactory, the user can revise the prompt, prompt template, and LLM used. Figure 1 shows the
taxonomy for the LLM-assisted code generation workflow.

Table 1: Taxonomy for the LLM-assisted code generation workflow

Table 1

Term Description Example

Instruction
Prompt Write unit test to the following function
to an LLM

Context Code body

def two_sum(nums, target):
on which
the hash_map = {}
instruction
is executed for index, num in enumerate(nums):
difference = target - num
if difference in hash_map:
return [hash_map[difference], index]
hash_map[num] = index

https://www.nutanix.com/tech-center/blog/llm-based-unit-tests-for-opensource-repositories 3/15
2/22/25, 5:48 PM It's Time For AI: LLM-Based Unit Tests for OpenSource Repositories | Nutanix / tech center

It's Time For AI: LLM-Basedreturn

Unit Tests
None fo…

<PROMPT>
Template to
Prompt combine Context:
Template prompt and <CONTEXT>
context
Response:

A
combination
of prompt
Input and context Prompt template with specific prompt and context
through
prompt
template

Large
LLM Language CodeLlame, Starcoder
Model

import unittest

class TestTwoSum(unittest.TestCase):

def test_two_sum_normal(self):

self.assertEqual(two_sum([2, 7, 11, 15], 9), [0, 1

def test_two_sum_no_solution(self):
self.assertIsNone(two_sum([1, 2, 3, 4], 10))
Output
Output generated
def test_two_sum_negative_numbers(self):
by the LLM
self.assertEqual(two_sum([-3, 4, 3, 90], 0), [0, 2

def test_two_sum_same_element_twice(self):
self.assertIsNone(two_sum([3, 3], 6))

def test_two_sum_one_element(self):
self.assertIsNone(two_sum([3], 3))

def test_two_sum_empty_list(self):
self.assertIsNone(two_sum([], 3))

Accuracy
assessment Provide feedback on the quality of the generated output and
Evaluation by a subject experiment with prompt, prompt template, and/or LLM for a given
matter context.
expert

https://www.nutanix.com/tech-center/blog/llm-based-unit-tests-for-opensource-repositories 4/15
2/22/25, 5:48 PM It's Time For AI: LLM-Based Unit Tests for OpenSource Repositories | Nutanix / tech center

Nutanix Cloud Platform

It's Time For AI: LLM-Based Unit Tests fo…
At Nutanix, we are dedicated to enabling customers to build and deploy intelligent applications
anywhere—edge, core data centers, service provider infrastructure, and public clouds. Figure 2
shows a schematic architecture of Nutanix GPT-in-a-Box 2.0, an enterprise AI platform running on
Nutanix cloud platform.

Figure 2: AI stack running on the cloud-native infrastructure stack of NCP.

As shown in Figure 2, the App layer runs on the top of the infrastructure layer of the Nutanix GPT-
in-a-Box 2.0 system used in the testing described below. The infrastructure layer can be deployed
in two steps, starting with Prism Element console login followed by VM resource configuration.
Figure 3 shows the UI for the Prism Element controller.

Figure 3: The UI showing the setup for a Prism Element console on which the transformer model
for this article was trained. It shows the AHV hypervisor summary, storage summary, VM
summary, hardware summary, monitoring for cluster-wide controller IOPS, monitoring for
cluster-wide controller I/O bandwidth, monitoring for cluster-wide controller latency, cluster CPU
usage, cluster memory usage, granular health indicators, and data resiliency status.

After logging into Prism Element, we create a virtual machine (VM) hosted on our Nutanix AHV
cluster. As shown in Figure 4, the VM has following resource configuration settings: 22.04 Ubuntu

https://www.nutanix.com/tech-center/blog/llm-based-unit-tests-for-opensource-repositories 5/15
2/22/25, 5:48 PM It's Time For AI: LLM-Based Unit Tests for OpenSource Repositories | Nutanix / tech center
operating system, 16 single core vCPUs, 64 GB of RAM, and a NVIDIA A100 tensor core
It'spassthrough
Time ForGPUAI:with
LLM-Based Unit
40 GB memory. The Tests fo… with the NVIDIA RTX 15.0 driver for
GPU is installed
Ubuntu OS (NVIDIA-Linux-x86_64-525.60.13-grid.run). The large deep learning models with
transformer architecture require GPU or other compute accelerators with high memory
bandwidth, large registers and L1 memory.

Figure 4: The VM resource configuration UI pane on Nutanix Prism Element. As shown, it helps a
user configure the number of vCPU(s), the number of cores per vCPUs, memory size (GiB), and
GPU choice. We used an NVIDIA A100 80G for this article.

The NVIDIA A100 Tensor Core GPU is designed to power the world’s highest-performing elastic
datacenters for AI, data analytics, and HPC. Powered by the NVIDIA Ampere™ architecture, A100
is the engine of the NVIDIA data center platform. A100 provides up to 20X higher performance
over the prior generation and can be partitioned into seven GPU instances to dynamically adjust
to shifting demands.

https://www.nutanix.com/tech-center/blog/llm-based-unit-tests-for-opensource-repositories 6/15
2/22/25, 5:48 PM It's Time For AI: LLM-Based Unit Tests for OpenSource Repositories | Nutanix / tech center
To peek into the detailed features of A100 GPU, we run `nvidia-smi` command which is a
It'scommand
Time For
lineAI: LLM-Based
utility, based on top ofUnit TestsManagement
the NVIDIA fo… Library (NVML), and intended to
aid in the management and monitoring of NVIDIA GPU devices. The output of the `nvidia-smi`
command is shown in Figure 6. It shows the Driver Version to be 515.86.01 and CUDA version to be
11.7. Figure 5 shows several critical features of the A100 GPU we used. The details of these features
are described in Table 1.

Figure 5: Output of `nvidia-smi` for the underlying A100 GPU

Table 2: Description of the key features of the underlying A100 GPU.

Table 2

Feature Value Description

GPU 0 GPU Index

Name NVIDIA A100 GPU Name

Temp 34C Core GPU Temperature

Perf P0 GPU Performance

Persistence-M On Persistence Mode

Pwr: Usage/Cap 36W/250W GPU Power Usage and its capacity

Bus Id 00000000:00:06.0 domain:bus:device.function

Disp. A Off Display Active

Memory-Usage 25939MiB/40960MiB Memory allocation out of total memory

Volatile Uncorr. ECC 0 Counter of uncorrectable ECC memory error

https://www.nutanix.com/tech-center/blog/llm-based-unit-tests-for-opensource-repositories 7/15
2/22/25, 5:48 PM It's Time For AI: LLM-Based Unit Tests for OpenSource Repositories | Nutanix / tech center

GPU-Util 0% GPU Utilization

It's Time For AI: LLM-Based Unit Tests fo…
Compute M. Default Compute Mode

MIG M. Disabled Multi-Instance Mode

Benchmarking Hypothesis
We aim to study the impact of input and output token size on latency, as well as identify any
memory or time bottlenecks in the workflow. It is instructive to choose the right code datasets for
this benchmarking, and we chose to use code from the GitHub repositories for three popular
Python packages: NumPy, PyTorch, and Seaborn. These packages were chosen because their
repositories include distinct complexities that could affect the unit test generation.

NumPy is a package for highly optimized array operations. Its codebase includes a wide
range of mathematical functions which are relatively straightforward to write unit tests for.
PyTorch is a popular optimized Deep Learning tensor library. Its complexity in model
architectures introduces unique challenges in test generation.
Seaborn is a Python data visualization library. Unlike NumPy and PyTorch, Seaborn’s focus
on rendering visualizations adds a layer of complexity in terms of testing image outputs.

For the code LLM API, we have used Meta-Llama-3-8B-Instruct. The API server was implemented
using FastAPI.

Results

Latency

First, we measured the latency for each of the requests and compared it with the corresponding
input/output token counts. Specifically, we measured the following metrics:

Latency: The time elapsed from the moment the API endpoint is called to when the output
is received and written to a test file.
Input Token Count: The number of tokens in the API call query.
Output Token Count: The number of tokens in the API call response.

As expected, the latency for all three packages closely fit an exponential distribution (p-value <
0.001). Figure 1 shows the fitted distribution, with the P99 latencies in red.

https://www.nutanix.com/tech-center/blog/llm-based-unit-tests-for-opensource-repositories 8/15
2/22/25, 5:48 PM It's Time For AI: LLM-Based Unit Tests for OpenSource Repositories | Nutanix / tech center

It's Time For AI: LLM-Based Unit Tests fo…

Figure 6: Latency distribution for all 3 packages. The black line shows the fitted exponential
distribution, and the red line denotes P99 latency.

The P99 latency for the NumPy repo appears higher than for the Seaborn and PyTorch repos. This
could be explained by the fact that the NumPy input files were on average larger, and had more
functions per file, than the PyTorch and Seaborn input files.

Figure 6 shows the correlation matrix among latency, input token count and output token count
for each individual package. There is an almost perfect linear correlation between latency and
output token count in all cases.

https://www.nutanix.com/tech-center/blog/llm-based-unit-tests-for-opensource-repositories 9/15
2/22/25, 5:48 PM It's Time For AI: LLM-Based Unit Tests for OpenSource Repositories | Nutanix / tech center

It's Time For AI: LLM-Based Unit Tests fo…

Figure 7: Correlation matrix for the different packages

Figure 7 shows the jointplot between latency and output token count for all 3 repositories. It
clearly shows that latency increases with output token count. This proportionality can be
explained by the fact that the LMM generates one token at a time.

https://www.nutanix.com/tech-center/blog/llm-based-unit-tests-for-opensource-repositories 10/15
2/22/25, 5:48 PM It's Time For AI: LLM-Based Unit Tests for OpenSource Repositories | Nutanix / tech center

It's Time For AI: LLM-Based Unit Tests fo…

Figure 8: There is an almost perfect linear correlation between latency and output token count for
all 3 packages. There is no statistically significant difference between the regression lines for the 3
packages.

Interestingly, while there is a relatively high correlation between input token counts and latency
for the PyTorch and NumPy repos, this is not the case for the Seaborn repo. Given the heavy
emphasis on visualization within the Seaborn repository, input token count may not be a good
measure of input complexity for the Seaborn repository. Rather, the complexity in unit test
generation for Seaborn comes from validating image, rather than textual, output. This complexity
remains regardless of input length.

For all three packages, we notice outliers in the latency against input token count graph. Where
latency is high for a low input token count, the input file tends to have a large number of utility
functions with no doc strings or comments explaining their use (for example, husl.py from the
Seaborn repo). Where latency is low for a high input token count, the input file tends to be mostly
comments, or lists of configurations and constants that do not need to be unit tested.

Memory Usage

Next, we look at memory usage per line of code during the test generation workflow, in order to
find memory bottlenecks in the programme. The memory profiler module was used to log
memory usage per line of code for all Python scripts in the PyTorch repository. During the unit
test generation workflow, four main functions are called:
https://www.nutanix.com/tech-center/blog/llm-based-unit-tests-for-opensource-repositories 11/15
2/22/25, 5:48 PM It's Time For AI: LLM-Based Unit Tests for OpenSource Repositories | Nutanix / tech center
generate_test_file
It's Time For AI: LLM-Based Unit Tests fo…
parse_code
run_main_agent
run_combiner_agent

Figure 8 shows the memory usage per line of code for each of these four functions.

Figure 9: Memory usage against line number for test generation. Each line represents a file.

From these graphs, we notice some key bottlenecks. First, line 81 of run_main_agent:

for f in self.extracted_functions:
agent.generate_direct_vllm(
context=f, file_name=self.file_name, **kwargs
)

The memory used here scales linearly with the number of functions extracted from the input file.
As a result, files with many function definitions cause the spikes in memory usage observed.

Similar behavior is seen in run_combiner_agent.Memory usage scales linearly with the number of
classes, methods and import statements extracted from the file.

Time Complexity

To identify any timing bottlenecks, cProfile was used to profile the timing behavior of unit test
generation on the PyTorch repository. The flame graph in Figure 5 describes the relative time
spent in different parts of the workflow. As expected, the most time is spent waiting for the vLLM
response.

https://www.nutanix.com/tech-center/blog/llm-based-unit-tests-for-opensource-repositories 12/15
2/22/25, 5:48 PM It's Time For AI: LLM-Based Unit Tests for OpenSource Repositories | Nutanix / tech center

It's Time For AI: LLM-Based Unit Tests fo…

Figure 10: Flame graph showing time spent in different functions. The width of each frame
corresponds to the time spent in that function, and the call stack can be recreated by tracing
frames upwards.

Combining this with insights from the latency benchmarking, we know that larger test files
require more time to be generated.

Insights
The response time varies proportionally with the output token count, and memory usage
varies proportionally with the number of classes, methods and import statements in the
input file.
On average, the response times for both use cases vary between 0 and 20s.

Conclusion
This article demonstrates how we can benchmark an LLM-based unit test writing API for different
open-source repositories. The benchmarking process not only highlights the efficiency and
coverage of the generated tests but also provides insights into the strengths and limitations of
the LLM in diverse codebases. By systematically evaluating performance metrics such as
accuracy, execution time, and test coverage across multiple repositories, we can better
understand the contexts in which LLMs excel and where improvements are needed. Future work
could focus on refining the model's understanding of complex logic patterns and enhancing its
adaptability to various coding styles, ultimately leading to more robust and reliable unit test
generation tools.

https://www.nutanix.com/tech-center/blog/llm-based-unit-tests-for-opensource-repositories 13/15
2/22/25, 5:48 PM It's Time For AI: LLM-Based Unit Tests for OpenSource Repositories | Nutanix / tech center
the United States and other countries. Other brand names mentioned herein are for
It's Time
identification purposes For
only and mayAI:
beLLM-Based Unit
the trademarks of Tests fo…
their respective holder(s). The third-
party products in this article are referenced for demonstration purposes only. Nutanix is not
affiliated with, endorsed by, or sponsored by these third-party companies. The use of these third
party products is solely for illustrative purposes to demonstrate the features and capabilities of
Nutanix's products. This post may contain links to external websites that are not part of
Nutanix.com. Nutanix does not control these sites and disclaims all responsibility for the content
or accuracy of any external site. Our decision to link to an external site should not be considered
an endorsement of any content on such a site. Certain information contained in this post may
relate to or be based on studies, publications, surveys and other data obtained from third-party
sources and our own internal estimates and research. While we believe these third-party studies,
publications, surveys and other data are reliable as of the date of this post, they have not
independently verified, and we make no representation as to the adequacy, fairness, accuracy, or
completeness of any information obtained from third-party sources.

This post may contain express and implied forward-looking statements, which are not historical
facts and are instead based on our current expectations, estimates and beliefs. The accuracy of
such statements involves risks and uncertainties and depends upon future events, including
those that may be beyond our control, and actual results may differ materially and adversely from
those anticipated or implied by such statements. Any forward-looking statements included
herein speak only as of the date hereof and, except as required by law, we assume no obligation
to update or otherwise revise any of such forward-looking statements to reflect subsequent
events or circumstances.

Related Resources

Hybrid Cloud: AOS Oracle on Nutanix Oracle on Nutanix Oracle on Nutanix

6.5 with AHV On-…

This document details the This best practice guide is This best practice guide is This best practice guide is
design decisions that for individuals designing for individuals designing for individuals designing
support the deployment of and maintaining Nutanix and maintaining Nutanix and maintaining Nutanix
a scalable, resilient, and solutions for single- solutions for single- solutions for single-
secure private cloud… instance Oracle Database… instance Oracle Database… instance Oracle Database…

PRODUCTS AHV PRODUCTS NUTANIX PRODUCTS NUTANIX PRODUCTS NUTANIX

VIRTUALIZATION,… DATABASE SERVICE DATABASE SERVICE DATABASE SERVICE

tech center Nutanix.com Nutanix Bible Nutanix.dev Nutanix Community

https://www.nutanix.com/tech-center/blog/llm-based-unit-tests-for-opensource-repositories 14/15

Tutorials Dojo Study Guide and Cheat Sheets - AZ-900 Microsoft Azure Fundamentals (Jon Bonso, Gerome Pagatpatan) (Z-Library)
No ratings yet
Tutorials Dojo Study Guide and Cheat Sheets - AZ-900 Microsoft Azure Fundamentals (Jon Bonso, Gerome Pagatpatan) (Z-Library)
147 pages
LLMs in Production-MLC - GRC
No ratings yet
LLMs in Production-MLC - GRC
39 pages
03-KeenRay DR50M Introduction
No ratings yet
03-KeenRay DR50M Introduction
25 pages
Programming with Nim: Definitive Reference for Developers and Engineers
From Everand
Programming with Nim: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Oracle-Guided Program Selection From Large Language Models: Zhiyu Fan Haifeng Ruan
No ratings yet
Oracle-Guided Program Selection From Large Language Models: Zhiyu Fan Haifeng Ruan
13 pages
LLM's For Code Generation
No ratings yet
LLM's For Code Generation
31 pages
A - Review - On - Code - Generation - With - LLMs - Application - and - Evaluation 2
No ratings yet
A - Review - On - Code - Generation - With - LLMs - Application - and - Evaluation 2
6 pages
ASE2024 CodeGenSurvey-7
No ratings yet
ASE2024 CodeGenSurvey-7
17 pages
Mastering the Art of C# Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Art of C# Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Bugs in LLms Genereated Code
No ratings yet
Bugs in LLms Genereated Code
47 pages
Towards Advancing Code Generation With Large Language Models: A Research Roadmap
No ratings yet
Towards Advancing Code Generation With Large Language Models: A Research Roadmap
10 pages
Can We Trust Large Language Models Generated Code A
No ratings yet
Can We Trust Large Language Models Generated Code A
27 pages
Mastering the Art of Smalltalk Programming: Advanced Techniques and Skills
From Everand
Mastering the Art of Smalltalk Programming: Advanced Techniques and Skills
Steve Jones
No ratings yet
Using LLMs For Smart Contract Programming
No ratings yet
Using LLMs For Smart Contract Programming
16 pages
C# OOP Step by Step: A Practical Guide with Examples
From Everand
C# OOP Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Objective-C Language Reference and Techniques: Definitive Reference for Developers and Engineers
From Everand
Objective-C Language Reference and Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
WLang Essentials: Definitive Reference for Developers and Engineers
From Everand
WLang Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
G AIT - LLM-: Enerative Oolkit A Framework For Increasing The Quality of Based Applications Over Their Whole Life Cycle
No ratings yet
G AIT - LLM-: Enerative Oolkit A Framework For Increasing The Quality of Based Applications Over Their Whole Life Cycle
16 pages
Vala Programming Language Essentials: Definitive Reference for Developers and Engineers
From Everand
Vala Programming Language Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Kotlin Essentials: Definitive Reference for Developers and Engineers
From Everand
Kotlin Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
CodePori - Large-Scale System For Autonomous Software Development Using Multi-Agent Technology - 2402.01411v2
No ratings yet
CodePori - Large-Scale System For Autonomous Software Development Using Multi-Agent Technology - 2402.01411v2
23 pages
LLM Seminar PDF
No ratings yet
LLM Seminar PDF
10 pages
Planet, Code - PYTHON For LARGE LANGUAGE MODELS - A Beginners Handbook For Leveraging Llms Into Modern Development Workflows and Applications (2025)
No ratings yet
Planet, Code - PYTHON For LARGE LANGUAGE MODELS - A Beginners Handbook For Leveraging Llms Into Modern Development Workflows and Applications (2025)
254 pages
C++ OOP Made Simple: A Practical Guide with Examples
From Everand
C++ OOP Made Simple: A Practical Guide with Examples
William E. Clark
No ratings yet
From - LLMs To - LLM - Based - Agents
No ratings yet
From - LLMs To - LLM - Based - Agents
42 pages
Zig Programming Essentials: Definitive Reference for Developers and Engineers
From Everand
Zig Programming Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
From Llms To Llm-Based Agents For Software Engineering: A Survey of Current, Challenges and Future
No ratings yet
From Llms To Llm-Based Agents For Software Engineering: A Survey of Current, Challenges and Future
50 pages
From LLMs To LLM Based Agents For Software Engineering 1723301316
100% (1)
From LLMs To LLM Based Agents For Software Engineering 1723301316
42 pages
Advanced Metaprogramming Techniques: Definitive Reference for Developers and Engineers
From Everand
Advanced Metaprogramming Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Clojure Essentials: Definitive Reference for Developers and Engineers
From Everand
Clojure Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Application of Large Language
No ratings yet
Application of Large Language
75 pages
Evaluating Large Language Model (LLM) Systems: Metrics, Challenges, and Best Practices
No ratings yet
Evaluating Large Language Model (LLM) Systems: Metrics, Challenges, and Best Practices
27 pages
Code Generation Techniques and Applications: Definitive Reference for Developers and Engineers
From Everand
Code Generation Techniques and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Studying The Quality of Source Code Generated by Different AI Generative Engines An Empirical Evaluation
No ratings yet
Studying The Quality of Source Code Generated by Different AI Generative Engines An Empirical Evaluation
19 pages
Multi-Language Unit Testing LLM 2024
No ratings yet
Multi-Language Unit Testing LLM 2024
13 pages
Programming Best Practices for New Developers: A Practical Guide with Examples
From Everand
Programming Best Practices for New Developers: A Practical Guide with Examples
William E. Clark
No ratings yet
Mastering the Craft of C Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Craft of C Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Text To Web Application Using LLM
No ratings yet
Text To Web Application Using LLM
4 pages
Can Large Language Models Write Parallel Code?: Daniel Nichols Joshua H. Davis Zhaojun Xie
No ratings yet
Can Large Language Models Write Parallel Code?: Daniel Nichols Joshua H. Davis Zhaojun Xie
14 pages
LLM Project Guide
No ratings yet
LLM Project Guide
4 pages
C++ Automation Basics: A Practical Guide with Examples
From Everand
C++ Automation Basics: A Practical Guide with Examples
William E. Clark
No ratings yet
Case Study For Procurement
No ratings yet
Case Study For Procurement
62 pages
Essentials of Swift Programming: Definitive Reference for Developers and Engineers
From Everand
Essentials of Swift Programming: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Kotlin Made Simple: A Practical Guide with Examples
From Everand
Kotlin Made Simple: A Practical Guide with Examples
William E. Clark
No ratings yet
C# Fundamentals Made Simple: A Practical Guide with Examples
From Everand
C# Fundamentals Made Simple: A Practical Guide with Examples
William E. Clark
No ratings yet
Mastering Simula Programming: From Basics to Expert Proficiency
From Everand
Mastering Simula Programming: From Basics to Expert Proficiency
William Smith
No ratings yet
Efficient Development with RubyMine: Definitive Reference for Developers and Engineers
From Everand
Efficient Development with RubyMine: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Code Beneath the Surface: Mastering Assembly Programming
From Everand
Code Beneath the Surface: Mastering Assembly Programming
Kameron Hussain
No ratings yet
Rebol Programming Insights: Definitive Reference for Developers and Engineers
From Everand
Rebol Programming Insights: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
OSDI-Llumnix - Dynamic Scheduling For Large Language Model Serving
No ratings yet
OSDI-Llumnix - Dynamic Scheduling For Large Language Model Serving
20 pages
LLM Framework - Documentation
100% (2)
LLM Framework - Documentation
23 pages
Test-Driven Development and LLM-based Code Generation: Noble Saji Mathews Meiyappan Nagappan
No ratings yet
Test-Driven Development and LLM-based Code Generation: Noble Saji Mathews Meiyappan Nagappan
12 pages
Lua Programming Essentials: Definitive Reference for Developers and Engineers
From Everand
Lua Programming Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
A Review On Edge Large Language Models: Design, Execution, and Applications
No ratings yet
A Review On Edge Large Language Models: Design, Execution, and Applications
37 pages
Tacn VD 1 4
No ratings yet
Tacn VD 1 4
6 pages
Concurrency in C++: Writing High-Performance Multithreaded Code
From Everand
Concurrency in C++: Writing High-Performance Multithreaded Code
Robert Johnson
No ratings yet
Delphi Programming Essentials: Definitive Reference for Developers and Engineers
From Everand
Delphi Programming Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Assessing Large Language Models For Code Generation: A Comprehensive Framework
No ratings yet
Assessing Large Language Models For Code Generation: A Comprehensive Framework
6 pages
Final-Year Project Proposal 1st Draft (Updated)
No ratings yet
Final-Year Project Proposal 1st Draft (Updated)
3 pages
An Empirical Evaluation of Using Large Language Models For Automated Unit Test Generation
No ratings yet
An Empirical Evaluation of Using Large Language Models For Automated Unit Test Generation
21 pages
Mastering the Art of Nix Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Art of Nix Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Tcl Language Essentials: Definitive Reference for Developers and Engineers
From Everand
Tcl Language Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
LOGO! 8 Soft Comfort Online-Help English Sides 105-109 (VM Range)
No ratings yet
LOGO! 8 Soft Comfort Online-Help English Sides 105-109 (VM Range)
5 pages
Seatronx Products 2019
100% (1)
Seatronx Products 2019
28 pages
Smali Patcher Ex Guide
No ratings yet
Smali Patcher Ex Guide
15 pages
ELM329DSD
No ratings yet
ELM329DSD
90 pages
Synopsis: A Project
No ratings yet
Synopsis: A Project
10 pages
C Notes Final
No ratings yet
C Notes Final
115 pages
What Is A Remote Desktop Connection
No ratings yet
What Is A Remote Desktop Connection
5 pages
KMS - VL - ALL - Smart Activation Script: Discussion in 'KMS and Other Tools' Started by Abbodi1406, Nov 9, 2013
No ratings yet
KMS - VL - ALL - Smart Activation Script: Discussion in 'KMS and Other Tools' Started by Abbodi1406, Nov 9, 2013
9 pages
Instruamp 1
No ratings yet
Instruamp 1
6 pages
IGNITE'24
No ratings yet
IGNITE'24
18 pages
CP-E 24/20.0 Power Supply In:115/230VAC Out: 24VDC/20A: General Information
No ratings yet
CP-E 24/20.0 Power Supply In:115/230VAC Out: 24VDC/20A: General Information
4 pages
High Power Devices
100% (2)
High Power Devices
3 pages
Ovation Compact Controller: Model OCC100
No ratings yet
Ovation Compact Controller: Model OCC100
7 pages
Hospital Management System - FUNCTIONS
No ratings yet
Hospital Management System - FUNCTIONS
18 pages
EJSRRVOL2ISS402 00af6ee6ef4d4e420742
No ratings yet
EJSRRVOL2ISS402 00af6ee6ef4d4e420742
32 pages
Simulation of Inverter Circuit Using Mul
No ratings yet
Simulation of Inverter Circuit Using Mul
11 pages
January ICT Digest
No ratings yet
January ICT Digest
36 pages
1st and 2nd Generation AMD Embedded G-Series System-on-Chip (SOC)
No ratings yet
1st and 2nd Generation AMD Embedded G-Series System-on-Chip (SOC)
3 pages
Report WLAN
No ratings yet
Report WLAN
28 pages
BasiliskII Manual PDF
No ratings yet
BasiliskII Manual PDF
138 pages
Topstar C46 VerA PDF
No ratings yet
Topstar C46 VerA PDF
59 pages
Integrated Protection & Control System: GE Power Management
No ratings yet
Integrated Protection & Control System: GE Power Management
148 pages
HA388867
No ratings yet
HA388867
14 pages
MSX Red Book
No ratings yet
MSX Red Book
194 pages
Angular 'This' Reference Undefined Within Set Function Each - GoJS - Northwoods Software
No ratings yet
Angular 'This' Reference Undefined Within Set Function Each - GoJS - Northwoods Software
3 pages
Lab 4 (35,66)
No ratings yet
Lab 4 (35,66)
14 pages
Mechatronics Lab 3
No ratings yet
Mechatronics Lab 3
14 pages
Cst206 Scheme
No ratings yet
Cst206 Scheme
4 pages

Nai - Research Paper

Uploaded by

Nai - Research Paper

Uploaded by

2/22/25, 5:48 PM It's Time For AI: LLM-Based Unit Tests for OpenSource Repositories | Nutanix / tech center

September 26, 2024 5:58 am

In this article, we present how a coding agent is built based on a large

Code Generation Workflow

Figure 1: Workflow of an LLM-assisted code generation system

Table 1: Taxonomy for the LLM-assisted code generation workflow

Term Description Example

Context Code body

It's Time For AI: LLM-Basedreturn

self.assertEqual(two_sum([2, 7, 11, 15], 9), [0, 1

Nutanix Cloud Platform

Figure 2: AI stack running on the cloud-native infrastructure stack of NCP.

Figure 5: Output of `nvidia-smi` for the underlying A100 GPU

Table 2: Description of the key features of the underlying A100 GPU.

Feature Value Description

GPU 0 GPU Index

Name NVIDIA A100 GPU Name

Temp 34C Core GPU Temperature

Perf P0 GPU Performance

Persistence-M On Persistence Mode

Pwr: Usage/Cap 36W/250W GPU Power Usage and its capacity

Bus Id 00000000:00:06.0 domain:bus:device.function

Disp. A Off Display Active

Memory-Usage 25939MiB/40960MiB Memory allocation out of total memory

Volatile Uncorr. ECC 0 Counter of uncorrectable ECC memory error

GPU-Util 0% GPU Utilization

MIG M. Disabled Multi-Instance Mode

It's Time For AI: LLM-Based Unit Tests fo…

It's Time For AI: LLM-Based Unit Tests fo…

Figure 7: Correlation matrix for the different packages

It's Time For AI: LLM-Based Unit Tests fo…

It's Time For AI: LLM-Based Unit Tests fo…

Hybrid Cloud: AOS Oracle on Nutanix Oracle on Nutanix Oracle on Nutanix

PRODUCTS AHV PRODUCTS NUTANIX PRODUCTS NUTANIX PRODUCTS NUTANIX

VIRTUALIZATION,… DATABASE SERVICE DATABASE SERVICE DATABASE SERVICE

tech center Nutanix.com Nutanix Bible Nutanix.dev Nutanix Community

You might also like