Documentation Generation

Code documentation is crucial for any software development project. It helps to make code understandable, maintainable and reduces the time required to fix bugs. Documentation for classes and functions is particularly important as it gives an overview of what the code does, how to use it, and any constraints or limitations. However, writing documentation can be time-consuming and often gets neglected.

In this article, we will explore how with help of DotnetPrompt a Large Language Model (LLM) can be used to automatically generate documentation for a class with minimal effort.

Problem:

We have a source file with a Logger class. This class provides logging functionality to other parts of the codebase. The Logger class has a predefined log messages in methods that look like this.

public void GetDocumentCodes(string id, int[] codes)
{
    WriteDebug(10001, new { id, codes },
        () => $"Reading document codes for id {id}.");
}

public void CreateMetadataCompleted(Guid id, TimeSpan timeToFilter)
{
    WriteInfo(11004, new { id, timeToFilter },
         () => $"Get comparison metadata for id {id} in {timeToFilter:c}");
}

public void IndexClientFailed(Exception exception)
{
    WriteError(13001, new { message = exception.Message },
        () => $"Failed to execute index client with error {exception.Message}", exception);
}

We need to create documentation for this class that includes a log code, log level and log message. This documentation should be presented in a markdown table format.

Solution:

We will use custom Chain with an LLM to generate the documentation for the Logger class. The Chain will take the source file as input, and generate markdown table documentation for the class. Inside the chain we will setup OpenAIModel with few prompt examples.

Step 1: Install Required Packages

We will be using the OpenAI GPT-3 API to generate the documentation. You will need to sign up for an API key from OpenAI and install the DotnetPrompt from NuGet.

> dotnet add package DotnetPrompt.All --version 1.0.0-alpha.1

Step 2: Initialize OpenAI API Key

We would not use configuration or something like this, so we just store API key as a constant.

public static class Constants
{
    public const string OpenAIKey = "YOUR-KEY";    
}

Step 3: Define the Prompt Examples and setup `ModelChain`

We will define a few prompt examples for the LLM to generate the documentation. These examples will give the LLM an idea of the format and structure we want the documentation to take.

Inside out custom chain we will use basic ModelChain to make a call to LLM.

private ModelChain BuildFewPromptLLModelChain()
{
    var example = new PromptTemplate("Code:\n{code}\nTableRow: {row}");
    var suffix = new PromptTemplate("Code:\n{code}\nTableRow: ");

    var examples = new List<IDictionary<string, string>>()
    {
        new Dictionary<string, string>()
        {
            {
                "code",
                "public void GetDocumentCodes(string id, int[] codes)\r\n {\r\n WriteDebug(10001, new { id, codes },\r\n () => $\"Reading document metadata for id {id}.\");\r\n }"
            },
            {
                "row",
                "| Debug | 10001 | GetDocumentCodes |  Reading document metadata for id {globalId}. |"
            }
        },
        new Dictionary<string, string>()
        {
            {
                "code",
                "public void CreateMetadataCompleted(Guid id, TimeSpan timeToFilter)\r\n {\r\n WriteInfo(11004, new { id, timeToFilter },\r\n () => $\"Get comparison metadata for id {id} in {timeToFilter:c}\");\r\n }"
            },
            {
                "row",
                "| Info | 11004 | CreateComparisonMetadataCompleted | Get comparison metadata for id {id} in {timeToFilter:c} |"
            }
        },
        new Dictionary<string, string>()
        {
            {
                "code",
                "public void IndexClientFailed(Exception exception)\r\n {\r\n WriteError(13001, new { message = exception.Message },\r\n () => $\"Failed to execute index client with error {exception.Message}\", exception);\r\n }"
            },
            {
                "row",
                "| Error | 13001 | IndexClientFailed | Failed to execute index client with error {exception.Message} |"
            }
        },
    };

    var prompt = new FewShotPromptTemplate(example, suffix, examples)
    {
        ExampleSeparator = "---"
    };

    var model = new ModelChain(prompt, new OpenAIModel(Constants.OpenAIKey, OpenAIModelConfiguration.Default),
        _logger);
    return model;
}

Step 4: Build the dataflow

Next we need to setup our dataflow

private readonly TransformBlock<IList<ChainMessage>, ChainMessage> _finalizatorBlock;
private readonly TransformManyBlock<ChainMessage, ChainMessage> _transformationBlockOne;
private readonly CancellationTokenSource _cts = new(TimeSpan.FromMinutes(1));
private readonly ModelChain _llmModelChain;

public ConvertDotnetTestsToMdTableChain(ILogger<ConvertDotnetTestsToMdTableChain> logger)
{
    _logger = logger;
    var dataflowOptions = new ExecutionDataflowBlockOptions() { CancellationToken = _cts.Token };

    // "transform" .cs file to list of methods
    _transformationBlockOne = new TransformManyBlock<ChainMessage, ChainMessage>(ReadMethodsFromFile, dataflowOptions);

    // LLM set up with several examples through few-prompt learning
    _llmModelChain = BuildFewPromptLLModelChain();

    // buffer to collect rows
    var batchRowsBlock = new BatchBlock<ChainMessage>(100, new GroupingDataflowBlockOptions() { CancellationToken = _cts.Token });

    // "transform" from list of rows to table
    _finalizatorBlock = new TransformBlock<IList<ChainMessage>, ChainMessage>(CombineRowsToTable, dataflowOptions);

    var linkOptions = new DataflowLinkOptions() { PropagateCompletion = true };

    // set up chain
    _transformationBlockOne.LinkTo(_llmModelChain.InputBlock, linkOptions);
    _llmModelChain.OutputBlock.LinkTo(batchRowsBlock, linkOptions);
    batchRowsBlock.LinkTo(_finalizatorBlock, linkOptions);
}

The dataflow here goes like this:

TransformManyBlock -> get a single ChainMessage with file name and produce ChainMessage for each method.
ModelChain -> Consume ChainMessage with method and extract documentation row from it (this could be launch in parallel, so we could simultaniously generate 5-10 rows).
BatchBlock -> Collect results ChainMessage from models.
TransformBlock -> Combine results from BatchBlock into a final ChainMessage with a table.

In fact you could pass any data between internal blocks, only first and last need to consume and return ChainMessage. The only recomendation here is to pass Id from input block to output block (without it executor would not work for example)

First and last data block we would make as a field to publish them as Input and Output of our chain.

public ITargetBlock<ChainMessage> InputBlock => _transformationBlockOne;
public ISourceBlock<ChainMessage> OutputBlock => _finalizatorBlock;

Step 5: Parse the Source File

Before we can generate the markdown table documentation, we need to extract the methods of the Logger class from the source file. For that we have a _transformationBlockOne which action ReadMethodsFromFile could look like this.

private IEnumerable<ChainMessage> ReadClassesFromFile(string arg)
{
    _logger.LogInformation($"Reading file {arg}");

    var file = File.ReadAllText(arg);

    var regex = new Regex(@"\bpublic\svoid\s([a-zA-Z0-9_]+)\(([^)]*)\)\s*{([^{}]*(?:{[^{}]*}[^{}]*)*)}");

    var methods = regex.Matches(file);

    _logger.LogInformation($"Extracted {methods.Count}");

    var fromFile = new List<ChainMessage>();
    foreach (var match in methods)
    {
        fromFile.Add(new ChainMessage(
            new Dictionary<string, string>()
            {
                { "code", match.ToString() }
            }));
    }

    return fromFile;
}

Step 6: Combine rows into table

private ChainMessage CombineRowsToTable(IList<ChainMessage> message)
{
    _logger.LogInformation("Finalization");

    var result = message.SelectMany(i => i.Values).Where(i => i.Key == "text").Select(i => i.Value);
    var resultText = string.Join('\n', result);
    return new ChainMessage(new Dictionary<string, string>() { { DefaultOutputKey, resultText } })
        { Id = message.First().Id };
}

Step 7: Run method

We need implemetation of Run method. Here we consume input ChainMessage with a single value - file name and post it to dataflow.

public bool Run(ChainMessage message)
{
    if (InputBlock.Completion.IsCompleted)
    {
        throw new InvalidOperationException("This chain would not accept any more messages");
    }

    var launched = InputBlock.Post(message);
    InputBlock.Complete();
    return launched;
}

public void Cancel()
{
    _cts.Cancel();
    _llmModelChain.Cancel();
}

Note the Complete method. We telling our chain that no more data will be added after inital file name. It's important to complete this one, because otherwise BufferBlock will be waiting forever or until it capacity full.

Step 8: Generate the Markdown Table Documentation

The usage of the chain is starighforward: we run the chain and wait for completion:

var chain = new ConvertDotnetTestsToMdTableChain(TestLogger.Create<ConvertDotnetTestsToMdTableChain>());
var input = new Dictionary<string, string>()
{
    {
        "file",
        @"Data\Logger.cs"
    }
};

chain.Run(new ChainMessage(input));

var result = await chain.OutputBlock.ReceiveAsync();
var resultText = result.Values["table"];

Console.WriteLine(resultText);

Final result

  Standard Output: 
 | Info | 17001 | WorkerStart | Start worker |
 | Info | 17003 | StartProcessingDocument | Start processing document |
 | Info | 17004 | EndProcessingDocument | End processing document. ElapsedMillisecond: {elapsedMillisecond} |
 | Error | 17100 | ProcessDocumentHandledException | Processing Handled Exception |
 | Warning | 171001 | ChangeTypeArgumentOutOfRangeWarning | Wrong change type. Exception message: {exception.Message} |

And here is a log of how our chain worked

2023-03-02T23:15:09 | Information | Reading file Data\Logger.cs
2023-03-02T23:15:09 | Information | Extracted 5 methods
2023-03-02T23:15:09 | Trace | Sending LLM request
2023-03-02T23:15:10 | Information | Result of ModelChain:  | Info | 17001 | WorkerStart | Start worker |
2023-03-02T23:15:10 | Trace | Sending LLM request
2023-03-02T23:15:11 | Information | Result of ModelChain:  | Info | 17003 | StartProcessingDocument | Start processing document |
2023-03-02T23:15:11 | Trace | Sending LLM request
2023-03-02T23:15:13 | Information | Result of ModelChain:  | Info | 17004 | EndProcessingDocument | End processing document. ElapsedMillisecond: {elapsedMillisecond}
2023-03-02T23:15:13 | Trace | Sending LLM request
2023-03-02T23:15:14 | Information | Result of ModelChain:  | Error | 17100 | ProcessDocumentHandledException | Processing Handled Exception |
2023-03-02T23:15:14 | Trace | Sending LLM request
2023-03-02T23:15:16 | Information | Result of ModelChain:  | Warning | 171001 | ChangeTypeArgumentOutOfRangeWarning | Wrong change type. Exception message: {exception.Message} |
2023-03-02T23:15:16 | Information | Finalization

Conclusion

In this article, we have seen how an LLM can be used to automatically generate documentation for a class. We used the OpenAI API to generate markdown table documentation for a Logger class by defining a few prompt examples and parsing the source file.

While the documentation generated by the LLM may not be perfect, it can serve as a starting point for further refinement and can save developersa significant amount of time.

The obvious improvement would be to provide list of examples as a parameter to make this chain suitable to generate documentation based on any kind of methods. This approach can be extended to generate documentation for other classes and functions in a codebase, making documentation a less time-consuming and tedious task.