ParaX AI Whitepaper

PDF Version

AI Whitepaper(EN).pdf


The further development of Web3 encounters a challenge in engaging non-technical users, primarily due to its inherent complexity. Intent-based interaction emerges as a valuable solution, offering a clear and succinct way to convey user intentions while shielding them from the intricate technical details. In response to this, we introduce the ParaX-GPT system, designed with user intent at its core for interaction. This system leverages a real-time updated knowledge base to provide contextual support, translating complex user intentions into manageable subtasks, and efficiently coordinating the execution by scheduling suitable types of agents. Ultimately, this integration of AI automation for user intent aims to elevate the user's interactive experience within the realm of Web3.

1 Background

Web 3.0 was proposed during the peak of cryptocurrency and blockchain technology development, aiming to address the limitations of centralized Web 2.0 by emphasizing decentralization, privacy protection, security, ecosystem integration, and tokenization. Its key principles include reducing reliance on centralized entities, enhancing privacy protection through encryption and blockchain storage, integrating communication between different networks, and implementing mechanisms like NFTs to facilitate ownership representation and micro-payments, creating more efficient economic flow.
However, for non-technical users, using decentralized applications often requires learning and understanding some professional knowledge, such as public-private key management and on-chain transaction principles. This not only increases the cognitive threshold but also makes practical operations such as digital wallet management and transaction parameter adjustments cumbersome. In addition, as an open network, Web3 also brings security risks such as phishing attacks and private key leaks. Taking these factors into consideration, these issues lead to a lack of enthusiasm among ordinary users in participating in Web3. Therefore, in order to truly promote Web3 to the mass market, we must reduce the cognitive difficulty for users while providing better interactive experiences and strengthening security measures. This means that within a secure framework, we must simplify and optimize the interaction experience of Web3 to attract more non-technical users.
An interaction method centered around user intent can make it more convenient for users to handle transactions, while isolating them from the complexity of underlying technology. For example, in the Ethereum environment, intent allows users to simply express "I want to send all my ETH to my friend" without needing to understand complex blockchain terms such as gas fee or nonce value. This expression only captures the user's core goal - sending ETH. The application captures this intent by automatically creating and configuring transactions, guiding users through necessary steps like providing the recipient's address. Additionally, the application is responsible for actual interaction with the blockchain, such as initiating transactions and tracking their status. This intent-based approach simplifies typically complex workflows into a single operation that non-technical individuals can easily perform on Ethereum, greatly enhancing convenience. It enables them to focus on their main objectives without being burdened by technical details. Therefore, "intent" and its application on Ethereum have garnered widespread attention within the community.
The combination of large language models (LLMs) and domain knowledge bases holds great potential. LLMs are capable of accurately identifying and understanding natural language intentions from users while being able to invoke external APIs to supplement their own functionality shortcomings; meanwhile, domain knowledge bases provide necessary technical and procedural knowledge for on-chain interactions. When these two are combined, they can automatically transform user intentions into raw data required for on-chain operations and execute them upon obtaining user authorization. For instance, when a user expresses "transfer all of my ETH to Bob," LLM quickly parses this intent and prompts the user to complete all necessary parameters; then the knowledge base provides the technical details required for completing the transaction in order to construct the necessary raw data. This automated process significantly reduces complexity faced by users when performing operations. By combining foundational LLMs with specific domain knowledge bases, we can conveniently connect users with Web3 technology while achieving reduced cognitive difficulty, simplified operational procedures, and driving intent-based automated interactions.
In addition, there are two main methods for integrating domain-specific knowledge: the first is to directly train models with specific domain knowledge through Fine-Tune@yu2021finetuning (such as Delta-Tuning @ding2022delta and LoRa@hu2021lora) by fine-tuning; the second is to embed specific domain knowledge into model prompts through In-Context Learning@kossen2023incontext, commonly done using the LangChain AI framework for integrated development. Both of these methods contribute to combining the general capabilities of large-scale models with domain-specific expertise, thereby providing users with more professional and convenient services.
However, there is currently no relevant work within the known scope when it comes to handling automated DeFi operations. It is from this standpoint that we propose our intent-based AI Middle Layer solution. Our visionary exploration pioneers the potential combination of AI and DeFi fields, bringing DeFi to a wider range of users. It can be foreseen that many interesting things can be accomplished using LLM, especially with regards to intent understanding tasks which align perfectly with its capabilities.

2 Introduction

Web3 is a complex technical field. The existence of blockchain technology, encryption algorithms, smart contracts, and other technologies makes the application of AI technology in this field very challenging. Although LLM has achieved great success in dialogue-based AI based on specific domain knowledge bases, the current LLM technology is still imperfect and there are many urgent challenges to applying it to the web3 field. We discuss these challenges from the following aspects:
  • Limited by the temporal constraints of datasets, current LLMs cannot keep up with the rapidly changing iteration speed of web3 domain knowledge.
  • In real-world scenarios, some complex user queries are usually composed of multiple sub-tasks. A single model invocation cannot satisfy user queries, so multiple model scheduling and cooperation are required.
  • Even if models perform well in zero-shot or few-shot scenarios, their reliability and interpretability may be insufficient compared to domain experts.
Our vision is to explore the possibility of combining artificial intelligence with DeFi (Decentralized Finance) to enable more users to participate in and benefit from DeFi. While addressing pain points, we use LLMs to make users' experiences in web3 more convenient, including question-answering about web3 domain knowledge and automation of web3 operations. Based on this, we propose a user-centric Multi-Agent system called ParaX-GPT that can reproduce and enhance human operations in web3 processes to some extent achieve automation. ParaX-GPT builds a real-time updated knowledge base that interprets user intent as an intermediate expression and then schedules corresponding agents to accurately interpreter simple operational intentions for users while handling complex intentions through planned execution and multi-agent collaboration processes for high-quality responses.
The entire process can be divided into four stages:
  • Real-time data preparation and updates: Outdated data makes it difficult to answer many emerging domain questions effectively pulling the latest web3 information in real-time ensures that databases maintain a new state capable of answering novel questions rather than being limited to a small knowledge scope.
  • Intent Interpretation: In this stage, LLMs interpret the user's initial query and transform it into various types of intent tasks, each corresponding to different formats.
  • Intent Completion: User intentions are often abstract and vague, so attempts are made to use LLMs and the ability to call real-time on-chain APIs to query implicit user inputs. If the user intent cannot be explained as a specific intent format in the end, that intent will be rejected and more information will be requested from the user.
  • Intent Execution: For different types of intentions, corresponding execution processes are designed. For example, querying real-time data and summarizing or encoding on-chain operation data in plaintext for return.
  • Response Generation: Finally, all prediction results are integrated to generate replies for users.
Through this process, ParaX-GPT is capable of generating intermediate expressions of intent based on the user's initial intent. It orchestrates and integrates various Language Model capabilities (LLMs) to handle multiple complex subtasks associated with different intentions. Simultaneously, we iteratively refine this process by fine-tuning the LLM models using input from authorized users.
In recent years, the emergence of large-scale language models (LLMs) has brought revolutionary changes to the field of artificial intelligence [4]. In terms of LLMs, both OpenAI (GPT-4) and Anthropic (Claude 2), with their closed-source code, possess industry-leading large models and have developed APIs for developers to use. Through training and optimization using RLHF [5], their LLMs greatly enhance their understanding capabilities and align more closely with human abilities. MetaGpt [6] and AutoGpt achieve impressive results in various application areas through complex chain calls using langchain and LLM APIs.
In the open-source domain, Meta’s development of Llama 2[7] has demonstrated superior performance and continues to receive attention from developers for further development and iteration towards better versions. Numerous developers are researching how to effectively utilize LLMs in downstream tasks. Fine-tuning and contextual learning are two major directions for development and application. Fine-tuning is currently very popular, with decreasing technical difficulties in fine-tuning LLMs as evidenced by recent developments such as HuggingFace’s introduction of a Fine Tuning UI interface and OpenAI’s release of a fine-tuning interface suitable for GPT-3.5-Turbo.
With GPT-3.5 Turbo fine-tuning, developers can now customize GPT-3.5 Turbo for their specific use cases using their own data. It is foreseeable that many interesting things will be accomplished using LLMs, especially in natural language understanding tasks where they excel.

4 Intent Definition: What Even is an Intent

The user's intent can usually be divided into query intent and interaction intent based on the criterion of whether it will generate transactions on the chain. For example, "Query my positions on ParaX" compared to "Supply all my USDC to ParaX". For query intentions, LLM is usually able to extract and answer questions semantically well. However, for interaction intentions, a more standardized form is needed for processing because chain operations represented by smart contracts are often complex and customized.
In traditional Ethereum Dapp user experiences, users interact with Ethereum by creating and signing transactions in specific formats that provide all necessary information for the Ethereum Virtual Machine (EVM) to execute state transitions. However, in an intent-based transaction paradigm, users only need to define their goals while leaving the implementation details to professionals. This allows users to outsource transaction creation to third parties without giving up complete control over the transactions, simplifying user participation and improving efficiency.
  • Transaction specifies specific actions (declarative)
    • I want to swap 1650 USDC for at least 1 ETH on UniswapV3.
  • Intent specifies desired outcomes/goals
    • I want to swap 1650 USDC for as much ETH as possible.
Figure 1: Intent
Specifically, Intent is a declarative way of expressing transactions that does not specify an execution path but defines a set of constraints. Under these constraints, third parties can choose appropriate state transition paths to fulfill the Intent and facilitate decentralized transaction outsourcing. Intent has several key features:
  • Intent defines a set of constraints for state transitions rather than a single path.
  • Third parties select suitable execution paths based on Intent, leveraging their expertise.
  • When combined with Abstract Account, Intent enables batched aggregation operations, increasing efficiency.
  • An Intent that includes user signatures represents user authorization for third-party operations under agreed conditions and can be executed accordingly.
  • Users still retain control over the final execution results.
Formally speaking, a complete Intent consists of an intent action and extra intent parameters. The user’s DeFi interaction process can be simplified as follows:
  1. 1.
    Express their Intent(A, P) = A + P, where A represents the Intent Action and P represents the Intent Extra.
  2. 2.
    Outcome = MonitorTx(Broadcast(SignedTx(ConstructTx(Intent(A,P)))))
    • Users only need to approve the parameters created by third parties in order to complete the interaction; all other tasks are taken care of by them.


Large Language Models (LLMs) are Transformer-based neural architectures enriched with hundreds of billions, if not more, parameters. Trained on expansive textual datasets, they have produced iconic models such as GPT-3, PaLM, Galactica, and LLaMA 2, LLMs have evolved into formidable tools in the realm of AI. This evolution can be attributed to a combination of progressive training techniques and vast data reservoirs. Among the capabilities they have mastered are:
  • Proficient Language Skills: LLMs are highly skilled at producing fluent and coherent language.
  • Broad World Knowledge: LLMs have a vast amount of information stored from their training data.
  • Contextual Learning Proficiency: LLMs can learn and apply knowledge from input text, reducing the need for frequent fine-tuning or parameter updates.
  • Logical Reasoning Proficiency: LLMs demonstrate logical reasoning abilities, although they may occasionally make errors in their conclusions.
  • Alignment Proficiency: LLMs possess the ability to understand and follow instructions given in human language.
However, like all technological marvels, LLMs aren’t without their challenges. A primary and conspicuous limitation lies in their handling of context: Contextual Memory and Limitations: The length of input that LLMs can process is inherently limited. For instance, GPT-4 has extended this to 32K tokens, which might suffice for several pages of text. However, real-world scenarios often demand more. Envision a legal tech firm wanting to parse through numerous lengthy legal documents using LLMs; the context limitation becomes a tangible constraint.

6 ParaX-GPT

6.1 Comparison with Classic Solution

Classic Solution
ParaX-GPT Solution
Timeliness Of Data
Scope of Data
Iterative Optimization

6.2 Design Goals

  1. 1.
    Up-to-date and cross-domain Knowledge base: Unlike traditional web3 AI robots, ParaX-GPT is able to answer a wider range of questions and has the feature of real-time updating of databases.
  2. 2.
    Interpretation of query intent: For user query intentions, comprehensive replies are provided by requesting real-time on-chain data and matching the latest web3 domain knowledge.
  3. 3.
    Automation of interaction intent: Complex Defi operations require a lot of user-level work, by enhancing and automating user intentions, it can reduce user usage costs.
  4. 4.
    Iterative fine-tuning process: User inputs can be formatted as fine-tuned sample data with permission, improving model performance in the next iteration.
  5. 5.
    Intent decomposition and plan execution: Accurately decompose user input into planned single-step or multi-step intent parsing tasks to ensure successful task execution.
  6. 6.
    Verification process through simulated execution: By simulating the execution of interactive intentions, the conversion rate of intent interpretation is improved.
  7. 7.
    Well-designed specific intent interpretation: For specific intentions such as dual-currency wealth management and airdrop planning scenarios, AI · decisions will be provided based on the latest industry data, market conditions, and experience to assist users in making judgments.

7 Basic Components

In order to convert user intent into the desired summary output and executable on-chain operation output (the construction parameters of blockchain transactions), it is necessary to go through three main steps: 1) interpretation of intent. 2) completion of ambiguous intent. 3) scheduling Agent to execute intent. This process involves multiple calls to LLM. We divide the overall architecture into two parts based on functionality: intent interpretation and intent execution, with the integration of multiple technologies in its process design.
Figure 3: Intent Interpreter
Semantic search is designed to provide more precise and meaningful search results that better reflect the user’s intent, rather than just matching keywords. This makes it particularly useful for complex queries. To integrate large language models (LLMs) into critical applications, it is necessary to overcome the inherent unpredictability of LLMs, which can lead to illusions - manifested as erroneous reasoning or obvious mistakes. This poses a significant challenge for applications that prioritize accuracy, comprehensibility, and reliability. Currently, cutting-edge solutions primarily focus on “retrieval-augmented generation,” a strategy that involves anchoring your LLM in factual information. In pursuit of this goal, two main candidates have emerged as potential therapies:
  • Vector Database: A vector database is a type of database that stores and manages unstructured data, such as text, images, or audio, in vector embeddings (high-dimensional vectors) to make it easy to find and retrieve similar objects quickly.
  • Knowledge Graph: A knowledge graph takes data and expresses it as relationships in graph form. With the graph, it’s possible to more easily see that a particular customer is connected to a particular set of products, and those products sit in a particular product hierarchy.
Knowledge Graph are optimized for graph-based queries and traversing relationships, while vector databases are designed for efficient similarity searches and nearest neighbor queries. Graph database predominantly care about relationships between objects. A social network is a standard example of a graph. In a social network, the relationships that you care about are pairwise relationships between objects in your data. A vector database cares about vectors, which are just numeric arrays. It’s the way that deep learning networks represent data. That data could be text, images or anything else.
Figure 2: Graph DB vs. Vector DB
Vector Databases are valuable for representing and retrieving multimodal data in constructing a Knowledge Database. However, in comparison to knowledge graphs, they exhibit certain limitations in intent understanding:
  • Complex Questions: Knowledge graphs excel in intricate queries involving layered relationships, while vector databases struggle with precise responses.
  • Complete Responses: Vector databases result limits and similarity scoring hinder complete answers, unlike knowledge graphs that provide accurate, focused responses.
  • Credibility: Vector databases may infer inaccurate connections, unlike knowledge graphs which ensure credible information flow.
Error Correction: Knowledge graphs offer transparent error identification and correction, while vector databases lack this transparency. • Transparency: Knowledge graphs trace queries transparently, aiding error correction; vector databases lack this insight.
We will combine vector databases and knowledge graphs to transparently address the challenges posed by LLM illusions.

7.2 Fine-tuning LLM

Pre-trained large language models (LLMs) are capable of impressive out-of-the-box tasks such as text generation, summarization, and coding. However, LLMs are not universally applicable solutions for every use case. The complexity of LLMs can lead to errors when faced with challenges beyond their capabilities. In such cases, fine-tuning the LLM can be a viable solution. Fine-tuning refers to the process of retraining the base model using new data. It is a powerful technique that optimizes existing models with smaller datasets to align them with the distribution of new datasets. This approach avoids complete retraining and enables efficient application of LLMs for specific tasks.
  • Supervised fine-tuning (SFT) revolves around curated prompt and response datasets. This form is crucial for models like Chat-GPT, which aim to understand and follow user instructions. Reinforcement Learning from Human Feedback (RLHF), on the other hand, involves human reviewers and rewards models to guide their behavior but requires significant resource investment.
  • Parameter-efficient fine-tuning (PEFT) [8] is an area of research focused on reducing the cost of model parameter updates. Techniques like Low-Rank Adaptation (LoRA [9]) stand out in this field by optimizing parameter updates.
While LLM fine-tuning is a powerful technique, it may not always be the ideal approach in certain situations. This is especially true when some models lack options for fine-tuning through APIs or when there is insufficient data available for fine-tuning. In such cases, alternative approaches like In-Context Learning or Retrieval augmentation can be considered as possible solutions. For instance, attaching relevant documents while writing an article can help adjust the model’s response and improve its understanding of the context. Similarly, models that are tailored to personalized data can utilize retrieval augmentation to generate more personalized outputs.
In our solution, we combine the benefits of both fine-tuned LLM and In-Context Learning to achieve a better and more accurate understanding of user intent.

7.3 Fine-tuning Optimization Based on User Feedback

User intentions are often complex and multifaceted, and LLM may have understanding errors for complex expressions. Therefore, more data is needed for fine-tuning to enable it to handle diverse user inputs. We obtain user-permitted intent inputs through sampling for fine-tuning training, iterating towards better versions of LLM to facilitate the optimization of the entire process.

7.4 Prompt Engineer

LLM's abilities go beyond casual conversation and can be applied in various contexts. By employing prompt engineering strategies, we can maximize the diverse functionalities of LLMs. Prompt engineering involves designing prompts for LLM models to guide them in generating desired outputs. A well-crafted prompt can greatly improve the performance and efficiency of the LLm by giving clear instructions, minimizing ambiguity, and guiding the LLm's understanding in the intended direction. In intent understanding tasks, it is necessary to design appropriate prompts that enable the LLM to accurately extract user intentions. For ambiguous results, the prompt should explicitly instruct the LLM to output "No" in order to avoid any erroneous outputs. For clear intentions, the prompt provides guidance for effective output generation.

7.5 Intelligent Agent

We utilize the capabilities of a large-scale language model (LLM) to perform semantic extraction tasks. The main component of this module can be understood as an AI Agent, which iteratively calls the API of LLM for reasoning until reaching the iteration limit or successfully completing the information extraction process. The functionalities of this intelligent agent include:
  • Knowledge Database Augmentation: Utilizing a knowledge database, the agent searches for parallels within user input, thereby furnishing supplementary contextual information.
  • Real-time External API Information Delivery: By engaging external APIs, the agent enriches the context with real-time data, such as querying the user's current holdings of USDT assets.
  • Strategically Crafted Prompts: Well-constructed prompts guide the model towards producing desired outcomes.
  • Semantic Extraction Competence of LLM: The LLM exhibits remarkable aptitude in semantic extraction tasks.
This intelligent agent utilizes input data and tools to accomplish specified tasks, covering publicly available APIs that enhance LLM's capabilities at various levels. While traditional LLM is primarily used for processing textual data, challenges arise when extracting structured data from text. These challenges often involve understanding underlying contexts, identifying patterns, and inferring relationships between entities defined by these patterns. However, equipped with appropriate tools, they still possess the ability to bridge the gap between textual inputs and structured data; the key to this is LLM's capability to call external APIs during inference stages. For example, through the ability of function call, certain fine-tuned LLMs (such as OpenAI) have the ability to determine when to invoke functions based on user input. As a result, they respond by generating JSON outputs that conform to function signatures. This enables developers to reliably extract structured data from models. Classic applications include translating natural language into API calls or database queries. Additionally, we enhance our data retrieval capabilities by making API calls across different Web3 infrastructures, including the public interfaces of Chainbase and TheGraph. This holistic approach ensures that our semantic extraction work based on LLM produces accurate and comprehensive results.

8 Architecture

8.1 Intent Interpretation

User queries submitted may have problems such as lack of background information, unclear goals, and ambiguous parameters. To better interpret the intent, we will create predefined intent task formats for each type of intent. User queries need to extract semantics through carefully designed prompt templates and convert them into corresponding formats of intent tasks using LLM. If the extracted results are not sufficient for further interpretation, an error will be quickly thrown and the user will be asked to provide additional information in a dialog manner.
The user’s intention can be very complex. With the contextual supplementation ability of knowledge databases and the external API call capability of LLM, it is possible to enhance LLM’s semantic extraction capability and extract the user’s intention more accurately.
The core of Intent Interpreter is to understand natural language queries and map them to structured intents, providing standardized inputs for subsequent blockchain interactions. Each intent relies on resources called tools (which can be public APIs). If the tools available to LLM cannot meet the requirements of an intent, an error will be thrown. Therefore, during the process of intent interpretation, strict validation is performed to determine if an executable intent schema can be obtained. For example:
Figure 4: Architecture
"read_intentions": [
"location": {
"chain": "Ethereum",
"protocol": "ParaX"
"action": "query the asset with the highest APY"
"write_intentions": [
"location": {
"chain": "Ethereum",
"protocol": "ParaX",
"from_address": "0x00...0"
"action": "supply all of the hold USDC"
In order to better understand the intent, we adopt a few-shot approach in each step of intent interpretation, demonstrating how intent should be interpreted to guide LLM's output. Each demonstrated example includes a user request and its corresponding output, representing the expected sequence of task analysis. By considering the dependencies between tasks, these demonstrations can help LLM make accurate decisions regarding execution order and resource dependencies.

8.2 Intent Completion

"location": {
"chain": "Ethereum",
"protocol": "ParaX",
"from_address": "0x00...0"
"action": "supply all of the hold USDC"

8.3 Intent Execution

Each category of intent has its specific way of handling, and the intent execution layer abstracts the processing of different types of intent. After obtaining one or more intent tasks, we schedule the corresponding type of agent to execute these intentions. Based on the number and order of subtasks, they can be divided into several categories:
  • Single-intent task: Interpreting user queries as a single intent.
  • Sequential structure task: intentions are executed in sequence, with each subsequent intent depending on the execution result of the previous one.
  • Graph structure task: Multiple intentions may have parallel or dependency relationships.
Among them, graph structure intent paths are the most complex cases. If LLM cannot infer a specific execution path, it will throw an error and prompt users to provide richer intent descriptions. After confirming the intent tasks, we proceed with distribution. Each agent instantiates an intent based on its corresponding pattern and processes its target accordingly. A response is generated based on the processing result.
For example:
  • I want to know the hash value of my last five transaction records: This intent requires querying real-time transaction records for this address. Therefore, we can call an API that can query data on-chain in real time to retrieve account history and return a response result.
  • I want to transfer all my USDT tokens to address X: This intent indicates that the user wants to construct a signed transaction. The agent needs to query relevant transaction ABI and complete parameter supplementation before encoding transaction parameters, ultimately returning a response result.
The returned response represents that the intention has been successfully interpreted and executed. This response depends on user input; different inputs will generate different types of intentions and their respective processing procedures. The response could be similar to helpful answers from intelligent assistants or require user signature completion for transaction parameters

8.4 Fork Network Simulation

Due to the strict parameters required for on-chain transactions, we expect the intention of submitting on-chain transactions to be correctly interpreted and executed. Therefore, we have run a fork network environment consistent with the mainnet and opened APIs to provide LLMs involved in the execution process with the ability to verify the execution results of intentions in the fork network. After completing transaction submission in the fork network, it is usually expected to verify some query request results and infer the expected execution of intentions. For example, whether the xTokens quantity for supply operations has increased as expected. Similar to Ethereum's fork network, we can verify that an intention has been successfully executed in the fork network and anticipate that it will also be successfully submitted and executed in the real Ethereum network, providing users with an expected correct result for intention execution.

8.5 Intentions for Specific Scenarios

Generally speaking, intentions are generic directions. ParaX-GPT interprets and executes them based on its current knowledge base capabilities and accessible API range of models. However, customized intention types have been designed specifically for mainstream scenarios related to web3. For topics related to airdrops, ParaX-GPT's design differs from traditional approaches where similar contexts are retrieved from databases for answering questions. Instead, it plans from an investment perspective by comparing multiple dimensions of available airdrop projects and providing users with information about token distribution rules tailored to practical application scenarios.

8.6 Iterative Fine-tuning Process

Using fine-tuned LLMs can optimize specific processes of intent interpretation; however, high-quality fine-tuned sample data requires extensive real-world practice to generate. Therefore, we encourage users through incentivization mechanisms to submit their user data for further iterative training improvements in model performance. The incentive mechanism involves point feedback which can be exchanged for a specified amount of token rewards.

8.7 Checkpoint Mechanism

The user's intention always involves multiple execution steps, and our goal is to help the user understand where the current trace of execution is failing. This allows the user to improve their input and further complete the execution of their intention. The purpose of the checkpoint mechanism is to provide transparent step feedback for complex intention execution processes, helping users debug and optimize intentions step by step, thereby improving the final conversion success rate.
Specifically, when a user inputs an intention, the system converts it into a multi-step transaction process based on its knowledge base. This process is divided into several execution steps, each of which serves as a checkpoint. After confirmation from the user, the system gradually attempts each checkpoint and provides clear feedback to the user. If a checkpoint fails, the system notifies the user of specific failure reasons. At this point, users can adjust their intention input based on these failure reasons. The system then re-executes from failed checkpoints instead of restarting the entire process. Ultimately, when all checkpoints pass successfully, the user's intention is fully executed.
The advantages of using a checkpoint mechanism include dividing complex processes, providing clear fault feedbacks avoiding redundant executions and assisting continuous improvement in input quality. We achieve setting up checkpoints through techniques such as intent planning and decomposition, dependency analysis,and exception handling etc.. In summary,the checkpoint mechanism makes intention execution more transparent and manageable for users.It helps with debugging and optimizing input intentions,resulting in an improved success rate for converting intentions into successful blockchain executions.


  1. 1.
    Y. Yu, S. Zuo, et al.,“Fine-tuning pre-trained language model with weak supervision: a contrastive-regularized self-training approach,” 2021.
  2. 2.
    N. Ding, Y. Qin, et al., “Delta tuning: a comprehensive study of parameter efficient methods for pre-trained language models,” 2022.
  3. 3.
    J. Kossen, T. Rainforth, and Y. Gal, “In-context learning in large language models learns label relationships but is not conventional learning,” 2023.
  4. 4.
    Y. Liu, T. Han, et al., “Summary of ChatGPT-related research and perspective towards the future of large language models,” Meta-Radiology, vol. 1, no. 2, p. 100017, Sep. 2023, doi: 10.1016/j.metrad. 2023.100017. [Online]. Available: https://
  5. 5.
    Z. Li, Z. Yang, and M. Wang,“Reinforcement learning with human feedback: learning dynamic choices via pessimism,” 2023.
  6. 6.
    S. Hong, X. Zheng, et al., “Metagpt: meta programming for multi-agent collaborative framework,” 2023.
  7. 7.
    H. Touvron, L. Martin, et al., “Llama 2: open foundation and fine-tuned chat mod-
    els,” 2023.
  8. 8.
    Z. Fu, H. Yang, et al., “On the effectiveness of parameter-efficient fine-tuning,” 2022.
  9. 9.
    E. J. Hu, Y. Shen, et al., “Lora: low-rank adaptation of large language models,” 2021.