Faculty of law blogs / UNIVERSITY OF OXFORD

‘Datafying’ Financial Market Regulations: A RegTech Experiment with the FINRA Rulebook


Craig Atkinson
Research Affiliate at SMU Yong Pung How School of Law
Joseph Potvin
Executive Director at Xalgorithms Foundation


Time to read

3 Minutes

Securities markets have been at the forefront of digital transformation since the beginning of scalable computing in the 1960s. In 2022, the Financial Industry Regulatory Authority (FINRA) in the United States began piloting a ‘machine-readable rulebook’ and related online information services. Based on a 2023 response to FINRA’s request for public comment, we outline how the regulator’s work on ‘digital rules’ could be scaled and systematically integrated into the operations of its member entities: experimentation with a Large Language Model (LLM) to help pre-structure FINRA rules in controlled natural language (CNL) and the use of Data With Direction Specification (DWDS) components to represent ‘rules-as-data’ for online publication and access via ‘an Internet of Rules’ (IoR).

US Financial Market Regulations and the FINRA Rules

In the US, the Securities and Exchange Commission (SEC) and FINRA jointly pursue financial market integrity and investor protection. FINRA’s activities include oversight, eg rulemaking, surveillance, fraud detection, and enforcement; membership; and transparency services, eg automated reporting, data analytics, and verification audits.

FINRA’s regulatory role is exercised under SEC authority in addition to a framework of statutes, including inter alia the Securities Act, the Securities Exchange Act, the Trust Indenture Act, the Sarbanes-Oxley Act, and the Dodd-Frank Act. The 850 ‘FINRA Rules’ and thousands of interpretative texts, policy statements, change notices, and other guidance instruments shape the self-regulated market operations of brokerages and registered securities representatives.

In 2018, FINRA began developing a taxonomy of regulatory/industry terms for a ‘machine-readable rulebook’. In late-2022, FINRA launched its ‘Machine-Readable Rulebook Initiative’, to facilitate discovery of regulations, followed by the release of two prototype services with a set of the 40 most frequently viewed rules:

  • A web application called FIRST (FINRA Rulebook Search Tool) provides an interface for users to locate FINRA rules through a selection of categories.
  • The FINRA API (Application Programming Interface) facilitates automated keyword queries of the database of sampled rules.

These ‘regulatory technology’ (RegTech) prototypes were designed to assist US financial market actors in setting, operationalizing, and conforming with regulatory, reporting, compliance, and risk management obligations.

Experiment using the Data With Direction Specification (DWDS)

The DWDS—materialized as an Internet of Rules (IoR)—provides a distributed, general-purpose method for any person to author, publish, discover, fetch, scrutinize, prioritize, and optionally automate normative assertions (ie rules) on computer networks between rule-makers and rule-takers (ie individuals, organizations and/or machines) with any application and platform.

Based on the DWDS, a free/libre/open reference implementation of an application known as ‘RuleMaker’ offers users a no-code environment for capturing rule metadata (eg jurisdiction, official URL, etc.) as well as for structuring the logic of natural language rules (ie expressions that contain MUST, MAY and SHOULD, and their synonyms and negatives) in a tabular declarative form. In addition to other trials, experimentation followed two phases:

1) The FINRA Rule Book was assessed, and two sample rules were chosen:

  • Rule 3240: ‘Borrowing From or Lending to Customers’
  • Rule 4140: ‘Audit’

2) With assistance from an LLM, selected rules were pre-structured as a set of simple declarative statements in CNL in the style of Ron Ross’ ‘RuleSpeak’ guidelines. The RuleMaker application was then used to structure each sentence into the six syntactic elements of the DWDS ‘RuleData’ form. Once each ‘Input Condition’ and ‘Output Assertion’ was authored, all permutations of the inputs were structured into columns (A, B, C…) and ‘logic gates’ for the examples were set using the RuleMaker interface (with both full sentences and symbols options).


Figure 1 Symbolic Meaning of Logic Gates, adapted from Potvin (2023)

When all the metadata and logic data were entered into the RuleMaker app for each example, JavaScript Object Notation (JSON) files were automatically generated. As the output of the experiments are both visual representations of logic and data structured in JSON, they are readily human readable and machine processable.

In the output examples, the discrete sentences of each rule have not yet been aligned to FINRA’s semantic taxonomy. However, after capturing the logic of the rules, FINRA’s taxonomy can now be applied to the sentence elements. With rules structured in this form, data mapping between the FINRA definitions and new forms of entity data (eg a verifiable Legal Entity Identifier, vLEI) is also possible. For example, ‘Input Condition #1’ of FINRA Rule 3240 specifically refers to a ‘member institution’ that could be verified by vLEI data and processed to determine output assertions.

Legal-Technical Design Recommendations

Based on the design rationale used to experiment with the logic of FINRA rules, specific challenges require the attention of the initiative:

  • ‘Natural Language from Rule-Makers’ to ‘Natural Language for Rule-Takers’:
    FINRA's initiative focuses on regulations for securities brokers. A systematic approach to providing auxiliary, practicable, natural language summaries of rules would enhance understanding and compliance.
  • ‘Machine Readable’ to ‘Machine Processable’: There is a need to consider specialized support for speed-optimized methods for algorithmic high-frequency transaction systems.
  • ‘Rule Book’ to ‘Rule System’: FINRA’s two online services—web-based rules search and an API for ‘rulesbase’ queries—create potential for additional experimentation and implementation of end-to-end compliance systems.

Conclusion and Future Research

Financial market regulators, like FINRA, are starting to engineer their rulebooks in ways that are more functional for communicating obligation, prohibition, and permission among humans or where machine-processing of contextual data can assist in determining which rules are ‘in effect’ by jurisdiction and time, ‘applicable’ to a context, ‘invoked’ for a circumstance, and, optionally, processing for compliance automation. Demonstrated by this experiment, practicable CNL for rule-takers, machine processing capabilities, and ‘rule system’ thinking are important design characteristics for such initiatives. Alongside the efforts of FINRA, the Global Legal Entity Identifier Foundation (GLIEF), and others, future endeavours will explore combining the rules-as-data approach with emergent research on ‘supervisory technology’ and Machine Learning (ML) techniques.

Craig Atkinson is a Research Affiliate at Singapore Management University (SMU) Yong Pung How School of Law, Centre for AI and Data Governance / Centre for Computational Law

Dr. Joseph Potvin is Executive Director Xalgorithms Foundation


With the support of