On galaxies, satellites and black holes
Aktualisiert: 22. Mai
How our AI understands German tax law — part one
Last year I started a “LegalTech” startup called Taxy.io. There, we work on the semantic analysis of tax legal texts such as laws, guidelines and judgments in order to build up a machine understanding of these texts and to provide targeted legal solutions for individual questions.
This technology, which is constantly being further developed, is used by selected customers and third-party software programs from the taxation space.
To gain a better understanding of the primary data basis and to derive correlations and rankings, my co-founder Sven Weber and me experimented with a network analysis recently. The analysis examined how tens of thousands of legal texts are linked via paragraph references and citations. These interim results are summarized below.
Structure of the network
The network analysis was carried out on the basis of approx. 6,500 laws and guidelines and 45,000 judgements.
One difficulty is that there are dozens of different forms of presentation and linking of references, e.g. “§3 paragraph 4 sentence 1 Einkommensteuergesetz (Income Tax Act)” or “§3 p. 4 s. 1 EstG”.
In this respect, the next step was to develop an intelligent grammar that recognizes as many reference forms as possible. Auto references within the paragraphs or articles of a law or judgement on oneself were ignored here, since only the dependence on one document to another is of interest.
Based on the recognized references in the original texts, a directed graph was then developed that represents the references from one judgment or law to another document in the database. Each paragraph or article is interpreted as a document. This graph has been stored in a special database designed for large graphs. The calculations were carried out on the Open Telekom Cloud, to which Taxy.io, as a selected startup of the TechBoost program, has access at reduced rates.
Finally, the networking of tens of thousands of legal documents in Germany could be made visible and digitally processable for the first time.
Overview of the network
What looks like a galaxy with countless stars, represents the German legal system. In addition to the extracted documents, which are displayed as about 137,000 points, you can also see about 173,000 connecting lines between two points (so-called directed edges), which represent the references from one document to another. The turquoise dots symbolize the laws, regulations and guidelines, the violet knots the corresponding paragraphs and articles, and the yellow dots the jurisdiction.
In addition, there are numerous knots which, like distant comets, do not seem to be connected to the entire work. These are, without exception, paragraphs and judgments that do not reference other documents within the data pool.
Besides, it is visible that there are nodes that have a large number of incoming or outgoing links. The front runners are the documents for §§ 8 f. MarkenG and Art. 103 GG. Sections 8 et seq. of the Trademark Act deal with obstacles to the protection of trademarks and Article 103 of the Basic Law deals with the right to be heard, criminal determination and the consumption of criminal proceedings.
For the semantic search technology on tax law texts as well as the intelligent matching of client data on tax topics, these “weights” are used at the edges in order to derive the relevance or importance of documents and references. A document that is referenced by many important documents also counts as an important document. A similar principle was used by Larry Page, the founder of Google, to introduce his PageRank for sorting in the Google results list.
Zoom into tax cluster
Managing the entire “galaxy” of German law is also a feat of strength for modern databases. The Taxy.io team has therefore focused on a subcluster of taxes for further analysis of the legal issues relevant to them. From this, the Taxy.io products can also derive the greatest added value.
Since the entire tax universe — obviously — is still very extensive, thematic clusters were formed.
The starting point for this was the calculation of communities with special algorithms, whereby the communities around the important tax laws AO, EStG and UStG and the related laws and judgements were filtered.
In the tax cluster, “gravitational centres” catch the eye, i.e. text parts (paragraphs of laws or judgements) that are particularly strongly networked and thus have a high page rank.
On the one hand, the German Tax Code (Abgabenordnung — AO) should be mentioned as the “Basic Law of Taxation”. On the other hand, there are also very weakly networked areas, such as the valuation law (BewG).
When further zooming into the tax universe, one notices that a great deal of primary literature has been devoted to one topic in particular: tax evasion.
Most of the judgments in the sample refer by far to this § 370 AO — see the violet paragraph to which a large number of golden judgments refer, west of the coffee tax law (KaffeeStG). In this case the gravitational centre is rather a black hole ;).
The star system shown is only a snapshot. Every week, an average of 100 more judgments are added and enlarge the corpus that a lawyer or tax consultant should have on the radar all the time.
Since it is very difficult for employees in tax consulting to keep track of the current legal situation, this technology is offered in combination with other semantic technologies and a comparison with client data both as a stand-alone solution and integrated via interfaces in tax consulting software (CRM, DMS, accounting, law firm management).