How our AI understands German tax law — part two
Knowledge management is an important topic for all tax consulting and auditing firms. It is estimated that tax departments spend millions of hours of work every year on research carried out in the traditional way. Tax professionals need to understand, process and memorize a wealth of constantly changing information. And even though it may not seem so when first looking at legal texts and administrative guidelines, this work is based on the analysis of natural language.
Machine learning and NLP as tools for intelligent text comprehension
This is where the methods of Natural Language Processing (NLP) come into play, through which an algorithm is trained to read and understand text. NLP does not use a static set of methods, but a collection of approaches that are constantly evolving. At the same time, machine learning methods are used to continuously improve the accuracy of hits.
And what does this actually mean?
A small selection of Natural Language Processing methods
Since “artificial intelligence” and “machine learning” are becoming more and more popular buzzwords which are often used to describe procedures somewhat unrelated to AI, in this article we would like to give a short, non-exhaustive overview of some of the NLP methods we use at Taxy.io.
Semantic network analysis
Looking at the extensive primary and secondary literature on tax law, the first step is to examine which paragraphs of the literature and related fields are linked, for example by cross-references. Behind this approach lie rule-based procedures plus — and this is where artificial intelligence comes into play — machine learning, so the algorithm recognizes what is meant even in the case of spelling mistakes, for example.
Network analysis on German tax laws and judgements with the highly interlinked tax evasion paragraph 370 AO in the centre.
Next, we can calculate the importance of references. When focusing on German tax law, it is noticeable that in the literature the “Abgabenordnung” (AO; Fiscal Code) is of particular importance. As this is the so-called basic tax law, this insight is obvious at first glance. Looking in more detail at the context of the network analysis for “Abgabenordnung”, it is striking that § 370 AO in particular is referenced very often; this article is dedicated to the topic of tax evasion.
If you would like to read more about the results of our network analysis, please refer to the article by our co-founders Daniel Kirch and Sven Weber.
Classification of texts
Switching from the observation of the network to specific texts, text classification, among other things, plays an important role in Natural Language Processing. Texts are assigned to certain categories, for example, emails can be classified as spam or not spam, or customer ratings as positive or negative using sentiment analysis. Texts can also be assigned to specific subject areas. In terms of tax law, this means that our algorithms, which have been specially refined with the help of supervised learning techniques, can recognize which tax law topics are covered in a given text and automatically assign them, for example, to the subject area of VAT or procedural law.
Text and topic identification: here, texts have been recognized as judgments and decisions, topics classified, and legislative bodies identified