Logic Programming Language
for Data Analysis
View project on
GitHub.

What is Logica?

Logica is an open source declarative logic programming language for data manipulation.

Logica extends syntax of logic programming for intuitive and efficient data manipulation. It compiles to SQL thus providing you access to the power of SQL engines with the convenience of logic programming syntax.

Examples

One may say that for programming languages like Python and Java functions are the basic building blocks. For Logica and other logic programming languages those building blocks are predicates. Logic program is defined as a set of rules that define output predicates from pre-defied predicates. Those pre-defined predecates represent input data. For example here is a rule to identify names of expensive books, from an existing table of book prices.

# Logica rule to get expensive books. ExpensiveBook(book_name) :- # book_name is expensive if and only if Book(book_name, price), # book_name costs price price > 100; # and price is greater than 100.

If you are familiar with SQL, you may see that the rule above is equivalent to the flowing SQL statement. Not that familiarity with SQL is required to learn Logica, not at all.

# SQL statement to get expensive books. SELECT book_name FROM book WHERE price > 100;
Predicate is a statement with variables. Any table can be treated as predicate, where column names are the variables, and each row is a set of values of the variables that satisfies the statement. While SQL is quite convenient for small queries like the one above it gets hard to read when complexity grows. Logica leverages power of mathematical syntax to scale nicely as complexity grows. Let's assume we have a table BabyNames that for each name, year, city and gender specifies number of babies of that name born. The following program finds a list of popular names, where a name is defined as popular if it was the most popular name on some year.
# Count babies per year. NameCountByYear(name:, year:) += number :- BabyNames(name:, year:, number:); # For each year pick the most popular. TopNameByYear(year) ArgMax= name -> NameCountByYear(name:, year:); # Accumulate most popular name into a table, droppig the year. PopularName(name: TopNameByYear());
Sometimes data analysis requires solving algorithmic problems. Logica's syntax is suited for it naturally. Here is a program finding prime numbers that are less than 100.
# Define numbers 1 to 30. Number(x + 1) :- x in Range(30); # Defining composite numbers. Composite(a * b) distinct :- Number(a), Number(b), a > 1, b > 1; # Defining primes as "not composite". Prime(n) distinct :- Number(n), n > 1, ~Composite(n);

Finally here is an example of program that runs over GDELT Project dataset, finding people mentioned in the context of "artificial general intelligence".

Observe that program is divided into a rule defining predicate NewsData and rule for AgiMentions. The first rule is essentially doing data cleaning, formatting the dataset in a shape that is convenient to use. Then second rule peforms the task at hand.

In Logica problems are naturally split into smaller components that end up reusable. So in the future if we have more analysis to do with GDELT dataset we may take advantage of the NewsData predicate that we just wrote.

# Structuring the data conveniently. NewsData(year:, month:, day:, persons:, quotations:) :- gdelt-bq.gdeltv2.gkg(persons: persons_str, quotations:, date: date_num), # Column `data` in GDELT dataset is given as an integer. year == ToInt64(Substr(ToString(date_num), 1, 4)), month == ToInt64(Substr(ToString(date_num), 5, 2)), day == ToInt64(Substr(ToString(date_num), 7, 2)), persons List= (person :- person in Split(persons_str, ";")); # Performing the task at hand. @OrderBy(AgiMentions, "mentions desc"); @Limit(AgiMentions, 10); AgiMentions(person:, mentions? += 1) distinct :- person in persons, Like(quotations, "%artificial general intelligence%"), NewsData(persons:, quotations:);
This program completes in interactive time when ran over the 4TB dataset via BigQuery.

Why Logica?

Logica is for engineers, data scientists and other specialists who need to perform complex data processing and analysis. Queries and pipelines written in Logica can run on BigQuery, SQLite and PostgreSQL engines. Information stored in these systems is thus available in Logica.

Logica compiles to SQL and gives you access to the power of SQL engines, including the massively distrbuted Google BigQuery engine, with the convenience of logic programming syntax. This is useful because BigQuery is magnitudes more powerful than state of the art native logic programming engines.

We encourage you to try Logica, especially if

Among other engines, there is partial support for Trino and Databricks. Contributions to improve this support are very welcome!

I have not heard of logic programming. What is it?

Logic programming is a declarative programming paradigm where the program is written as a set of logical statements.

Logic programming was developed in academia from the late 60s. Prolog and Datalog are the most prominent examples of logic programming languages. Logica is a successor to Yedalog, a language created at Google earlier. Logica as well as Yedalog belong to Datalog family.

Datalog and relational databases start from the same idea: think of data as relations and think of data manipulation as a sequence of operations over these relations. But Datalog and SQL differ in how these operations are described. Datalog is inspired by the mathematical syntax of the first order propositional logic and SQL follows the syntax of natural language.

SQL was based on the natural language to give access to databases to the people without formal training in computer programming or mathematics. This convenience may become costly when the logic that you want to express is non trivial. There are many examples of hard-to-read SQL queries that correspond to simple logic programs.

Logica follows Yedalog in the attempt to merge these branches back together: extending the elegant syntax of Logic Programming to solve practical problems and leverage the tremendous advances of SQL infrastructure for the execution.

How does Logica work?

Logica compiles the logic program into a SQL expression, so it can be executed on BigQuery, the state of the art SQL engine. Among database theoreticians Datalog and SQL are known to be equivalent. And indeed the conversion from Datalog to SQL and back is often straightforward. However there are a few nuances, for example how to treat disjunction and negation. In Logica we tried to make choices that make understanding of the resulting SQL structure as easy as possible, thus empowering user to write programs that are executed efficiently.

Why is it called Logica?

Logica stands for Logic with aggregation.

How to learn?

🏖️ Playground
🎓 Tutorial

Learn basics of Logica with the CoLab tutorial located at tutorial folder. See examples of using Logica in examples folder. You try Logica immediately in the browser in Playground.

It is easy to install Logica on your machine as well.

Installation

Install Logica with `pip`.
# Install: $ python3 -m pip install logica # Run: $ python3 -m logica # (optional) Create alias for convenience: alias logica=python3 -m logica
Let's say this program is written in file hello.l.
@Engine("sqlite"); Greeting("Hello world!");
When exectued with
$ logica hello.l run Greeting
it should produce the following table:
+--------------+ | col0 | +--------------+ | Hello world! | +--------------+

Join the discussion!

If you have any questions or ideas about Logica, you are welcome to post those in Discussions section of the repo!

Unless otherwise noted, the Logica source files are distributed under the Apache 2.0 license found in the LICENSE file.

Logica is not an officially supported Google product.