# Data sources By __data sources__ we mean databases and other sources of data. A data source provides access to any kind of tabular data. The library has standard data sources for PostgreSQL (via PsycoPg2 and PsycoPg3), MySQL, Sqlite3, Sparql, and its simple built-in in-memory db MemoryDB. A data source is used by a model to retrieve and store rows of data. ## PostgreSQL (PsycoPg2) This code shows how to create data source based on a Postgres database, accessed via PsycoPg2. PsycoPg2 can be installed with `pip install psycopg2-binary`. ~~~python from richard.data_source.PsycoPg2DataSource import PsycoPg2DataSource import psycopg2 connection = psycopg2.connect( database='richard', # Your database host='127.0.0.1', user='patrick', # Your username password='test123', # Your password port=5432 ) ds = PsycoPg2DataSource(connection) ~~~ ## Postgres (PsychoPg3) This code shows how to create data source based on a Postgres database, accessed via PsycoPg3. PsycoPg3 can be installed with `pip install "psycopg[binary]"`. ~~~python from richard.data_source.PsycoPg3DataSource import PsycoPg3DataSource import psycopg with psycopg.connect( dbname='richard', # Your database host='127.0.0.1', user='patrick', # Your username password='test123', # Your password port=5432 ) as connection: ds = PsycoPg3DataSource(connection) ~~~ ## MySQL This code shows how to create data source based on a MySQL database. The MySQL connector can be installed with `pip install mysql-connector-python"`. ~~~python from richard.data_source.MySqlDataSource import MySqlDataSource import mysql.connector connection = mysql.connector.connect( host="localhost", user="patrick", # Your username password="test123", # Your password database="richard" # Your database name ) ds = MySqlDataSource(connection) ~~~ ## SQLite3 This code shows how to create data source based on a SQLite3 database. ~~~python from richard.data_source.Sqlite3DataSource import Sqlite3DataSource connection = sqlite3.connect(':memory:') ds = Sqlite3DataSource(connection) ~~~ ## SPARQL There's a basic adapter for SPARQL databases, that will likely need to be extended for practical use. Here's example code that creates a custom class (here: WikidataDataSource) that wraps `SparqlDataSource` with custom configuration. The reason for this approach is that each of the methods of the base class can be overridden as needed. ~~~python class WikidataDataSource(SparqlDataSource): def __init__(self, result_cache_path: bool=None): """ It's important to add a proper User Agent or you will get many 403 denied responses If you intend to use this data source for your own application, change it to something personal see also: https://foundation.wikimedia.org/wiki/Policy:User-Agent_policy """ super().__init__("https://query.wikidata.org/sparql", result_cache_path=result_cache_path, headers={ "User-Agent": "YOUR_REPO_NAME/1.0 (https://github.com/PATH_TO_YOUR_REPO; YOUR_EMAIL_ADDRESS) YOUR_REPO_NAME/VERSION" } ) ~~~ The `result_cache_path` is the name of a directory where you can cache SPARQL results. Caching will cause a big performance gain for recurring queries. ## Create a custom data source Data sources for other relational databases can be added easily. Other services can be added as data source as well. And this will be described here. Any data source takes the shape of an adapter that makes the underlaying data available via this standard signature: ~~~python def select(self, table: str, columns: list[str], values: list[Simple]) -> list[list[Simple]]: ~~~ Compare this signature to an SQL select query: ~~~sql SELECT `columns` FROM `table` WHERE `column1`=`value1` AND `column2` = `value2`; ~~~ Note that same columns are both used in the "select" and the "where" clauses. Note that if a value is `None`, it must be omitted from the "where" clause. To add a new data source, copy an existing one that best looks like the one you need, and make changes to it. To give you an idea, here's the implementation of `PsycoPg2DataSource`: ~~~python def select(self, table: str, columns: list[str], values: list[Simple]) -> list[list[Simple]]: import psycopg2 where = "TRUE" variables = [] for column, value in zip(columns, values): if value is not None: where += f" AND {column}=%s" variables.append(value) cursor = self.connection.cursor(cursor_factory=psycopg2.extensions.cursor) select = ','.join(columns) cursor.execute(f"SELECT {select} FROM {table} WHERE {where}", variables) return [list(row) for row in (cursor.fetchall())] ~~~ If the data source is able to write as well as read, it implements this signature: ~~~python def insert(self, table: str, columns: list[str], values: list): ~~~ Compare this signature to an SQL insert query: ~~~sql INSERT INTO `table` (`column1`, `column2`) VALUES (`value1`, `value2`); ~~~