Curious but Lazy
Sunday, September 28, 2025
Thursday, August 1, 2024
Analyzing CSV files using SQL with Squirrel SQL client and Apache Calcite JDBC driver
Intro
The JDBC driver
The way this driver works is that instead of the database name, we have to point to a JSON file that defines the database model. This file will in turn specify a directory that holds the csv files. And all the csv files under that directory will be considered as tables in that model. The CSV files need to have a header for column names.
The SQL client
Installation
Configuring the client
As you see, its a schema created using the CsvSchemaFactory, that points to the directory holding our Csv files.
Sunday, July 28, 2024
Analyzing CSV files with SQL using Squirrel SQL Client and csvjdbc driver
Note
Hola. So being an SQL fan, i somehow find it much easier to work on Excel files in SQL, rather than Excel itself. SQL is so beautiful, so easy, yet so powerful !
The JDBC driver
Firstly, we need a JDBC driver that can read CSV files. I am using the one from https://github.com/simoc/csvjdbc. Its read-only, and that's okay for me, since i only wish to analyze the files. The jar can be downloaded from maven : https://mvnrepository.com/artifact/net.sourceforge.csvjdbc/csvjdbc/1.0.42
The way this driver works is that instead of the database name, we need to specify a directory that holds the csv files. And all the csv files under that directory will be considered as tables in that database. The CSV files need to have a header for column names. The separator can be mentioned in the jdbc url.
The SQL client
Installation
Configuring the client
Wednesday, November 1, 2023
IOT with the Raspberry Pi Pico
A few months ago, I got a requirement for a POC, to create a system to monitor an UPS. The inverter would provide its status, e.g. "Mains On", "Inverter On", "Battery Voltage", "Mains High", "Mains Low" etc. over a RS232 serial port. We had to read it and send it over the net to our website where the data would be stored and monitored. The readings were to be taken every 1 minute.
The Microcontroller
- Its a powerful microcontroller board, with many IOs including UART ports.
- It has a Pico W version, which comes with WiFi connectivity.
- Its cheap.( The base version is around Rs 350)
- It can be programmed with MicroPython, which is much easier to program in than C, and the Thonny IDE is very user friendly. Kudos to the MicroPython and Thonny devs !
Development
- As a USB storage device. This mode will be used only once at the start, to copy the Micropython image to the Pico's internal storage. This is necessary to support the Micropython programming environment.
- As a USB device that interacts with the Thonny IDE to develop and test the Micropython programs.
Some Gotchas
- Voltage compatibility - The serial port signals were 0-5V, whereas the Pico's IO, like the Pi, is 0-3.3V. There are various solutions for this, from transistors to chips, but I did it with a resistive voltage divider, since the RS232 baud rate used was low.
- Male and Female pinout differences - This may not be immediately clear, but is obvious when you think of it. Male and Female pinouts are different, since Rx in one has to go to Tx in another and so on. So lookup the pinout as per the type of socket.
- UART parameters matching - When you initialize the Pico's UART, make sure that its parameters like baud-rate, parity, start-stop bits match that of the port you are interfacing with.
- Reads and timeouts - If you read multiple bytes at a time, and it times out, you may be get lesser data than what you asked for.
- Sleeps - Sleeps in between can make you miss data that is being sent in a stream.
- Micropython differences - Since Micropython is not the full python version, you will not have some commonly used classes, e.g. not all data structures are available.
Tuesday, August 22, 2023
Git for Svners
Since Git is a distributed repository system, each developer has his own repository, that keeps track of local changes, as well as can sync with one or multiple remote repositories. Hence the concepts of Git are rather different from SVN, and the same terms like checkout mean different things in SVN and GIT. Being an SVNer earlier, i too found Git frustrating to start with.
Atlassian has a good tutorial that explains these concepts :
https://www.atlassian.com/git/tutorials/learn-git-with-bitbucket-cloud
Some important concepts :
- The local repository is a full fledged repository, and independently holds a history of its own branches, commits etc. One could create and work with a local repository, without ever connecting to a remote repository, say for some private work that is not shared.
- We usually start by cloning a remote repository to a local one. See clone. This creates a copy of the remote repository to the local, as well as adds a reference to the remote repository usually as "origin" to the local repo's list of remotes( See the remote command). Thus, when syncing with the original repository, we can specify it as "origin".
- The checkout command does NOT checkout files from remote. Instead it checks out the specified branch to the working directory. i.e. that branch now becomes the current one.
- GIT has a staging area where the changes to be committed are kept. We specify which files are to be staged by using the add command. Without adding, the commit command will have nothing to work on.
- When you push your changes to the remote, you are syncing changes committed in your local repo to the remote one. So you must have committed them first to your local repo. This may not be obvious to SVNers, who would expect the locally updated files to be automatically pushed !
- TODO : See the branch command
- Unlike SVN, we do not need separate working folders for branches. We can switch to another branch in the same working folder, using the checkout command. This means that uncommitted changes can be lost, unless we stash them to a temp location.
Maven for ANTers
So here's a quick summary :
- Maven executes Goals, just like Ant has targets.
- In addition, maven has Phases, each phase being a list of goals. We can also tell maven to execute one or more phases, and then each phase will execute the goals grouped under it. Phases themselves have an order, so that if we execute a given phase, the phases coming before that in the predefined order will be executed first.
- The default phases are validate, compile, test, package, verify, install, deploy.
- Each maven execution run happens in a lifecycle, which has phases under it. The default lifecycles are default, clean and site.
- We can create and define our custom goals. These are packaged in a Plugin.
- Maven also allows us to specify and manage dependencies of the project.
- Maven resources have a groupId, artifactId and version.
- Maven provides for a repository to fetch/store the artifacts. A central maven repository is the default .Also, maven keeps a local repository on the system where its run, to avoid fetching from the remote repository each time.
See also https://www.baeldung.com/maven-goals-phases
Read these links first for a basic understanding :
https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html
In ANT, we specify tasks and their dependencies. We can execute a specific task, and only that, along with its specified dependent tasks will be executed. But this also means that tasks like clean, compile, generate, copy, package, install, deploy etc have be specified in details for all projects, along with config like source, staging, target dirs etc. Also external dependencies like jars have to be managed, along with their versions. These may be common amongst multiple projects. Should we check these into source control, or have a separate common location ? What if we want to add a common functionality to all builds ? How do we name and version output artifacts ?
Maven tries to answer these questions. It tries to provide a standard way to execute projects, by promoting convention over configuration :
- Standardized project directory structure. e.g. The src/main/java, src/main/resources src/test etc
- Naming conventions for articfacts using groupid, artifactid, version
- Providing a dependency configuration, and a repository mechanism to store and access needed dependencies
- Out-of-the box implementation of standard build lifecycle, so that a project can be build with minimum configuration
- An inheritance mechanism, so that a build may be shared amongst multiple projects, each overriding only the parts needed.
- Profiles to have different builds for different case, e.g we might just need compile in dev mode, and the full jar with dependencies may have to be built in production mode.
So if we have a simple java project with no dependencies, then a minimal pom with just the groupid, artifactid and version id will be able to build the project from clean to deploy.
Maven works with lifecyles, phases and goals. It has default implementations of these. We can also create our own lifecycles, which is a list of phases, in order. Then there are goals, that execute in a particular phase which do the actual work. e.g. in the default lifecycle, the install phase has a goal install:install.
Goals are usually implemented using java classes called plugins. When creating plugins, we can specify what phase the plugin should execute, tho its not mandatory. The phase in which a plugin executes can also be specified via the build configuration, and this will override the default.
What can we execute with the maven command ?
e.g. we can execute just the install goal using mvn instal:install, instal:install being the name of the goal in the install phase. Earlier phases will not be run. However, it can cause problems like in this case, since this goal looks for the anme of the output jar to install in the execution context, which is missing. In this case, it can be remedied by also installing the jar:jar goal before, so that it gets the jar name.
What does the maven build file consist of ?
- The groupid, artifactid and version id of the project being built, this the minimal info needed.
- The list of dependencies needed by the project.
- The list of dependency repositories, if using any other than the standard maven one.
- The list of repositories if any, to deploy/publish the final artifact( usually jar, war etc)
- The build section to use and configure non-standard maven plugins, in which phase and with what params are they to run. Similarly to customise/change execution of the standard maven plugins.
- A profiles
Saturday, April 9, 2022
What to watch on Prime Video
Movies
English
Youth
The Unbearable Weight of Massive Talent
Dune
Kimi
The tender bar
Notting Hill
Many Tom Cruise movies, including the MI series
Many James Bond movies
Pride and Prejudice
Sense and Sensibility
The Matrix
The Jurrasic movies
The Harry Potter movies
The Lord of the rings
The tomorrow war
Before I fall
2067
Another round
Indian
Sharmaji Namkeen
Sherni
Missing
Bonus
Dev bhoomi