The Final Artefact

Yet another blog on coding and business

Showing ChatGPT How to Draw

Background For a computer, an image is collection of interpretable instructions that amount to a visual representation. Raster images are composed from using pixels, containing unique colours, whereas vector images keep track of points and equations that join them. In \(\LaTeX\) PGF/TikZ is used to generate vector graphics from algebraic descriptions. TikZ is mostly used to conveniently draw various scientific figures. ChatGPT is capable of generating computer code in majority of popular languages....

June 14, 2023 · 4 min · Konrad Zdeb

Using RScript for R Installation Managment

Most frequently, users tend to undertake common R installation and management tasks from within the R session. Frequently making use of commands, like install.packages, update.packages or old.packages to obtain or update packages or update/verify the existing packages. Those common tasks can also be accomplished via the GUI offered within RStudio, which provides an effortless mechanism for undertaking basic package management tasks. This is approach is usually sufficient for the vast majority of cases; however, there are some examples when working within REPL1 to accomplish common installation tasks is not hugely convenient....

January 3, 2022 · 5 min · Konrad Zdeb

Beauty of R and Big-O

Big-O The purpose of this is not to provide yet another primer on the Big-O/\(\Omega\)/\(\Theta\) notation but to share my enduring appreciation for working with R. I will introduce Big-O only briefly to provide context but I would refer all of those who are interested to the linked materials. What is Big-sth notation… When analysing functions, we may be interested in knowing how fast a function grows. For instance, for function \(T(n)=4n^2-2n+2\), after ignoring constants, we would say that \(T(n)\) grows at the order of \(n^2\)....

December 9, 2021 · 6 min · Konrad Zdeb

R-based metaprogramming strategies for handling Hive/CSV interaction (Part I, imports)

Background Handling Hive/CSV interaction is a common reality of many analytical and data environments. The question on exporting data from Hive to CSV and other formats is frequently raised on online forums with answers frequently suggesting making use of sed that combined with nifty regular expressions pipes Hive output into a flat CSV files as an exporting solution. Import of large amounts of data is best handled by suitable tools like Apache Flume....

August 13, 2021 · 9 min · Konrad Zdeb

Why regex is not fuzzy matching

Recently, I cam across an interesting discussion on StackOverflow1 pertaining to approach to fuzzy matching tables in R. Good answer contributed by one of the most resilient and excellent contributors to whom I owe a lot of thanks for help suggested relying on regular expression, combining this with basic sting removal and transformations like toupper to deterministically match the tables. The solution solved the problem and was accepted. So what’s wrong… With this particular problem/solution pair, there is absolutely nothing wrong....

June 29, 2021 · 7 min · Konrad Zdeb

On Sorting Arrays...or why it's good to read the actual assignment

Problem Solving challenges on project Euler or HackerRank is a good past time. For folks working in the wider analaytical / data science field, places like project Euler provide an excellent opportunity to work with academic programming concepts that do not frequently appear in real-life. I was looking at common problem: You are given an unordered array consisting of consecutive integers [1, 2, 3, …, n] without any duplicates....

May 23, 2021 · 4 min · Konrad Zdeb

Using R for File Manipulation

Challenge File manipulation is a frequent task unavoidable in almost every IT business process. Traditionally, file manipulation tasks are accomplished within the ramifications of specific tools native to a given system. As such, the one may consider writing and scheduling shell script to undertake frequent file operations or using more specific purpose-built tools like logrotate in order to archive logs or tools like Kafka are used to build streaming-data pipelines....

March 29, 2021 · 6 min · Konrad Zdeb

Inserting Data into Partitioned Table

Rationale Maintaining partitioned Hive tables is a frequent practice in a business. Properly structured tables are conducive to achieving robust performance through speeding up query execution (see Costa, Costa, and Santos 2019). Frequent use cases pertain to creating tables with hierarchical partition structure. In context of a data that is refreshed daily, the frequently utilised partition structure reflects years, months and dates. Creating partitioned table In HiveQL we would create the table with the following structure using the syntax below....

February 26, 2021 · 7 min · Konrad Zdeb

Poor Man's Robust Shiny App Deployment (Part II)

Introduction This article draws on the past post concerned with utilisation of golem for robust deployment of analytical and reporting solutions. For this article, we will assume that we are working with defined working requirements that utilise some of the Labour Market Statistics disseminated through the nomis portal. Change Plan What we have Reporting requirements Past scripts we used to create reports with accompanying instructions What we want...

February 12, 2021 · 3 min · Konrad

Poor Man's Robust Shiny App Deployment

Not so uncommon problem… RStudio Connect and more modest Shiny Proxy come to mind as most obvious solutions for deploying Shiny applications in production. Application servers are ideal for deploying applications that are to be consumed on a regular basis by larger audiences. In addition to serving the application, managing dependencies and user access or logging user activity are common tasks we would expect for a publishing platform to address....

July 23, 2020 · 5 min · Konrad