sparkutils / frameless   0.17.0-RC5

Apache License 2.0 GitHub

A wrapped fork of the excellent frameless library

Scala versions: 2.13 2.12

frameless

A wrapped and published fork of the excellent frameless library

Why?

Across the last 4 years there have been PRs that have lagged in release, often also against the OSS Spark releases. Whilst this is a completely understandable and acceptable part of the process it can hamper development and release of software using frameless, Quality for example, made more complicated still by differences in runtimes such as Databricks. 3.5 Spark support took 5 months to release, given the amount of change it's also not too surprising.

The major shift proposed by #800 amps up the level of change and, although it may ease such changes in the future, introduces another unproven dependency in shim. It's beyond reasonable to expect this change requires even more care to release in frameless proper.

In order to test in a corporate setting the software needs to be full-blown release on maven central, building local snapshots is not always straight forward, worst still if you need to depend on it.

com.sparkutils.frameless aims to fill that void.

What com.sparkutils.frameless is not

It's not frameless, it just behaves the same, use the same packages and stays fairly up to date with frameless proper. Check release logs for confirmations of functionality.

How does com.sparkutils.frameless relate to the rest of com.sparkutils?

com.sparkutils.frameless builds upon typelevel frameless targetting OSS Spark and uses shim to run on a wider variety of runtimes without a major release, Quality as of 0.1.3 uses com.sparkutils.frameless as a provided scope artefact. testless provides a shaded test pack against com.sparkutils.frameless (and possibly frameless proper in the future if 800 is merged) to test against various runtimes (primarily Databricks).

How?

  1. Git submodules - the actual source code will be taken from a specific commit of the https://github.com/chris-twiner/frameless/ fork. Code will be kept in sync on demand
  2. Maven - SBT builds are not straightforward to achieve in some corporate environments, keeping sbt pluguins updated and available can be almost a full time role. As such the build moves to maven with build helper to access the correct source locations.

Why should I use this?

You need some of the features in the build which are not yet (or possibly won't ever be) included in the official release. If this isn't the case use the official library.

Will com.sparkutils.frameless continue to exist if 800 is merged and released?

The repo won't go away, nor will the occasional need. However it's strongly recommended to use the official library wherever possible.

What if I find a bug?

If the bug is one that is also found in frameless proper, raise an issue there. If the issue is one of OSS version changes, look at testsless' tested combos to see if/when your combination has been tested and re-test the nearest combo (it's possible, for example, that a perfectly running test suite on a Databricks runtime suddenly stops working due to backported fixes or improvements) - it may be that a new, more specific, shim version is required for that runtime rather than anything in either frameless proper or com.sparkutils.frameless (in which case raise an issue on shim).

If the bug is in functionality effected by the use of any com.sparkutils.frameless specific functionality then raise it here.

Versions

com.sparkutils.frameless starts off from the 0.16 release of frameless proper and publishes artifacts against the spark major.minor.

Version Based On Released Extras
0.17.0-RC1 0.16.0 8th April 24 #800 - shim usage, #805 - correct Seq/Set encoding and #806 - correct eval implementation for UDF.

How to use

In order to depend upon both typelevel frameless and com.sparkutils.frameless the following scheme is needed (use the profiles to swap between version see testless' pom for a thorough example):

<pom>
<profiles>
     <profile>
         <!-- actual frameless -->
        <id>0.17.0-3.3.4</id>
        <properties>
            <framelessVersion>0.16.0-78-b880261-SNAPSHOT</framelessVersion>
            <framelessRuntime>0.17.0-3.3.2</framelessRuntime>
            <framelessOrg>org.typelevel</framelessOrg>
            <framelessCompatVersion>-spark33</framelessCompatVersion>
            <framelessCoreCompatVersion></framelessCoreCompatVersion>
        </properties>
    </profile>
    <profile>
        <!-- sparkutils frameless -->
        <id>sparkutils-0.17.0-3.5</id>
        <properties>
            <framelessVersion>0.17.0-SNAPSHOT</framelessVersion>
            <framelessRuntime>sparkutils_0.17.0-3.5</framelessRuntime>
            <framelessOrg>com.sparkutils</framelessOrg>
            <framelessCompatVersion>_3.5</framelessCompatVersion>
            <framelessCoreCompatVersion>_3.5</framelessCoreCompatVersion>
        </properties>
    </profile>
    
</profiles>
    
<properties>
    <shimRuntime>14.3.dbr</shimRuntime>
    <shimRuntimeVersion>0.0.1-RC4</shimRuntimeVersion>
</properties>   
    
<dependencies>
    <!-- shim runtime of your choosing -->
    <dependency>
        <groupId>com.sparkutils</groupId>
        <artifactId>shim_runtime_${shimRuntime}_${sparkCompatVersion}_${scalaCompatVersion}</artifactId>
        <version>${shimRuntimeVersion}</version>
    </dependency>

    <dependency>
        <groupId>${framelessOrg}</groupId>
        <!-- framelessCoreCompatVersion is used as sparkutils.frameless always publishes the spark major.minor -->
        <artifactId>frameless-core${framelessCoreCompatVersion}_${scalaCompatVersion}</artifactId>
        <version>${framelessVersion}</version>
        <exclusions>
            <!-- exclude the shim -->
            <exclusion>
                <groupId>com.sparkutils</groupId>
                <artifactId>*</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
    <dependency>
        <groupId>${framelessOrg}</groupId>
        <artifactId>frameless-dataset${framelessCompatVersion}_${scalaCompatVersion}</artifactId>
        <version>${framelessVersion}</version>
        <exclusions>
            <!-- exclude the shim -->
            <exclusion>
                <groupId>com.sparkutils</groupId>
                <artifactId>*</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
</dependencies>
</pom>