Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the Cassandra connector #180

Merged
merged 7 commits into from
Aug 8, 2017
Merged

Conversation

rkuris
Copy link
Contributor

@rkuris rkuris commented Jun 14, 2017

This connector allows for a direct connection to Cassandra from
the client side. To configure the connector, you can either
specify a yaml fragment or a gocql configuration block. The default
is to connect to 127.0.0.1:9042.

This change also adds support for testing via ccm in jenkins.

Most of the DML code is just ported over from our closed source
connector, with a few changes:

  • Removal of any remaining stapi-go code
  • Removal of any remaining UNS code
  • Changes to test infrastructure to use ccm
  • Renamed Resolver to KeyspaceMapper

Other changes in this diff:

  • Upgrade to golang 1.8
  • Increased test timeouts since creating a local C* cluster takes a while
  • Abstracted KeyspaceMapper interface for mapping
    (scope, namePrefix) -> keyspace

TODO:

  • Replication factor and strategy is hardcoded; should be calculated
    based on the cluster
  • Column type compatibility is not currently checked
  • TruncateScope is not implemented
  • Provide a better way to just specify a set of contact points
  • If a generated keyspace name is >48 characters, hash it instead
  • Scope creation can't use the keyspace mapper since it doesn't get
    the namePrefix
  • Use templates for the schema portions of the connector

Copy link
Contributor

@klnusbaum klnusbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to get through the entire diff, but I wanted to give you what I have so far as there's actionable stuff in here.

Also, you've included some additional stuff that's a little distracting from the main thing we're trying to review. Can you move:

  • Upgrade to golang 1.8
  • Increased test timeouts since creating a local C* cluster takes a while

into their own PR?

KsMapper: ksMapper,
Session: session,
}
c.Session = session
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wat. I don't think you need this line, since you do the assignment two lines above.

func init() {
dosa.RegisterConnector("cassandra", func(args dosa.CreationArgs) (dosa.Connector, error) {
// set up defaults for configuration
config := gocql.NewCluster("127.0.0.1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move 127.0.0.1 into a named constant, i.e.

const(
localhost = "127.0.0.1"
)

}

if mapper, ok := args["keyspace_mapper"]; ok {
if m, ok := mapper.(KeyspaceMapper); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but what if we're not ok? can we error out and say "hey, you tried to use a KeyspaceMapper that isn't actually a KeyspaceMapper"

"github.com/uber-go/dosa/testentity"
)

func TestNewConnector(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you opted not to use any of the assert package in this particular test function? For instance, I think:

	if err != nil {
 		fmt.Printf("Error connecting to Cassandra: %v", err)
 		return
 	}

Could probably be written more succictly with:

assert.NoError(t, err, fmt.Sprintf("Error connecting to Cassandra: %v", err))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is that assert.NoError doesn't abort the test, but I can do better than this anyway; it's inconsistent at best.


var testConnector dosa.Connector

func init() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I've ever seen an empty init() function. Is it special? Do we need it for some reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, code restructuring left it behind

)

func TestNewConnector(t *testing.T) {
_ = GetTestConnector(t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you do this just in order to kick ccm and boot up a cluster. Maybe instead of faking it, we actually just provide a public method that boots up the casandra cluster for testing purposes?

expected.SerialConsistency = gocql.Serial
expected.PoolConfig = gocql.PoolConfig{HostSelectionPolicy: gocql.TokenAwareHostPolicy(gocql.RoundRobinHostPolicy())}
expected.HostFilter = gocql.DataCentreHostFilter("dca1")
assert.Equal(t, config.ClusterConfig.PoolConfig, expected.PoolConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The arguments for all these assert.Equal calls need to be reversed. The assert package says the second argument to Equals should be what the expected output is, and the third argument should be what you actually got. this actually matters since it affects the way the assert package prints out error messages when an assert fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this code from our closed source side. I'll fix it here, but it also exists still there.

assert.Equal(t, config.ClusterConfig.SerialConsistency, expected.SerialConsistency)
assert.Equal(t, config.ClusterConfig.SocketKeepalive, expected.SocketKeepalive)
assert.Equal(t, config.ClusterConfig.PageSize, expected.PageSize)
// no way I can test this. The hostfilter is an anonymous func
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you call the host filter function and see if it gives you a certain output?


var testStore *Connector

func initTestStore(t *testing.T) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to return an error here, since it doesn't seem possible for this function to return an error?

@@ -0,0 1,165 @@
// Copyright (c) 2017 Uber Technologies, Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a serious concerns with checking in commented code. It makes greping the codebase a lot harder and will like be out of dat by the time it get's uncommented. Are you planning on uncommenting this out in future revisions of this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was copied from the closed source connector. At some point we will be adding multi support, but yeah, we should just delete this for now.

@rkuris rkuris force-pushed the rkuris-os-cassandra-connector branch 5 times, most recently from 935fad4 to 1bd0407 Compare June 19, 2017 15:16
Copy link
Contributor

@klnusbaum klnusbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was way easier to look at. I think everything we need is here now, but there's just some clean up we have to do before landing it. Really excited about these changes!

}

ctx := context.Background()
if err := c.CreateScope(ctx, "example"); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you wanna do an assert here like you do for "Error connecting to Cassandra"?

}
ei, _ := dosa.TableFromInstance(&testentity.TestEntity{})
_, err = c.UpsertSchema(ctx, "example", "example", []*dosa.EntityDefinition{&ei.EntityDefinition})
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you wanna do an assert here like you do for "Error connecting to Cassandra"?

"int64key": int64(11),
})

if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you wanna do an assert here like you do for "Error connecting to Cassandra"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. The test will fail when you call t.Fatal, with enough diagnostics (the error message) to tell you what the problem is, since a stack is included.


tokenBytes, err := base64.StdEncoding.DecodeString(token)
if err != nil {
return nil, "", common.NewErrBadToken(fmt.Sprintf("bad token: %s, err: %s", token, err.Error()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we no longer want to return an error of type ErrBadToken


expectedUUIDSet := make(map[dosa.UUID]struct{})
expectedIntValueSet := make(map[int]struct{})

// remove any entities from any previous test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just move this code into it's own function called clearTestStore. Then you can get rid of the docstring because the function name documents what the code does for you.

}

// compareStructToSchema compares a dosa EntityDefinition to the gocql TableMetadata
// There are two main cases, one that we can fix by adding some columns and all the other mismatches that we can't fix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth adding to the docs here that the function only returns an error in the case of a non-reparable mis match being found. That wasn't quite clear to me on the first pass.

// drop the old scope, ignoring errors
_ = c.DropScope(ctx, scope)

ksn := CleanupKeyspaceName(scope)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we use the keyspace mapper here to determine the keyspace?

@@ -37,17 55,197 @@ func (c *Connector) TruncateScope(ctx context.Context, scope string) error {

// DropScope is not implemented
func (c *Connector) DropScope(ctx context.Context, scope string) error {
panic("next diff")
ksn := CleanupKeyspaceName(scope)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we use the keyspace mapper here to determine the keyspace?

return c.Session.Query(createTableString(keyspace, ed)).Exec()
}

func createTableString(keyspace string, ed *dosa.EntityDefinition) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When generating CQL for all of our crud and queries we use go's templating framework, why not also use the templating here? I think it will make the code way easier to read.

"github.com/stretchr/testify/assert"
"github.com/uber-go/dosa"
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our code for creating a table string is way too complex to not have tests. Please add tests for that.

@rkuris rkuris force-pushed the rkuris-os-cassandra-connector branch from d8663ae to 3e9396f Compare June 21, 2017 01:37
Copy link
Contributor

@klnusbaum klnusbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good to go. Just a few small things worth thinking about before merging.

// CreateScope creates a keyspace
func (c *Connector) CreateScope(ctx context.Context, scope string) error {
// drop the old scope, ignoring errors
_ = c.DropScope(ctx, scope)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems really dangerous, to just implicitly drop a scope. Can we do better here?

(meant to leave this comment in the previous review, but it looks like it got dropped some how).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do better, but that's not the job of this connector to do that. There should be some additional connectors for authn and authz to prevent doing things like performing scope operations. That connector is TBD. By the time you get here, it's assumed you have appropriate rights to do this.

In the office, we should probably have a connector that holds our "rules" about not dropping production scopes. They need to be a bit more complicated though -- perhaps only dosa-group members should be able to drop production scopes. We've had a few tickets about this already.

// drop the old scope, ignoring errors
_ = c.DropScope(ctx, scope)

ksn := CleanupKeyspaceName(scope)
Copy link
Contributor

@klnusbaum klnusbaum Jun 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In here, and in DropScope, we directly map the the scope to the keyspace name using CleanupKeyspaceName. But in CheckSchema we actually use the KeySpaceMapper associated with the connector to determine the keyspace to use.

I'm guessing this has to do with some of the other problems related around keyspace, scope, and prefix mapping that we've talked about offline. Can we at least leave a TODO here to let people know that this inconsistent behavior is going to be fixed soon?

(meant to leave this comment in the previous review, but it too seemed to have gotten dropped some how).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, you actually explained this the description for the PR. My bad.

// parameters were provided, so lets use them
if args != nil {
// TODO the most common case is hosts-only so make that easier
if configYaml, ok := args["yaml"]; ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is confusing, the config should already be yaml, so why would there be a key in the args called "yaml"? Does this mean the config would look like this:

connector:
  name: cassandra
  yaml: ???

I would suggest defining a configuration type similar to the other connectors so that it's clear what configuration is required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we should have all the connectors take either a snippet of configuration yaml or a structure that is specific to the connector, or possibly a K/V set. The problem with the latter is that sometimes you need more depth than that. We should really clean this up though, but that's a problem for another PR.

@rkuris rkuris force-pushed the rkuris-os-cassandra-connector branch from 3e9396f to db7e12f Compare July 3, 2017 02:44
Also increases the timeout, in preparation for cassandra connector
This is just for reference to make the diff easier
to review.
@rkuris rkuris force-pushed the rkuris-os-cassandra-connector branch from db7e12f to 031a3e8 Compare July 3, 2017 13:22
rkuris and others added 5 commits July 3, 2017 06:23
This shows the differences from the closed source connector and
should be the focal point of the review.

The schema work is in the next diff. The tests do not pass yet because
they depend on the schema being created by the schema diff.

This connector allows for a direct connection to Cassandra from
the client side. To configure the connector, you can either
specify a yaml fragment or a gocql configuration block. The default
is to connect to 127.0.0.1:9042.

Most of the DML code is just ported over from our closed source
connector, with a few changes:
  - Removal of any remaining stapi-go code
  - Removal of any remaining UNS code
  - Changes to test infrastructure to use ccm
  - Renamed Resolver to KeyspaceMapper
  - Abstracted KeyspaceMapper interface for mapping
    (scope, namePrefix) -> keyspace

TODO:
  - Provide a better way to just specify a set of contact points
  - If a generated keyspace name is >48 characters, hash it instead
This is the schema portion of the Cassandra connector work.
The approach for schema management is to always run CheckSchema,
and when it fails, examine the error. CheckSchema will return a
RepairableSchema error containing the necessary changes, which
UpsertSchema converts into a series of create and alter statements.

TODO:
 - Replication factor and strategy is hardcoded; should be calculated
   based on the cluster
 - Column type compatibility is not currently checked
 - TruncateScope is not implemented
 - Scope creation can't use the keyspace mapper since it doesn't get
   the namePrefix
Straight copy from closed source bug fix
SQUASHME ON MERGE
 - Added unit tests to createTableString
 - Moved the create table code to the cql generator
 - Renamed VerifyOrStartCassandra to EnsureLocalCassandraStarted
 - Broke compareStructToSchema into three subfunctions,
   checkPartitionKeys, checkClusteringKeys, and checkColumns
@rkuris rkuris merged commit 9b7197e into master Aug 8, 2017
@rkuris rkuris deleted the rkuris-os-cassandra-connector branch September 27, 2017 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants